DNA genotypes from human RNA-seq data were ascertained via the SAMtools mpileup function as done previously21 (link). Human genotypes derived from RNA-seq data were phased and imputed with Beagle version 5.1, which uses a probabilistic Hidden Markov Chain model that performs well for sequencing data with sparse genomic coverage22 (link). We would like to caution the reader that Beagle was originally developed for genome-wide DNA variant data and not RNA-sequencing data. Our analyses used a few methods and criteria for quality control (QC) including: genotyping rate > 95%, minor allele frequency > 0.10, Hardy–Weinberg equilibrium > 1e-6, > 5 reads per sample, Phred Score > 20 and an imputation score > 0.3. The input for imputation was 40,878 called genotypes that were common among all samples and passed initial QC. These variants were imputed to 1000 Genomes Phase III all data, which resulted in 570,755 SNPs, 178,598 of which passed QC. These ~ 170 k variants were used for polygenic score and sQTL analyses. Note, that the 91.9% of these SNPs were present in the AUD GWAS, but that GWAS has 77.9 times more SNPs than the current study. Thus, we encourage the reader to use caution in interpreting our polygenic score and sQTL analyses given the limited number of individuals and the number of SNPs used.
RNA-seq Data Processing and Variant Calling
DNA genotypes from human RNA-seq data were ascertained via the SAMtools mpileup function as done previously21 (link). Human genotypes derived from RNA-seq data were phased and imputed with Beagle version 5.1, which uses a probabilistic Hidden Markov Chain model that performs well for sequencing data with sparse genomic coverage22 (link). We would like to caution the reader that Beagle was originally developed for genome-wide DNA variant data and not RNA-sequencing data. Our analyses used a few methods and criteria for quality control (QC) including: genotyping rate > 95%, minor allele frequency > 0.10, Hardy–Weinberg equilibrium > 1e-6, > 5 reads per sample, Phred Score > 20 and an imputation score > 0.3. The input for imputation was 40,878 called genotypes that were common among all samples and passed initial QC. These variants were imputed to 1000 Genomes Phase III all data, which resulted in 570,755 SNPs, 178,598 of which passed QC. These ~ 170 k variants were used for polygenic score and sQTL analyses. Note, that the 91.9% of these SNPs were present in the AUD GWAS, but that GWAS has 77.9 times more SNPs than the current study. Thus, we encourage the reader to use caution in interpreting our polygenic score and sQTL analyses given the limited number of individuals and the number of SNPs used.
Corresponding Organization :
Other organizations : Emory University
Variable analysis
- Removal of Illumina adapters
- Removal of poor quality reads (reads < 36 bp long, leading or trailing reads < Phred score of 3 and allowing a maximum of 2 mismatches per read)
- RNA-seq data quality
- Paired-end read alignment
- Genotyping rate
- Minor allele frequency
- Hardy–Weinberg equilibrium
- Phred Score
- Imputation score
- Uniform pipeline for RNA-seq data processing
- Alignment of trimmed reads to the human hg19 genome or the Rhesus Macaque mmul_10 genome
- Criteria for quality control (QC): genotyping rate > 95%, minor allele frequency > 0.10, Hardy–Weinberg equilibrium > 1e-6, > 5 reads per sample, Phred Score > 20 and an imputation score > 0.3
- None specified
- None specified
Annotations
Based on most similar protocols
As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.
About PubCompare
Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.
We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.
However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.
Ready to get started?
Sign up for free.
Registration takes 20 seconds.
Available from any computer
No download required
Revolutionizing how scientists
search and build protocols!