RNA-seq Data Processing and Variant Calling

RNA-seq data were processed using a uniform pipeline. First, we investigated RNA-seq data quality using FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). We removed Illumina adapters and poor quality reads (reads < 36 bp long, leading or trailing reads < Phred score of 3 and allowing a maximum of 2 mismatches per read) using Trimmomatic (version 0.39)^{19 (link)}. Then, we aligned trimmed reads to either the human hg19 genome or the Rhesus Macaque mmul_10 genome using STAR aligner version 2.5.3.a^{20 (link)}. We followed the guidelines outlined by leafcutter (https://davidaknowles.github.io/leafcutter) to align RNA-seq reads and prepare data for differential splicing analyses. RNA-seq read alignment yielded an average of 78,955,738 paired-end reads in humans (s.d. = 29,804,777; M_Alignment = 86.16%; M_{read_size} = 188.36) and a mean of 34,551,920 paired–end reads in primates (s.d. = 8,202,258; M_Alignment = 79.71%; M_{read_size} = 127.59).
DNA genotypes from human RNA-seq data were ascertained via the SAMtools mpileup function as done previously^{21 (link)}. Human genotypes derived from RNA-seq data were phased and imputed with Beagle version 5.1, which uses a probabilistic Hidden Markov Chain model that performs well for sequencing data with sparse genomic coverage^{22 (link)}. We would like to caution the reader that Beagle was originally developed for genome-wide DNA variant data and not RNA-sequencing data. Our analyses used a few methods and criteria for quality control (QC) including: genotyping rate > 95%, minor allele frequency > 0.10, Hardy–Weinberg equilibrium > 1e-6, > 5 reads per sample, Phred Score > 20 and an imputation score > 0.3. The input for imputation was 40,878 called genotypes that were common among all samples and passed initial QC. These variants were imputed to 1000 Genomes Phase III all data, which resulted in 570,755 SNPs, 178,598 of which passed QC. These ~ 170 k variants were used for polygenic score and sQTL analyses. Note, that the 91.9% of these SNPs were present in the AUD GWAS, but that GWAS has 77.9 times more SNPs than the current study. Thus, we encourage the reader to use caution in interpreting our polygenic score and sQTL analyses given the limited number of individuals and the number of SNPs used.

Free full text: Click here

Huggett S.B., Ikeda A.S., Yuan Q., Benca-Bachman C.E, & Palmer R.H. (2023). Genome- and transcriptome-wide splicing associations with alcohol use disorder. Scientific Reports, 13, 3950.

Publication 2023

Genomes Genotypes Gwas Human Human genome Primates Rhesus macaque Rna seq Snps

Corresponding Organization :

Other organizations : Emory University

Top 5 similar protocols

Variable analysis

independent variables

Removal of Illumina adapters
Removal of poor quality reads (reads < 36 bp long, leading or trailing reads < Phred score of 3 and allowing a maximum of 2 mismatches per read)

dependent variables

RNA-seq data quality
Paired-end read alignment
Genotyping rate
Minor allele frequency
Hardy–Weinberg equilibrium
Phred Score
Imputation score

control variables

Uniform pipeline for RNA-seq data processing
Alignment of trimmed reads to the human hg19 genome or the Rhesus Macaque mmul_10 genome
Criteria for quality control (QC): genotyping rate > 95%, minor allele frequency > 0.10, Hardy–Weinberg equilibrium > 1e-6, > 5 reads per sample, Phred Score > 20 and an imputation score > 0.3

positive controls

None specified

negative controls

None specified

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!