RNA was isolated from rectal biopsies obtained during diagnostic colonoscopy using the Qiagen AllPrep RNA/DNA Mini Kit. PolyA-RNA selection, fragmentation, cDNA synthesis, adaptor ligation, TruSeq RNA sample library preparation (Illumina, San Diego, CA), and paired-end 75 bp sequencing was performed9 (link). An additional validation of the baseline rectal gene expression at diagnosis utilized the independent RISK cohort of treatment-naïve pediatric patients (55 non-IBD controls, 43 UC patients, and 92 CD patients with rectal inflammation) and single-end 75 bp mRNA sequencing9 (link). Reads were quantified by kallisto51 (link), using Gencode v24 as the reference genome and transcripts per million (TPM) as an output. We included 14,085 protein-coding mRNA genes with TPM above 1 in 20% of the samples in our downstream analysis. Only samples for which the gene expression (Y encoded genes and XIST)-determined gender matched the clinical-reported gender were included in the analyses (we excluded only one sample with unmatched gender). Four other PROTECT samples were excluded due to poor read quality. A total of 226 RNAseq samples with mean read depth of ~47 M (14 M std. deviation) were stratified into specific clinical subgroups including Ctl (n = 20), and UC (n = 206), and were substratified based on disease severity, and on histologic findings. Differentially expressed genes were determined in GeneSpring® software with fold change differences (FC) ≥ 1.5 and using the Benjamini–Hochberg false discovery rate correction (FDR, 0.001) for all analyses except for the corticosteroid response genes that was calculated out of the 712 severity genes with FDR < 0.05. Unsupervised hierarchical clustering using Euclidean distance metric and Ward’s linkage rule was used to test for groups of rectal biopsies with similar patterns of gene expression. ToppGene6 (link) and ToppCluster7 (link) software were used to test for functional annotation enrichment analyses of immune cell types, pathways, phenotype, immune cell-type enrichments, and biologic functions. Visualization of the network was obtained using Cytoscape.v3.0.252 (link).
For validation of the association between baseline gene expression and outcome, we also generated independent Lexogen QuantSeq 3′ mRNAseq libraries19 (link) and single-end 100 bp sequencing for 134 participants who also had Illumina mRNAseq data (the Discovery Cohort) and for 50 participants who did not have Illumina mRNAseq data (the independent Validation cohort, Table1 ). PCA was performed to summarize variation in gene expression between patients, and principal components (PC) values were extracted for downstream analyses. We considered several central gene expression pathways PC1 preidentified by the previous differential expression analyses and functional annotation enrichment analyses of the core 5296 UC genes, the 712 severity genes, and the 115 corticosteroid response gene signature for the model building and associations with the microbial composition as described below. PROTECT (GSE109142) and RISK (GSE117993) rectal mRNAseq data sets were deposited into GEO.
For validation of the association between baseline gene expression and outcome, we also generated independent Lexogen QuantSeq 3′ mRNAseq libraries19 (link) and single-end 100 bp sequencing for 134 participants who also had Illumina mRNAseq data (the Discovery Cohort) and for 50 participants who did not have Illumina mRNAseq data (the independent Validation cohort, Table
Full text: Click here