379 cases (69%) of our EAC cohort were derived from the esophageal adenocarcinoma WGS ICGC study, for which samples are collected through the UK wide OCCAMS (Oesophageal Cancer Classification and Molecular Stratification) consortium. The procedures for obtaining the samples, quality control processes, extractions and whole genome sequencing are as previously described17 (
link). Strict pathology consensus review was observed for these samples with a 70% cellularity requirement before inclusion. Comprehensive clinical information was available for the ICGC-OCCAMS cases (
Supplementary Table 13). In addition, previously published samples were included in the analysis from Dulak et al.19 (
link) (149 WES; 27%) and Nones et al.20 (22 WGS samples; 4%) to total 551 genome characterized EACs. RNA-seq data was available from our ICGC WGS samples (116/379). BAM files for all samples (include those from Dulak et al.19 (
link) and Nones et al.20 ) were run through our alignment (BWA-MEM), mutation (Strelka), copy number (ASCAT) and structural variant (Manta) calling pipelines, as previously described17 (
link). Our methods were benchmarked against various other available methods and have among the best sensitivity and specificity for variant calling (ICGC benchmarking exercise49 ,50 (
link)). Cell lines were whole genome sequenced at 30X coverage with 150bp paired end reads on an Illumina Hiseq4000. Copy number calling was performed by Freec as previously described41 (
link). Mutations were called by GATK as previously described41 (
link), filtered for germline variants in the 1000 genomes project and any known oncogenic hotspots32 (
link) were recovered. Amplifications were defined as genes with 2x the median copy number of the host chromosome or greater.
Total RNA was extracted using All Prep DNA/RNA kit from Qiagen, and the quality was checked on Agilent 2100 Bioanalyzer using RNA 6000 nano kit (Agilent). Qubit High sensitivity RNA assay kit from Thermo Fisher was used for quantification. Libraries were prepared from 250 ng RNA, using TruSeq Stranded Total RNA Library Prep Gold (Ribo-zero) kit, and ribosomal RNA (nuclear, cytoplasmic and mitochondrial rRNA) was depleted, whereby biotinylated probes selectively bind to ribosomal RNA molecules forming probe-rRNA hybrids. These hybrids were pulled down using magnetic beads and rRNA depleted total RNA was reverse transcribed. The libraries were prepared according to Illumina protocol51 (
link). Paired end 75-bp sequencing on HiSeq4000 generated the paired end reads. For normal expression controls, we chose gastric cardia tissue, from which some hypothesize Barrett’s esophagus may arise, and duodenum which contains intestinal histology, including goblet cells, which mimics that of Barrett’s esophagus. We did not use Barrett’s esophagus tissue itself as a normal control given the heterogeneous and plentiful phenotypic and genomic changes that it undergoes early in its pathogenesis.
Frankell A.M., Jammula S., Li X., Contino G., Killcoyne S., Abbas S., Perner J., Bower L., Devonshire G., Ococks E., Grehan N., Mok J., O’Donovan M., MacRae S., Eldridge M.D., Tavaré S, & Fitzgerald R.C. (2019). The landscape of selection in 551 esophageal adenocarcinomas defines genomic biomarkers for the clinic. Nature genetics, 51(3), 506-516.