CaVEMan (Cancer Variants Through Expectation Maximization:
http://cancerit.github.io/CaVEMan/) was used for calling somatic substitutions.
Indels in the tumor and normal genomes were called using a modified Pindel version 2.0. (
http://cancerit.github.io/cgpPindel/) on the NCBI37 genome build 49 (
link).
Structural variants were discovered using a bespoke algorithm, BRASS (BReakpoint AnalySiS) (
https://github.com/cancerit/BRASS) through discordantly mapping paired-end reads. Next, discordantly mapping read pairs that were likely to span breakpoints, as well as a selection of nearby properly-paired reads, were grouped for each region of interest. Using the Velvet de novo assembler50 (
link), reads were locally assembled within each of these regions to produce a contiguous consensus sequence of each region. Rearrangements, represented by reads from the rearranged derivative as well as the corresponding non-rearranged allele were instantly recognisable from a particular pattern of five vertices in the de Bruijn graph (a mathematical method used in de novo assembly of (short) read sequences) of component of Velvet. Exact coordinates and features of junction sequence (e.g. microhomology or non-templated sequence) were derived from this, following aligning to the reference genome, as though they were split reads.
Supplementary Table 3 for summary of somatic variants. Annotation was according to ENSEMBL version 58.
Single nucleotide polymorphism (SNP) array hybridization using the Affymetrix SNP6.0 platform was performed according to Affymetrix protocols. Allele-specific copy number analysis of tumors was performed using ASCAT (v2.1.1), to generate integral allele-specific copy number profiles for the tumor cells51 (
link) (
Supplementary Table 4 and 5). ASCAT was also applied to NGS data directly with highly comparable results.
12.5% of the breast cancers were sampled for validation of substitutions, indels and/or rearrangements in order to make an assessment of the positive predictive value of mutation-calling (
Supplementary Table 6).
Further details of these processing steps as well as processing of transcriptomic and miRNA data (
Supplementary Table 7 and 8) can be found in
Supplementary Methods.
Nik-Zainal S., Davies H., Staaf J., Ramakrishna M., Glodzik D., Zou X., Martincorena I., Alexandrov L.B., Martin S., Wedge D.C., Van Loo P., Ju Y.S., Smid M., Brinkman A.B., Morganella S., Aure M.R., Lingjærde O.C., Langerød A., Ringnér M., Ahn S.M., Boyault S., Brock J.E., Broeks A., Butler A., Desmedt C., Dirix L., Dronov S., Fatima A., Foekens J.A., Gerstung M., Hooijer G.K., Jang S.J., Jones D.R., Kim H.Y., King T.A., Krishnamurthy S., Lee H.J., Lee J.Y., Li Y., McLaren S., Menzies A., Mustonen V., O’Meara S., Pauporté I., Pivot X., Purdie C.A., Raine K., Ramakrishnan K., Rodríguez-González F.G., Romieu G., Sieuwerts A.M., Simpson P.T., Shepherd R., Stebbings L., Stefansson O.A., Teague J., Tommasi S., Treilleux I., Van den Eynden G.G., Vermeulen P., Vincent-Salomon A., Yates L., Caldas C., van’t Veer L., Tutt A., Knappskog S., Tan B.K., Jonkers J., Borg Å., Ueno N.T., Sotiriou C., Viari A., Futreal P.A., Campbell P.J., Span P.N., Van Laere S., Lakhani S.R., Eyfjord J.E., Thompson A.M., Birney E., Stunnenberg H.G., van de Vijver M.J., Martens J.W., Børresen-Dale A.L., Richardson A.L., Kong G., Thomas G, & Stratton M.R. (2016). Landscape of somatic mutations in 560 breast cancer whole genome sequences. Nature, 534(7605), 47-54.