The gold standard CNVs utilized in this study were a subset of those CNVs released by the 1000 Genomes Project from 8-fold coverage Illumina paired-end population-scale sequencing data (available at 1000genomes.org) and analysis of the genomes of 2,504 individuals [4 ]. CNV discovery was conducted using the following tools: Delly, VariationHunter, BreakDancer, CNVnator, GenomeStrip, Pindel, and SSF. CNVs were merged by taking into account the confidence intervals around estimated boundaries. The merged set was genotyped across the entire population by GenomeStrip and filtered to remove redundancy and low quality sites based on genotype information. Genotypes were then updated though the integrated imputation of all variants (CNVs, SNPs, indels, etc.) with the MNV tool. The resulting genotypes were estimated to have a very low 3.1% false positive rate. Array-based experimental validation of CNV calls and PCR-based validation of CNV breakpoints were carried out by different contributors to the 1000 Genomes Project. CNVs genotyped as existing in NA12878 and not within 1000 Genomes Project described regions of VDJ recombination were selected to comprise our NA12878 gold standard CNV call set [Additional file 5: Spreadsheet 1].
A gold standard set of CNVs was similarly generated for the genome of NA10851, the aCGH control genome used in this study. 876 CNVs are common to the genomes of both NA12878 and NA10851 and are therefore are not expected to be detectable by aCGH platforms in this study.
Additionally, we generated an unfiltered CNV call set for NA12878 using1000 Genome Project deep sequencing data (2×250 bp paired-end sequencing to 60× coverage on Illumina HiSeq available at ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/phase3/data/NA12878/high_coverage_alignment) and read-depth analysis by the CNVnator algorithm [14 (link)]. This set contained 4293 total CNV calls and 4047 autosomal CNV calls [Additional file 5: Spreadsheet 1]. Of these CNVnator calls, 609 total and 587 autosomal calls overlapped a gold standard call by 50% reciprocally. A further 137 autosomal CNVnator calls overlapped a gold standard call by less than 50% reciprocally.
Free full text: Click here