Reassembled read data was generated for each of the 9.8 million variants in ExAC v0.3.1 by running GATK HaplotypeCaller 3.1 (full version: v3.1-1-ga70dc6e) with the -bamout flag on each sample containing the particular variant (up to a limit of five homozygous and five heterozygous samples). Only samples with a read depth (DP) ≥10 and genotype quality (GQ) ≥20 were included. When a variant was present in more than five such samples, the five samples with the highest GQ were selected. Overall, HaplotypeCaller was run 22.3 million times to produce over 5 Tb of small BAM files—with each BAM file storing reassembled reads for a several-hundred base pair window around the variant. Batches of several thousand of these small BAM files were then combined into larger BAM files to improve compression ratios, while using read groups to keep track of the original source of each read. The final dataset comprised ∼23 000 BAMs and spanned 540 Gb. These BAM files were made directly available over the web and visualized in the ExAC browser using IGV.js.
Besides the -bamout flag, these additional flags were passed to HaplotypeCaller to ensure that gVCF genotype calls matched the original ExAC gVCF genotypes, which are reproduced here to facilitate reproducibility:
-ERC GVCF - -paddingAroundSNPs 300 - -paddingAroundIndels 300 - -max_alternate_alleles 3
-A DepthPerSampleHC
-A StrandBiasBySample - -maxNumHaplotypesInPopulation 200
-stand_call_conf 30.0
-stand_emit_conf 30.0 - -disable_auto_index_creation_and_locking_when_reading_rods - -minPruning 3 - -variant_index_type LINEAR - -variant_index_parameter 128000
CNVs were generated using XHMM (15 (link)) and based on GENCODE v19 coding regions: all details of CNV calling and quality control have been published previously (16 ). Gene summary CNV counts and related constraint scores are presented based on likelihoods of the CNV occurring within the genomic range of the gene, as described (16 ). Exon CNV counts and CNVs presented in the UCSC browser are based on all confidently called CNVs (XHMM SQ > 60) across the genome. All overlapping CNVs, regardless of amount of overlap, are included in Exon CNV counts.