Variants were annotated using the ANNOVAR18 (link) (version Aug 2013) software based on a GRCh/hg19 gene annotation database. Upstream/downstream status was assigned to variants that mapped ≤1kb from the transcript start/end. Variants without intergenic annotation were assigned a genic annotation status (42%). Supplementary Table 8 shows the annotation status of 9.4M variants included in the CAD additive meta-analysis; 86% of the genic variants map to introns.
ENCODE features were downloaded from the Ensembl database using the Funcgen Perl API module release 75. The list of the ENCODE experiments stored in the Ensembl database can be browsed at http://Feb2014.archive.ensembl.org/Homo_sapiens/Experiment/Sources?db=core;ex=project-ENCODE-. This summarized 100 different types of functional evidence in 11 different cell types, a total of 379 ENCODE experiments that revealed 6,099,034 features. Variants that overlaid one or more of these features were cross-tabulated with their ANNOVAR annotation status (Supplementary Table 10); 50% of variants mapped to one or more ENCODE features and variants in ENCODE features were strongly enriched for genic annotation status. Variants were grouped into three functional sets, histone/chromatin modifications (HM), DNase I hypersensitive sites (DHS) and transcription factor binding sites (TFBS) (Supplementary Table 9). Cell types were grouped into CAD relevant and others (Supplementary Table 12) based on their potential roles in CAD pathophysiology; hepatocytes (e.g. lipid metabolism80 ), vascular endothelial cells (atherosclerosis81 ) and myoblasts (injury/repair82 ) were selected as being most relevant to the CAD phenotype. Multi-way contingency tables reporting ENCODE feature and ANNOVAR annotation status with inclusion in the FDR < 5% variant list (FDR202 status) are summarized for 11 ENCODE cell types in Supplementary Table 11 and for the three CAD relevant cell types in Supplementary Table 13. Contingency table counts were modelled by a logistic multiple regression model predicting FDR202 status with independent explanatory variables HM, DHS, TFBS and genic/intergenic status. The ENCODE83 project has previously mapped 4,492 GWAS significant SNPs from the NHGRI (June 2011) catalogue74 to TF (12%) and DHS (34%) features in an extended dataset of 1,640 experiments. The 202 FDR variants were slightly less prevalent in these feature groups (10.4% TF and 19.8% DHS) which could reflect a CAD-specific issue or a more general consequence of our analysis being based on a subset of the ENCODE data retrieved from the Ensembl database.