Variants were annotated using the ANNOVAR
18 (link) (version Aug 2013) software based on a GRCh/hg19 gene annotation database. Upstream/downstream status was assigned to variants that mapped ≤1kb from the transcript start/end. Variants without intergenic annotation were assigned a genic annotation status (42%).
Supplementary Table 8 shows the annotation status of 9.4M variants included in the CAD additive meta-analysis; 86% of the genic variants map to introns.
ENCODE features were downloaded from the Ensembl database using the Funcgen Perl API module release 75. The list of the ENCODE experiments stored in the Ensembl database can be browsed at
http://Feb2014.archive.ensembl.org/Homo_sapiens/Experiment/Sources?db=core;ex=project-ENCODE-. This summarized 100 different types of functional evidence in 11 different cell types, a total of 379 ENCODE experiments that revealed 6,099,034 features. Variants that overlaid one or more of these features were cross-tabulated with their ANNOVAR annotation status (
Supplementary Table 10); 50% of variants mapped to one or more ENCODE features and variants in ENCODE features were strongly enriched for genic annotation status. Variants were grouped into three functional sets, histone/chromatin modifications (HM), DNase I hypersensitive sites (DHS) and transcription factor binding sites (TFBS) (
Supplementary Table 9). Cell types were grouped into CAD relevant and others (
Supplementary Table 12) based on their potential roles in CAD pathophysiology; hepatocytes (e.g. lipid metabolism
80 ), vascular endothelial cells (atherosclerosis
81 ) and myoblasts (injury/repair
82 ) were selected as being most relevant to the CAD phenotype. Multi-way contingency tables reporting ENCODE feature and ANNOVAR annotation status with inclusion in the FDR < 5% variant list (FDR202 status) are summarized for 11 ENCODE cell types in
Supplementary Table 11 and for the three CAD relevant cell types in
Supplementary Table 13. Contingency table counts were modelled by a logistic multiple regression model predicting FDR202 status with independent explanatory variables HM, DHS, TFBS and genic/intergenic status. The ENCODE
83 project has previously mapped 4,492 GWAS significant SNPs from the NHGRI (June 2011) catalogue
74 to TF (12%) and DHS (34%) features in an extended dataset of 1,640 experiments. The 202 FDR variants were slightly less prevalent in these feature groups (10.4% TF and 19.8% DHS) which could reflect a CAD-specific issue or a more general consequence of our analysis being based on a subset of the ENCODE data retrieved from the Ensembl database.
Nikpay M., Goel A., Won H.H., Hall L.M., Willenborg C., Kanoni S., Saleheen D., Kyriakou T., Nelson C.P., Hopewell J.C., Webb T.R., Zeng L., Dehghan A., Alver M., Armasu S.M., Auro K., Bjonnes A., Chasman D.I., Chen S., Ford I., Franceschini N., Gieger C., Grace C., Gustafsson S., Huang J., Hwang S.J., Kim Y.K., Kleber M.E., Lau K.W., Lu X., Lu Y., Lyytikäinen L.P., Mihailov E., Morrison A.C., Pervjakova N., Qu L., Rose L.M., Salfati E., Saxena R., Scholz M., Smith A.V., Tikkanen E., Uitterlinden A., Yang X., Zhang W., Zhao W., de Andrade M., de Vries P.S., van Zuydam N.R., Anand S.S., Bertram L., Beutner F., Dedoussis G., Frossard P., Gauguier D., Goodall A.H., Gottesman O., Haber M., Han B.G., Huang J., Jalilzadeh S., Kessler T., König I.R., Lannfelt L., Lieb W., Lind L., Lindgren C.M., Lokki M.L., Magnusson P.K., Mallick N.H., Mehra N., Meitinger T., Memon F.U., Morris A.P., Nieminen M.S., Pedersen N.L., Peters A., Rallidis L.S., Rasheed A., Samuel M., Shah S.H., Sinisalo J., Stirrups K.E., Trompet S., Wang L., Zaman K.S., Ardissino D., Boerwinkle E., Borecki I.B., Bottinger E.P., Buring J.E., Chambers J.C., Collins R., Cupples L.A., Danesh J., Demuth I., Elosua R., Epstein S.E., Esko T., Feitosa M.F., Franco O.H., Franzosi M.G., Granger C.B., Gu D., Gudnason V., Hall A.S., Hamsten A., Harris T.B., Hazen S.L., Hengstenberg C., Hofman A., Ingelsson E., Iribarren C., Jukema J.W., Karhunen P.J., Kim B.J., Kooner J.S., Kullo I.J., Lehtimäki T., Loos R.J., Melander O., Metspalu A., März W., Palmer C.N., Perola M., Quertermous T., Rader D.J., Ridker P.M., Ripatti S., Roberts R., Salomaa V., Sanghera D.K., Schwartz S.M., Seedorf U., Stewart A.F., Stott D.J., Thiery J., Zalloua P.A., O’Donnell C.J., Reilly M.P., Assimes T.L., Thompson J.R., Erdmann J., Clarke R., Watkins H., Kathiresan S., McPherson R., Deloukas P., Schunkert H., Samani N.J, & Farrall M. (2015). A comprehensive 1000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nature genetics, 47(10), 1121-1130.