PhenoScanner consists of a Perl interface (with R command line tool) that connects to a MySQL database. To develop the initial database, we collated 137 genotype–phenotype association datasets, including results for anthropometric traits, blood pressure, lipids, cardiometabolic diseases, renal function measures, glycemic traits, inflammatory diseases, psychiatric diseases and smoking phenotypes (Supplementary Table). We also included the NHGRI-EBI GWAS catalog, NHLBI GRASP (Leslie et al., 2014 (link)) and dbGaP catalogues of associations. To ensure consistent formatting, we aligned alleles to the plus strand, added or updated chromosome positions to build 37 using dbSNP (release 138) (Sherry et al., 2001 (link)) and liftOver (https://genome.ucsc.edu/cgi-bin/hgLiftOver), and updated old rsIDs to dbSNP release 141 (Supplementary Data). Linkage disequilibrium (LD) measures between neighbouring variants in the autosomal chromosomes were calculated using the phased haplotypes from European samples in 1000 Genomes phase 3 (N = 503) (1000 Genomes Project Consortium et al., 2012 (link)). Variants with minor allele frequencies <0.5% were removed along with multiallelic variants and large indels ( 5 bases). For each remaining variant, we calculated D and r2 for variants within 500 kb in either direction, and kept LD statistics for pairs of variants with r20.6 . LD statistics based on the CEU population from Hapmap 2 release 24 (Frazer et al., 2007 (link)) are also available (Supplementary Data).
The user may enter either one variant into the text box on the website or upload up to 50 variants in a text file. The Perl interface annotates the variant alleles using dbSNP, identifies proxies of the specified variants (if requested) in the database according to a user-specified pairwise r2 threshold, and queries the catalogue of genotype–phenotype associations for the specified variants and their proxies. Association results are collated and presented with respect to the same effect and non-effect alleles for each variant. The associations with proxies are aligned according to the effect and non-effect alleles of the corresponding primary variant of interest for added ease of interpretation. The output is a file of associations, which is made available to download. There is also a P value filter option that only retains results with study-specific P values less than the selected threshold.
Free full text: Click here