Multivariate GWAS Analysis in Diverse Mouse and Human Cohorts

We analyzed two data sets: the Hybrid Mouse Diversity Panel (HMDP)^{31 (link)} and the Northern Finland Birth Cohort 1966 (NFBC1966) Study^{32 (link)}.
The HMDP data includes 100 inbred strains with four phenotypes (high-density lipoprotein, HDL; total cholesterol, TC; triglycerides, TG; unesterified cholesterol, UC) and four million high quality fully imputed SNPs (SNPs are downloaded from http://mouse.cs.ucla.edu/mousehapmap/full.html). We excluded mice with missing phenotypes for any of these four phenotypes. We excluded non-polymorphic SNPs, and SNPs with a minor allele frequency less than 5%. For SNPs that have identical genotypes, we tried to retain only one of them (by using “--indep-pairwise 100 5 0.999999” option in PLINK^{33 (link)}). This left us with 98 strains, 656 individuals and 108,562 SNPs. We quantile transformed each phenotype to a standard normal distribution to guard against model mis-specification. We used the product of centered genotype matrix as an estimate of relatedness^{16 (link),17 (link),34 ,35 (link)}. Note that the sample size used here is smaller than the original study^{31 (link)}, and the phenotypes are quantile-transformed instead of log transformed for robustness.
The NFBC1966 data contains 5402 individuals with multiple metabolic traits measured and 364,590 SNPs typed. We selected four phenotypes (high-density lipoprotein, HDL; low-density lipoprotein, LDL; triglycerides, TG; C-reactive protein, CRP) among them, following previous studies^{3 (link)}. We selected individuals and SNPs following previous studies^{11 (link),32 (link)} with the software PLINK^{33 (link)}. Specifically, we excluded individuals with missing phenotypes for any of these four phenotypes or having discrepancies between reported sex and sex determined from the X chromosome. We excluded SNPs with a minor allele frequency less than 1%, having missing values in more than 1% of the individuals, or with a Hardy-Weinberg equilibrium p value below 0.0001. This left us with 5,255 individuals and 319,111 SNPs. For each phenotype, we quantile transformed the phenotypic values to a standard normal distribution, regressed out sex, oral contraceptives and pregnancy status effects^{32 (link)}, and quantile transformed the residuals to a standard normal distribution again. We replaced the missing genotypes for a given SNP with its mean genotype value. We used the product of centered and scaled genotype matrix as an estimate of relatedness^{11 (link),17 (link),34 ,35 (link)}.
In both data sets, we quantile transformed each single phenotype to a standard normal distribution to guard against model misspecification. Although this strategy does not guarantee that the transformed phenotypes follow a multivariate normal distribution jointly, it often works well in practice when the number of phenotypes is small (see, e.g. ²²). For both data sets, we used a standard mvLMM with an intercept term (without any other covariates), and test each SNP in turn. Because the software MTMM relies on the commercial software ASREML to estimate the variance components in the null model, we modified the MTMM source code so that it can read in the estimated variance components from GEMMA.

Partial Protocol Preview
This section provides a glimpse into the protocol.
The remaining content is hidden due to licensing restrictions, but the full text is available at the following link: Access Free Full Text.

Zhou X, & Stephens M. (2014). Efficient Algorithms for Multivariate Linear Mixed Models in Genome-wide Association Studies. Nature methods, 11(4), 407-409.

Publication 2014

Birth cohort C reactive protein Cholesterol Genotype High density lipoprotein Hybrid Low density lipoprotein Mice Oral contraceptives Phenotypes Polymorphic Pregnancy Strains Triglycerides X chromosome

Corresponding Organization :

Other organizations : University of Chicago

Top 5 similar protocols

Protocol cited in 12 other protocols

Variable analysis

dependent variables

High-density lipoprotein (HDL)
Total cholesterol (TC)
Triglycerides (TG)
Unesterified cholesterol (UC)
Low-density lipoprotein (LDL)
C-reactive protein (CRP)

control variables

Oral contraceptives
Pregnancy status

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!