We analyzed two data sets: the Hybrid Mouse Diversity Panel (HMDP)31 (link) and the Northern Finland Birth Cohort 1966 (NFBC1966) Study32 (link).
The HMDP data includes 100 inbred strains with four phenotypes (high-density lipoprotein, HDL; total cholesterol, TC; triglycerides, TG; unesterified cholesterol, UC) and four million high quality fully imputed SNPs (SNPs are downloaded from http://mouse.cs.ucla.edu/mousehapmap/full.html). We excluded mice with missing phenotypes for any of these four phenotypes. We excluded non-polymorphic SNPs, and SNPs with a minor allele frequency less than 5%. For SNPs that have identical genotypes, we tried to retain only one of them (by using “--indep-pairwise 100 5 0.999999” option in PLINK33 (link)). This left us with 98 strains, 656 individuals and 108,562 SNPs. We quantile transformed each phenotype to a standard normal distribution to guard against model mis-specification. We used the product of centered genotype matrix as an estimate of relatedness16 (link),17 (link),34 ,35 (link). Note that the sample size used here is smaller than the original study31 (link), and the phenotypes are quantile-transformed instead of log transformed for robustness.
The NFBC1966 data contains 5402 individuals with multiple metabolic traits measured and 364,590 SNPs typed. We selected four phenotypes (high-density lipoprotein, HDL; low-density lipoprotein, LDL; triglycerides, TG; C-reactive protein, CRP) among them, following previous studies3 (link). We selected individuals and SNPs following previous studies11 (link),32 (link) with the software PLINK33 (link). Specifically, we excluded individuals with missing phenotypes for any of these four phenotypes or having discrepancies between reported sex and sex determined from the X chromosome. We excluded SNPs with a minor allele frequency less than 1%, having missing values in more than 1% of the individuals, or with a Hardy-Weinberg equilibrium p value below 0.0001. This left us with 5,255 individuals and 319,111 SNPs. For each phenotype, we quantile transformed the phenotypic values to a standard normal distribution, regressed out sex, oral contraceptives and pregnancy status effects32 (link), and quantile transformed the residuals to a standard normal distribution again. We replaced the missing genotypes for a given SNP with its mean genotype value. We used the product of centered and scaled genotype matrix as an estimate of relatedness11 (link),17 (link),34 ,35 (link).
In both data sets, we quantile transformed each single phenotype to a standard normal distribution to guard against model misspecification. Although this strategy does not guarantee that the transformed phenotypes follow a multivariate normal distribution jointly, it often works well in practice when the number of phenotypes is small (see, e.g. 22 ). For both data sets, we used a standard mvLMM with an intercept term (without any other covariates), and test each SNP in turn. Because the software MTMM relies on the commercial software ASREML to estimate the variance components in the null model, we modified the MTMM source code so that it can read in the estimated variance components from GEMMA.