Mixed linear model association (MLMA) methods are effective in preventing false-positive associations due to sample structure in studies of humans and model organisms1 (link)–6 (link). In particular, simulations show that the correction for confounding is nearly perfect for common variants even when geographic population structure, which is a fixed effect, is modeled as a random effect based on overall covariance6 (link),17 (link)–19 (link) (however, rare variants pose a greater challenge for all methods, due to differential confounding of rare and common variants20 (link)). MLMA methods also provide an increase in power, by applying a correction that is specific to sample structure1 (link)–6 (link). In the case of geographic population structure, markers with large allele frequency differences between populations will receive a larger correction. In the case of relatedness structure, the contribution of related individuals to test statistics will be reduced, preventing overweighting of redundant information due to correlation structure.
An underappreciated point is that MLMA can also increase power in studies without sample structure, by implicitly conditioning on associated loci other than the candidate locus that are not genome-wide significant in the data being analyzed8 (link). For example, a GRM computed from all markers can be used to approximate the set of causal markers (implicitly assuming that all markers are causal), but this approximation can be generalized. The increase in power scales with the ratio N/M of the number of samples (N) to the effective number of independent markers (M), since the information about unknown associated loci depends on the number of samples. In simulations of a quantitative trait with no sample structure and no LD between markers (Online Methods), application of MLMA instead of linear regression increased average −log10P-values at causal markers from 2.89 to 2.94 (1.8% increase) when N=10,000 and M=100,000, and from 2.92 to 3.46 (18% increase) when N=10,000 and M=10,000. We note that this improvement is contingent on the exclusion of the candidate marker from the GRM (see below).