Suppose
m phenotypes are measured as indicators of one trait, e.g., individual symptoms within a disorder, items within a test, or multiple measures of one trait using different instruments (e.g., open-field test, a light-dark box, and an elevated plus maze to measure anxiety in mice). Rather than combining these
m phenotypes into one general phenotype, we test the association between all
m phenotypes and all
n genotyped genetic variants (GVs) using a statistically appropriate method (e.g., linear or logistic regression). Let p(1)…p(m) be the ascending p-values of the
m phenotypes for a given GV. TATES combines within each GV the
m phenotype-specific p-values to obtain one overall trait-based p-value P
T as follows: where
me denotes the effective number of independent p-values of all
m phenotypes for a given GV, and
mej the effective number of p-values among the top
j p-values, where
j runs from 1 to
m, and
pj denotes the
jth p-value in the list of ordered p-values. P
T is thus the smallest weighted p-value, associated with the null hypothesis that none of the phenotypes is associated with the GV, and the alternative hypothesis that at least one of the phenotypes is associated with the GV.
Following Li et al [15] (
link), we obtain an estimate of the effective number of p-values
mej through a correction based on eigenvalue decomposition of the
m×
m correlation matrix
ρ between the p-values associated with the
m phenotypes. The effective number of p-values
mej for the top
j p-values is calculated as: where
j is the number of top
j p-values, λ
i denotes the
ith eigenvalue, and
I( λ
i−1) is an indicator function taking on value 0 if λ
i≤1 and 1 if λ
i>1. That is, the effective number of p-values
mej is calculated as the observed number of p-values
j minus the sum of the difference between the eigenvalues λ
i and 1 for those eigenvalues λ
i>1. If the
j phenotypes are all uncorrelated, then all
j eigenvalues equal 1, and
mej =
j−0 =
j. In contrast, if the
j phenotypes are perfectly correlated, then the first eigenvalue equals
j, and the other eigenvalues equal 0, rendering
mej =
j−(
j−1) = 1 (i.e.,
j perfectly correlated phenotypes represent only 1 unique unit of information). In practice, phenotypes show intercorrelations of variable magnitude (but not 0 or 1), so the effective number of p-values
mej will usually be smaller than
j, but greater than 1. Note that
me is equal to
mej for the case that
j =
m, i.e., when the selection of top phenotypes covers all phenotypes.
van der Sluis S., Posthuma D, & Dolan C.V. (2013). TATES: Efficient Multivariate Genotype-Phenotype Analysis for Genome-Wide Association Studies. PLoS Genetics, 9(1), e1003235.