in (1) is computed using Mendel’s laws under the null hypothesis of no association and conditional on the trait as well as the parental genotypes (denoted as for the i-th family). Under the same conditional distribution, we can compute Var ; the large sample FBAT statistic is defined as where . Under the null hypothesis of no association Z is approximately N(0,1). The formula extends easily where multiple offspring are sampled in a family for testing the null hypothesis of no association and no linkage.
The FBAT Multi-Marker test is a multivariate extension of the univariate FBAT test designed to simultaneously test a set of markers in a defined region, such as a gene. It belongs to the general class of ‘gene-based tests’ since a set of M univariate tests in a gene are replaced by a single multivariate test. Let and denote the statistics in
Rakovski et al [15] (link) estimate the correlation matrix empirically as follows: Let be the vector of FBAT statistics, which forms the basis of the multimarker test. Let , the empirical variance estimator, be the matrix with elements and be the diagonal matrix with elements equal to the Var( )’s where . The corresponding adjusted variance matrix is defined by
Note that is a variance-covariance matrix, with all elements estimated empirically. However the diagonal elements of can be calculated directly provided there is no linkage between any marker and the true disease locus. is an ‘adjusted’ variance covariance matrix which replaces the empirical variances with the exact ones. The multi-marker test is then defined as
In large samples, T will be approximately distributed with degrees of freedom equal to the rank of . The asymptotic normality relies on the asymptotic normality of each marker test , and may not be valid in the rare variant setting.
Several papers have noted that tests of multiple markers can be greatly improved upon by taking optimal linear combinations of the individual tests [8] (link), [16] (link), [18] (link), [19] , but a major issue is determining the optimal weights, since the optimal weights depend upon the unknown effect of each marker. Xu et al [16] (link) proposed a method to handle this problem by using that portion of the family data that is not used in constructing the FBAT statistics, e.g. the noninformative families [13] (link),[20] (link). The approach is designed for measured outcomes, or at least cases where both affected and unaffected offspring are sampled. The approach can be extended in principle to the setting where we have only affected trios [21] (link), but this is beyond the scope of this paper. An additional feature of the FBAT-LC approach is that estimation of the weights can be invalidated by population substructure.