We first consider a sample of trios - one offspring with information on both parents available and review the single variant setting. The general FBAT statistic is a covariance between the offspring genotype and trait. Let and denote the genotype for the variant and the trait, respectively, for the offspring. In the general case, can be both measured or dichotomous, and we can use an offset to appropriately center the trait [17] (link). For family samples with dichotomous traits such as affected trios or discordant sibpairs, is often taken to be zero; with measured outcomes, mean of the outcome is usually chosen for offset. For the additive model, is the number of copies of minor alleles for the locus of interest. We define
in (1) is computed using Mendel’s laws under the null hypothesis of no association and conditional on the trait as well as the parental genotypes (denoted as for the i-th family). Under the same conditional distribution, we can compute Var ; the large sample FBAT statistic is defined as where . Under the null hypothesis of no association Z is approximately N(0,1). The formula extends easily where multiple offspring are sampled in a family for testing the null hypothesis of no association and no linkage.
The FBAT Multi-Marker test is a multivariate extension of the univariate FBAT test designed to simultaneously test a set of markers in a defined region, such as a gene. It belongs to the general class of ‘gene-based tests’ since a set of M univariate tests in a gene are replaced by a single multivariate test. Let and denote the statistics in equation 1 and 2, defined for the marker. Assuming large samples to obtain sufficient heterozygote parents, each is approximately N(0,1), but the M markers may be correlated because of linkage disequilibrium in the region. Provided we have an estimate of the correlation matrix, we can obtain a M degree of freedom test of the null hypothesis of no association between any of the M variants and the disease, versus the alternative that at least one marker is in LD with a disease locus.
Rakovski et al [15] (link) estimate the correlation matrix empirically as follows: Let be the vector of FBAT statistics, which forms the basis of the multimarker test. Let , the empirical variance estimator, be the matrix with elements and be the diagonal matrix with elements equal to the Var( )’s where . The corresponding adjusted variance matrix is defined by

Note that is a variance-covariance matrix, with all elements estimated empirically. However the diagonal elements of can be calculated directly provided there is no linkage between any marker and the true disease locus. is an ‘adjusted’ variance covariance matrix which replaces the empirical variances with the exact ones. The multi-marker test is then defined as
In large samples, T will be approximately distributed with degrees of freedom equal to the rank of . The asymptotic normality relies on the asymptotic normality of each marker test , and may not be valid in the rare variant setting.
Several papers have noted that tests of multiple markers can be greatly improved upon by taking optimal linear combinations of the individual tests [8] (link), [16] (link), [18] (link), [19] , but a major issue is determining the optimal weights, since the optimal weights depend upon the unknown effect of each marker. Xu et al [16] (link) proposed a method to handle this problem by using that portion of the family data that is not used in constructing the FBAT statistics, e.g. the noninformative families [13] (link),[20] (link). The approach is designed for measured outcomes, or at least cases where both affected and unaffected offspring are sampled. The approach can be extended in principle to the setting where we have only affected trios [21] (link), but this is beyond the scope of this paper. An additional feature of the FBAT-LC approach is that estimation of the weights can be invalidated by population substructure.
Free full text: Click here