Protocol detail

Find Similar Protocols

Estimating Genetic Heritability via LDAK Model

We first construct the n × m genotype matrix X, by centering and scaling the allele counts for each SNP according to X_i,j = (S_i,j−2f_j) × [2f_j (1−f_j)]^α/2, where f_j = Σ_iS_i,j/2_n. If w_j and r_j denote the LD weight9 (link) and information score for SNP j, then the LDAK Model for estimating SNP heritability

h_{SNP}^{2} = σ_{g}^{2} / (σ_{g}^{2} + σ_{e}^{2})

is:

\begin{array}{l} Y_{i} = \sum_{k = 1}^{p} θ_{k} Z_{i, k} + \sum_{j = 1}^{m} β_{j} X_{i, j} + e_{i}, with \\ β_{j} \sim ℕ (0, r_{j} w_{j} σ_{g}^{2} / W), e_{i} \sim ℕ (0, σ_{e}^{2}) \\ and W = \sum_{j = 1}^{m} r_{j} w_{j} {[2 f_{j} (1 - f_{j})]}^{1 + α} . \end{array}

θ_k denotes the fixed-effect coefficient for the kth covariate, β_j and e_i are random-effects indicating the effect size of SNP j and the noise component for Individual i, while

σ_{g}^{2}

and

σ_{e}^{2}

are interpreted as genetic and environmental variances, respectively. Note that the introduction of r_j is an addition to the model we proposed in 2012.9 (link)
Model (2) is equivalent to assuming:44 , 45 (link)

Y \sim ℕ (Z θ, K σ_{g}^{2} + I σ_{e}^{2}), with K = \frac{X Ω X^{T}}{W},

where I is an n × n identity matrix and Ω denotes a diagonal matrix with diagonal entries (r₁w₁, …, r_mw_m). The kinship matrix K, also referred to as a genetic relationship matrix (GRM)1 (link) or genomic similarity matrix (GSM),46 (link) consists of average allelic correlations across the SNPs (adjusted for LD and genotype certainty). Model (3) is typically solved using REstricted Maximum Likelihood (REML), which returns estimates of θ₁, …, θ_p,

σ_{g}^{2}

and

σ_{e}^{2} .

12
The heritability of SNP j can be estimated by

h_{j}^{2} = β_{j}^{2} Var (X_{j}) / Var (Y),

which under Model (2), and assuming Hardy-Weinberg Equilibrium,47 (link), 48 has expectation

𝔼 [h_{j}^{2}] = \frac{𝔼 [β_{j}^{2}] \times Var (X_{j})}{Var (Y)} = \frac{r_{j} w_{j} σ_{g}^{2} / W \times {[2 f_{i} (1 - f_{j})]}^{1 + α}}{Var (Y)} .

If P₁ and P₂ index two sets of SNPs of size |P₁| and |P₂|, then under the LDAK Model, they are expected to contribute heritability in the ratio W₁ : W₂, where W_l = Σ_{j∈P_l}r_jw_j [2f_j (1−f_j)]^1+α. The GCTA Model corresponds to setting w_j = r_j = 1, in which case W_l = Σ_{j∈P_l} [2f_j (1−f_j)]^1+α. Most applications of GCTA have further assumed α = −1, so that W_l = |P_l|, which corresponds to the assumption that SNP sets are expected to contribute heritability proportional to the number of SNPs they contain.
Model (2) assumes that all effect-sizes can be described by a single prior distribution. This assumption is relaxed by SNP partitioning. Suppose that the SNPs are divided into tranches P₁, …, P_L of sizes |P₁|, …, |P_L|; typically these will partition the genome, so that each SNP appears in exactly one tranche and Σ_l |P_l| = m, but this is not required. This correspond to generalizing Model (2), so that SNPs in Tranche l have effect-size prior distribution

β_{j} \sim ℕ (0, r_{j} w_{j} σ_{l}^{2} / W_{l}) .

Letting

Σ = σ_{1}^{2} + \dots + σ_{L}^{2},

then

h_{SNP}^{2} = Σ / (Σ + σ_{e}^{2}),

while

σ_{l}^{2} / Σ

represents the contribution to

h_{SNP}^{2}

of SNPs in Tranche l. This model can equivalently be expressed as

Y \sim ℕ (Z θ, K_{1} σ_{1}^{2} + \dots + K_{L} σ_{L}^{2} + I σ_{e}^{2}),

where K_l represents allele correlations across the SNPs in Tranche l.
For analyses under the LDAK Model, we used LDAK v.5; for analyses under the GCTA Model, we used GCTA v.1.26. For about a third of GCTA-LDMS analyses, the GCTA REML solver failed with the error “information matrix is not invertible,” in which case we rerun using LDAK (while the GCTA and LDAK solvers are both based on Average Information REML,28 (link), 49 (link) subtle differences mean that when using a large number of tranches, one might complete while the other fails). For the few occasions when both solvers failed, we instead used “GCTA-LD” (i.e., SNPs divided only by LD, rather than by LD and MAF), which we found gave very similar results to GCTA-LDMS for traits where both completed (Supplementary Fig. 7). For diseases, we converted estimates of

h_{SNP}^{2}

to the liability scale based on the observed case-control ratio and assumed prevalence.26 (link), 27 (link) In general, we copied the prevalences used by previous studies; however for tuberculosis, where no previous estimate of

h_{SNP}^{2}

is available, we derived an estimate of prevalence from World Health Organization data50 (see Supplementary Note).

Partial Protocol Preview
This section provides a glimpse into the protocol.
The remaining content is hidden due to licensing restrictions, but the full text is available at the following link: Access Free Full Text.

Speed D., Cai N., Johnson M.R., Nejentsev S, & Balding D.J. (2017). Re-evaluation of SNP heritability in complex human traits. Nature genetics, 49(7), 986-992.

Publication 2017

Allele Genome Genotype I component Relationship Snps Tuberculosis

Corresponding Organization :

Other organizations : University College London, Wellcome Sanger Institute, Imperial College London, University of Cambridge, University of Melbourne

Top 5 similar protocols

Protocol cited in 42 other protocols

Variable analysis

independent variables

SNP genotypes (X_i,j)

dependent variables

Phenotype (Y_i)

control variables

Covariates (Z_i)
Minor allele frequency (f_j)
LD weight (w_j)
Information score (r_j)

Annotations

This protocol is too long. Unable to provide accurate annotations

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!