Efficient Genotype Calling with optiCall

optiCall uses deviation from Hardy–Weinberg equilibrium (HWE) as an indicator of clustering quality. A χ² test is used to test HWE unless sample size is small (<50 expected counts of any genotype, assuming HWE or allele counts of <100 for either allele), in which case an exact test is used (Wigginton et al., 2005 (link)). SNPs with a HWE P-value less than a given threshold (P<5×10⁻¹⁵ by default) are deemed to be poorly called. optiCall attempts to improve the genotype calls at these SNPs by again running a Student's t-based mixture model, but this time omitting the SNP and sample-wise prior. This rescue step is primarily implemented to give better genotype calls at SNPs where the genotype intensity clouds lie outside of the expected regions defined by the within and across sample prior. The statistical model is as described in (1) and (2), with the intensity values first transformed according to (8), to improve calling of SNPs with shifted intensities (Teo et al., 2007 (link)).

Inference is as in (2.2), by the EM algorithm. The ν_i are fixed at 1 for all classes except the heterozygous class, which is fixed at 1.3. The values of μ_i, Σ_i for the unknown class are fixed with identical values to (2.2).
All four classes have initial class probabilities set to 0.25, and for the three genotype classes initial covariance matrices are set to (2c/N)×I₂ with c the cost (Arthur and Vassilvitskii, 2007 ) of a k means ++ clustering on the data, and N the number of intensity points. The transformation of intensities has accounted for shifts, and so location parameters of the two homozygous classes can be initialized to the extremes of y₍₁₎, and the heterozygous class will then fall somewhere in between, thus the μ_i are initialized to

where the min/max are taken over a filtered version of the intensity data, with the lowest 1% of untransformed intensity values in the x₍₁₎ direction and lowest one percent in the x₍₂₎ direction removed. is the mean of the y_j over the second axis, and k is a shift parameter for the location of the heterozygous class, that takes one of three values, 0.45, 0.5 or 0.55, resulting in three sets of initial values dependent on the value of k. For each set of starting values, the EM algorithm is run until genotype calls are concordant for two consecutive iterations, and the optimal parameters are chosen to be the final values with the highest likelihood.
Genotype calls are made using genotype posterior probabilities [using the π_i inferred from this step unlike (2.3)] with a 0.7 call threshold. By default, SNPs that fail the HWE test subsequent to this step have all genotypes called unknown.
In our experiments, we have found the occurrence of the rescue step, and the subsequent chances of a successful rescue, to vary with the quality of the dataset. On a number of Immunochip datasets, rescue steps tended to occur on between 3 and 10% of SNPs, with 30–50% being successful.

Partial Protocol Preview
This section provides a glimpse into the protocol.
The remaining content is hidden due to licensing restrictions, but the full text is available at the following link: Access Free Full Text.

Shah T.S., Liu J.Z., Floyd J.A., Morris J.A., Wirth N., Barrett J.C, & Anderson C.A. (2012). optiCall: a robust genotype-calling algorithm for rare, low-frequency and common variants. Bioinformatics, 28(12), 1598-1603.

Publication 2012

Allele Axis Genotypes Heterozygous Homozygous Snps Step test Taken

Corresponding Organization :

Other organizations : Wellcome Sanger Institute

Top 5 similar protocols

Protocol cited in 45 other protocols

Variable analysis

independent variables

Hardy–Weinberg equilibrium (HWE) P-value threshold
Student's t-based mixture model with omission of SNP and sample-wise prior
Transformation of intensities to account for shifts
Initialization of class probabilities, location parameters, and covariance matrices
Choice of shift parameter k (0.45, 0.5, or 0.55) for heterozygous class location

dependent variables

Genotype calls based on genotype posterior probabilities and a 0.7 call threshold

control variables

Fixed values of νi for all classes except the heterozygous class (fixed at 1.3)
Fixed values of μi and Σi for the unknown class, identical to (2.2)

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!