Genotyping Repeat Expansions from Sequencing Data

Genotype probabilities for repeats of size up to the read length are calculated using a similar model as the one used for SNPs (Li et al. 2009 (link)). Namely, P(G|R) = P(R|G) · P(G)/P(R) where the genotype G is a tuple of repeat sizes with the number of entries equal to the ploidy of the chromosome containing the repeat. The probability P(R|G) is expressed in terms of the probabilities P(r_i|H_i) for individual reads r_i and repeat alleles H_i as described (Li et al. 2009 (link)).
If r_i is a spanning read containing m repeat units, P(r_i|H_i = n) = π · f(m| p, n, s), where π is defined as above (“Repeat size estimation from IRRs”). The frequency function f is defined by f(m|p, n, s) ∼ p(1 − p)^d, where m, n, s are non-negative integers bounded by the maximum number of repeat units in a read which we denote by u, p ∈ (0, 1) corresponds to the proportion of molecules with repeat of the expected size, and d = |n − m| if |n − m| < s and d = s otherwise. Note that f is defined similarly to the geometric frequency function with parameter d representing the deviation from n, the expected repeat size (which can be at most s). If r_i is a flanking or in-repeat read containing m repeat units,

P (r_{i} | H_{i} = n) = π \cdot \sum_{i = m}^{u} f (i | p, n, s)

. In all our analyses, the parameters p and s were set to 0.97 and 5. The values were chosen to maximize Mendelian consistency of genotype calls in Platinum Genome pedigree samples (Eberle et al. 2017 (link)) on an unrelated set of repeats.
We use read-length-sized repeats as a stand-in for repeats longer than the read length. If only one allele is expanded, we estimate the full size of the repeat as described above. If both alleles are expanded, the size intervals are estimated similarly by assuming that between 0 and 50% of in-repeat reads come from the short allele and between 50% and 100% of in-repeat reads come from the long allele.

Free full text: Click here

Dolzhenko E., van Vugt J.J., Shaw R.J., Bekritsky M.A., van Blitterswijk M., Narzisi G., Ajay S.S., Rajan V., Lajoie B.R., Johnson N.H., Kingsbury Z., Humphray S.J., Schellevis R.D., Brands W.J., Baker M., Rademakers R., Kooyman M., Tazelaar G.H., van Es M.A., McLaughlin R., Sproviero W., Shatunov A., Jones A., Al Khleifat A., Pittman A., Morgan S., Hardiman O., Al-Chalabi A., Shaw C., Smith B., Neo E.J., Morrison K., Shaw P.J., Reeves C., Winterkorn L., Wexler N.S., Housman D.E., Ng C.W., Li A.L., Taft R.J., van den Berg L.H., Bentley D.R., Veldink J.H, & Eberle M.A. (2017). Detection of long repeat expansions from PCR-free whole-genome sequence data. Genome Research, 27(11), 1895-1903.

Publication 2017

Allele Chromosome Genome Genotype Irrs Platinum Snps

Corresponding Organization :

Other organizations : Illumina (United States), University Medical Center Utrecht, Utrecht University, Illumina (United Kingdom), Mayo Clinic in Florida, New York Genome Center, SURFsara (Netherlands), Trinity College Dublin, Beaumont Hospital, King's College London, University College London, University of Southampton, University of Sheffield, Columbia University, Hereditary Disease Foundation, Massachusetts Institute of Technology

Top 5 similar protocols

Protocol cited in 9 other protocols

Variable analysis

independent variables

Proportion of molecules with repeat of the expected size (p)
Maximum number of repeat units in a read (u)

dependent variables

Genotype probabilities for repeats of size up to the read length
Probability P(r_i|H_i) for individual reads r_i and repeat alleles H_i

control variables

Deviation from the expected repeat size (s)
Parameters p and s were set to 0.97 and 5 to maximize Mendelian consistency of genotype calls in Platinum Genome pedigree samples

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!