Protocol detail

Find Similar Protocols

BUSTED[S]: Detecting Positive Selection in the Presence of SRV

We adapted the existing BUSTED test of positive selection (Murrell et al. 2015 (link)) to account for the presence of SRV and call the new method BUSTED[S]. To explore the generality of our findings about FPRs in the presence of SRV we also investigated a second existing test of selection, the M1a versus M2a comparison from Wong et al. (2004) (link), modified slightly to employ MG94 substitution models.
BUSTED[S] is a straightforward extension of BUSTED (Murrell et al. 2015 (link)). The nucleotide substitution process is modeled using the standard finite state continuous time Markov process approach of Muse and Gaut (1994) (link), with entries of the instantaneous rate matrix Q corresponding to substitutions between sense codons i and j denoted as

q_{i j} = {\begin{matrix} α^{s} θ_{i j} π_{j}^{p} & 1 - step synonymous change, \\ α^{s} ω^{b s} θ_{i j} π_{j}^{p} & 1 - step nonsynonymous change, \\ 0 & otherwise . \end{matrix}

The θ_ij (with

θ_{i j} = θ_{j i}

) are parameters governing nucleotide substitution biases. For example,

θ_{ACT, AGT} = θ_{CG}

and because we incorporate the standard nucleotide GTR model there are five identifiable θ_ij parameters:

θ_{AC}, θ_{AT}, θ_{CG}, θ_{CT}

, and

θ_{GT}

, with

θ_{AG} \equiv 1

. The position-specific equilibrium frequency of the target nucleotide of a substitution is

π_{j}^{p}

; for example, it is

π_{G}^{2}

for the second-position change associated with

q_{ACT, AGT}

. The

π_{j}^{p}

and the stationary frequencies of codons under this model are estimated using the CF3 × 4 procedure (Kosakovsky Pond et al. 2010 (link)), adding nine parameters to the model. The ratio of nonsynonymous to synonymous substitution rates for site s along branch b is ω^bs, and this ratio is modeled using a 3-bin general discrete distribution (GDD) with five estimated hyperparameters:

0 \leq ω_{1} \leq ω_{2} \leq 1 \leq ω_{3}, p_{1} = P (ω^{b s} = ω_{1})

, and

p_{2} = P (ω^{b s} = ω_{2})

. The procedure for efficient computation of the phylogenetic likelihood function for these models was described in Kosakovsky Pond et al. (2011) (link). The quantity α^s is a site-specific synonymous substitution rate (no branch-to-branch variation is modeled) drawn from a separate 3-bin GDD. The mean of this distribution is constrained equal to one to maintain statistical identifiability, resulting in four estimated hyperparameters:

0 \leq c α_{1} < α_{2} = c \leq c α_{3}, f_{1} = P (α^{s} = α_{1})

, and

f_{2} = P (α^{s} = α_{2})

, with c chosen to ensure that

E {α^{s}} = 1

. Typical implementations, including ours, allow the number of α and ω rate categories to be separately adjusted by the user, for example, to minimize AIC_c or to optimize some other measure of model fit. The default setting of three categories generally provides a good balance between fit and performance when using this GDD approach for modeling. Our HyPhy implementation of BUSTED[S] will warn the user if there is evidence of model overfitting, such as the appearance of rate categories with very similar estimated rate values or very low frequencies.
The BUSTED[S] procedure for identifying positive selection is the likelihood ratio test comparing the full model described above to the constrained model formed when ω₃ is set equal to 1 (i.e., no positively selected sites). Critical values of the test are derived from a

50 : 50

mixture distribution of

χ_{0}^{2}

and

χ_{2}^{2}

. Note that this asymptotic statistic differs from the 3-component mixture used by Murrell et al. (2015) (link); the simulation studies performed in the current study suggest that this less conservative mixture is sufficient to maintain nominal Type I errors. Both BUSTED[S] and BUSTED analyses in the current work use the same

50 : 50

mixture test statistic. BUSTED[S] reduces to BUSTED by setting

α^{s} = 1

, that is, by placing all the mass of the synonymous rate heterogeneity distribution at α = 1. The method is implemented as a part of HyPhy (version 2.5.1 or later). BUSTED[S] is available for free public use on the Datamonkey webserver (Weaver et al. 2018 (link)) at https://www.datamonkey.org/BUSTED (last accessed February 24, 2020).

Free full text: Click here

Wisotsky S.R., Kosakovsky Pond S.L., Shank S.D, & Muse S.V. (2020). Synonymous Site-to-Site Substitution Rate Variation Dramatically Inflates False Positive Rates of Selection Analyses: Ignore at Your Own Peril. Molecular Biology and Evolution, 37(8), 2430-2439.

Publication 2020

Codons Fprs Heterogeneity Muse Nucleotide Sense codons

Corresponding Organization : North Carolina State University

Other organizations : Temple University

Top 5 similar protocols

Protocol cited in 13 other protocols

Variable analysis

independent variables

Presence of SRV

dependent variables

False positive rates (FPRs)

control variables

Finite state continuous time Markov process approach of Muse and Gaut (1994) for modeling the nucleotide substitution process
MG94 substitution models
3-bin general discrete distribution (GDD) for modeling the ratio of nonsynonymous to synonymous substitution rates (ω)
3-bin GDD for modeling the site-specific synonymous substitution rate (α)
CF3 × 4 procedure for estimating the stationary frequencies of codons

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!