Modeling de novo Mutation Patterns

We wanted to create an accurate model of de novo mutation for each gene. In order to do so, we extended a previous sequence context-based model of de novo mutation to derive gene-specific probabilities of mutation for each of the following mutation types: synonymous, missense, nonsense, essential splice site, and frameshift^{3 (link)}. In brief, the local sequence context was used to determine the probability of each base in the coding region mutating to each other possible base and then determine the coding impact of each possible mutation. These probabilities of mutation were summed across genes to create a per-gene probability of mutation for the aforementioned mutation types (see Supplementary Note for more details). Here, we applied the method to exons and immediately flanking essential splice sites, but note that the framework is applicable to non-genic sequences. While fitting the expected rates of mutation to observed data, we added a term for local primate divergence across 1 Mb (to capture additional unmeasured sources of regional mutational variability) and another for the average depth of sequence of each nucleotide (to capture inefficiency of variant discovery at lower sequencing depths); both terms significantly improved the fit of the model to observed data (details in Supplementary Note). We also investigated a regional replication timing term^{22 (link)}, but found no evidence for it significantly improving the model (Supplementary Note).
To evaluate the predictive value of the model of de novo coding mutations, we extracted synonymous variants that were seen 10 times or fewer in the 6,503 individuals in the NHLBI’s Exome Sequencing Project (ESP) and compared the number of these rare variants in each gene to 1) the length of the gene and 2) the probability of a synonymous mutation for that gene determined by our model. While gene length alone showed a high correlation (0.880), our full model showed a significantly greater correlation (0.940, p < 10⁻¹⁶). Of note, the stochastic variability of counts from NHLBI ESP is such that if the model were perfect, the correlation to any instance of these data would be 0.975, indicating that little additional gene-to-gene variability remains to be explained. The relative rates of different types of coding mutations was quite similar to previous work based on primate substitutions^{23 (link)}. With this calibrated model of relative mutability, we determined the absolute expected mutation rate per gene by applying a genome-wide mutation rate of 1.2×10⁻⁸ per base pair per generation (Supplementary Note)^{24 (link),25 (link)}.

Partial Protocol Preview
This section provides a glimpse into the protocol.
The remaining content is hidden due to licensing restrictions, but the full text is available at the following link: Access Free Full Text.

Samocha K.E., Robinson E.B., Sanders S.J., Stevens C., Sabo A., McGrath L.M., Kosmicki J.A., Rehnström K., Mallick S., Kirby A., Wall D.P., MacArthur D.G., Gabriel S.B., dePristo M., Purcell S.M., Palotie A., Boerwinkle E., Buxbaum J.D., Cook EH J.r., Gibbs R.A., Schellenberg G.D., Sutcliffe J.S., Devlin B., Roeder K., Neale B.M, & Daly M.J. (2014). A framework for the interpretation of de novo mutation in human disease. Nature genetics, 46(9), 944-950.

Publication 2014

A gene Base pair Exons Gene Mutations Nonsense Nucleotide sequence Primate Seen Synonymous mutation

Corresponding Organization :

Other organizations : Harvard University, Massachusetts General Hospital, Yale University, Broad Institute, Baylor College of Medicine, Baylor Genetics, Institute for Molecular Medicine Finland, University of Helsinki, Center for Systems Biology, Icahn School of Medicine at Mount Sinai, University of Illinois at Chicago, University of Pennsylvania, Vanderbilt University, University of Pittsburgh, Carnegie Mellon University

Top 5 similar protocols

Protocol cited in 75 other protocols

Variable analysis

independent variables

Local sequence context
1 Mb local primate divergence
Average depth of sequence of each nucleotide

dependent variables

Probability of each base in the coding region mutating to each other possible base
Coding impact of each possible mutation
Per-gene probability of mutation for synonymous, missense, nonsense, essential splice site, and frameshift mutations

control variables

Regional replication timing (not found to significantly improve the model)

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!