Illumina MiSeq Sequencing and Fitness Analysis

Output files from Illumina MiSeq were first run through FastQC (Andrews et al. 2018 (link)) to check read quality. The paired-end reads were merged using PEAR (Stamatakis et al. 2014 (link)) set to a minimum assembly length of 150 base pairs reads allowing for high quality scores at both ends of the sequence. Adapters were trimmed from the ends of the antibiotic resistance genes' coding sequence using Trimmomatic (Bolger et al. 2014 (link)). Enrich2 (Rubin et al. 2017 ) was used to count the frequency of each allele for use in calculating selection coefficients and associated statistical measures. We set Enrich2 to filter out any reads containing bases with a quality score below 20, bases marked as N, or mutations at more than one codon.
Fitness of an allele (w_i) was calculated from the enrichment of the synonyms of the wild-type gene (

ε_{w t}

), the enrichment of allele i (

ε_{i}

) and the fold increase in the number of cells during the growth competition experiment (r) as described by Equation 1. We utilize the frequency of wildtype synonymous alleles as the reference instead of the frequency of wildtype because wildtype synonyms occurred more frequently in the library and wildtype sequencing counts are more prone to being affected by the artifact of PCR template jumping during the preparation of barcoded amplicons for deep-sequencing. Detailed derivations of the following equations (Equations 1–6) can be found in our previous work (Mehlhoff et al. 2020 (link)).
We calculate the variance in the fitness as
where the frequency of allele (f_i) is calculated from counts of that allele (c_i) and the total sequencing counts (c_T).
From the variance in fitness, we calculated a 99% confidence interval. Additionally, we calculated a P-value using a 2-tailed test. Details of the Z-score and P-value equations are available in Mehlhoff et al. (2020) (link).
We estimated the number of false positives that would be included at P < 0.01 and P < 0.001 significance in order to correct for multiple testing (Storey and Tibshirani 2003 (link)) in our DMS datasets as described previously (Mehlhoff et al. 2020 (link)). For TEM-1, we estimated that our data would contain approximately 55.0 false positives on average at P < 0.01 significance and an estimated 5.6 false positives on average at P < 0.001 significance for a single replica (Mehlhoff et al. 2020 (link)). Those values are 44.1 and 4.3 (CAT-I), 52.8 and 5.3 (NDM-1), and 33.8 and 3.4 (aadB) at P < 0.01 and P < 0.001 significance, respectively. We chose to report the frequency of mutations having fitness effects that met the P-value criteria in both replica experiments to limit the occurrence of false positives.

Partial Protocol Preview
This section provides a glimpse into the protocol.
The remaining content is hidden due to licensing restrictions, but the full text is available at the following link: Access Free Full Text.

Mehlhoff J.D, & Ostermeier M. (2023). Genes Vary Greatly in Their Propensity for Collateral Fitness Effects of Mutations. Molecular Biology and Evolution, 40(3), msad038.

Publication 2023

Allele Antibiotic resistance At 001 Cat i Coding sequence Codon Gene Library Mutations Pear

Corresponding Organization : Johns Hopkins University

Top 5 similar protocols

Variable analysis

independent variables

Fitness of an allele (wi)
Enrichment of the synonyms of the wild-type gene (εwt)
Enrichment of allele i (εi)
Fold increase in the number of cells during the growth competition experiment (r)

dependent variables

Frequency of each allele
Variance in the fitness
99% confidence interval
P-value using a 2-tailed test
Number of false positives included at P < 0.01 and P < 0.001 significance

control variables

Minimum assembly length of 150 base pairs reads allowing for high quality scores at both ends of the sequence
Filtering out any reads containing bases with a quality score below 20, bases marked as N, or mutations at more than one codon

positive controls

Frequency of wildtype synonymous alleles used as the reference instead of the frequency of wildtype

negative controls

Not mentioned

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!