Amplicon Error Correction Pipeline

Output sequences were first compared to the known 16S rRNA gene reference sequences of the members of each mock community. If an output sequence matched a reference sequences, it was classified as Reference, and if it had one mismatch or gap to a reference sequence it was classified as One Off. Output sequences that were at least Hamming distance 2 from any reference sequence were then BLASTed against the nr/nt database. If the best hit was an exact match covering the full output sequence, it was classified Exact. If there was a single mismatch or indel, it was classified One Off. Output sequences that remained unclassified to this point were classified Other.
We included the BLAST against nr/nt step because even amplicon sequencing data from communities with a putatively known reference composition will contain contaminant sequences. Contaminants are real, albeit unwanted, biological variation, and should be identified when correcting amplicon errors. While the nr/nt database is imperfect, it is reasonable to expect that Exact matches are far more likely to be real variants than are Others. Output sequences classified as Other, and output sequences classified as One Off that differed by one substitution from a more abundant output sequence, were considered a proxy for false positives. Output sequences classified as Reference or Exact were considered true positives.

Partial Protocol Preview
This section provides a glimpse into the protocol.
The remaining content is hidden due to licensing restrictions, but the full text is available at the following link: Access Free Full Text.

Callahan B.J., McMurdie P.J., Rosen M.J., Han A.W., Johnson A.J, & Holmes S.P. (2016). DADA2: High resolution sample inference from Illumina amplicon data. Nature methods, 13(7), 581-583.

Publication 2016

Biological Indel Rrna gene

Corresponding Organization :

Other organizations : Stanford University, Second Genome (United States)

Top 5 similar protocols

Protocol cited in 2 109 other protocols

Variable analysis

independent variables

None explicitly mentioned

dependent variables

Output sequences classified as Reference, One Off, Exact, and Other

control variables

None explicitly mentioned

controls

Positive control: Output sequences that matched the known 16S rRNA gene reference sequences of the members of each mock community were classified as Reference.
Negative control: Output sequences that were at least Hamming distance 2 from any reference sequence were BLASTed against the nr/nt database. Output sequences classified as Other, and output sequences classified as One Off that differed by one substitution from a more abundant output sequence, were considered a proxy for false positives.

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!