Additionally, we have implemented masking of low-complexity sequences from reference sequences in Kraken 2, by using the “dustmasker” [31 (link)] (for nucleotide sequences) and “segmasker” [32 (link)] (for protein sequences) tools from NCBI. Using the tools’ default settings, nucleotide and protein sequences are checked for low-complexity regions, and those regions identified are masked and not processed further by the Kraken 2 database building process. In this manner, we seek to reduce false positives resulting from these low-complexity sequences, similar to the build process for Centrifuge [1 (link)].
Kraken 2 Database Enhancements
Additionally, we have implemented masking of low-complexity sequences from reference sequences in Kraken 2, by using the “dustmasker” [31 (link)] (for nucleotide sequences) and “segmasker” [32 (link)] (for protein sequences) tools from NCBI. Using the tools’ default settings, nucleotide and protein sequences are checked for low-complexity regions, and those regions identified are masked and not processed further by the Kraken 2 database building process. In this manner, we seek to reduce false positives resulting from these low-complexity sequences, similar to the build process for Centrifuge [1 (link)].
Corresponding Organization :
Other organizations : Johns Hopkins University
Protocol cited in 455 other protocols
Variable analysis
- Kraken 2's standard reference library inclusion of additional genomic data (GRCh38 assembly of the human genome and the 'UniVec_Core' subset of the UniVec database)
- Masking of low-complexity sequences from reference sequences in Kraken 2 using the 'dustmasker' and 'segmasker' tools from NCBI
- Classification accuracy of human microbiome reads
- Classification accuracy of reads containing vector sequences
- Kraken 1's default database, which included data from archeal, bacterial, and viral genomes but not the additional genomic data included in Kraken 2's default database
- The default settings of the 'dustmasker' and 'segmasker' tools used for masking low-complexity sequences
- Positive control: Kraken 1's default database
- Negative control: Not explicitly mentioned
Annotations
Based on most similar protocols
As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.
About PubCompare
Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.
We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.
However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.
Ready to get started?
Sign up for free.
Registration takes 20 seconds.
Available from any computer
No download required
Revolutionizing how scientists
search and build protocols!