The three datasets were processed by the
read_trim_filter step in MOCAT with length cut off set to
30 and quality cut off set to
20, using
solexaqa for the mock community and the simulated metagenome, and
fastx for the 124 gut metagenomes.
Estimated taxonomic compositions for the simulated metagenome and the mock community were calculated in three steps. First, quality trimmed and filtered reads from the mock community were screened against a FASTA-file with Illumina adapter sequences (
Table S5), using the
screen_fastafile option and e-value set to
0.01. Second, screened reads from the mock community and quality trimmed and filtered reads from the simulated metagenome were mapped and filtered against the custom-made reference databases with chromosome and plasmid sequences from the 22 mock genomes (
Table S4) and 100 genomes from the simulated metagenome (
Table S2 in [13] (
link) and
Table S3), respectively. This was done by executing the
screen and
filter commands with length cutoff set to
30, percentage identity set to
90 and paired_end_filtering set to
yes for the simulated metagenome and set to
no for the mock community. Finally, the taxonomic composition was estimated using the
calculate_coverage command.
Assembly and gene prediction, on the simulated metagenome and mock community, were performed using the
assembly (SOAPdenovo version
1.06) and
gene_prediction (
MetaGeneMark) options. Quality trimmed and filtered reads from the simulated metagenome, and adapter-screened reads from the mock community, were assembled into scaftigs
60 bp or longer. Predicted complete genes were aligned to their respective metagenomes using blastall v2.2.26 [26] (
link) (program blastn, 95% sequence identity, alignment length > = 90%, and e-value 0.1) and only the best hit selected.
The 124 human gut microbiomes were processed with and without 5′ trimming. 5′ trimmed reads were assembled using SOAPdenovo
1.05, using both the Kmer determined by MOCAT and a fixed Kmer size set to 23. These assemblies were revised using SOAPdenovo
1.06 using the
assembly_revision options, and genes were predicted, with
MetaGeneMark as selected software, on scaftigs from both assemblies and revised assemblies. The non 5′ trimmed and 5′ trimmed reads were mapped to the assembled scaftigs using the
screen option using length cutoff
30 and quality cutoff
15.
Complete commands for processing the simulated metagenome and mock community in MOCAT are bundled with the installation of the pipeline.
Kultima J.R., Sunagawa S., Li J., Chen W., Chen H., Mende D.R., Arumugam M., Pan Q., Liu B., Qin J., Wang J, & Bork P. (2012). MOCAT: A Metagenomics Assembly and Gene Prediction Toolkit. PLoS ONE, 7(10), e47656.