Estimated taxonomic compositions for the simulated metagenome and the mock community were calculated in three steps. First, quality trimmed and filtered reads from the mock community were screened against a FASTA-file with Illumina adapter sequences (
Assembly and gene prediction, on the simulated metagenome and mock community, were performed using the assembly (SOAPdenovo version 1.06) and gene_prediction (MetaGeneMark) options. Quality trimmed and filtered reads from the simulated metagenome, and adapter-screened reads from the mock community, were assembled into scaftigs 60 bp or longer. Predicted complete genes were aligned to their respective metagenomes using blastall v2.2.26 [26] (link) (program blastn, 95% sequence identity, alignment length > = 90%, and e-value 0.1) and only the best hit selected.
The 124 human gut microbiomes were processed with and without 5′ trimming. 5′ trimmed reads were assembled using SOAPdenovo 1.05, using both the Kmer determined by MOCAT and a fixed Kmer size set to 23. These assemblies were revised using SOAPdenovo 1.06 using the assembly_revision options, and genes were predicted, with MetaGeneMark as selected software, on scaftigs from both assemblies and revised assemblies. The non 5′ trimmed and 5′ trimmed reads were mapped to the assembled scaftigs using the screen option using length cutoff 30 and quality cutoff 15.
Complete commands for processing the simulated metagenome and mock community in MOCAT are bundled with the installation of the pipeline.