Simulated short-read metagenomic datasets were obtained using NeSSM software [55 (link)] and the genomic sequences of five bacterial species, Sulfolobus islandicus, Proteus mirabilis, Nitrosospira multiformis, Bacteroides fragilis, and Acidobacterium capsulatum [26 (link)]. For the first simulation, we prepared three relative abundance vectors of these species (0.297, 0.507, 0.116, 0.058, 0.022), (0.345, 0.244, 0.281, 0.088, 0.042), and (0.526, 0.320, 0.042, 0.066, 0.046) according to the simulation method of Jiang et al. [19 (link)]. From these abundance vectors, we generated 30 vectors in which each of the five values was increase or decrease by 5%, and used them to generated 30 metagenomic samples by mixing the randomly sampled short reads from the five bacteria using NeSSM [55 (link)].
In the second simulation, we generated 30 vectors from the original species abundance vectors used for the first simulation. These 30 vectors were generated by adding to each component the absolute value of one-fifth Gaussian noise, with mean zero and standard deviation equal to the value of that component. Each species abundance vector was randomized and renormalized 10 times, and the 30 vectors, which belonged to three groups with 10 vectors in each group, were obtained (Fig 2A). These vectors were used to generate 30 metagenomic samples as was done in the first simulation.
Free full text: Click here