Both the forward and reverse ends of the same read were truncated at the first base where the Q value was no more than 2. If the pair of reads had a minimum overlap of 50 bp, they were then merged into a complete read. These reads were not kept unless longer than 399 bp with an expected error of no more than 0.561 (link). Two batches of sequencing data from healthy and DSS-induced colitis mice were pooled before OTU picking. Quality-filtered reads were dereplicated into unique sequences and then sorted by decreasing abundance, and singletons were discarded. Representative non-chimeric OTU sequences were next picked by Uparse’s default62 (link). Further reference-based chimera detection was performed using UCHIME63 (link) against the RDP classifier training database (v9)64 (link). The OTU table was finalized by mapping quality-filtered reads to the remaining OTUs with the Usearch61 (link) global alignment algorithm at a 97% cutoff.
The number of high-quality reads of 2 sample were less than 9000, which was removed from further analysis. Then, the sequences of all the samples were downsized to 9000 (1000 permutations) to equal the difference in sequencing depth. All subsequent analysis was performed based on the QIIME platform (version 1.8)65 (link). The alpha diversity of each sample was calculated with observed OTUs and the Shannon index. Representative sequences for each OTU were built into a phylogenetic tree by FastTree and subjected to the RDP classifier to determine the phylogeny with a bootstrap cutoff of 80% (RDP database version 2.10). The preliminary results of sequencing on 16S rRNA gene V3–V4 region were presented in the Supplementary Results.
Random forest models66 were introduced to identify specific bacterial phylotypes that contributed to the segregation of gut microbiota induced by DSS and/or BPB5. Group pairs with a significant difference (P < 0.05, PERMANOVA based on Bray-Curtis distance) were included for random forest discrimination. Models with class error = 0 were considered successful. The importance of an OTU was determined based on the mean decrease in accuracy of discrimination, and OTUs with a value greater than 0.003 were considered key OTUs.
The correlation among 83 key OTUs was calculated by the SparCC algorithm67 (link) with a bootstrap procedure repeated 100 times and then visualized into a network diagram. The Ward clustering algorithm and PERMANOVA (9999 permutations, P < 0.005) based on SparCC correlation coefficients were used to cluster the 83 key OTUs into 11 co-abundance groups (CAGs) using the R program.
Free full text: Click here