To identify samples with highly divergent (dysbiotic) metagenomic microbial compositions, as a complement to baseline disease diagnosis, we defined a dysbiosis score based on Bray-Curtis dissimilarities to non-IBD metagenomes. First, a “reference set” of samples was constructed from non-IBD subjects by taking all samples after the 20th week after the subject’s first stool sample. This was chosen since a subset of the non-IBD subjects at the start of their respective time series may not yet have overcome any gastrointestinal symptoms that triggered the initial visit to a doctor, though ultimately not caused by IBD. The dysbiosis score of a given sample was then defined as the median Bray-Curtis dissimilarity to this reference sample set, excluding samples that came from the same subject (
Fig. 2C).
To identify samples which were highly divergent from the reference set, we thresholded the dysbiosis score at the 90th percentile of this score for non-IBD samples. This therefore identifies samples with a feature configuration that has a <10% probability of occurring in a non-IBD subject. By this measure, 272 metagenomes were classified as dysbiotic. Samples from CD and UC subjects are overrepresented in the dysbiotic set, with 24.3% and 11.6% of their samples classified as dysbiotic, respectively. As expected, these samples also tended to locate in the extremes of the taxonomic ordination based on metagenomes (
Extended Data Figs. 3B,
3C). Dysbiosis was unevenly distributed among subjects (
Extended Data Fig. 3D), with some subjects remaining dysbiotic for all or most of their time series, while others remained non-dysbiotic for their entire time series.
To lend additional support to the dysbiosis definition (i.e. as outliers by one type of microbiome profile), we tested the concordance between dysbiosis classifications made using the same statistical definition, but applied to metabolomic rather than taxonomic profiles. That is, we defined a metabolomic dysbiosis score as the median Bray-Curtis dissimilarity of one metabolomic profile to the non-IBD metabolomic profiles (after the 20th week), and defined the dysbiosis threshold as the 90th percentile of this distribution among non-IBD metabolomic profiles. We then compared these dysbiosis classifications with those from the nearest metagenomic sample (up to two weeks, see the “
Cross-measurement type temporal matching” section) and found that dysbiotic samples identified by metagenomics were 4.6 times more likely to be dysbiotic by metabolomics (Fisher’s exact p = 5.9 × 10
−9), showing that dysbiosis measurements are highly consistent across measurement types.
To test the sensitivity of the dysbiosis classification to the choice of reference dataset, we also performed the dysbiosis classification using the HMP1-II stool samples
10 (link) as the reference sample set instead of the non-IBD samples. The resulting dysbiosis scores (
Extended Data Fig. 3E) were highly concordant (Spearman rho = 0.86; p < 2.2 × 10
−16), as were the dysbiosis classifications (odds ratio of 56; Fisher’s exact p < 2.2 × 10
−16). This shows that, despite the inclusion of subjects with other conditions in the non-IBD group here, as well as large differences in measurement technologies between the datasets, the dysbiosis classification is highly robust. Further, 43/426 (10.1%) of non-IBD samples were classified as dysbiotic using the HMP1-II samples as reference, falling remarkably close to the 10% expected by the definition and showing that the enrichment of IBD samples in the dysbiotic set is not simply a consequence of the definition.
Lloyd-Price J., Arze C., Ananthakrishnan A.N., Schirmer M., Avila-Pacheco J., Poon T.W., Andrews E., Ajami N.J., Bonham K.S., Brislawn C.J., Casero D., Courtney H., Gonzalez A., Graeber T.G., Hall A.B., Lake K., Landers C.J., Mallick H., Plichta D.R., Prasad M., Rahnavard G., Sauk J., Shungin D., Vázquez-Baeza Y., White RA I.I.I., Braun J., Denson L.A., Jansson J.K., Knight R., Kugathasan S., McGovern D.P., Petrosino J.F., Stappenbeck T.S., Winter H.S., Clish C.B., Franzosa E.A., Vlamakis H., Xavier R.J, & Huttenhower C. (2019). Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature, 569(7758), 655-662.