To identify samples that were highly divergent from the reference set, we thresholded the dysbiosis score at the 90th percentile of this score for non-IBD samples. This therefore identifies samples with a feature configuration that has a less than 10% probability of occurring in a participant without IBD. By this measure, 272 metagenomes were classified as dysbiotic. Samples from participants with CD or UC were overrepresented in the dysbiotic set, with 24.3% and 11.6% of their samples classified as dysbiotic, respectively. As expected, these samples also tended to locate in the extremes of the taxonomic ordination based on metagenomes (Extended Data Fig.
To lend additional support to the definition of dysbiosis (that is, as outliers by one type of microbiome profile), we tested the concordance between dysbiosis classifications made using the same statistical definition, but applied to metabolomic rather than taxonomic profiles. That is, we defined a metabolomic dysbiosis score as the median Bray–Curtis dissimilarity of one metabolomic profile to the non-IBD metabolomic profiles (after the 20th week), and defined the dysbiosis threshold as the 90th percentile of this distribution among non-IBD metabolomic profiles. We then compared these dysbiosis classifications with those from the nearest metagenomic sample (up to two weeks, see ‘Cross-measurement type temporal matching’) and found that dysbiotic samples identified by metagenomics were 4.6 times more likely to be dysbiotic by metabolomics (Fisher’s exact P = 5.9 × 10−9), showing that dysbiosis measurements are highly consistent across measurement types.
To test the sensitivity of the dysbiosis classification to the choice of reference data set, we also performed the dysbiosis classification using the HMP1-II stool samples10 (link) as the reference sample set instead of the non-IBD samples. The resulting dysbiosis scores (Extended Data Fig.