Species abundances (using the species delineation from Mende et al (2013 (link))) were used to calculate Shannon diversity index and species
richness for each sample in study population F using the diversity and
specnumber functions, respectively, of the vegan R package (http://cran.r-project.org/web/packages/vegan/index.html). Differences between tumor-free
and CRC patients were assessed by the Kruskal–Wallis test (Supplementary Fig S1D and E).
Gene richness (the number of genes from the metagenomic gene catalog with nonzero abundance) was
calculated for each sample from study population F after rarefying to 3 million reads per sample;
differences were evaluated using the Kruskal–Wallis test (Supplementary Fig S1F).
As an additional high-level descriptor of gut microbial community composition, we analyzed the
abundance ratio between the phyla of Bacteroidetes and Firmicutes (Turnbaugh et al,
2006 (link)) with respect to separation of the three groups of
participants using the Kruskal–Wallis test (Supplementary Fig S1C).
Enterotypes were determined on a reference set of the 292 healthy individuals from study
population H (Qin et al, 2010 (link); Le Chatelier
et al, 2013 (link)) using the original
computational protocol and PCoA visualization (Supplementary Fig S1A) (for details, see Arumugam
et al (2014 (link), 2011 (link))). We projected the 156 samples from study population F into this PCoA space (Trosset
& Priebe, 2006 ) and assigned enterotypes by minimal
JSD distance to the medoid of each enterotype (i.e., to the nearest cluster center). Differences in
enterotype composition between CRC patients (all stages) and tumor-free controls (some with
adenomas) of study population F were assessed using the Fisher test (Supplementary Fig S1B).
Additionally, we subjected study population F to a PCoA independently of other datasets and
investigated the separation of CRC cases from controls (neoplasia-free participants and patients
with small adenomas) along principal coordinates; significance was assessed using the Wilcoxon test
(Supplementary Fig S1G–J).
To assess whether differences in such high-level descriptors of microbial community structure are
useful for CRC detection, we built a logistic regression model with the ten first principal
coordinates (from Supplementary Fig S1G) and the Bacteroidetes to Firmicutes ratio (Supplementary
Fig S1C) as predictors. Its accuracy was determined using tenfold cross-validation on study
population F and ROC analysis (Supplementary Fig S1K).
Free full text: Click here