Gut microbial time series data were collected from 20 women each of whom donated stool samples for over a month, with a sampling frequency close to one sample per day (Vandeputte et al., submitted) [26 ]. These women also reported data on their menstrual cycle. For each sample, enterotype assignments were carried out as in Vandeputte et al. [27 (
link)] with Dirichlet multinomial clustering. Samples were assigned to Bacteroides 1, Bacteroides 2, Ruminococcaceae, or Prevotella.
Progression through the menstrual cycle was rescaled to 28 days (the average length of a menstrual cycle) for all women. For days where there was more than one sample, only the first sample was used. Taxa present in less than 50% of participants were discarded from the analysis. Association networks were constructed with fastLSA v1.0 [28 (
link)] with data rarefied to 10,000 sequences per sample, with correlations inferred across a delay of three time points (
α = 0.05). Set sizes were analyzed with anuran, by generating 20 networks per observed network and resampling 100 different groups from these. Positive controls were generated 20 times, with a core size equal to 20% of the union of edges at 10% prevalence (edges present in at least two networks) and at 50% prevalence (edges present in at least ten networks). Set sizes and centralities with a
p value below 0.05 for comparisons to values from random networks were considered significantly different from the random networks. The anuran toolbox was also used to assess the effect of increasing the number of participants.
The Walktrap community finding algorithm [29 ], implemented in the igraph R package v1.2.6 [30 ], was used to cluster the inferred CAN as the lack of negative edges in the CAN suggested that random walks could sufficiently identify clusters. To visualize enterotype-specific patterns of relative abundance, we computed the mean relative abundance of taxa per individual. We then took the median relative abundances across all individuals who belonged predominantly to the Ruminococcaceae enterotype, an enterotype previously linked to lower stool moisture [27 (
link)], and subtracted from these all other median relative abundances, giving an estimate of taxa that had high abundance in the Ruminococcaceae enterotype compared to other enterotypes.
For the case study on the sponge microbiome, QIIME-processed data were downloaded from Moitinho et al. [31 (
link)]. Samples with fewer than 1000 counts were removed and the samples were rarefied to even depth at 1034 sequences. After rarefaction, the abundance data were first filtered for 20% taxon prevalence across all samples, then once more to ensure 20% prevalence across different orders. Counts for removed taxa were retained to preserve the sample sums. After excluding host orders with fewer than 50 samples, 10 orders remained. CoNet v1.1.1 with renormalisation was then used to infer association networks (Faust and Raes [2 ]). Edges were generated with Pearson correlation, Spearman correlation, mutual information, Bray–Curtis dissimilarity, and Kullback–Leibler distance. Edges were included if at least one method reached significance; only edges with a combined
Q-value below 0.05 (estimated using a combination of permutation and bootstrapping) were retained. The CoNet CANs were inferred with anuran generating 20 negative control random networks per host order and resampling these 100 times. For the positive controls, 20 network groups were generated with a core size equal to 20% of the union of edges at 20% prevalence (edges present in at least two networks) and at 50% prevalence (edges present in at least five networks). Set sizes and centralities with a
p value below 0.05 for comparisons to values from random networks were considered significantly different from the random networks. CoNet networks were compared to FlashWeave networks [7 (
link)]. FlashWeave v0.16.0 was run as FlashWeave-S (sensitive set to true and heterogeneous to false), with all other settings set to the default. To compare FlashWeave networks to CoNet networks, anuran generated five randomized networks per order-specific network and resampled these five times.
Prior research indicated that microbial abundance was a significant driver of community structure in sponges [32 (
link)]. Therefore, taxa in the CAN were compared to taxa reported as indicators of high microbial abundance (HMA) or low microbial abundance (LMA) [32 (
link)]. CAN network clusters were identified with manta v1.0.0 [33 ], as this algorithm has been designed to handle negative edges in the CAN. To run the clustering algorithm, default settings were used, except the number of iterations and permutations, which was set to 200. A Chi-squared test was used to compare HMA–LMA predictions to CAN cluster assignments (
α = 0.05).
Röttjers L., Vandeputte D., Raes J, & Faust K. (2021). Null-model-based network comparison reveals core associations. ISME Communications, 1, 36.