The North Temperate Lakes Long-Term Ecological Research database hosts many
long-term time ecological series. We used five long-term phytoplankton data sets
(two from the North Temperate Lakes Long-Term Ecological Research and three from
the Cascade research group) to validate the cohesion workflow. These data sets met
a number of criteria that made them good candidates for the validation: the
samples were collected regularly, sampling spanned multiple years and many
environmental gradients, and taxa were counted in absolute abundance. The term
‘phytoplankton’ refers to the polyphyletic assemblage of
photosynthetic aquatic microbes (Litchman and Klausmeier,
2008 ). The data sets are from the following lakes in Wisconsin, USA:
Lake Mendota (293 samples with 410 taxa over 19 years), Lake Monona (264 samples
with 382 taxa over 19 years), Paul Lake (197 samples with 209 taxa over 12 years),
Peter Lake (197 samples with 237 taxa over 12 years) and Tuesday Lake (115 samples
with 121 taxa over 12 years). These lakes vary in size, productivity and food web
structure. Lake Mendota and Lake Monona are large (39.4 km2 and
13.8 km2), urban, eutrophic lakes (Brock,
2012 ). Peter, Paul and Tuesday lakes are small (each
<0.03 km2) lakes surrounded by forest (Carpenter and Kitchell, 1996 ). Peter Lake and Tuesday Lake were also
subjected to whole-lake food web manipulations during the sampling timeframe
(detailed in Elser and Carpenter, 1988 (link) and Cottingham et al., 1998 ). After validating our
workflow using the phytoplankton data sets, we tested the cohesion metrics on a
bacterial data set obtained using 16S rRNA gene amplicon sequencing. These types
of data sets often contain thousands of taxa, most of them rare, which may
influence the results of correlation-based analyses (Faust
and Raes, 2012 ). We used the Lake Mendota bacterial 16S rRNA gene
sequencing time series (91 samples with 7081 taxa over 11 years) for this analysis
(Hall et al., in review ). Sample
processing, sequencing and core amplicon data analysis were performed by the Earth
Microbiome Project (www.earthmicrobiome.org ; Gilbert et
al., 2014 ), and all amplicon sequence data and metadata have
been made public through the data portal (qiita.microbio.me/emp ). Briefly, community DNA (Kara et al., 2013 (link)) was used to amplify partial 16S rRNA
genes using the 515F-806R primer pair (Caporaso et
al., 2011 ) and an Illumina MiSeq, with standard Earth
Microbiome Project protocols.
We present the workflow using results from the Lake Mendota phytoplankton data
set, as it is the largest (longest duration and most taxa) data set available in
absolute abundance. The dominant taxa in the Lake Mendota phytoplankton data set
change throughout the year, with diatoms most abundant during the spring bloom and
cyanobacteria most abundant in summer. Details about phytoplankton data sets can
be found athttps://lter.limnology.wisc.edu/ . Further details about the Lake
Mendota 16S rRNA gene data set are included in theSupplementary Online Material .
long-term time ecological series. We used five long-term phytoplankton data sets
(two from the North Temperate Lakes Long-Term Ecological Research and three from
the Cascade research group) to validate the cohesion workflow. These data sets met
a number of criteria that made them good candidates for the validation: the
samples were collected regularly, sampling spanned multiple years and many
environmental gradients, and taxa were counted in absolute abundance. The term
‘phytoplankton’ refers to the polyphyletic assemblage of
photosynthetic aquatic microbes (
2008
Lake Mendota (293 samples with 410 taxa over 19 years), Lake Monona (264 samples
with 382 taxa over 19 years), Paul Lake (197 samples with 209 taxa over 12 years),
Peter Lake (197 samples with 237 taxa over 12 years) and Tuesday Lake (115 samples
with 121 taxa over 12 years). These lakes vary in size, productivity and food web
structure. Lake Mendota and Lake Monona are large (39.4 km2 and
13.8 km2), urban, eutrophic lakes (
2012
<0.03 km2) lakes surrounded by forest (Carpenter and Kitchell, 1996 ). Peter Lake and Tuesday Lake were also
subjected to whole-lake food web manipulations during the sampling timeframe
(detailed in Elser and Carpenter, 1988 (link) and Cottingham et al., 1998 ). After validating our
workflow using the phytoplankton data sets, we tested the cohesion metrics on a
bacterial data set obtained using 16S rRNA gene amplicon sequencing. These types
of data sets often contain thousands of taxa, most of them rare, which may
influence the results of correlation-based analyses (
and Raes, 2012
sequencing time series (91 samples with 7081 taxa over 11 years) for this analysis
(Hall et al., in review ). Sample
processing, sequencing and core amplicon data analysis were performed by the Earth
Microbiome Project (
al., 2014
been made public through the data portal (
genes using the 515F-806R primer pair (
al., 2011
Microbiome Project protocols.
We present the workflow using results from the Lake Mendota phytoplankton data
set, as it is the largest (longest duration and most taxa) data set available in
absolute abundance. The dominant taxa in the Lake Mendota phytoplankton data set
change throughout the year, with diatoms most abundant during the spring bloom and
cyanobacteria most abundant in summer. Details about phytoplankton data sets can
be found at
Mendota 16S rRNA gene data set are included in the