One winter rumen fluid sample was separated into a pellet of plant material (gentle centrifugation for 5 mins at 3000 g) and the supernatant was sequentially filtered through a 0.8 μm filter and then onto a 0.2 μm filter. DNA was extracted from four fractions: the pellet (1 g), half of the biomass retained on each of the 0.8 and 0.2 μm filters, and the filtrate that passed through the 0.2 μm filter. DNA was sequenced with Illumina Hi-Seq 2500 (Columbus, OH, USA) at The Ohio State University. 16S rRNA gene sequences were reconstructed from the Illumina trimmed unassembled reads using EMIRGE (Miller et al., 2011 (link)). Trimmed reads were assembled de novo to generate genome fragments using IDBA-UD (Peng et al., 2012 (link)). Genes were called, annotated and analyzed as previously described by Wrighton et al. (2012) (link) (see Supplementary Methods for details). A combination of phylogenetic signal, coverage and GC content was used to identify BS11 genomic bins (Sharon et al., 2013 (link)). Additional assembly and binning methods and validation information are available in the Supplementary Methods . Genomic completion of the BS11 bins was assessed based on the presence of a core gene set that typically occurs only once per genome and is widely conserved among bacteria and archaea (Wu and Eisen, 2008 (link)). For sequence-based comparison, average amino acid identity (AAI) and average nucleotide identity (ANI) values were calculated using the ANI and AAI calculators from the Kostas lab calculator (http://enve-omics.ce.gatech.edu/ ).
Existing reference datasets for the 11 ribosomal proteins chosen as single-copy phylogenetic marker genes (RpL2, 3, 4, 6, 14, 15, 16 and 18, and RpS8, 17 and 19) were augmented with sequences mined from sequenced genomes from the Bacteroidales phyla from the NCBI and JGI IMG databases (August 2015). Each individual protein dataset was aligned using MUSCLE 3.8.31 and then manually curated to remove end gaps (Edgar, 2004 (link)). Alignments were concatenated to form an 11-gene, 63 taxa alignment and then run through ProtPipeliner, a python script developed in-house for generation of phylogenetic trees (https://github.com/lmsolden/protpipeliner ). The pipeline runs as follows: alignments are curated with minimal editing by GBLOCKS (Talavara and Castresana, 2007 (link)), and model selection conducted via ProtTest 3.4 (Darriba et al., 2011 (link)). A maximum likelihood phylogeny for the concatenated alignment was conducted using RAxML version 8.3.1 under the LG model of evolution with 100 bootstrap replicates (Stamatakis, 2014 (link)) and visualized in iTOL (Letunic and Bork, 2007 (link)). Identified glycoside hydrolases of selected functional classes (for example, chitin, hemicellulose and debranching) were identified by a Pfam HMM search. Briefly, Pfam search was performed and parsed into an output table organized by function per genome. In addition, we manually identified genes for central carbonmetabolism, motility and fermentation product generation in all genomes.
Existing reference datasets for the 11 ribosomal proteins chosen as single-copy phylogenetic marker genes (RpL2, 3, 4, 6, 14, 15, 16 and 18, and RpS8, 17 and 19) were augmented with sequences mined from sequenced genomes from the Bacteroidales phyla from the NCBI and JGI IMG databases (August 2015). Each individual protein dataset was aligned using MUSCLE 3.8.31 and then manually curated to remove end gaps (Edgar, 2004 (link)). Alignments were concatenated to form an 11-gene, 63 taxa alignment and then run through ProtPipeliner, a python script developed in-house for generation of phylogenetic trees (