The finished libraries were quantitated using Invitrogen’s Picogreen assay and the average library size was determined on Bioanalyzer 2100, using a DNA 7500 chip (Agilent). Library concentrations were then normalized to 4 nM and validated by qPCR on a ViiA-7 real-time thermocycler (Applied Biosystems), using qPCR primers recommended in Illumina’s qPCR protocol, and Illumina’s PhiX control library as standard. The libraries were then pooled at equimolar concentrations and sequenced on an Illumina HiSeq2500 sequencer in rapid mode at a read-length of 250 bp paired-end. Approximately 5Gb of sequencing data were obtained per sample so as to capture most of the novelty [19 (link)].
Genomic DNA sequences obtained from Illumina HiSeq paired-end sequencing were analyzed as follows: (i) Sequence quality check, (ii) Reads to protein alignment, and (iii) Taxonomical classification. To perform a quality check, reads shorter than 30 bp and low-quality sequences (
Lastly, a lowest common ancestor (LCA)-based Taxonomical Classification of the aligned sequence reads was carried out on MEGAN6 (MEtaGenome Analyzer 6) using a bitscore cut-off of 100. A simple algorithm is utilized by MEGAN to assign each read to the LCA of the set of taxa that it hits in the comparison, whereby species-specific sequence reads will be assigned to the species taxon, while widely conserved sequence reads will be assigned to the high-order taxa [20 ].