A total of 3,300 randomly selected fosmid clones were sequenced on one full lane of the 454 GS-FLX Genome Sequencer System using the Titanium platform (Roche, Brandford, CT, USA) following the manufacturer’s protocol. Repeats in raw sequenced reads obtained were removed using RepeatMasker (http://www.repeatmasker.org). The vector and host sequences were filtered by BLASTN, with an E-value cutoff of 1e-3. The filtered reads were assembled using the Newbler assembly software, developed by 454 Life Sciences (version 2.6, Roche). Non-overlapping fragment singletons were clustered using the CD-HIT software [58 (link)] to minimize redundant sequences. The overall process of metagenomic data preparation and analysis is summarized in Additional file 1: Figure S1. The entire sequences of the bagasse fosmid library have been deposited to the NCBI Sequence Read Archive (SRA), which can be accessed using the accession number: SRX493840.
Free full text: Click here