Two types of sequencing approaches were combined to sequence the L. japonicus genome: clone-by-clone sequencing and shotgun sequencing of selected regions of the genome.
TAC/BAC clones were selected from the genomic libraries as seed points using the sequence information from ESTs and cDNA markers from L. japonicus and other legumes. The nucleotide sequence of each clone was determined according to the shotgun strategy with three to five times redundancy. A total of 1909 TAC/BAC clones, those newly sequenced in this study and those that had been sequenced previously,5 (link)–9 (link) were assembled into 954 scaffolds using the Paracel Genome Assembler (PGA; version 2.6.2, Paracel Co., 2002), followed by manual TAC/BAC end-pair scaffolding, resulting in high-quality genomic sequence (HGS) contigs.
In parallel, shotgun sequencing of a selected TAC mixture (STM) enriched in gene spaces and a whole genomic DNA from which highly repetitive and organelle genomic sequences were subtracted (selected genomic regions, SGRs) was carried out. The TAC clones, neither end sequence of which hit repetitive or organelle genomic sequences in the L. japonicus genome, were selected from the libraries, pooled, and subjected to shotgun sequencing. For the SGRs, a genomic library with an average insert size of 2.5 kb was generated using pBluescript SK− as the cloning vector. For subtraction, polymerase chain reaction (PCR)-amplified fragments of LjTR1 were biotinylated using Biotin-High Prime (Roche, Basel, Switzerland) and used as a driver in subtractive hybridization with the WGS library. The WGS library was single-stranded prior to hybridization by combined action of gene II and exonuclease III. Hybrids were removed using Dynabeads M-280 Streptavidin (Invitrogen, Carlsbad, CA, USA) and the remaining single-stranded WGS library was double-stranded using Klenow fragments (Takara Bio, Japan) and transformed into host E. coli ElectroTen-Blue (Agilent Technologies, Santa Clara, CA, USA).
A total of 808 816 reads from STM generated from 4603 TAC inserts and 847 513 SGR reads were assembled into a set of 109 986 contigs, 147 805 446 bp in length (selected genome assembly, SGA) by the Arachne assembler, version 2.01.11 (link) The SGA sequences were then subjected to assemble with the HGS, and finally, a total of 110 940 supercontigs with a total coverage of 315 073 275 tentative genomic sequence (TGS) bases were obtained.