A database was established representing a COS of 1,704 tomato unigenes (1612 Sl, 29 S. habrochaites and 63 S. pennellii) from 113,932 ESTs [15 ]. From these, a single and longest EST was chosen to design primers. Using the tools developed for Compositae Genome database, the position of introns was first estimated using the procedures above. A set of 1,268 primers were designed to amplify across estimated intron sites with primers 50–100 bp from the intron. Amplification of primers was tested on a single line, M82.
Primers that successfully amplified a product were tested for polymorphism using sequencing in a series of three pools representing different degrees of diversity. The design has complementary pools representing each class (fresh market, processing, other) with one diverse line from an alternate class to maximize the chance of detecting a polymorphism within or among pools. Using a series of empirical tests with lines with known SNPs in ratios of 1:7, 1:5, 1:3 and 1:1, we determined that an unknown polymorphism can be reliably detected with sequencing with a 1:3 dilution. Pool 1 consisted of O 9242, FL7600, Ha7998, PI114490; Pool 2 included M82, O 8245, O 88119, NC84173 and; Pool 3 consisted of Sun1642, Heinz1706, O 9242, FL7600 (Table 2). DNA was extracted from each line and was combined in equi-molar concentrations.
For all sequencing reactions, forward and reverse primers were tailed with M13 sequences and sequenced using standard protocols for Sanger sequencing (Applied Biosystems, Foster City, CA) in forward and reverse directions using a ABI 3730 (Applied Biosystems, Foster City, CA). Trace files were trimmed with Phred options -trim_cutoff 0.02" which translates to Phred 17 score. [29 (link)]. Assembly was achieved with Phrap/Consed and options were set at " -retainduplicates and -forcelevel 5". These options were optimized to give the best trim and assembly parameters for calling SNPs. Stringent trim parameters are favored in this case to minimize the high number of false SNPs associated with poor sequence on the ends. Amplicon sizes were estimated and included in Additional file 1, Tables S2 and S3. To calculate a more accurate estimate than from gel electrophoresis, the sequenced contig(s) size was used as a minimum. When greater than one contig per locus was obtained as a result of unpredictably large introns, the forward and reverse contig sizes were added.
SNPs were first identified semi-manually using Polyphred as heterozygotes within pools or homozygous differences among pools. The line, M82, was used as reference to screen amplicons for single copy number. Amplicons with putative SNPs were then amplified in the individual 12 lines (Table 2) and sequenced as described above. Only SNPs showing both homozygous alleles were called. Data was extracted from Polyphred using custom scripts ([30 ] See Additional file 1). Similarly, data for indels were extracted from Polyphred. SSRs (di to tetra repeats) were extracted from all sequenced loci for M82, our reference line, and the various genotypes and reported for all sequenced individuals. The sequence database was analyzed for all known repeats for tomato [1 ]. All loci were cross-referenced to the SGN COSII for tomato, pepper, potato and coffee and associated maps [14 (link)].
Free full text: Click here