Several sequence resources were combined, forming a custom, redundant protein database. Expressed Sequence Tags (EST) databases from A. thaliana (release 12.1), Brassica napus (release 1), C. reinhardtii (release 5), G. max (release 10), Lotus japonicus (release 3), Lycopersicum esculentum (release 10.1), M. truncatula (release 8), Nicotiana tabacum (release 2), O. sativa (release 16), Solanum tuberosum (release 10), and Z. mays (release 16) were downloaded from the TIGR Gene Indices (now available at the Dana-Farber Cancer Institute gene index project) [49 (link)]. TIGR Transcript Assemblies (TA) from A. thaliana, Brassica napus, C. reinhardtii, P. patens, G. max, Glycine soja, Lotus corniculatus, Lupinus albus, Lycopersicum esculentum, M. sativa, M. truncatula, Nicotiana tabacum, O. sativa, Phaseolus coccineus, Phaseolus vulgaris, Pisum sativum, Solanum tuberosum, and Z. mays were added to this set (all release 1, 15 August 2005) [50 (link)]. The proteins predicted from the plant genomes of A. thaliana (NCBI Genbank release 5, 03 May 2006) [57 (link)], C. reinhardtii (JGI, release 3) [58 (link)], M. truncatula (Genome Sequencing Project release 17 July 2006) [59 (link)], O. sativa (release 4, 30 December 2005) [60 (link)], and P. trichocarpa (JGI, release 1) [61 (link)] were also included.
Sequence names were truncated to a unique identifier. Information about the database origin of each sequence was added to the unique identifier (i.e. OS-TA, OSEST, OSGEN for O. sativa TA, EST or genomic sequences respectively). Nucleotide sequences were translated into protein sequences in all six reading frames (universal code), and frame information was appended to the sequence identifier (e.g. "_+2"). The translated nucleotide sequences and modified protein sequences derived from genomic data were combined into a single file and formatted using Formatdb (options: -p T and -o T) [43 (link)]. The resulting database contained 3,631,558 sequences. To determine whether CLE sequences were specific to plants, a separate search was based on the non-redundant protein database (NCBI nr, version 15 June 2006.).
Free full text: Click here