Automated Genome Annotation for Lotus japonicus

Tentative genomic sequence was subjected to gene prediction and modeling by the Kazusa Annotation PipelinE for Lotus japonicus (KAPSEL).^{5 (link)} The KAPSEL employs ab initio gene-finding software and similarity searches in order to generate the elements for gene model production. The ab initio gene-finding software used in the pipeline includes GeneMark.hmm,^{24 (link)} Genscan^{25 (link)} and Grail^{26 (link)} using the A. thaliana-trained matrix. Splice-site candidates were deduced by NetGene2^{27 (link)} and SplicePredictor.^{28 (link)} The similarity searches to detect potential protein-coding exons were performed using the BLASTX function of BLAST against the UniProtKB database.^{29 (link)} The assigned exon candidates were extracted from the original sequence library, then mapped on the TGS more precisely using the dps and nap programs in the program suite of the analysis and annotation tool (AAT) package.^{30 (link)} Similarity searches of transcript sequences were performed by aligning the TGS against the Gene Indices^{31 (link)} for legume species including L. japonicus, M. truncatula and Glycine max. The assigned transcript sequences were mapped on the TGS using the dds and gap2 programs in AAT to confirm working models of protein-encoding genes. As a result of the automated annotation process, a total of 19 848 partial and 10 951 complete models were assigned as protein-encoding genes in the TGS, except for those related to TEs. The 76.4-Mb sequences in the HGS were edited and annotated manually to ensure high-quality gene prediction.
The genes thus assigned were denoted by IDs with the clone (LjT**** for TACs and LjB**** for BACs) or contig (CM****) names followed by sequential numbers from one end to another. Of these, manually annotated genes on the HGS were followed by “.nc”, and others were followed by “.nd”. The genes assigned on the SGA sequences were denoted by IDs with the assemble consensus names (LjSGA_****) followed by sequential numbers from one end to another in the insert.
A global alignment of the genome sequences and ESTs was performed using the NEEDLE program^{32 (link),33} that is provided at the EMBOSS site (http://emboss.sourceforge.net/). To identify a possible TATA box-like motif for recognition by RNA polymerase II, a search against the plant cis-acting regulatory DNA elements (PLACE) database^{34 (link)} (http://www.dna.affrc.go.jp/PLACE/) was carried out.

Partial Protocol Preview
This section provides a glimpse into the protocol.
The remaining content is hidden due to licensing restrictions, but the full text is available at the following link: Access Free Full Text.

Sato S., Nakamura Y., Kaneko T., Asamizu E., Kato T., Nakao M., Sasamoto S., Watanabe A., Ono A., Kawashima K., Fujishiro T., Katoh M., Kohara M., Kishida Y., Minami C., Nakayama S., Nakazaki N., Shimizu Y., Shinpo S., Takahashi C., Wada T., Yamada M., Ohmido N., Hayashi M., Fukui K., Baba T., Nakamichi T., Mori H, & Tabata S. (2008). Genome Structure of the Legume, Lotus japonicus. DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes, 15(4), 227-239.

Publication 2008

A dna A thaliana Clone Ests Exon Genes Genome Glycine max Legume Library Lotus japonicus Needle Plant Protein Protein genes Rna polymerase ii Sequences alignment Tata box

Corresponding Organization :

Other organizations : Kazusa DNA Research Institute, Kobe University, Osaka University, Keio University, Nara Institute of Science and Technology

Top 5 similar protocols

Protocol cited in 23 other protocols

Variable analysis

independent variables

Tentative genomic sequence was subjected to gene prediction and modeling by the Kazusa Annotation PipelinE for Lotus japonicus (KAPSEL)

dependent variables

Gene models produced by the KAPSEL pipeline, including partial and complete protein-encoding gene models

control variables

Similarity searches using BLASTX against the UniProtKB database to detect potential protein-coding exons
Splice-site candidates deduced by NetGene2 and SplicePredictor
Similarity searches of transcript sequences by aligning the TGS against the Gene Indices for legume species including L. japonicus, M. truncatula and Glycine max
Manual editing and annotation of the 76.4-Mb sequences in the HGS to ensure high-quality gene prediction
Global alignment of the genome sequences and ESTs using the NEEDLE program
Search for possible TATA box-like motifs for recognition by RNA polymerase II against the plant cis-acting regulatory DNA elements (PLACE) database

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!