Comprehensive Transcript Assembly and Gene Modeling for P. vulgaris
Partial Protocol Preview
This section provides a glimpse into the protocol.
The remaining content is hidden due to licensing restrictions, but the full text is available at the following link:
Access Free Full Text.
Corresponding Organization :
Other organizations : HudsonAlpha Institute for Biotechnology, Joint Genome Institute, Agricultural Research Service, United States Department of Agriculture, Applied Genetic Technologies (United States), Génétique Quantitative et Évolution Le Moulon, Institut de Biologie Moléculaire des Plantes, Tennessee State University, Michigan State University, University of Arizona
Protocol cited in 61 other protocols
Variable analysis
- Construction of 43,627 transcript assemblies from about 727 million reads of paired-end Illumina RNA-seq data using PERTRAN
- Construction of 47,464 transcript assemblies using PASA from 79,630 P. vulgaris Sanger ESTs and the RNA-seq transcript assemblies
- Identification of loci by transcript assembly alignments and/or EXONERATE alignments of peptides from Arabidopsis, poplar, Medicago truncatula, grape (Vitis vinifera) and rice (Oryza sativa) peptides to the repeat-soft-masked genome
- Gene model prediction by the homology-based predictors FGENESH+, FGENESH_EST, and GenomeScan
- Selection of the highest scoring predictions for each locus using multiple positive factors, including EST and peptide support, and one negative factor—overlap with repeats
- Improvement of selected gene predictions by PASA, including by adding UTRs, correcting splicing and adding alternative transcripts
- Peptide homology analysis of PASA-improved gene model peptides with the above-mentioned proteomes to obtain Cscore values and peptide coverage
- Selection of transcripts based on Cscore value and peptide coverage, or if they had EST coverage but the proportion of their coding sequence overlapping repeats was less than 20%
- Removal of gene models whose encoded peptide contained more than 30% Pfam transposon element domains
- Repeat-soft-masked genome
- Transposon database developed as part of this project
- Threshold of up to 2,000-bp extension on both ends for loci, unless they extended into another locus on the same strand
- Threshold of Cscore value greater than or equal to 0.5 and peptide coverage greater than or equal to 0.5, or if the proportion of coding sequence overlapping repeats was less than 20%
- Threshold of Cscore value at least 0.9 and homology coverage at least 70% for gene models where greater than 20% of the coding sequence overlapped with repeats
Annotations
Based on most similar protocols
As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.
About PubCompare
Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.
We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.
However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.
Ready to get started?
Sign up for free.
Registration takes 20 seconds.
Available from any computer
No download required
Revolutionizing how scientists
search and build protocols!