The P. dulcis genome assembly was annotated by combining transcript alignments, protein alignments and ab initio gene predictions. A flowchart of the annotation process is shown in Figure S13. Scripts are available at https://github.com/jesgomez/annotation_pipeline. First, almond RNA‐seq reads were downloaded from NCBI with the accession number SRR1251980 and aligned to the genome with STAR (v.2.5.3a) (Dobin et al., 2013). Transcript models were subsequently generated using Stringtie (v.1.0.4) (Pertea et al., 2015) and, along with the P. persica transcriptome (annotation Pp2.0a) and 4509 almond expressed sequence tags downloaded from NCBI on July 2015, were assembled into a non‐redundant set by PASA (v.2.3.3) (Haas et al., 2008). The TransDecoder program, which is part of the PASA package, was run on the PASA assemblies to detect coding regions in the transcripts. Second, the complete Rosaceae proteome was downloaded from Uniprot on July 2015 and aligned to the genome using Exonerate (v.2.4.7) (Slater and Birney, 2005). Third, ab initio gene predictions were performed on the repeat masked pdulcis26 assembly with three different programs: GeneID v.1.4 (Alioto et al., 2018), Augustus v.3.2.3 (Stanke et al., 2015) and GeneMark‐ES v.2.3e (Lomsadze et al., 2014) with and without incorporating evidence from the RNA‐seq data. Finally, all the data were combined into consensus coding sequence models using EvidenceModeler‐1.1.1 (EVM) (Haas et al., 2008). Additionally, untranslated regions and alternative splicing forms were annotated through two rounds of PASA annotation updates. Non‐coding RNAs were annotated as follows: first, the program cmsearch v.1.1 (Cui et al., 2016) from the INFERNAL package (Nawrocki and Eddy, 2013) was run against the RFAM (Nawrocki et al., 2015) database of RNA families (v.12.0). Also, tRNAscan‐SE v.1.23 (Lowe, 1997) was run to detect the transfer RNA genes present in the genome assembly. To annotate long non‐coding RNAs (lncRNAs) we first selected PASA assemblies that had not been included in the annotation of protein‐coding genes. Those longer than 200 bp and whose length was not covered to at least 80% by a small ncRNA were incorporated into the ncRNA annotation as lncRNAs. The resulting transcripts were clustered into genes using shared splice sites or significant sequence overlap as criteria for designation as the same gene.
Alioto T., Alexiou K.G., Bardil A., Barteri F., Castanera R., Cruz F., Dhingra A., Duval H., Fernández i Martí Á., Frias L., Galán B., García J.L., Howad W., Gómez‐Garrido J., Gut M., Julca I., Morata J., Puigdomènech P., Ribeca P., Rubio Cabetas M.J., Vlasova A., Wirthensohn M., Garcia‐Mas J., Gabaldón T., Casacuberta J.M, & Arús P. (2019). Transposons played a major role in the diversification between the closely related almond and peach genomes: results from the almond genome sequence. The Plant Journal, 101(2), 455-472.
Corresponding Organization : Center for Research in Agricultural Genomics
Other organizations :
Pompeu Fabra University, Centre for Genomic Regulation, Washington State University, Genetics and Improvement of Fruit and Vegetables, Innovative Genomics Institute, University of California, Berkeley, Consejo Superior de Investigaciones Científicas, The Pirbright Institute, Centro de Investigación y Tecnología Agroalimentaria de Aragón, Universidad de Zaragoza, Gobierno de Aragón, University of Adelaide, Australian Wine Research Institute, Institució Catalana de Recerca i Estudis Avançats
4509 almond expressed sequence tags downloaded from NCBI
Rosaceae proteome downloaded from Uniprot
Ab initio gene prediction programs (GeneID, Augustus, GeneMark-ES)
dependent variables
Transcript models generated using Stringtie
Coding regions detected in the PASA assemblies using TransDecoder
Protein alignments to the genome using Exonerate
Consensus coding sequence models using EvidenceModeler-1.1.1 (EVM)
Annotation of untranslated regions and alternative splicing forms
Annotation of non-coding RNAs (RFAM database, tRNAscan-SE, long non-coding RNAs)
control variables
Repeat masked pdulcis26 genome assembly
Annotations
Based on most similar protocols
Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.
As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.
About PubCompare
Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.
We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.
However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.
Ready to
get started?
Sign up for free.
Registration takes 20 seconds.
Available from any computer
No download required