Ensembl Database Pipeline Architecture

Ensembl databases are built using MySQL and data input and analysis pipelines are written in Perl, normally utilising the eHive (49 (link)) workflow management system. While our database schema is subject to change, our Perl API is stable with changes deployed and announced in a controlled way. Specifically, we aim to support deprecated functionality for at least a year to provide ample time for those using our API in their pipelines to make the necessary updates.
Reducing sequencing costs have facilitated a rapid increase in the quantity of variant data available for a number of species, motivating us to regularly revise and optimize our analysis and storage methods. Key compute-intensive API functions, such as checking whether a variant overlaps another genomic feature, have been rewritten in C and can be optionally used through the Perl-XS interface. This brings considerable performance improvements when analysing large numbers of variants. We have also modified our API to use tabix (50 (link)), an efficient file access tool, to extract genotype data from Variant Call Format (51 (link)) files, removing the need to load large datasets into MySQL. Variant locations are stored in databases, enabling look up by names such as dbSNP refSNP identifier or ClinVar accession, followed by rapid extraction of genotype and allele frequency data from files.
To ensure our tools and data are compatible with other systems, we champion standards for data formatting and have adopted and contributed to the development of many standards. We drove the collaboration to develop the SO and use SO terms to describe both the type of change a variant represents and its consequence on overlapping genomic features (24 (link)). Consequences are annotated on the immutable Locus Reference Genomic (52 (link)) transcripts as well as the current Ensembl gene set. All variants are annotated using the HGVS (53 (link)) nomenclature, which has become the preferred way to describe variants in the clinical community. HGVS descriptions using Ensembl, RefSeq and LRG transcripts are provided where possible.

Free full text: Click here

Hunt S.E., McLaren W., Gil L., Thormann A., Schuilenburg H., Sheppard D., Parton A., Armean I.M., Trevanion S.J., Flicek P, & Cunningham F. (2018). Ensembl variation resources. Database: The Journal of Biological Databases and Curation, 2018, bay119.

Publication 2018

Gene Genomic Genotype Hgvs

Corresponding Organization : European Bioinformatics Institute

Top 5 similar protocols

Protocol cited in 82 other protocols

Variable analysis

independent variables

Analysis and storage methods for variant data
API functions (e.g., checking variant overlap with genomic features) rewritten in C

dependent variables

Quantity of variant data available for a number of species
Performance improvements when analyzing large numbers of variants

control variables

Database schema
Perl API (stable with changes deployed in a controlled way)

positive controls

Not explicitly mentioned

negative controls

Not explicitly mentioned

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!