Comprehensive Curation of Gene Expression Data

Data for the Atlas are selected from ArrayExpress Archive and selection is based on various criteria outlined earlier. As currently we are using only microarray data, our first consideration is whether sufficient array annotation is given to enable us to map the array design elements to existing gene identifiers. We use two routes for this mapping: we preferentially map array probe sequences to Ensembl genomes (15 (link)) or we attempt to map the design element annotation identifiers to gene annotation in UniProt database (16 (link)). Where re-annotation fails, experiments that are performed on such arrays cannot be included in the Atlas. The array re-annotation pipeline will be released as a software package, described and published separately (Sarkans et al., in preparation).
Experiments in ArrayExpress Archive that are performed on well-annotated arrays, which have high MIAME scores (2 ,17 (link)), where the EF/EFV annotation and sufficient replication criteria (as well as some other technical criteria not described here), and where normalized data are present, are annotated as ‘suitable for Atlas’. When all basic criteria are satisfied, experiment selection for the Atlas is motivated by the quality of annotation, use of standard platforms and large sample sizes, without any preference for any biological conditions. Recently, we started to produce themed Atlas data releases, e.g. species oriented or addressing a specific research domain, or by curating user-requested studies. Experiments selected for Atlas are then exported from the Archive. The submitter's; normalized data are used, hence we do not perform any renormalization. Prior to loading into the Atlas, annotations are harmonized, experimental descriptions checked for consistency and non-standard terms are standardized. Maps to EFO are added where the term required is present in the ontology. If terms are not in EFO, we examine source ontologies and provide a term name, definition and maps to external ontologies. The term is then placed in the EFO hierarchy that is optimized for the Atlas visualization.
Once data are loaded, statistical computations, as described in the previous section, are performed and for each new experiment, for each EF and EFV, for each gene the P-value is computed.
Currently, the Atlas contains data from nine species. Table 1 shows the number of assays and the number of studies (experiments) included from each. The experiments included in the Atlas together have more than 40 different EFs, covering over 4500 different EFVs. The distribution of the number of assays for the most frequently studied (at least 50 experiments for each factor) EFs and EFVs are given in Table 2.

Table 1.

Number of studies and assays for each species in the Atlas

Species	Assays	Studies
Homo sapiens	13 703	410
Mus musculus	7539	373
Rattus norvegicus	4858	133
Arabidopsis thaliana	1607	88
Saccharomyces cerevisiae	813	43
Drosophila melanogaster	790	40
Schizosaccharomyces pombe	458	19
Danio rerio	214	13
Caenorhabditis elegans	166	5
Total	30 148	1124

Table 2.

Most frequently used EFs and the number of EFVs and studies for each factor

EFs	EFVs	Studies
Genotype	389	211
Compound treatment	425	196
Disease state	214	137
Organism part	267	98
Cell type	164	61
Growth condition	122	61
Strain or line	227	51

The method used in Gene Expression Atlas analytics allows us to examine trends in differential gene expression across all Atlas data. Figure 5A shows the distribution of proportions of differentially expressed genes across all experiments. There are approximately 400 experiments (from over 1000) with fewer than 10% of all genes showing differential expression; the mean proportion of genes differentially expressed in an experiment, according to our FDR criteria, is 25%. Further, when we examine the number of differentially expressed genes per factor (Figure 5B), we observe that the numbers are highest in the factors ‘observation’, ‘histology’, ‘cell line’, ‘generation’ and ‘organism part’. It appears that, broadly, across species, transcriptional activity is strongly driven by its context: by tissue (‘histology’, ‘organism part’ and, by extension, ‘cell line’), followed by developmental stage and then cell type, while the main extrinsic drivers of transcriptional activity such as xenobiotic responses (‘compound treatment’) and disease states contribute to differential expression to a smaller extent. We can also observe that the number of differentially expressed genes is largely independent of the number of EFVs (the median factor value count is around 3 EFVs).
Figure 5.

Distributions of differentially expressed genes over (A) experiments and (B) EFs. Error bars in (B) mark the 25% and 75% quantiles in the differentially expressed gene count for each EF.

Partial Protocol Preview
This section provides a glimpse into the protocol.
The remaining content is hidden due to licensing restrictions, but the full text is available at the following link: Access Free Full Text.

Kapushesky M., Emam I., Holloway E., Kurnosov P., Zorin A., Malone J., Rustici G., Williams E., Parkinson H, & Brazma A. (2009). Gene Expression Atlas at the European Bioinformatics Institute. Nucleic Acids Research, 38(Database issue), D690-D698.

Publication 2009

Assays Biological Cell line Cell type Efvs Gene Gene element Gene expression Gene expression analytics Genomes Maps Microarray Releases Replication Tissue Transcriptional Xenobiotic

Corresponding Organization :

Other organizations : European Bioinformatics Institute

Top 5 similar protocols

Protocol cited in 10 other protocols

Variable analysis

independent variables

Genotype
Compound treatment
Disease state
Organism part
Cell type
Growth condition
Strain or line

dependent variables

Differential gene expression

control variables

Not explicitly mentioned

controls

No positive or negative controls are mentioned.

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!