Data for the Atlas are selected from ArrayExpress Archive and selection is based on various criteria outlined earlier. As currently we are using only microarray data, our first consideration is whether sufficient array annotation is given to enable us to map the array design elements to existing gene identifiers. We use two routes for this mapping: we preferentially map array probe sequences to Ensembl genomes (15 (link)) or we attempt to map the design element annotation identifiers to gene annotation in UniProt database (16 (link)). Where re-annotation fails, experiments that are performed on such arrays cannot be included in the Atlas. The array re-annotation pipeline will be released as a software package, described and published separately (Sarkans et al., in preparation). Experiments in ArrayExpress Archive that are performed on well-annotated arrays, which have high MIAME scores (2 ,17 (link)), where the EF/EFV annotation and sufficient replication criteria (as well as some other technical criteria not described here), and where normalized data are present, are annotated as ‘suitable for Atlas’. When all basic criteria are satisfied, experiment selection for the Atlas is motivated by the quality of annotation, use of standard platforms and large sample sizes, without any preference for any biological conditions. Recently, we started to produce themed Atlas data releases, e.g. species oriented or addressing a specific research domain, or by curating user-requested studies. Experiments selected for Atlas are then exported from the Archive. The submitter's; normalized data are used, hence we do not perform any renormalization. Prior to loading into the Atlas, annotations are harmonized, experimental descriptions checked for consistency and non-standard terms are standardized. Maps to EFO are added where the term required is present in the ontology. If terms are not in EFO, we examine source ontologies and provide a term name, definition and maps to external ontologies. The term is then placed in the EFO hierarchy that is optimized for the Atlas visualization. Once data are loaded, statistical computations, as described in the previous section, are performed and for each new experiment, for each EF and EFV, for each gene the P-value is computed. Currently, the Atlas contains data from nine species. Table 1 shows the number of assays and the number of studies (experiments) included from each. The experiments included in the Atlas together have more than 40 different EFs, covering over 4500 different EFVs. The distribution of the number of assays for the most frequently studied (at least 50 experiments for each factor) EFs and EFVs are given in Table 2.
Number of studies and assays for each species in the Atlas
Species
Assays
Studies
Homo sapiens
13 703
410
Mus musculus
7539
373
Rattus norvegicus
4858
133
Arabidopsis thaliana
1607
88
Saccharomyces cerevisiae
813
43
Drosophila melanogaster
790
40
Schizosaccharomyces pombe
458
19
Danio rerio
214
13
Caenorhabditis elegans
166
5
Total
30 148
1124
Most frequently used EFs and the number of EFVs and studies for each factor
EFs
EFVs
Studies
Genotype
389
211
Compound treatment
425
196
Disease state
214
137
Organism part
267
98
Cell type
164
61
Growth condition
122
61
Strain or line
227
51
The method used in Gene Expression Atlas analytics allows us to examine trends in differential gene expression across all Atlas data. Figure 5A shows the distribution of proportions of differentially expressed genes across all experiments. There are approximately 400 experiments (from over 1000) with fewer than 10% of all genes showing differential expression; the mean proportion of genes differentially expressed in an experiment, according to our FDR criteria, is 25%. Further, when we examine the number of differentially expressed genes per factor (Figure 5B), we observe that the numbers are highest in the factors ‘observation’, ‘histology’, ‘cell line’, ‘generation’ and ‘organism part’. It appears that, broadly, across species, transcriptional activity is strongly driven by its context: by tissue (‘histology’, ‘organism part’ and, by extension, ‘cell line’), followed by developmental stage and then cell type, while the main extrinsic drivers of transcriptional activity such as xenobiotic responses (‘compound treatment’) and disease states contribute to differential expression to a smaller extent. We can also observe that the number of differentially expressed genes is largely independent of the number of EFVs (the median factor value count is around 3 EFVs).
Distributions of differentially expressed genes over (A) experiments and (B) EFs. Error bars in (B) mark the 25% and 75% quantiles in the differentially expressed gene count for each EF.
Partial Protocol Preview
This section provides a glimpse into the protocol. The remaining content is hidden due to licensing restrictions, but the full text is available at the following link:
Access Free Full Text.
Kapushesky M., Emam I., Holloway E., Kurnosov P., Zorin A., Malone J., Rustici G., Williams E., Parkinson H, & Brazma A. (2009). Gene Expression Atlas at the European Bioinformatics Institute. Nucleic Acids Research, 38(Database issue), D690-D698.
Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.
As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.
About PubCompare
Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.
We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.
However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.
Ready to
get started?
Sign up for free.
Registration takes 20 seconds.
Available from any computer
No download required