The Dashboard is a component of the Pathway Tools software (8 (link)). Pathway Tools powers the BioCyc website, and Pathway Tools is used to construct the organism-specific databases, called Pathway/Genome Databases (PGDBs), that make up the BioCyc database collection. The panel and plot visualizations within the Dashboard are implemented using Google Charts https://developers.google.com/chart/, which in turn is implemented in Javascript. The Dashboard also contains client-side (web browser) components implemented in Javascript, and server-side components implemented in Common Lisp. The pathway and operon diagrams displayed by the Dashboard are generated by existing Pathway Tools algorithms, as is the enrichment analysis operation within the Dashboard.
The Dashboard software defines a mapping from each subsystem (plot) to one or more pathways and/or GO terms. When the Dashboard displays each plot, it dynamically retrieves gene or metabolite lists for each plot from the PGDB for the current organism. For example, it issues PGDB queries to determine what genes (if any) exist in the current organism for the pathway(s) or GO term(s) associated with each plot. More specifically, Dashboard-panel gene groups are obtained from pathways in the PGDB via a Pathway Tools built-in query that returns all genes coding for enzymes catalyzing reactions within a specified metabolic pathway. Similarly, Pathway Tools provides a built-in query for obtaining all genes annotated to a given GO term. When displaying the window of regulators, the Dashboard issues a built-in Pathway Tools query for obtaining a list of all transcriptional regulators of a given gene.
PGDBs within the BioCyc collection are highly variable in terms of the completeness of their GO term annotations and regulatory interactions, but the Dashboard is best suited for use with PGDBs with significant numbers of GO terms and regulatory interactions. Table 1 lists the 16 BioCyc databases containing more than 3000 GO term annotations, and the 10 BioCyc databases containing more than 500 transcriptional regulatory interactions. For our next BioCyc release in 2017 we have downloaded available GO term annotations from UniProt for all of the 42 Tier 2 BioCyc PGDBs (Tier 2 PGDBs have undergone a moderate amount of manual curation). GO annotations will be available for even more organisms in the future.
Given a set of genes (the user could specify all genes, or a set of genes whose changes are computed to be statistically significant), the Dashboard computes an enrichment p-value for every subsystem using a Lisp implementation of Grossmann’s parent–child-union analysis, a variation of the Fisher-exact test in which the enrichment of a given subsystem is determined relative to its parent subsystem rather than to the entire population (10 (link)). An optional multiple-hypothesis correction (options are Bonferroni, Benjamini-Hochberg or Benjamini-Yekutieli corrections, with no correction being the default) may be applied. The enrichment p-value is then converted to an enrichment score, −log(P-value).
Experimental designs that the Dashboard should be appropriate for include time-course experiments, dose-response experiments, and experiments that vary growth conditions. The Dashboard performs well up to 20 columns of data, but the display becomes cramped; that effect will be lessened on larger monitors.
This paper uses the analysis of two datasets to illustrate the application of the Dashboard toolset: a genome-wide transcriptome analysis of Thalassiosira pseudonana, and an E. coli gene-expression analysis of a 10 min time course following a shift from anaerobic to aerobic growth conditions.
Mock et al. performed a genome-wide transcriptome analysis on T. pseudonana strain CCMP 1335 under five different environmental conditions: low nitrate (low N), low silicic acid (low Si), low iron (Low Fe), low temperature (4°C) and high pH (9.4), with nutrient-replete cultures serving as reference conditions (11 (link)). Cultures were maintained in natural seawater that had been autoclaved and supplemented with 2 × f/2 nutrients minus one of the limited nutrient (Si, Fe or N) at 20°C and 100 μmol of photons m−2s−1. F/2 provides the major nutrients including N, Si and P, as well as trace metals and vitamins (12 (link)). Alkaline pH condition was obtained by increasing the pH of 2 × f/2 seawater to 9.4 by adding 1M NaOH. Temperature limitation was achieved by transferring a culture maintained in nutrient-replete 2 ×f/2 seawater at 20°C to 4°C for 24 h (11 (link)). All limitation experiments were conducted in parallel with nutrient-replete cultures. Cells were harvested for RNA when the growth rate began to decrease significantly relative to the control cultures. Differentially expressed genes include those that have a Bayesian t-test P-value ≤ 0.05, and a ≥2-fold difference in mRNA levels with respect to the control samples. Data are available under GEO accession GSE9697.
Methods from von Wulffen et al. (13 ): Escherichia coli K–12 strain W3110 was used in this study. Cells were grown anaerobically in defined medium at pH7 and 37°C in a stirred 3-l bio-reactor until the culture reached an OD (600 nm) of 3. At that point, the first three replicate samples were drawn and aeration was started subsequently at 1 l/min. At 0.5, 1, 2, 5 and 10 min after the onset of aeration additional samples were drawn from the three replicates.
Analysis of von Wulffen et al. data performed for this publication: raw gene counts were obtained from the GEO database (accession GSE71562). Replicates were averaged and were next normalized using the TPM (Transcripts per Kilobase Million) approach (14 (link)). Genes that had zero counts in more than 15% of the samples were removed from further analysis; in addition, two genes (ssrA and rnpB) with high expression values that compressed the scales of two panels were removed; see Supplementary File S1. Differentially expressed genes in the samples at 0.5, 1, 2, 5 and 10 minutes were identified with respect to zero time samples by applying a paired T.TEST analysis (computed with Excel). Samples with statistically significant changes (P-value ≤ 0.05) and at least a 2-fold increase or decrease in gene expression (for any time point relative to time zero) were retained; see Supplementary File S2. At 0.5 min, 33 genes were found to be differentially expressed versus 487 at 10 min in aerobic growth. Over the 10-min period 639 genes were identified as significantly differentially expressed; their read counts summed to 12% of the total normalized read counts.