Statistical enrichment of ontology terms is dependent upon the genome-wide gene set used in the analysis. GREAT currently supports testing of human (Homo sapiens NCBI Build 36.1, or UCSC hg18) and mouse (Mus musculus NCBI Build 37, or UCSC mm9). To limit the gene sets to only high-confidence genes and gene predictions, we use only the subset of the UCSC Known Genes45 that are protein coding, are on assembled chromosomes and possess at least one meaningful GO annotation14 (link). GO is an ontological representation of information related to the biological processes, cellular components and molecular functions of genes. We rely on the idea that if a gene has been annotated for function it should be included in the gene set, and if no function has been ascribed to a gene its status may be unclear and thus it is best omitted from the gene set. In GREAT version 1.1.3, we use GO data downloaded on 5 March 2009 for human and 23 March 2009 for mouse, leading to gene sets of 17,217 and 17,506 genes for human and mouse, respectively.
A single gene may have multiple splice variants. As annotations are generally given at the gene level, GREAT uses a single transcription start site (TSS) to specify the location of each gene. The TSS used is that of the ‘canonical isoform’ of the gene as defined by the UCSC Known Genes track45 .