To evaluate our annotation methodology, and to compare our GeneRIF results with the traditional OMIM resource in detail, we utilized a well-characterized fifty-gene collection by Homayouni et al. that they used to evaluate semantic indexing of gene functions [20 (link)]. This gene collection includes genes in the reelin signaling pathway of Alzheimer's disease and other genes important in cancer biology and development. We call it Homayouni gene collection from here on. The 5 genes with more than 50 diseases mapped to them (APOE, EGFR, ERBB2, TGFB1 and TP53) were excluded from the test set due to the large number of GeneRIFs requiring manual inspection. This evaluation was done on February 9th, 2006.
Assessing the false positive and false negative error rates for this collection was difficult [24 (link)], so several domain experts were used for scoring the results with all results reviewed by MID (internal medicine physician) who made the final error determination. To determine gene-disease relationships, a false positive was scored only when the disease was identified incorrectly. No effort was made here to assess the appropriateness of the GeneRIF because of the subjective nature of such a process. However for Table 3 estimates were used for calculating precision and recall rates whereby the overall false positive value was corrected to account for false positives arising when a correctly identified disease did not have a relationship to its associated gene as specified in the GeneRIF.
Free full text: Click here