Approximately 1,600 journal articles previously manually curated for CTD were used as a baseline data set, or "gold standard," to evaluate the performance of our prototype text-mining applications. These documents were a subset of the approximately 25,000 documents reviewed by biocurators since CTD manual curation began in 2005. The 1,600 documents contained 6,664 curated actors, including chemicals, genes, and diseases and represented data for 10 different priority chemicals: urethane, aspartame, 2-acetylaminofluorene, cyclophosphamide, indomethacin, aniline, raloxifene, amsacrine, phenacetin, and doxorubicin.
Free full text: Click here