Most keywords with which UniProt proteins are annotated were originally defined manually by database curators. They are automatically transferred to homologous proteins according to various rules developed within UniProt (1 (link)). The keyword annotation similarity between two proteins x, y with keyword lists Kx and Ky is defined in the exact same way as the GO annotation similarity while defining sim(a, b) = I(a = b), with indicator function I( · ). This yields sim(x, y) = 2|KxKy|/(|Kx| + |Ky|). The keywords in the UniProt knowledge base describe functional features in categories such as molecular function, domain, biological process, ligand and cellular component. We ignore keyword categories technical term and coding sequence diversity, and keywords provided by the UniProt automatic annotation team that do not describe biological functions.
Free full text: Click here