The benchmarking assessment involves assignment of positive and negative compound sets. The DrugBank database32 (link) was used to derive the positive set. 771 compounds having the word “oral” in their “Route of Administration” field were selected. Whilst we endeavoured to obtain a truly independent positive set for the benchmark inevitably significant overlap was found between the DrugBank set and the drugs used to derive QED. 554 of the 771 compounds were structurally identical and a further 30 had significant structural similarity (Tanimoto score > 0.8). Small molecule ligands from the Protein Data Bank’s (PDB’s) Ligand Dictionary50 was selected as the negative set as it provides a large and diverse source of chemical tools, metabolites, natural products, crystallographic buffers as well as drugs. To prevent ambiguity, 475 compounds were removed that had significant structural similarity to the positive set (Tanimoto score > 0.8), leaving a negative set of 10,250.