we property-matched decoys to ligands using molecular weight, estimated
water–octanol partition coefficient (miLogP), rotatable bonds,
hydrogen bond acceptors, and hydrogen bond donors, plus we added net
charge. We generated all ligand protonation states in pH range 6–8
using Schrödinger’s Epik with arguments “-ph
7.0 -pht 1.0 -tp 0.20” (
Figure S1C
Molinspiration’s mib. Over all the protonated forms of a given
ligand, we kept only those with a unique set of the six physicochemical
properties. For each of these unique property sets, we aimed to generate
50 matched decoys. For example, a single input ligand predicted to
have two alternate charges would get 50 decoys property-matched to
each charge. To accomplish this, a pool of decoys was selected from
ZINC46 (link) using a dynamic protocol that adapted
to local chemical space by narrowing or widening windows in seven
steps around the six properties. The goal was to return 3000–9000
potential decoys that matched the decoy’s reference protonation
state (predicted most prevalent form at pH 7.05). In the final decoy
procedure, ECFP4 fingerprints were generated by Scitegic’s
Pipeline Pilot for ligands and potential decoys. The decoys were sorted
by their maximum Tc to any ligand, and
the most dissimilar 25% were retained through this dissimilarity filter.
We then remove duplicate decoys from the ligand set by sorting decoys
from least to most duplicated and assigned each decoy to the protonated
ligand which has the least number of decoys already assigned. This
ensures unique decoys were spread across the ligands as evenly as
possible. Finally, if available, 50 decoys were picked randomly from
this deduplicated list.