UniProt accession codes (species) with any raw ChEMBL compounds (ligands,
decoys, marginal ligands, or marginal decoys). For only those accession
codes, structures were extracted using the ChEMBL to PDB mapping,
except P07700 was manually added to ADRB1 to include six more rare
structures for that GPCR. This procedure neglects those PDB structures
that belong to an accession code having no ChEMBL compounds. For example,
of thymidine kinase (KITH) in the original DUD. This KITH structure
is from herpes virus (UniProt P03176), an accession code with no raw
compounds extracted from ChEMBL, and is thus not included in the ChEMBL/PDB
intersection used to construct the new DUD. Still, 5025 PDB codes
were sent to an updated DOCK Blaster pipeline for automated docking
preparation (
the binding site, but we were able to assign 565 additional ligands
by manually inspecting over 1300 structures. Ultimately, 3692 structures
completed input grid preparation, and all but two finished docking
and enrichment analysis. Clustered ligands sets were docked to property-matched
decoys (both described below) using ECFP4 fingerprints and removing
the most similar 75% of queried decoys. DOCK 3.6 was run using SEV
ligand desolvation (as below). For each target, enrichment, resolution,
and organism were collected and sorted by enrichment in pdb_analyze.txt,
available online at
on the selection process are recorded in pdb_selection.txt, and the
picked structure is listed in pdb_blessed.txt. AA2AR and DRD3 docking
preparations were provided by Jens Carlson,44 (link),45 CXCR4 partially by Dahlia Weiss,3 (link) ADRB1
by Peter Kolb (personal communication), and AMPC by Sarah Barelier,
Oliv Eidam, and Inbar Fish (unpublished results).