To quantify similarity of two PPI sites, which are represented by a set of patches, we optimize pairing of patches from the two PPI sites so that the following score (distance), PatchScore, is minimized:
The first term,
pDist, is a weighted sum of the Euclidean distance between 3DZD for matched patch A and B.
The index
i denotes physicochemical features of a patch, which are 3D shape, electrostatic potential, visibility, hydrogen-bond acceptors/donor distribution, and hydrophobicity. Relative weights of the features,
wi, were trained on the protein structures that bind to multiple partner proteins, called a hub protein set, extracted from the PiSite database (Higurashi et al., 2009 (
link)). PiSite collects structures of protein complexes that share a common component protein and provides protein-protein interaction sites at a residue level. The details of the training will be described in the next section.
The second term of Eq.
5 is the root-mean-square deviation (RMSD) of the seed points of the matched patches. The coordinates of seed points of matched patches on each PPI site are extracted and superimposed to calculate the RMSD. The last term of Eq.
5 is called
APPD, an abbreviation of Approximate Patch Position Difference:
APP is a histogram of the geodesic distance from a seed point to other seed points in the given PPI site. The bin size was set to 1.0 Å.
APP represents an approximate position of a patch in the PPI surface, i.e., the patch is placed in the middle or edge of a PPI site.
To search similar patch pairs between PPI sites, a modified version of the auction algorithm, a bipartite matching method, is used (Sael and Kihara, 2010b (
link)). The algorithm minimizes the PatchScore (Eq.
5) by matching similar patches pairs iteratively. Once the correspondence of surface patches is finalized, the overall similarity between the two PPI sites is calculated as a different score, the PPI Score:
where the
Avg(X) is an average of a term X in Eq.
5 over all matched pairs. All the weights (
kis) in Eq.
8 were optimized to maximize the benchmark performance measured by the hub protein set, which will be described in the next section. The last term,
SD, refers to the Size Difference between the PPI sites, which is inferred by counting the number of patches on each PPI site. The term is determined by dividing the difference between the number of patches of two PPI sites by the number of surface patches of the larger PPI site. PPI Score is a distance metric; similar PPI sites have a small value.
Shin W.H., Kumazawa K., Imai K., Hirokawa T, & Kihara D. (2023). Quantitative comparison of protein-protein interaction interface using physicochemical feature-based descriptors of surface patches. Frontiers in Molecular Biosciences, 10, 1110567.