Complete-Linkage Clustering in mutation3D Structural Analysis

The algorithm underlying the mutation3D web interface is complete-linkage (CL) clustering (Sørensen, 1948 ), a hierarchical clustering method in which clusters first comprise single elements and are then merged with nearest neighboring clusters or unassigned elements until a single cluster comprises all elements. Notably, the clusters found by complete-linkage clustering, as opposed to single-linkage clustering (Sneath, 1957 (link)), are assured to have a diameter less than or equal to a specified linkage distance, which results in tight well-defined clusters. Because of this property, this method can also be referred to as furthest-neighbor clustering, since the dissimilarity of elements within a cluster is determined by the distance between the two elements furthest from each other in n-dimensional space.
In our implementation of this classic machine learning algorithm, we cluster the three-dimensional locations of the α-carbons of those amino acids whose codons contain missense mutations. The coordinates of all atoms within proteins were derived from both PDB structures and structural models (Pieper, et al., 2011 ) based on PDB entries covering proteins either in part or in full. For any given protein, many overlapping models may be available from either or both sources. mutation3D will invariably use entries from the PDB when they are available, as these experimentally determined crystal structures are considered to be a ‘gold standard’ in structural biology. To increase structural coverage of the proteome, the user may also select a subset of homology-based models to include, based upon several quality metrics available via the Advanced Query page (Supp. Note S2). Once a set of PDB structures and structural models has been established for a single protein, mutation3D attempts to cluster amino acid substitutions on all models separately, and reports any model or experimentally determined structure in which a cluster has been found. In our analyses we consider it sufficient to implicate a protein in cancer if any of its models are found to contain a cluster.
Some whole proteins or regions of proteins may not have been crystallized or modeled to-date. Owing to the lack of structural coordinates in these regions, we would be unable to identify clusters of mutations. There are some cases in which a single genomic mutation may give rise to defects in distinct proteins, in which case mutation3D will attempt to find clusters across all proteins and models for which this mutation has an effect on protein products.
Users may elect to set the CL-distance, or the maximum allowable distance between α-carbons in a cluster of substituted amino acids. We refer to this as the maximum cluster diameter as this is equivalent to the maximum allowable diameter in Angstroms of a sphere encapsulating all α-carbons in a cluster. With regard to the complete linkage clustering algorithm, the CL-distance is the maximal dissimilarity between elements, after which, no new merging of elements and groups of elements occurs. In mutation3D, we call this parameter the Maximum Clustering Diameter, which is measured in Angstroms, and represents the maximum distance between amino acid substitutions after which no further merging of single mutations with clusters occurs and clusters are assigned based on current hierarchical groupings of mutations. For more information on all algorithm parameters and their default values, see Supp. Notes S2 and S3.

Partial Protocol Preview
This section provides a glimpse into the protocol.
The remaining content is hidden due to licensing restrictions, but the full text is available at the following link: Access Free Full Text.

Meyer M.J., Lapcevic R., Romero A.E., Yoon M., Das J., Beltrán J.F., Mort M., Stenson P.D., Cooper D.N., Paccanaro A, & Yu H. (2016). mutation3D: cancer gene prediction through atomic clustering of coding variants in the structural proteome. Human mutation, 37(5), 447-456.

Publication 2016

A protein Amino acid substitutions Amino acids Cancer Carbons Carbons acids Codons Gold Missense mutations Mutations Protein Proteins regions Proteome

Corresponding Organization : Cornell University

Other organizations : Tri-Institutional PhD Program in Chemical Biology, Royal Holloway University of London, Cardiff University

Top 5 similar protocols

Protocol cited in 9 other protocols

Variable analysis

independent variables

The maximum allowable distance between α-carbons in a cluster of substituted amino acids, referred to as the 'Maximum Clustering Diameter'

dependent variables

Clustering of amino acid substitutions on protein models and experimentally determined structures
Identification of clusters of mutations in proteins, which are used to implicate a protein in cancer

control variables

PDB structures, which are considered the 'gold standard' in structural biology and are used as the primary source of structural coordinates
Homology-based structural models, which are used to increase structural coverage of the proteome when PDB structures are not available

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!