We defined a calpain cleavage peptide CCP(m, n) as a cleavage bond flanked by m residues upstream and n residues downstream. As previously described [23] (link), [24] (link), we regarded all experimentally verified cleavage sites as positive data (+), while all other non-cleavage sites in the same substrates were taken as negative data (−). If a cleavage site locates at the N- or C-terminus of the protein and the length of the peptide is smaller than m+n, we added one or multiple “*” characters as pseudo amino acids to complement the CCP(m, n). The positive data (+) set for training might contain several homologous sites from homologous proteins. If the training data were highly redundant with too many homologous sites, the prediction accuracy would be overestimated. To avoid such overestimation, we clustered the protein sequences with a threshold of 40% identity by CD-HIT [25] (link). If two proteins were similar with ≥40% identity, we re-aligned the proteins with BL2SEQ, a program in the BLAST package [26] (link), and checked the results manually. If two calpain cleavage sites from two homologous proteins were at the same position after sequence alignment, only one item was preserved, the other was discarded. Finally, the non-redundant benchmark data set for training contained 368 positive sites from 130 unique substrates (Supplementary
Calpain Substrate Cleavage Site Identification
We defined a calpain cleavage peptide CCP(m, n) as a cleavage bond flanked by m residues upstream and n residues downstream. As previously described [23] (link), [24] (link), we regarded all experimentally verified cleavage sites as positive data (+), while all other non-cleavage sites in the same substrates were taken as negative data (−). If a cleavage site locates at the N- or C-terminus of the protein and the length of the peptide is smaller than m+n, we added one or multiple “*” characters as pseudo amino acids to complement the CCP(m, n). The positive data (+) set for training might contain several homologous sites from homologous proteins. If the training data were highly redundant with too many homologous sites, the prediction accuracy would be overestimated. To avoid such overestimation, we clustered the protein sequences with a threshold of 40% identity by CD-HIT [25] (link). If two proteins were similar with ≥40% identity, we re-aligned the proteins with BL2SEQ, a program in the BLAST package [26] (link), and checked the results manually. If two calpain cleavage sites from two homologous proteins were at the same position after sequence alignment, only one item was preserved, the other was discarded. Finally, the non-redundant benchmark data set for training contained 368 positive sites from 130 unique substrates (Supplementary
Corresponding Organization : Sun Yat-sen University
Protocol cited in 9 other protocols
Variable analysis
- Keyword of "calpain" to obtain the experimentally verified calpain substrates with cleavage sites
- Experimentally verified calpain substrates with cleavage sites
- Protein sequences were retrieved from the UniProt database
- Cleavage sites at the N- or C-terminus of the protein and the length of the peptide is smaller than m+n, we added one or multiple "*" characters as pseudo amino acids to complement the CCP(m, n)
- Positive data (+) set for training might contain several homologous sites from homologous proteins, we clustered the protein sequences with a threshold of 40% identity by CD-HIT and re-aligned the proteins with BL2SEQ to remove redundant cleavage sites
- All experimentally verified cleavage sites were regarded as positive data (+)
- All other non-cleavage sites in the same substrates were taken as negative data (-)
Annotations
Based on most similar protocols
As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.
About PubCompare
Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.
We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.
However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.
Ready to get started?
Sign up for free.
Registration takes 20 seconds.
Available from any computer
No download required
Revolutionizing how scientists
search and build protocols!