Data were compiled in two different tables as primary and secondary data. The manually curated information such as PubMed ID, CPP sequences, name, their category, chirality, nature, ends modifications, length and other relevant experimental information like uptake efficiency, uptake mechanism, sub-cellular localizations, model systems used for CPP testing, cargo types, etc. were organized as the primary information.
We have derived other important information from the primary data like physiochemical properties and amino acid composition of CPPs. This information was stored as secondary information in the database. Since structure plays a major role in defining the function of a peptide, we also performed the structural annotation of the peptides present in CPPsite 2.0. We followed a systematic approach for performing structural annotation. First, if the peptide was already available in Protein Data Bank (PDB) (25 (link)), we assigned the same structure to that peptide as present in PDB. We mapped the sequences of all the peptides of CPPsite 2.0 to that of PDB sequences to identify such peptides. If the peptide was not available in PDB, we predicted the structure of those peptides using structure prediction techniques. Peptides with length ranging between 5 and 30 residues were predicted using the web-service PEPstrMOD (in parallel communication). PEPstrMOD is an updated version of PEPstr method (26 (link)), which predicts the tertiary structure of peptides. PEPstrMOD is capable of handling peptides with natural as well as non-natural or modified amino acids. Many peptides in CPPsite2 contains modified residues like Ornithine, β-alanine, etc. which were predicted using PEPstrMOD which integrates force field libraries (FFNCAA (27 (link)), FFPTM (28 (link)) and SwissSideChain (29 (link)–30 (link)) to tackle non-natural residues. The peptides having natural amino acids but linked with fluorophores (used for labeling) at terminal residues or having other complex modifications were treated as natural and their structure was also predicted using PEPstrMOD.
Peptides having the length between 1 and 4 residues were predicted using an alternative approach. We used an extended conformation of the peptide (with phi and psi torsion angles of 180° for each residue) as initial structure, which is then subjected to energy minimization and molecular dynamics simulation to get the output predicted structure. The peptides with more than 30 natural amino acids were predicted using I-TASSER suite (31 (link)). I-TASSER (named as ‘Zhang-Server’) was among the best methods in the server category in recent CASP (2011 and 2010) experiments for the assessment of protein structure prediction (32 (link)).
The predicted tertiary structures of peptides were given as input to DSSP software (version 2.0.4) (33 (link)) which assigns eight types of secondary structure states. DSSP describes these states as helix (alpha helix (H), 3/10 helix (G) and pi helix (I)); strand (extended strand (E) and beta-bridge (B)); turn (T); bend (S) and loop (C).
We have derived other important information from the primary data like physiochemical properties and amino acid composition of CPPs. This information was stored as secondary information in the database. Since structure plays a major role in defining the function of a peptide, we also performed the structural annotation of the peptides present in CPPsite 2.0. We followed a systematic approach for performing structural annotation. First, if the peptide was already available in Protein Data Bank (PDB) (25 (link)), we assigned the same structure to that peptide as present in PDB. We mapped the sequences of all the peptides of CPPsite 2.0 to that of PDB sequences to identify such peptides. If the peptide was not available in PDB, we predicted the structure of those peptides using structure prediction techniques. Peptides with length ranging between 5 and 30 residues were predicted using the web-service PEPstrMOD (in parallel communication). PEPstrMOD is an updated version of PEPstr method (26 (link)), which predicts the tertiary structure of peptides. PEPstrMOD is capable of handling peptides with natural as well as non-natural or modified amino acids. Many peptides in CPPsite2 contains modified residues like Ornithine, β-alanine, etc. which were predicted using PEPstrMOD which integrates force field libraries (FFNCAA (27 (link)), FFPTM (28 (link)) and SwissSideChain (29 (link)–30 (link)) to tackle non-natural residues. The peptides having natural amino acids but linked with fluorophores (used for labeling) at terminal residues or having other complex modifications were treated as natural and their structure was also predicted using PEPstrMOD.
Peptides having the length between 1 and 4 residues were predicted using an alternative approach. We used an extended conformation of the peptide (with phi and psi torsion angles of 180° for each residue) as initial structure, which is then subjected to energy minimization and molecular dynamics simulation to get the output predicted structure. The peptides with more than 30 natural amino acids were predicted using I-TASSER suite (31 (link)). I-TASSER (named as ‘Zhang-Server’) was among the best methods in the server category in recent CASP (2011 and 2010) experiments for the assessment of protein structure prediction (32 (link)).
The predicted tertiary structures of peptides were given as input to DSSP software (version 2.0.4) (33 (link)) which assigns eight types of secondary structure states. DSSP describes these states as helix (alpha helix (H), 3/10 helix (G) and pi helix (I)); strand (extended strand (E) and beta-bridge (B)); turn (T); bend (S) and loop (C).