The largest database of trusted experimental protocols
> Chemicals & Drugs > Amino Acid > Peptide L

Peptide L

Peptide L is a small peptide molecule that plays a key role in various biological processes.
Discover how the AI-driven platform PubCompare.ai can enhance the reproducibility and accuracy of your Peptide L research.
Locate relevant protocols from literature, pre-prints, and patents, and leverage the platform's AI-driven comparisons to identify the best protocols and products.
Streamline your research process and improve outcomes with PubCompare.ai's powerful tool.
Note a typpo in the description for authenticity.

Most cited protocols related to «Peptide L»

Consider a macromolecular system of n (nonhydrogen) atoms with Cartesian coordinates. To rapidly evaluate the energy of a particular configuration of the system (including hydrogens), we will decompose the system into a collection of distinct chemical groups, {Ai}, consisting of atoms for which the protonation state is unknown and a set P, the part of the system for which there is assumed to be no uncertainty regarding its protonation state.
The decomposition proceeds as follows: implicitly break all bonds between 4-coordinated alkane sp3 carbon atoms and collect the resulting connected (bonded) groups of atoms. For proteins, this will leave the backbone intact, isolate the alkane carbons, and produce a collection of m-methylamide (Asn, Gln), thiomethanol (Cys), methylimidazoles (His), methylguanidinium (Arg), methyl carboxylic acids (Asp, Glu), methanol (Ser, Thr), indole (Trp), methylphenol (Tyr) and methylbenzene (Phe), methylamine (Lys), and thioether (Met) groups. A special case disconnection of the standard termini will produce a methyl amine (N terminus) and a methyl carboxylic acid (C terminus). Solvent and disconnected ions are considered to be separate groups. Collect the backbone and isolated alkane atoms into a set, P, the “known” portion of the system. The remaining atoms in the chemical groups are collected (by connectivity) into m sets, {Ai}, the sets of the atoms for which there is uncertainty with respect to their protonation geometry, tautomer, or ionization state. This decomposition procedure assumes that alkane carbons and the protein peptide backbone have a known protonation state. In principle, any partitioning method can be used by Protonate3D provided that (relatively) apolar bonds are used to divide the system. The reason for this has to do with the thermodynamic approximations and the calculation of partial charges (which will be described later).
The hydrogen atoms of the heavy atoms of P (the “known” atoms) are added at standard bond lengths and angles according to the hybridization state of the atoms; for example, the backbone nitrogen in nonproline peptide bonds is given one hydrogen in the peptide plane; the Cα of nonglycine residues is given one hydrogen placed in an ideal tetrahedral geometry; sp3 carbons with two heavy neighbors (e.g., Cβ of Glu) are given two hydrogens placed at ideal tetrahedral geometry; terminal methyls are given three hydrogens in tetrahedral geometry in staggered conformation with respect to their (necessarily) alkane carbon neighbors. Henceforth, P will denote the hydrogen augmented set of atoms in the “known” part of the macromolecule.
For each chemical group Ai, we generate a finite collection Si = {Ai1,Ai2,…} of states consisting of the heavy atoms, flipped states, and all rotamer, tautomer, and ionization/protonation combinations of hydrogen atoms (see Fig. 1). In general, the states of chemical groups are generated according to a parameter file containing definitions of each chemical group and all of their topological tautomer and ionization states. The parameter file also contains, for each state, a tautomer strain energy (to provide for tautomer preferences). Rotamer (conformational) strain energy of each state is also considered and generated from force field parameter files such as OPLS-AA18 by applying the dihedral energy terms to the fragment geometry (as though still connected to P) and the intrafragment van der Waals energy terms (interfragment energies are handled by the matrix formulation of Eq. (1), later).
For proteins, the sp3 carbon atoms with two heavy neighbors are given hydrogens in a similar manner to the carbons of P; sp2 carbon atoms with one heavy neighbor (e.g., aromatic carbons) are given one hydrogen at standard bond lengths and angles in the π system plane. Primary amides are given two hydrogens at standard planar geometry; planar nitrogen atoms with two heavy neighbors and one hydrogen has that hydrogen placed in-plane at standard bond lengths and angles. The polar hydrogens and terminal methyls are given hydrogens appropriate to their ionization state and hybridization at standard bond lengths and angles. The dihedral combinations are determined according to the chemical type of the heavy atom: hydrogens in hydroxyls and thiols are sampled at 60° dihedral increments starting at a staggered rotamer; phenol hydrogens and other conjugated hydroxyls are sampled at 30° dihedral increments starting at an in-plane rotamer; methyls and primary amines are sampled at 60° dihedral increments starting at an extended conformation; hydrogens on other terminal atoms are given similar geometries. The anionic state of phenols, alcohols, thiols, and indoles are generated in addition to the neutral forms. The flip states of terminal amides, sulfonamides, and phosphonamides are generated. The anionic state and both neutral tautomers of carboxylic acids are generated (with the hydrogen cis to the carbonyl oxygen). Primary amines are generated in neutral and cationic forms and dihedral angles sampled at 60° increments starting at a staggered rotamer. Imidazoles are generated in anionic, cationic and two neutral tautomers (HID and HIE) as well as in flipped states (for a total of eight states). The states neutral of guanidines consist of all planar tautomers and rotamers. Water states consist of ∼500 rigid body orientations and isolated metals are given appropriate ionization states for groups I and II and a collection of ionization states from {+1,+2,+3} for transition metals under the assumption of zero ionization potential.
Thus, each Aij consists of an all-atom chemical group with an appropriate ionization state, the heavy atoms, all of its hydrogen atoms in reasonable geometry and has an associated internal energy, sij, consisting of the sum of its conformational and tautomeric energy. Figure 1 depicts a hypothetical fixed part P (with known protonation state and geometry) of a macromolecular system and three chemical groups each with a collection Si of alternative protonation states; A1 has four alternative states, A2 has two states, and A3 has three states.
To represent the state ensemble of the system, arrange all of the individual chemical group states in all of the {Si} into single state list, S, divided into contiguous blocks corresponding to the {Si}, each of length mi = |Si|.
The first block of m1 elements in the list are the states of chemical group 1, the next block of m2 elements in the list are the states of group 2, and so on. (The reason for this arrangement will become clear shortly.) A configuration of the entire system consists of a selection of exactly one particular state from each block associated with a chemical group. Thus, there are a total of m1 × m2 × m3 × … configurations of the system. In typical proteins, the number of configurations exceeds 10100. A binary vector x of length equal to the length of the list S conveniently encodes a configuration, with a value 1 denoting the selection of an individual state. For example, in Figure 1, the vector x = (0,1,0,0,1,0,0,0,1) denotes the configuration state 2 from group 1, state 1 from group 2, and state 3 from group 3; to see this, introduce dividers into x corresponding to the blocks: x = (0,1,0,0 | 1,0 | 0,0,1), so that the position of the 1 value within each block (counting from the left) indicates the number of the state within the group. Admissible, or permitted, configuration vectors, x, have the property that there is exactly one 1 value in each block corresponding to a chemical group; this means that an admissible configuration vector encodes a definite single state for each chemical group. This constraint giving rise to the admissible configuration vectors is called the unary constraint, inspired by unary (base 1) notation of numbers in which “1” = 1, “10” = 2, “100” = 3, “1000” = 4, “10,000” = 5, and so on.
Suppose that we are given a pairwise interaction energy function f(i,j), for atoms i and j (e.g., Coulomb's law or a Lennard-Jones van der Waals potential), without loss of generality, we will assume that f(i,i) is well defined (e.g., for Coulomb's law, f(i,i) = 0). If X and Y are two disjoint sets of atoms (e.g., two chemical states), then the interaction energy between X and Y is
Form a matrix U with entries equal to the interaction energy of the various chemical group states in the list S. We will take the interaction energy between two states of the same chemical group to be zero. For notational convenience, let I(k) denote the chemical group to which state k belongs. Thus, the matrix U will have Uij = f(Ai,Bj) if I(i) ≠ I(j) and 0 otherwise. Form a vector u with entries ui = f(P,Ai) + si, the interaction energy between a chemical group state and the known part of the protein, P, and the internal energy of the state, si (to be described later). Let u0 = f(P,P)/2, the (constant) internal interaction energy of the known part of the protein P. With this matrix notation, we can write the total energy of a particular configuration encoded by admissible binary vector, x, compactly (and efficiently) with
Thus, the total energy of a configuration of the system specified by x can be evaluated by a multidimensional quadratic form. If all of the values of u and U are calculated in advance, then a matrix–vector multiplication and two inner products are all that is required to evaluate the total energy for any arbitrary configuration of the system. Finding the optimal configuration of the system now is a matter of finding the smallest value of the quadratic form E over all binary vectors x satisfying the unary constraint; this optimization problem is called the “Unary Quadratic Optimization” problem.
Postponing the details of the energy model, the algorithmic structure of Protonate3D is (a more detailed set of steps is given at the end of this section):
The addition of many (more than 20) water molecules (each with ∼500 orientations) becomes impractical. As a result, most of the water molecules are typically left out of the preceding steps and oriented afterward. This is done by orienting the waters one by one proceeding from the water in the strongest electrostatic field (of the protein and previously oriented waters) to the weakest. The selection of water molecules to include in the main calculation is left to the user—typically, water molecules near the sites of interest are treated in the main calculation.
The Unary quadratic optimization algorithm used by Protonate3D proceeds as follows. First, a dead-end elimination14 (link) procedure is applied to eliminate states that cannot possibly be part of the optimal solution. This has the effect of reducing the dimensions of the U matrix and u vector of the quadratic energy function in a provably correct way. Suppose, elements r and s of the list S belong to the same chemical group X; if (where the sum extends over all chemical groups Y different from X) we can eliminate state r. The dead-end elimination criterion, when satisfied, eliminates r because no matter what state assignment is made, some state X, different from r, will result in a lower energy. This criterion is applied repeatedly until no more elimination is possible. Typically, the majority of the configurations are eliminated a priori, but it is still not practical to conduct a brute force search over the remaining configurations.
In an effort to speed up the state space search to follow, a “Mean Field Theory” calculation is performed to produce a Boltzmann distribution over all of the remaining individual chemical group states. This results in an estimate of the probability of each state in the Boltzmann-weighted ensemble of configurations. Briefly, the state probabilities pk are determined by solving the nonlinear equation. where p is the probability vector; U and u are as in Eq. (1); ek is a vector of all zeros and a single 1 at position k; and β = 1/kT. The nonlinear equation can be solved efficiently by successive feedback iteration. These probabilities, p, are the population probabilities of the individual states under the assumption that each state feels the Boltzmann weighted average interactions of the other states. The vector p is used as a heuristic state priority in the subsequent search over states; the idea is to investigate high mean field probability states first under the assumption that they will lead to low energy configurations of the entire system (an approximate best-first search). The mean field probabilities, p, only affect the run-time of the state-space search and not its correctness; moreover, the energy of a system is evaluated using Eq. (1), which does not depend on p. The value of β must be chosen carefully to guarantee the uniqueness of p; in general, the solutions to Eq. (3) depend on the starting p vector. However, for certain values of β, the solution will be independent of the starting point (see the Appendix) and consequently p can be initialized with a uniform distribution on the states of each chemical group.
Finally, a recursive tree search is conducted over all admissible binary vectors, x, to locate the lowest energy state as calculated by Eq. (1) (which provides for rapid evaluation of energies). The performance of the search depends critically on the ability to prune the search space without loss of correctness. At any given point in the search, some of the elements of x, corresponding to some set of groups, G, will be assigned and others are yet to be assigned (with zero values). A lower bound, L(x), on the minimum energy of the system assuming the assigned part of x is
If this lower bound value exceeds the energy of the best energy determined thus far, then no further search of configurations containing the assigned part of x is required, thereby pruning the search tree and bypassing the examination of descendant configurations. During the recursive search, trial elements of the unassigned portion of x are made in decreasing order of the mean field probability. This greatly improves the pruning performance of the lower bound because the likelihood of visiting the best configurations first is increased. Moreover, premature termination of the search will produce the best solution with high probability.
The pseudocode for the recursive tree search procedure is given in Figure 2.
We now turn to the energy model for the macromolecular system. We will use an energy model that contains van der Waals repulsion, Coulomb electrostatic, and Generalized Born implicit solvation energies. Use of the Poisson-Boltzmann Equation (PBE) was not attempted because it was felt that the run-time would be prohibitively long for large systems, requiring at least one PBE solution per state. The van der Waals and Coulomb functional forms terms are pairwise and fit neatly into the quadratic form of Eq. (1); however, the Generalized Born model is not a two-body potential and certain approximations will be used to reformulate it into an effective two-body potential. In addition, because the number of particles may change upon ionizing a chemical group, we must introduce free energy terms related to group titration (because potential energies cannot be compared for systems with different numbers of particles).
Each atom of the system, whether in the known part, P, or in one of the group states {Aij} has associated van der Waals radius, van der Waals well depth parameters, as well as a partial charge. The van der Waals parameters and partial charges are permitted to depend on the particular tautomer, rotamer, or ionization state of each chemical group. In the interests of efficiency, we impose the requirement that the van der Waals parameters and partial charge assignments of one chemical group do not depend on the particular state selection of another chemical group. In particular, we require that the partial charge model be a nonpolarizable charge model (see the titration theory, later). The decomposition of the system along apolar bonds is done to reduce the potential adverse impact of these independence requirements.
Protonate3D uses a slightly modified version of MMFF9419 partial charges because (a) the MMFF94 charge model is based on fixed (topological) bond charge increments; (b) the chemical contexts for atom types in MMFF94 do not cross sp3 carbon atoms; (c) the bond charge increment between sp3 carbon bonds is zero (a purely apolar bond); and (d) MMFF94 supports general organic compounds. The slight modification to the MMFF94 charge model is that the normal zero bond charge increment between alkane hydrogens and carbons was replaced with a bond charge increment of 0.08 electrons, in better agreement with protein force field partial charges such as AMBER.20 Protonate3D uses Engh–Huber21 van der Waals parameters; however, hydrogens on oxygen and nitrogen are taken to have zero van der Waals radius, consistent with OPLS-AA. Coulomb's law is used for electrostatic interactions and special form of van der Waals interaction is used: only the repulsive part of the van der Waals interaction energy is modeled (although, the standard Lennard-Jones functions with the attractive term are not precluded). The special functional form is 800εij (1 − r/Rij),3 (link) where r < Rij is the interatomic separation, Rij is the sum of the van der Waals radii, and εij is the geometric mean of the van der Waals well depth parameters for the two interacting atoms. Because of the 800 factor derived from a series expansion, this functional form lies in between the 12-6 and 9-6 Lennard-Jones functions at distances below the optimal interaction distance and approximates the 12-6 form well near the energy minimum. Because the OPLS-AA van der Waals parameters for polar hydrogen atoms are zero, the van der Waals terms are used by Protonate3D to handle side-chain “flip” states; the special form was used largely to mimic the sphere overlap test of Reduce.7 (link) The elements of U matrix and u vector are populated by a straightforward application of the pairwise formulae given previously.
Protonate3D uses a modified version of the Generalized Born/Volume Integral (GB/VI) formalism22 (link) for implicit solvent electrostatics (although other Generalized Born models are not precluded):
In this equation, ε is the dielectric constant of the interior of a solute, εsol is the dielectric constant of the solvent, {γi} are (topological) atom-type-dependent constants that account for nonpolar energies including cavitation and dispersion using an inverse sixth-power integral instead of surface area, {Ri} are (topological) atom-type-dependent solvation radii, κ is the Debye ionic screening parameter that depends on salt concentration, {qi} are the atomic partial charges, {Bi} are the Born self-energies (inversely proportional to the Born radii), which are estimated with a pairwise sphere approximation23 to the solute cavity, and rij denotes the distance between atoms i and j. Were it not for the {Bi}, the GB/VI equations would be a pairwise potential; however, because the Bi of a particular atom i depends on the state assignment of atoms in other chemical groups with possibly unknown state, we must calculate a set of {Bi} that (a) remain fixed despite the protonation state of other groups and (b) reasonably preserve the GB/VI energy values.
Consider an atom k in the system (whether in P or in some state Aij). The contribution to Bk from all of the other atoms in the system will fall as the sixth power in the integrand of Eq. (7). Thus, atoms far away from k will contribute little, no matter if they are in some other group with unknown state. The various states in the system differ only in the position or absence of hydrogen atoms, which contribute relatively little to the volume integral (because of their small solvation radius); thus, the bulk of the states' contribution (from the heavy atoms) will be accurate no matter which state is selected. In any event, the approximation to the volume integral in the GB/VI is a pairwise summation of the form for a specific function22 (link) V. To minimize the impact of the hydrogen positions of the unknown states, Protonate3D uses a separate mean field approximation to the volume integrals. A separate U matrix and u vector is created containing only the van der Waals repulsion terms, the states' internal strain energies, and the pH-dependent isolated group titration energies (see later). For each separate group state, the mean field equation of Eq. (3) is then solved to produce a set of state probabilities p. Each atom in each group state as well as the known part P is given the probability of its chemical group state, or 1 if the atom is in P. The Born factors are then calculated with resulting in a mean field approximation to the Born factors that takes steric, rotamer/tautomer, and isolated group pKa free energies into account. This approximation works very well in practice; indeed, one can argue it is in some sense superior to the original in that it takes some protonation state flexibility into account. It should be noted that some GB implicit solvent models do not include hydrogens in the volume integration24 (link); consequently, we believe that our calculation of the mean field Born factors is eminently reasonable. In this way, we approximate the three-body GB/VI model with a close pairwise model more suited for the quadratic form of Eq. (1).
It remains to deal with the pH-dependent free energy of ionization of the chemical groups that must be included in the calculation. Consider the free energy, a, of the reaction PAH → PA + H+, where AH is an acidic group bound (possibly covalently) to a macromolecule P. Our approach is to introduce a thermodynamic cycle linking the reaction to the isolated group reaction AH → A + H+, whose free energy will be assumed to be known. In the covalent case, we consider the thermodynamic cycle in which a = b + c + d. If the pKa of the reaction HAH → HA + H+ is known (say from experiment), then for a given pH, we have that c = −kT (log 10) (pH−pKa), where k is Boltzmann's constant and T is the temperature of the system. Because the (vertical) reaction equation H2 + PAH → PH + HAH is balanced and, by construction, E = ECOUL + ESOL is the free energy of charging and solvating the system, we may simply write
The case of a noncovalently bound group AH near a macromolecule P is simpler in that the H2 molecule is not required to balance the equation and, in this case,
We shall deal with the noncovalent case first, because it is simpler and provides insight into the covalent case. The noncovalent d is similar to b resulting in and using the fact that E(A + B) = E(A) + E(B) we have that
The superscript iso is used to signify that the E is calculated for the isolated AH and A systems (i.e., calculated with Born factors derived from the isolated system, ignoring P). These iso superscripted quantities involve only a small number of atoms—the atoms of AH and A—and direct evaluations of E are used to calculate the required energy. The iso superscripted quantities are included directly in the u vector of Eq. (1) for the corresponding group state so that b + d is simply a difference of configuration energies.
With a similar line of reasoning as in the noncovalent case, we find that as a result of cancellations of E(PH) and E(H2), for the covalent case and, as before, the iso superscripted quantities can be calculated directly (because few atoms are involved) and included in the u vector. In practice, the distinction between covalent and noncovalent groups makes only a small difference—on the order of 0.5 kcal/mol (∼2% error) for ionic species. A small correction to the experimental isolated pKa values for covalently bound species can account for most of this difference. In any event, the static nature of the entire calculation and the approximations inherent in a Generalized Born model will in all likelihood overshadow any lack of distinction between the cases.
The free energy c = −kT (log 10) (pH − pKa) remains to be included in Eq. (1). Consider a polyprotic species AHn with pKa values pKi, corresponding to AHi → AHi−1. The free energy of the reaction AHi → AHi−1 + H+ is then ΔGi = −kT log 10 (pH − pKi). If we assign we will have that ΔGi = GiGi−1; thus, we can incorporate the Gi values into the relevant u vector entries for each acidic chemical group state with i titratable protons. The reasoning for the b and d quantities generalizes to polyprotic species and multiple-site titration straightforwardly, because of the overall pairwise nature of the energy terms that make up the effective configuration energy.
We now summarize the Protonate3D procedure:
This brings to a close the exposition of the Protonate3D methodology. Protonate3D was implemented in the Scientific Vector Language of the Molecular Operating Environment25 version 2006.08. Computational experiments were conducted on a 2 GHz Pentium IV processor running Microsoft Windows.
Full text: Click here
Publication 2008
Given a DNA/RNA/protein sequence S as expressed by

where R1 denotes the first residue of S, R2 the second one, and so forth; L is the length of the sequence. In order to deal with it by means of the existing machine-learning algorithms such as SVM (support vector machine) and NN (neural network), the sequence must be first converted into a dimension-fixed vector containing its key features, the so-called feature vector. However, this is by no means an easy job because different biological sequences may have different lengths with a huge number of possible sequence patterns.
Here, we are to propose a powerful web server, called Pse-in-One, by which users can generate all the possible pseudo components for DNA, RNA and protein sequences. It covers a total of 28 different modes, of which 14 for DNA sequences (5 (link)–7 (link),9 (link)–10 (link),14 (link)–17 (link)), six for RNA sequences (8 (link),18 ) and eight for protein sequences (2 (link)–3 (link),16 (link),19 (link)–21 (link)).
Pse-in-One contains three sub web servers: (1) PseDAC-General, (2) PseRAC-General and (3) PseAAC-General. Each of them contains three categories. The first one is to generate the pseudo components for the short-range or local sequence order information by counting the occurrence frequencies of the k nearest residues along the sequence S. The second and third categories are to generate the pseudo components for the long-range or global sequence order information by counting, respectively, the auto and special correlations of residues along the sequence as shown in Figure 1. Because of space limit, given below is only a brief introduction; for more details about the three sub web servers, see Supplementary Description S1.
PseDAC-General is the abbreviation for ‘pseudo deoxyribonucleic acid compositions for DNA sequences’. It contains 14 different modes to generate various feature vectors for DNA sequences, and can be grouped into the following three categories (Table 1).
The first category is of nucleic acid composition that contains two modes, basic Kmer (Kmer) (15 (link)) and reverse complementary kmer (RevKmer) (17 (link),22 (link)). Kmer means the subsequence of a DNA sequence containing k neighboring nucleic acids. The reverse complementary kmer is a variant of the kmer, in which the kmers are not expected to be strand-specific, so reverse complements are collapsed into a single feature. Therefore, both the Kmer and RevKmer can represent the local DNA sequence composition.
The second category is of autocorrelation that contains six modes, reflecting different correlation manners between two dinucleotides or trinucleotides along a DNA sequence via their physicochemical properties. Of the six modes, three (DAC, DCC and DACC) are based on the 148 physicochemical indices of dinucleotides extracted from (8 (link),14 (link)); and the other three (TAC, TCC and TACC) based on the 12 physicochemical indices of trinucleotides extracted from (8 (link)).
The third category is of pseudo nucleotide composition that contains six modes by incorporating the global or long-range sequence order information into the feature vectors via the physicochemical properties of dinucleotides or trinucleotides. Of the six modes, PseDNC is based on the six local DNA structural properties of dinucleotides; PseKNC extends the PseDNC to the level that can incorporate k-tuple nucleotides as well; PC-PseDNC-General and SC-PseDNC-General are two general modes based on the properties of dinucleotides, by which users can generate parallel correlation components and series correlation components, via not only selecting the properties from the 148 built-in indices but also uploading the properties defined by themselves; PC-PseTNC-General and SC-PseTNC-General are another two general modes but based on the properties of trinucleotides with 12 built-in indices, by which users can do the same as in PC-PseDNC-General and SC-PseDNC-General, respectively.
PseRAC-General is the abbreviation for ‘pseudo ribonucleic acid compositions for RNA sequences’. It contains six different modes to generate various feature vectors for RNA sequences, and can be grouped into the following three categories (Table 2).
The first category is of basic kmer, where the occurrence frequencies of k neighboring nucleic acids (kmers) are used to reflect the short-range or local sequence compositions of RNA.
The second category is of autocorrelation that contains three modes, reflecting the level of correlation between two dinucleotides along a RNA sequence in terms of their physicochemical properties. Of the three modes, one is of DAC, one is of DCC, and one is of DACC that combines DAC and DCC. Users can use each of these modes to generate their desired RNA feature vectors by selecting 22 built-in properties from (8 (link),14 (link)), and the properties defined by their own.
The third category is of pseudo nucleotide composition that contains two modes: PC-PseDNC-General and SC-PseDNC-General. The former can generate the parallel correlation (8 (link)) components for RNA sequences via the properties selected from 22 built-in physiochemical indices from (8 (link),14 (link)) or the user-defined properties, while the latter can generate the corresponding series correlation (8 (link)) components via the same manner.
PseAAC-General is the abbreviation for ‘pseudo amino acid composition for protein sequences’. It contains eight different modes to generate various feature vectors for protein sequences, and can be grouped into the following three categories (Table 3).
The first category is of basic kmer, where the occurrence frequencies of k neighboring amino acids (kmers) are used to reflect the short-range or local sequence compositions of protein.
The second category is of autocorrelation that contains three modes, reflecting three different manners in counting the correlations along a protein chain via the 547 amino acid physicochemical properties extracted from AAindex (19 (link)). Of the three, the first one is of auto covariance (AC) that incorporates the correlation of the same property between two amino acids; the second one is of cross covariance (CC) that incorporates the correlation of the different properties between two amino acids; and the third one is of auto-cross covariance (ACC) that is a combination of AC and CC. Besides, the three modes also have the function to generate the feature vectors by user-defined properties.
The third category is of pseudo amino acid composition for incorporating the global or long-range sequence order information of protein sequences into their feature vectors via the physicochemical properties of their constituent amino acids. It contains four modes: PC-PseAAC, SC-PseAAC, PC-PseAAC-General, and SC-PseAAC-General, where the first and second modes are generating the protein feature vectors by combining the amino acid composition and global sequence-order effects via parallel correlation (2 (link)) and series correlation (3 (link)) respectively; while the third and fourh modes are the general forms of PC-PseAAC and SC-PseAAC, meaning that, besides the aforementioned 547 physicochemical properties, they also allow to incorporate higher level information such as functional domain (FunD), gene ontology (GO), and sequential evolution (4 (link),23 (link)), as well as any user-defined properties.
Accordingly, with Pse-in-One, what users need to do is just to input DNA, RNA, or protein sequences along with their selected parameters. After clicking the Submit button, they can immediately obtain the desired feature vectors ready for most existing machine-learning algorithms to conduct varieties of analyses. Particularly, the feature vectors thus obtained can also be visualized via an intuitive graph called the ‘heat map’ shown on the screen, which is very useful for users to adjust their selected parameters. To our best knowledge, Pse-in-One is so far the first ever web server that can generate all the possible pseudo components for DNA, RNA, and protein sequences, as well as those even with the properties defined by users themselves. Therefore, it is very flexible with extremely high capacity and wide coverage, allowing users to have many choices to generate their desired pseudo components for in-depth analyzing varieties of DNA, RNA or protein/peptide sequences.
Full text: Click here
Publication 2015
The immunogenicity model is build based on the enrichment of amino acids in immunogenic versus non-immunogenic peptides and the importance scores of different positions of the MHC-I presented peptide (Table 2). For each MHC-I molecule, the impact on binding affinity was determined per position of the presented peptides (as explained in [40] (link)). The six positions with least impact on the binding affinity were defined as non-anchor positions, these six positions can differ for different MHC-I molecules that use different anchor positions. Only non-anchor positions were used to study differences in immunogenicity, as anchor positions might reflect a difference in binding affinity rather than a difference in immunogenicity. Per amino acid, the enrichment is calculated as the ratio between the fraction of that amino acid in the immunogenic versus non-immunogenic data sets. For instance, Tyr occurs with a frequency of 2.5% in immunogenic and 1.5% in non-immunogenic peptides, the enrichment in immunogenic peptides is 1.7-fold, and the natural logarithm of this enrichment is 0.54. We call this enrichment the log enrichment score. To predict the immunogenicity of a new pMHC, per non-anchor residue of the presented peptide the log enrichment score was found and weighted according to the importance of that position (measured as the Kullback-Leibler divergence; see Table 2). The weighted log enrichment scores of all (non-anchor) residues were summed, the resulting score was termed the immunogenicity score. The larger the immunogenicity score, the more the pMHC is like the immunogenic peptides and therefore expected to be immunogenic. The log enrichment scores of amino acids at anchor residues are masked, i.e. not used to derive the immunogenicity score. These assumptions resulted in the following formula to calculate the immunogenicity score, S, of a peptide ligand, L, presented on an HLA molecule, H: Where for every position p in the ligand L, the log enrichment score E for the amino acid at that position A(L,p) weighted by the importance of that position is summed. The eventual masking of anchor positions on that HLA is obtained by setting M(H,p) to 0.
The immunogenicity score model was tested in a 3-fold cross-validation experiment, where a random two-thirds of the data was used to calculate the log enrichment scores. These log enrichment scores, together with the position importance weights (Table 2) were then used to construct the immunogenicity score model as described above, and the other one-third of the data was used to test its performance. 25 Cross-validations were performed. Our final immunogenicity score model, that is used throughout this paper, is based on all non-redundant HLA class I presented peptides found in HLA-transgenic mice. As the selected non-redundant set of peptides varies slightly (explained above), the final model was constructed by repeating the non-redundancy selection and model building 100 times, and taking average log enrichment scores per amino acid from these 100 models. The final log enrichment scores, position importance weights and explanations on constructing the immunogenicity score model are given in Supplemental Table S1.
Full text: Click here
Publication 2013
Amino Acids Antigens Histocompatibility Antigens Class I Ligands Mice, Transgenic peptide I Peptides
Peak lists (38 058 spectra) were searched with Mascot 2.2 using the following parameters: enzyme = trypsin (allowing for cleavage before proline27 (link)); maximum missed cleavages = 2; variable modifications = carbamidomethylation of cysteine, oxidation of methionine; product mass tolerance = 0.5 Da. The International Protein Index (IPI) database version 337 (Mus musculus) was used as a protein sequence database. Common external contaminants from cRAP (a maintained list of contaminants, laboratory proteins and protein standards provided through the Global Proteome Machine Organisation, http://www.thegpm.org/crap/index.html, were appended. The compounded database contained 51 355 sequences and 23 635 027 residues. For FDR assessment, a separate decoy database was generated from the protein sequence database using the decoy.pl Perl script provided by Matrix Science. This script randomizes each entry, but retains the average amino acid composition and length of the entries.
Data was searched at 100 ppm peptide mass tolerance to evaluate the mass accuracy of the data set. After a correction25 (link) of a systematic mass deviation of 3 ppm, 90% and 99% of all PSMs with a Mascot score greater than 30 fell within a ±5 and ±20 ppm mass window, respectively. For the most stringent mass tolerance settings where Mascot thresholds are most sensitive, the data was searched at 20 ppm. Moreover, data was also searched at 500 ppm peptide mass tolerance to enable mass accuracy filtering combined with the adjusted MHT (Adjusted Mascot Threshold, AMT25 (link)). The mass deviation filter was set to 5 ppm, which was shown to be the most effective filter setting in combination with the AMT (Supporting Information Figure 1).
Publication 2009
Amino Acids Cysteine Cytokinesis Enzymes Feces Immune Tolerance Methionine Mice, House Peptides Proteins Proteome SH2B protein, human Strains Trypsin
Let i = (i1, i2, ..., iw) denote a sequence of amino acids, which has been extracted from a protein sequence. Let j denote the position in this window, j = 1...w. On basis of i, the hidden Markov model predicts if the center position of the window is annotated as part of an epitope. In the N- and C-termini, parts of the extracted windows are exceeding the terminals. For these residues, the character 'X' is used, which does not count when the hidden Markov model is used for the predictions. The prediction score for a window is given by
which is the log odds of the residue at the center position of the window is being part of an epitope (Epitope model) as opposed to if it is occurring by chance (Random model).
To construct the Random model, background frequencies of the Swiss-Prot database [23 (link)], qi, is used. For the Epitope model, pi,j is the effective amino acid probability of having amino acid i at position j according to the model.
To calculate the values of pi,j, all windows, for which their center position is annotated as part of an epitope, are extracted from atraining data set. Again, if an extracted window exceeds the N or C terminal, the character 'X' is used, which does not count when calculating the parameters.
These extracted peptide windows form a matrix of aligned peptides of the width w. From this alignment, pi,j is calculated as the pseudo count corrected probability of occurrence of amino acid i in column j, estimated as in [24 (link)]. To make the pseudo count correction, pseudo count frequencies, gi,j, are calculated. They are given by
where pk,j is the observed frequency of amino acid k in column j of the alignment [25 (link)]. The variable bi,k is the Blosum 62 substitution matrix frequency, e.g. the frequency of which i is aligned to k [26 (link)].
To give an example of using (2), let the window size, w = 1. The model is then only covering residues, which are annotated as being part of linear B-cell epitopes. If the observed peptides consists of the following single amino acid sequences L and V, with the frequencies pL,1 = 0.5 and pV,1 = 0.5, then the pseudo-count frequency for e.g. I is given by
The effective amino acid frequencies are calculated as a weighted average of the observed frequency and the pseudo count frequency,
Here, α is the effective number of sequences in the alignment - 1, and β is the pseudo count correction [25 (link)], which is also called the weight on low counts. To finish the calculation example, let β be very large as it is in this work. Then pI,1 gI,1 = 0.14.
Note that we shall use the term hidden Markov model throughout this work to refer to the weight matrix generated using (1). The parameters of the ungapped Markov model are calculated using a so-called Gibbs sampler, written by Nielsen et al. [24 (link)].
The result of applying (1) is a prediction score for every residue of the query sequence. To reduce fluctuations, a smoothing window is applied to every position. It is made asymmetric in the N- and C- termini in order to conserve prediction examples.
Full text: Click here
Publication 2006
Amino Acids Amino Acid Sequence Character Epitopes Epitopes, B-Lymphocyte Peptides

Most recents protocols related to «Peptide L»

In the training of the denoising network, we employed a composite loss scheme that includes denoising score matching (DSM) loss L dsm , peptide amino acid type loss L aa , and a series of auxiliary losses comprising peptide backbone position loss L bb , pairwise atomic distance loss L dist , side chain torsion angle loss L chi , and Cα atom clash loss L clash . The SE(3) DSM loss is given by:
i )∥ (24) where i ∈ {1, . . . , N lig }, t ∼ U[0, 1]. We utilized the weight schedule following previous works [44, 46] :
The peptide amino acid type loss L aa is calculated as a residue-averaged cross entropy loss for all peptide residues. For the auxiliary losses, the peptide backbone position loss is formulated as a mean squared error (MSE) loss on backbone atom positions:
where n ∈ {N, C, Cα, O}. The pairwise atomic distance loss is also a MSE loss on pairwise atomic distances:
where d dist = 6 Å, n, m ∈ {N, C, Cα, O}, and the normalizing factor Z dist is given by:
The side chain torsion angle loss is a MSE loss as computed in AlphaFold 2 [28] . We only considered χ 1 and χ 2 angles because χ 3 and χ 4 are relatively less informative and more challenging to predict accurately [105] . We empirically found that this side chain torsion angle loss helps the model better capture inter-residue interactions. The Cα atom clash loss L clash is defined as:
where we set the clash threshold d clash = 4 Å, and the normalizing factor Z clash is computed by Z clash = N rec × N lig . The Cα atom clash loss is utilized to minimize steric clashes between the target and the generated peptide ligand. The full training loss is formulated as:
where we set the loss weights w 1 = 2, w 2 = 0.25, w 3 = 0.25. We applied the amino acid type loss L aa and auxiliary losses primarily near t = 0 to encourage the model to learn fine-grained characteristics.
The denoising network consists of ∼104M parameters and was trained exclusively on the PepPC-F dataset using 8 NVIDIA A800 80GB GPUs. The training lasted ∼5 days. We employed the AdamW optimizer [106] with a learning rate of 1e-5. For multi-GPU training and inference, we used DistributedDataParallel implemented by PyTorch [107] .
Publication 2024
Given the amino acid sequences of a pair of MHC molecule and peptide, the task is a regression problem to predict the binding affinity between them. Here, the sequence of an MHC molecule of length L is represented as SMHC={s1,s2,,sL} , where each si represents one of the 20 amino acids, 1iL . Similarly, the sequence of a peptide of length L is represented as Spep={s1,s2,,sL} . Note that the sequence of an MHC molecule is usually simplified to a pseudo-sequence of length 34, i.e. it is a non-contiguous subsequence of the original sequence (Karosiene et al. 2013 (link), You et al. 2022 (link)). This MHC pseudo-sequence extracts amino acid residues, which are considered to be essential for the MHC–peptide binding (Karosiene et al. 2013 (link)), and consists of 15 residues in the alpha chain and 19 residues in the beta chain of the MHC molecule. In addition, since the length of most peptide sequences is less than 20, the length of all peptide sequences is padded or truncated to the length of 20, so as to maintain the consistency of the input dimension. Therefore, L and L have values of 34 and 20, respectively.
Full text: Click here
Publication 2024
For the synthesis of peptide-modified PEG-Peptide compounds, the Michael addition reaction according to Guo et al.49 (link) was used, whereby peptides were irreversibly bonded to PEG(Mal)4 moieties (Fig. 1E). Briefly, 1 g of PEG(Mal)4 was dissolved in 3 ml DPBS, followed by the addition of the desired amount of peptide dissolved in 4 ml DBPS under vigorous stirring and the exclusion of light. After 3 h, the reaction was stopped, the solution was transferred into dialysis tubes (molecular weight cut off: 6–8 kDa) and dialyzed for 24 h against 5 l of ultra-pure water with daily water changes. After dialysis, all compounds were frozen at −20 °C and then lyophilised, respectively. Table 1 depicts the concentrations of Peptide A, Peptide B and Peptide C yielding in a theoretical single-arm substitution of PEG(Mal)4 for PEG-Peptide A, PEG-Peptide B and PEG-Peptide C. In a last approach, all three peptides were bonded to one corresponding PEG(Mal)4 compound by the addition of Peptide A, Peptide B and Peptide C in the same concentration occupying theoretically 3 of 4 arms of PEG(Mal)4.
Full text: Click here
Publication 2024

Protocol full text hidden due to copyright restrictions

Open the protocol to access the free full text link

Publication 2024
For the irreversible coupling of the peptides to ALG and ADA, the well-known carbodiimide reaction was used, as described by Rowley et al.48 (link). In this regard, peptides were covalently coupled to ALG by reacting their amine groups with the carboxylic acid groups of ALG or ADA, which results in an irreversible amide bond formation (Fig. 1B and C). In general, a monomer activation of 5% was used, which corresponds to a ratio of 1 : 20 EDC to ALG monomers. EDC·HCl was stabilized with Sulfo-NHS in a ratio of 2 : 1. ALG-Peptide A, ALG-Peptide B, ALG-Peptide C and ALG-Peptide ABC with a theoretical degree of substitution of 0.25% was synthesized using Peptide A (MW: 810.92 g mol−1), Peptide B (MW: 1105.22 g mol−1) Peptide C (MW: 744.95 g mol−1) shown in Table 1. For the preparation of ALG-Peptide ABC, each peptide was coupled separately to ALG by adding a Peptide A, Peptide B and Peptide C containing solution subsequently after each other to the ALG solution aiming for a final peptide substitution degree of 0.75% (3 × 0.25% per peptide). Briefly, 1 g ALG was dissolved in 100 ml 0.1 M MES/0.3 M NaCl buffer at pH 6.5.
This buffer was produced by adding 21.33 g MES monohydrate and 17.53 g NaCl to 1 l ultra-pure water. The pH value was adjusted using a 1 M NaOH solution. To the dissolved ALG solution, 48.14 mg EDC·HCl and 24.07 mg Sulfo-NHS dissolved each in 1 ml MES buffer were added. The solution was allowed to stir for 15 min. Then, the desired amount of peptide dissolved in 1 ml buffer solution was added (see Table 1) and the solution was stirred for a further 24 h. Then, all products were filled into dialysis tubes (molecular weight cut off: 6 kDa–8 kDa) and dialyzed for 3 days against 6 l of ultra-pure water with daily water changes. After dialysis, all compounds were frozen at −20 °C and then lyophilized, respectively. The same procedure was applied for ADA which has a lower molecular weight compared to ALG, because the sodium ions being present in ALG were removed during the purification step of ADA. Therefore, different quantities of the reactants were needed: For 1 g ADA, 54.42 mg EDC·HCl and 30.82 mg Sulfo-NHS were used for the synthesis of ADA-Peptide A, ADA-Peptide B, ADA-Peptide C and ADA-Peptide ABC with a theoretical degree of substitution of 0.25% using the same method and the Peptide A, Peptide B and Peptide C (see Table 1).
For ALG-Peptide ABC and ADA-Peptide ABC all three peptides with a respective degree of substitution of 0.25% for each peptide and a final peptide concentration of 0.75% per gram ADA were used.
Full text: Click here
Publication 2024

Top products related to «Peptide L»

Sourced in United States, China, United Kingdom, Germany, Australia, Japan, Canada, Italy, France, Switzerland, New Zealand, Brazil, Belgium, India, Spain, Israel, Austria, Poland, Ireland, Sweden, Macao, Netherlands, Denmark, Cameroon, Singapore, Portugal, Argentina, Holy See (Vatican City State), Morocco, Uruguay, Mexico, Thailand, Sao Tome and Principe, Hungary, Panama, Hong Kong, Norway, United Arab Emirates, Czechia, Russian Federation, Chile, Moldova, Republic of, Gabon, Palestine, State of, Saudi Arabia, Senegal
Fetal Bovine Serum (FBS) is a cell culture supplement derived from the blood of bovine fetuses. FBS provides a source of proteins, growth factors, and other components that support the growth and maintenance of various cell types in in vitro cell culture applications.
Sourced in United States, United Kingdom, Germany, France, Canada, Switzerland, Italy, Australia, Belgium, China, Japan, Austria, Spain, Brazil, Israel, Sweden, Ireland, Netherlands, Gabon, Macao, New Zealand, Holy See (Vatican City State), Portugal, Poland, Argentina, Colombia, India, Denmark, Singapore, Panama, Finland, Cameroon
L-glutamine is an amino acid that is commonly used as a dietary supplement and in cell culture media. It serves as a source of nitrogen and supports cellular growth and metabolism.
Sourced in United States, United Kingdom, Germany, China, France, Canada, Australia, Japan, Switzerland, Italy, Belgium, Israel, Austria, Spain, Netherlands, Poland, Brazil, Denmark, Argentina, Sweden, New Zealand, Ireland, India, Gabon, Macao, Portugal, Czechia, Singapore, Norway, Thailand, Uruguay, Moldova, Republic of, Finland, Panama
Streptomycin is a broad-spectrum antibiotic used in laboratory settings. It functions as a protein synthesis inhibitor, targeting the 30S subunit of bacterial ribosomes, which plays a crucial role in the translation of genetic information into proteins. Streptomycin is commonly used in microbiological research and applications that require selective inhibition of bacterial growth.
Sourced in United States, United Kingdom, Germany, China, France, Canada, Japan, Australia, Switzerland, Italy, Israel, Belgium, Austria, Spain, Brazil, Netherlands, Gabon, Denmark, Poland, Ireland, New Zealand, Sweden, Argentina, India, Macao, Uruguay, Portugal, Holy See (Vatican City State), Czechia, Singapore, Panama, Thailand, Moldova, Republic of, Finland, Morocco
Penicillin is a type of antibiotic used in laboratory settings. It is a broad-spectrum antimicrobial agent effective against a variety of bacteria. Penicillin functions by disrupting the bacterial cell wall, leading to cell death.
Sourced in United States, Germany, United Kingdom, China, Canada, France, Japan, Australia, Switzerland, Israel, Italy, Belgium, Austria, Spain, Gabon, Ireland, New Zealand, Sweden, Netherlands, Denmark, Brazil, Macao, India, Singapore, Poland, Argentina, Cameroon, Uruguay, Morocco, Panama, Colombia, Holy See (Vatican City State), Hungary, Norway, Portugal, Mexico, Thailand, Palestine, State of, Finland, Moldova, Republic of, Jamaica, Czechia
Penicillin/streptomycin is a commonly used antibiotic solution for cell culture applications. It contains a combination of penicillin and streptomycin, which are broad-spectrum antibiotics that inhibit the growth of both Gram-positive and Gram-negative bacteria.
Sourced in United States, China, United Kingdom, Germany, France, Australia, Canada, Japan, Italy, Switzerland, Belgium, Austria, Spain, Israel, New Zealand, Ireland, Denmark, India, Poland, Sweden, Argentina, Netherlands, Brazil, Macao, Singapore, Sao Tome and Principe, Cameroon, Hong Kong, Portugal, Morocco, Hungary, Finland, Puerto Rico, Holy See (Vatican City State), Gabon, Bulgaria, Norway, Jamaica
DMEM (Dulbecco's Modified Eagle's Medium) is a cell culture medium formulated to support the growth and maintenance of a variety of cell types, including mammalian cells. It provides essential nutrients, amino acids, vitamins, and other components necessary for cell proliferation and survival in an in vitro environment.
Sourced in United States, Austria, Canada, Belgium, United Kingdom, Germany, China, Japan, Poland, Israel, Switzerland, New Zealand, Australia, Spain, Sweden
Prism 8 is a data analysis and graphing software developed by GraphPad. It is designed for researchers to visualize, analyze, and present scientific data.
Sourced in United States, China, Japan, Germany, United Kingdom, Canada, France, Italy, Australia, Spain, Switzerland, Netherlands, Belgium, Lithuania, Denmark, Singapore, New Zealand, India, Brazil, Argentina, Sweden, Norway, Austria, Poland, Finland, Israel, Hong Kong, Cameroon, Sao Tome and Principe, Macao, Taiwan, Province of China, Thailand
TRIzol reagent is a monophasic solution of phenol, guanidine isothiocyanate, and other proprietary components designed for the isolation of total RNA, DNA, and proteins from a variety of biological samples. The reagent maintains the integrity of the RNA while disrupting cells and dissolving cell components.
Sourced in United States, Germany, United Kingdom, Israel, Canada, Austria, Belgium, Poland, Lao People's Democratic Republic, Japan, China, France, Brazil, New Zealand, Switzerland, Sweden, Australia
GraphPad Prism 5 is a data analysis and graphing software. It provides tools for data organization, statistical analysis, and visual representation of results.
Sourced in United States, China, Germany, United Kingdom, Japan, France, Canada, Australia, Italy, Switzerland, Belgium, New Zealand, Spain, Israel, Sweden, Denmark, Macao, Brazil, Ireland, India, Austria, Netherlands, Holy See (Vatican City State), Poland, Norway, Cameroon, Hong Kong, Morocco, Singapore, Thailand, Argentina, Taiwan, Province of China, Palestine, State of, Finland, Colombia, United Arab Emirates
RPMI 1640 medium is a commonly used cell culture medium developed at Roswell Park Memorial Institute. It is a balanced salt solution that provides essential nutrients, vitamins, and amino acids to support the growth and maintenance of a variety of cell types in vitro.

More about "Peptide L"

Peptide L is a small peptide molecule that plays a crucial role in various biological processes.
It is also known as Peptide Ligand, Small Peptide, and Bio-active Peptide.
Peptide L is involved in signaling pathways, cell-cell communication, and regulation of physiological functions.
Understanding the mechanisms and applications of Peptide L is crucial for researchers working in fields such as cell biology, pharmacology, and drug development.
The AI-driven platform PubCompare.ai can greatly enhance the reproducibility and accuracy of your Peptide L research.
This innovative tool allows you to easily locate relevant protocols from scientific literature, pre-prints, and patents.
By leveraging the platform's advanced AI-driven comparisons, you can identify the most effective protocols and products for your Peptide L studies.
PubCompare.ai's powerful features can streamline your research process and improve your outcomes.
For example, you can use the platform to compare the efficacy of different cell culture media, such as FBS, L-glutamine, DMEM, and RPMI 1640, in supporting Peptide L-related experiments.
Additionally, you can explore the effectiveness of various reagents, like Penicillin, Streptomycin, and TRIzol, in your Peptide L research workflows.
By utilizing PubCompare.ai, you can also access valuable visualizations and data analysis tools, such as GraphPad Prism 5 and Prism 8, to enhance the interpretation and presentation of your Peptide L findings.
This comprehensive platform empowers researchers to make more informed decisions, leading to more reproducible and accurate results in their Peptide L studies.