Typically, the number of designs that can be created in silico exceeds the number that can be produced and examined experimentally. We therefore used Rosetta to prune the list of designs, by one of two methods. For design consisting of canonical amino acids, Rosetta’s fragment-based ab initio algorithm32 (link) was utilized to predict a design’s structure given its amino acid sequence, and to determine whether the target structure was a unique minimum in the conformational energy landscape. Disulfide bonds were not allowed to form during these simulations; the designed disulfide bonds are intended to stabilize the folded conformation rather than direct protein folding. Designs which incorporate short stretches of D-amino acids were also validated using Rosetta’s fragment-based ab initio algorithm; the amino acid sequences of designs, with all D-amino acids mutated to glycine, were provided as input, and we allowed Rosetta to generate on the order of 30,000 predicted structures as output. Unlike the standard ab initio protocol, we did not use secondary structure predictions in fragment picking. Additionally, the length of small and large fragments was set to 4 and 6 amino acid residues, instead of the default 3 and 9; we found that this produced better sampling for peptides. After conformational sampling, the D-amino acid positions were changed to their original identities, and rescored. A small modification to the ab initio algorithm permitted it to build a terminal peptide bond for the N-C cyclic designs during the full-atom refinement stages of the structure prediction. Those designs that showed no sampling near the design conformation, or for which the design conformation was not the unique, lowest-energy conformation, were discarded.
Since fragment-based methods are poorly suited to the prediction of structures with large amounts of D-amino acid content, such as NC_cHLHR_D1, we developed a new, fragment-free algorithm for validation of these topologies. This algorithm, which we call “simple_cycpep_predict”, uses the same GenKIC-based sampling approach used to build backbones for design, with additional steps of filtering solutions based on disulfide geometry, optimizing sidechain rotamers, and gradient-descent energy minimization. Because the search space is vast, even with the constraints imposed by the N-C cyclic geometry and the disulfide bond(s), we further biased the search by setting mainchain torsion values for residues in the middle of the helices to helical values (a Gaussian distribution centred on phi=−61°, psi=−41° for the αR helix and on phi=+61°, psi=+41° for the αL helix); this is analogous to the biased sampling obtained by fragment-based methods, in which sequences with high helix propensity are sampled primarily with helical fragments. As with ab initio validation, designs showing poor sampling near the design conformation or poor energy landscapes were discarded.