De novo design of primers is performed by finding short conserved sequences in a given multiple sequence alignment to act as a 3 binding site for new primers. Once these sites have been identified, full-length forward or reverse de novo primers are generated by incorporating the N upstream or downstream bases, where N is 15 by default. De novo full-length primers can then be sorted according to sensitivity, specificity or degeneracy, and compared with known primers to find matches or significant overlap. Specificity for particular target groups, such as archaea, can be obtained by supplying an optional alignment of sequences from which to exclude matches.
Primer analyses, including the prediction of taxonomic coverage, rely upon scoring primers against target sequences. To predict its taxonomic coverage, a primer is locally aligned to full-length target sequences with known taxonomies, and scored based on gap, 3 mismatch and non-3 mismatch counts. An example of the graphical output is provided in Supplementary Figure S3. The final five bases are considered to be the 3 region by default, and are considered to be the most important for PCR amplification. The scoring scheme is parameterizable. The RDP Classifier (Wang et al., 2007 (link)) is used to classify the resulting sequence fragments, and the accuracy is displayed both in terms of which taxa are amplified and in terms of classification level of the resulting fragments. PrimerProspector supports retraining of the RDP Classifier for taxa coverage analysis based on different reference taxonomies.
Descriptions of the scripts included in PrimerProspector, the various outputs generated by PrimerProspector and an example based on the F515/R806 primer pair are included in the online documentation at http://pprospector.sourceforge.net/.