Single linkage clustering is performed on the aligned sequence data. This approach ordinarily requires the generation of a distance matrix for all pairs of sequences followed by a clustering step where sequences are grouped based on a pre-selected distance threshold [45] (link), [46] . RESL performs distance calculations and clustering concurrently, employing the transitive property to avoid distance determinations for sequences that are certain to possess a divergence above the threshold. This strategy is implemented by flushing all clusters to disk, and retaining one or more representative sequences, depending on the diameter of the cluster, for each cluster and inter-cluster distance statistics in active memory, excepting those clusters whose members show high variability (max intra-cluster distance >2.2%, see below). The sequence divergence between each new sequence and the representative(s) of all existing clusters is then calculated. If its distance to any existing cluster is more than twice the threshold [>4.4%], it is recognized as the founder of a new cluster. If, on the other hand, it shows lower divergence, all members of the closest cluster(s) are retrieved from disk to enable more detailed analysis of sequence variation. This approach considerably reduces computational requirements without compromising accuracy, and analysis is further expedited by moving clusters to disk when they have seen no activity ( = gained new members) for a number of cycles.
The implementation of single linkage clustering requires the selection of a threshold parameter, t, which represents the level of sequence divergence for the designation of OTUs. Early work [13] suggested that a threshold value of 2% was effective because most specimens showing more than this level of divergence represented different species, while those with less divergence were usually conspecific. However, this issue was examined in more detail by inspecting the patterning of OTU recovery with variance in the distance threshold for eight datasets (Table 1 ). Sixty single linkage cluster analyses were generated for each dataset by stepping the distance threshold parameter by an increment of 0.1% across the range from 0.1%–6.0%. The OTUs recovered at each threshold were subsequently evaluated for their concordance with recognized species boundaries (Figure 2 ). These analyses revealed that maximal concordance was achieved by thresholds that varied from a low of t = 0.7% (in North American birds) to a high of t = 1.8% (in Bavarian moths). It also showed that performance, as measured by the number of correctly recognized species, dropped steeply when the threshold deviated on either side of optimality. Thresholds higher than optimal inflated the number of cases where members of different species were merged in a single OTU, while thresholds lower than the optimal value increased the cases where members of what are thought by current taxonomy to be a single species were split into two or more OTUs. Based on these analyses, a threshold (t) of 2.2% was adopted as it represents the upper 99% confidence limit for the optimal thresholds in the eight test datasets SD = 0.40). Its adoption will lead to the merger of some distinct clusters, but such cases are addressed in the third step of the analysis.
The implementation of single linkage clustering requires the selection of a threshold parameter, t, which represents the level of sequence divergence for the designation of OTUs. Early work [13] suggested that a threshold value of 2% was effective because most specimens showing more than this level of divergence represented different species, while those with less divergence were usually conspecific. However, this issue was examined in more detail by inspecting the patterning of OTU recovery with variance in the distance threshold for eight datasets (