The present test builds on three pieces of information: two phylogenetic trees corresponding to hosts and parasites, and a binary matrix (
A) coding the host-parasite associations (
Fig. 1). Let
h and
p be the numbers of host and parasite species in the respective phylograms,
A is an
h ×
p matrix, where 1 denotes presence of a given parasite species in a given host species, and 0 corresponds to absence of a particular parasite species in a particular host species (
Fig. 1). [Note the arbitrary assignation of hosts to rows and parasites to columns. Although the original ParaFit test of Legendre et al. [7] (
link) and HCT use
A′, we opted to adopt the same input format required for the parafit function of the ape package of R [41] (
link) to ease comparison and integration with our R script implementing PACo.] The R code needed and instructions to implement PACo in R are given in
File S1. In addition, an annotated code version, the input file examples and R code for the simulations described below can be downloaded at
http://www.uv.es/cophylpaco/index.html.
Figure 1 provides an overview of how PACo works. First, the host and parasite phylogenies are transformed into their respective distance matrices between species. This can be achieved by computing either patristic or genetic distances, or any dissimilarity measure between the species involved. The host and parasite distance matrices are, in turn, transformed into their respective matrices of principal coordinates (PCo), with
h and
p rows, and
h –1 and
p –1 columns, the latter representing each of the PCo axes. The PCo matrices can be viewed as representations of the host and parasite phylogenies in a Euclidean hyperspace, although they may contain noisy information with respect to the true phylogeny [7] (
link), [42] (
link).
PACo contemplates a given parasite occurring in more than one host species and, conversely, a host harbouring more than one parasite species (
Fig. 1). Since Procrustes analysis requires the same number of observations in both ordinations,
A is transformed into an identity matrix by duplicating multiple associations, which in turn are used to replicate in the right order rows of hosts harbouring more than a parasite (PCo hosts) and the corresponding parasites occurring in more than one host (PCo parasites, see
Fig. 1). It has been shown in studies using the Mantel test that the replication of taxa produces incorrect Type I rates [34] (
link). Although we had no sufficient a priori information on the behaviour Procrustes analysis with duplicated data points, we show below through simulations that no systematic biases in
P values were produced and the Type I errors were mostly correct (see below). This is probably so because the replicated taxa in the corresponding PCo matrices are treated as independent observations occupying identical positions in the hyperspace. Next, the expanded matrices of PCo coordinates of hosts (
X) and parasites (
Y), with column vectors centred on their respective means, are compared by means of Procrustes analysis using least-squares superimposition. Whereas the
X configuration is kept fixed, the
Y counterpart is scaled, centred, mirrored (if necessary) and rotated to minimize the squared differences between the two configurations [43] , [44] (
link). If
X and
Y do not contain the same number of columns, the narrow matrix is completed with the appropriate number of zero columns. The Procrustean fit of
Y onto
X can be visualised in an ordination plot (
Fig. 1) and yields a residual sum of squares , which is computed as follows: where
W is obtained by singular value decomposition of (
X′Y) =
VWU′[38] . Given that is inversely proportional to the topological congruence between the two ordinations, it represents a measure of the fit of the parasite phylogeny onto the host phylogeny. Note that the statistic is asymmetric, i.e. . (Not to be confused with the nature of the Procrustean fit, which itself can be symmetric or asymmetric [43] ). It is possible to obtain a symmetric statistic by normalizing the column vectors of
X and
Y[44] (
link), [45] . This approach yields a dimensionless residual sum of squares, which is appropriate in an ecological context [45] where the original variables have different units. Herein, we adopted the asymmetric because the PCo axes taken all together preserve the original dissimilarities among the taxa [46] and thus it provides a goodness-of-fit statistic with squared units of the original dissimilarity measure of the host phylogeny. In addition, some of our preliminary analyses using the symmetric sum of squares yielded biased Type I errors perhaps due to the influence of the replicated taxa on the estimated variances computed for normalization of the column vectors of
X and
Y.
Balbuena J.A., Míguez-Lozano R, & Blasco-Costa I. (2013). PACo: A Novel Procrustes Application to Cophylogenetic Analysis. PLoS ONE, 8(4), e61048.