where and are the proportion of the i-th haplotype in quasispecies X, and that of the j-th haplotype in quasispecies Y, and is the genetic distance between both haplotypes. The sum extends over all haplotypes in both quasispecies. This distance is interpreted as the average number of nucleotide substitutions between the reads from quasispecies X and quasispecies Y.
Taking into account the nucleotide diversity of each quasispecies [2 ], that is the average number of nucleotide substitutions for a random pair of reads in the quasispecies, and , which may be estimated by:
where and are the number of reads in each quasispecies, then the net nucleotide substitutions between the two quasispecies [2 ] is estimated by:
will be taken as the genetic distance between two quasispecies.
The quasispecies pairs are simulated in a way that all haplotypes are considered to have a single substitution with respect to the master haplotype in the first quasispecies. In this way, the matrix of distances between all pairs of haplotypes in both quasispecies has the form: