We next identified homologs as those gene pairs that had BLAST hits in both directions within a given scaled bit score threshold. We scaled the bit scores by the bit score of the self hit of the query gene. That is, scaledBitScore(A->B) = bitScore(A->B)/bitScore(A->A). This method has been used previously to identify conserved homologs among bacterial genomes and has been shown to be more stringent than criteria based solely on reciprocal best matches using E values [17] (link).
We then formed homolog families by including two genes in a family if they had been identified as homologs. Note that not all pairs of genes in a family need to be identified as homologs. For example, if A and B are homologs, and B and C are homologs, then A and C will be in the same family even if A and C have not been identified as homologs. Finally we identified the putative panorthologs as being the genes from homolog families with exactly one gene from each genome. For each set of genomes we kept the largest set of panorthologs found by computing the putative panorthologs while varying the scaled bit score threshold from .1 to .9 in .1 increments.
The following scaled bit score thresholds were used for genome sets A–E depicted in