The MultiParanoid algorithm [10] (link) is an extension of the
graph-based InParanoid clustering algorithm [11] (link), [42] (link) for
identifying orthologs and inparalogs across multiple species.
InParanoid uses bi-directional best BLAST [9] (link), [43] (link) to
identify putative orthologs and a clustering algorithm to identify their
inparalogs. To do so, InParanoid assumes that any sequences
from the same species that are more similar to the predicted ortholog than to
any sequence from other species are inparalogs [11] (link), [42] (link).
MultiParanoid generates multi-species orthogroups by
merging all pairwise InParanoid predictions, while minimizing
the number of internal conflicts. Furthermore, the algorithm uses a
‘cut-off’ parameter based on the distance of candidate inparalogs to
the predicted target ortholog to filter out weakly supported candidates.
MultiParanoid was obtained from http://multiparanoid.sbc.su.se and InParanoid(version 3beta) was obtained upon request from inparanoid@sbc.su.se.
The OrthoMCL algorithm also builds upon the InParanoidalgorithm [11] (link),
[42] (link) by
using the Markov Cluster (MCL) algorithm for predicting orthogroups across
multiple species based on their sequence similarity information [3] (link). The algorithm
uses an ‘inflation rate’ parameter, to regulate the
‘tightness’ of the predicted orthogroups. OrthoMCL (version
1.4) was obtained from http://orthomcl.org/common/downloads/software/v1.4/.
The Reciprocal Best Hit (RBH) algorithm [4] (link), [6] (link), [12] (link), [13] (link) relies on BLAST [9] (link), [43] (link) to
identify pairwise orthologs between two species. According to the RBH algorithm,
two proteins X and Y from species
x and y, respectively, are considered
orthologs if protein X is the best BLAST hit for protein
Y and protein Y is the best BLAST hit for
protein X. We integrated a ‘filtering’ parameter
r that enabled us to avoid constructing orthogroups that
contained distant homologs by considering the degree by which the two proteins
differed in sequence length or BLAST alignment [44] (link), [45] (link). Thus, putative
orthogroups are retained if:
From the above equation, it follows that r values close to 1 are
likely to filter out a larger number of putative orthologs, whereas
r values close to 0 are likely to include all putative
orthologs. The default mode of the algorithm does not use the filtering
parameter r.
The Reciprocal Smallest Distance (RSD) algorithm [14] (link) generates global sequence
alignments for a small number of top BLAST hits against a query gene
X from species x. RSD then calculates the
maximum likelihood evolutionary distance between X and its top
BLAST hits, identifying the gene with the smallest evolutionary distance from
X (e.g., gene Y from species
y). If the RSD search using gene Y from
species y as the query also identifies gene Xfrom species x as its closest relative, then proteins
X and Y are considered orthologs [14] (link), [15] (link). In RSD,
the user can modify the shape parameter a of the gamma
distribution, a key determinant of the estimated evolutionary distance between
genes. The RSD algorithm was obtained from http://roundup.hms.harvard.edu/site/.
Free full text: Click here