graph-based I
identifying orthologs and inparalogs across multiple species.
I
identify putative orthologs and a clustering algorithm to identify their
inparalogs. To do so, I
from the same species that are more similar to the predicted ortholog than to
any sequence from other species are inparalogs [11] (link), [42] (link).
M
merging all pairwise I
the number of internal conflicts. Furthermore, the algorithm uses a
‘cut-off’ parameter based on the distance of candidate inparalogs to
the predicted target ortholog to filter out weakly supported candidates.
M
The O
[42] (link) by
using the Markov Cluster (MCL) algorithm for predicting orthogroups across
multiple species based on their sequence similarity information [3] (link). The algorithm
uses an ‘inflation rate’ parameter, to regulate the
‘tightness’ of the predicted orthogroups. O
1.4) was obtained from
The Reciprocal Best Hit (RBH) algorithm [4] (link), [6] (link), [12] (link), [13] (link) relies on BLAST [9] (link), [43] (link) to
identify pairwise orthologs between two species. According to the RBH algorithm,
two proteins X and Y from species
x and y, respectively, are considered
orthologs if protein X is the best BLAST hit for protein
Y and protein Y is the best BLAST hit for
protein X. We integrated a ‘filtering’ parameter
r that enabled us to avoid constructing orthogroups that
contained distant homologs by considering the degree by which the two proteins
differed in sequence length or BLAST alignment [44] (link), [45] (link). Thus, putative
orthogroups are retained if:
From the above equation, it follows that r values close to 1 are
likely to filter out a larger number of putative orthologs, whereas
r values close to 0 are likely to include all putative
orthologs. The default mode of the algorithm does not use the filtering
parameter r.
The Reciprocal Smallest Distance (RSD) algorithm [14] (link) generates global sequence
alignments for a small number of top BLAST hits against a query gene
X from species x. RSD then calculates the
maximum likelihood evolutionary distance between X and its top
BLAST hits, identifying the gene with the smallest evolutionary distance from
X (e.g., gene Y from species
y). If the RSD search using gene Y from
species y as the query also identifies gene Xfrom species x as its closest relative, then proteins
X and Y are considered orthologs [14] (link), [15] (link). In RSD,
the user can modify the shape parameter a of the gamma
distribution, a key determinant of the estimated evolutionary distance between
genes. The RSD algorithm was obtained from