The algorithm is based on the calculation of the CAI (10 (link)). Each codon is given a weight with respect to the subset of highly expressed genes defined for the considered organism. The so-called relative adaptiveness of a codon is defined as:
where fi is the frequency of a codon (i) and fmax(i) is the frequency of the codon most often used to code for the considered amino acid in the subset of highly expressed genes.
The CAI for a gene ‘g’ can be calculated according toEquation 2 :
where N is the number of codons in a gene ‘g’ without the initiation and stop codons.
The calculation of the relative adaptiveness for all genomes in the PRODORIC database was made in advance. The subset of highly expressed genes for each organism was defined by applying the algorithm proposed by Carbone et al. (13 (link)). The algorithm is based on the assumption that in each genome there is a set of genes with high codon bias. The algorithm is iterative and reduces the set of genes (initially all genes of an organism) during each iteration until only 1% of genes remain with the highest codon bias of the initial set of genes.
The optimization of a given sequence splits into two parts. First, the sequence is examined whether it is either a correct gene sequence or a correct amino acid sequence. Subsequently, depending on the type of sequence, it is translated into an amino acid sequence. The second step is to translate the amino acid sequence into a gene sequence by using the codons that got the highest relative adaptiveness for the amino acid in question. In this way, every amino acid of the sequence is replaced until the whole sequence is retranslated.
where fi is the frequency of a codon (i) and fmax(i) is the frequency of the codon most often used to code for the considered amino acid in the subset of highly expressed genes.
The CAI for a gene ‘g’ can be calculated according to
where N is the number of codons in a gene ‘g’ without the initiation and stop codons.
The calculation of the relative adaptiveness for all genomes in the PRODORIC database was made in advance. The subset of highly expressed genes for each organism was defined by applying the algorithm proposed by Carbone et al. (13 (link)). The algorithm is based on the assumption that in each genome there is a set of genes with high codon bias. The algorithm is iterative and reduces the set of genes (initially all genes of an organism) during each iteration until only 1% of genes remain with the highest codon bias of the initial set of genes.
The optimization of a given sequence splits into two parts. First, the sequence is examined whether it is either a correct gene sequence or a correct amino acid sequence. Subsequently, depending on the type of sequence, it is translated into an amino acid sequence. The second step is to translate the amino acid sequence into a gene sequence by using the codons that got the highest relative adaptiveness for the amino acid in question. In this way, every amino acid of the sequence is replaced until the whole sequence is retranslated.