EukHighConfidenceFilter in tRNAscan-SE 2.0 is a post-scan filtering program for better distinguishing tRNA-derived repetitive elements in large eukaryotic genomes (metazoans and plants) from ‘real’ tRNAs that function in protein translation. Three filtering stages are involved in the classification (Supplementary Figure S5). First, tRNA predictions that are labeled as possible pseudogenes are excluded from the high confidence set; these criteria were established by the prior version of tRNAscan-SE (overall score below 55 bits and one of two conditions: primary sequence score below 10 bits or secondary structure score below 5 bits). Second, predictions with any of the following attributes are removed from the high confidence set: isotype-specific model score below 70 bits, overall score below 50 bits, or secondary structure score below 10 bits. Finally, if there are >40 predicted hits remaining for any given anticodon, a dynamic score threshold is used, starting at 71 bits, rising one bit and filtering lower-scoring hits, iteratively, until the number of predictions for that anticodon is no longer over 40 or when the score threshold reaches 95 bits. The score thresholds in stages two and three were empirically determined by comparing score distributions of predictions among eukaryotic genomes with and without large numbers of tRNA-derived repetitive elements (Supplementary Figure S6). The remaining tRNA predictions are included in the high confidence set if (i) they have a consistent isotype prediction (inferred from anticodon versus the highest scoring isotype-specific model) and (ii) they have an ‘expected’ anticodon based on known decoding strategies in eukaryotes where 15 anticodons are not used (67 (link)). There is no corresponding filter for bacterial or archaeal tRNA, as they have not been found to contain large families of tRNA-derived repetitive elements.
Free full text: Click here