pHMMs (25 ,26 (link)) are a widely used probabilistic representation of protein domain families and can conveniently be used to search for known domains in given protein sequences. pHMMs are versatile: First, models are publicly available, e.g. from the Pfam protein family database (27 (link)). This database contains various prebuilt models of protein domains associated with the process of retrotransposition. Secondly, pHMMs can easily be built from custom multiple sequence alignments. Due to this flexibility, pHMMs were chosen to model protein domains in LTR retrotransposon candidates. For the analyses performed in this work, collections of protein domain models associated with LTR retrotransposons were compiled (Supplementary Data 1, Tables B1 and B2). Given such a user-configurable set D of domain models in HMMER format, LTRdigest searches for all models in the translations of all six reading frames of a LTR retrotransposon candidate sequence. In the case of frame shifts, it is possible to obtain multiple partial hits per protein domain occurring in different reading frames. If more than one hit per domain model is found in a candidate, individual hits are combined using a chaining algorithm adapted from the gene prediction software GenomeThreader (28 ). This algorithm is able to find an optimal sequence of individual hits representing the model-sequence alignment best. Finally, the amino acid start and end positions in the translated sequences of all hits in the optimal chain below a user-defined E-value threshold are mapped back to the respective coordinates in the DNA sequence before they are reported.