To identify the chromosomal coordinates of HML-2 proviruses in human DNA, we searched the most recent genome build (GRCh37/hg19, February 2009) using the UCSC BLAT program [49 (link)] for sequences related to the full-length nucleotide sequence of the K113 provirus (AY037928) [16 (link),49 (link)]. The DNA flanking individual 'hits' was manually searched for sequence with high similarity to prototypical HML-2 sequences as determined by the RepeatMasker program in the UCSC genome browser [67 ]. For each identified locus, complete nucleotide sequences were generated by extracting and concatenating the internal and LTR proviral segments. Additional BLAT searches with individual K113 genes (gag, pro, pol, and env) were performed to further identify HML-2 elements within the available genome. Complete sequence reconstruction was performed as above, with the minimum criterion for a provirus being the presence of an LTR and a "hit" matching > 50% of the length of a full gene, or two proximal genes with > 50% hits and no LTR. All full-length sequences were initially aligned to K113 using ClustalW [88 (link)], and manually edited in BioEdit v.7.0.9.0 [89 ]. The full-length sequences for the HML-2 proviruses located at 10p12.1 (K103) and 19p12 (K113) were from NCBI (accession numbers AF164611 and AY037928, respectively). We identified the K105 sequence by taking flanking sequence of the K105 solo LTR and searching the chimpanzee database. We identified a BAC with a provirus starting at position 74813 (AC195095.2). We found a sequence with 99% similarity through a BLAST search of the NCBI database that corresponded to a human provirus labeled K111 (GU476554). Due to the high similarity between Chimpanzee K105 and this human "K111" as well as similarity between K105 deposited 5' and 3' LTRs (AH008413.1), we conclude that K111 is the human variant of the K105 provirus. Furthermore, the K111 provirus clusters most closely with chimpanzee K105 in phylogenetic trees of gag, pol, and env, as well as chimp and human published K105 5' and 3' LTR sequences (data not shown). The 12q13.2 provirus was sequenced in this study (described below). Provirus sequences were deposited into GenBank (accession numbers: JN675007-JN675097), along with their respective flanking sequences (accession numbers: JN675098-JN675187).
Separate searches were performed using the UCSC Genome Browser to identify chromosomal coordinates of HML-2 solo LTRs. We queried the published sequence for elements corresponding to one of three HML-2 LTR subgroups: LTR5Hs (canonical sequence is ~986 bp); LTR5A (~1004 bp); or LTR5B (~1002 bp). Sequences corresponding to solo LTRs were extracted, aligned using ClustalW, and manually edited in BioEdit v.7.0.9.0 as described above. LTRs associated, and in the same orientation, with internal HML-2 gene sequences, were excluded from this analysis to ensure that only solo LTRs were analyzed. For the remaining elements, an arbitrary cut-off of 750 bp was used to include only the most intact elements per group.
Free full text: Click here