The GS20 emPCR process incorporates the use of the high-fidelity polymerase, Platinum
Taq Hifidelity (Invitrogen), an enzyme mixture composed of recombinant
Taq DNA polymerase,
Pyrococcus spp. GB-D thermostable polymerase and Platinum
Taq antibody. This enzyme is marketed partly on its very low misincorporation rate, 2 × 10
−6 (Invitrogen). In this study we find the actual rate of misincorporation to be higher (≈7 × 10
−4), similar to results from a previous aDNA study that has also specifically examined these properties of this enzyme (8 (
link)). To discriminate between true aDNA damage and enzyme error or potential damage that may have arisen during the DNA extraction or that may have been present in the DNA before extraction, we analysed a further dataset of GS20 sequences, generated from a modern DNA extract, comprising 390 965 bp of
L.tulipfera cpDNA. These data are part of the first chloroplast genome sequenced using the GS20 (J.E. Carlson, J.H. Leebens-Mack and D.G. Peterson, manuscript in preparation) and constitutes all the sequence reads between np 45 000 and 90 000 of the genome (J.E. Carlson, J.H. Leebens-Mack and S. Schuster, unpublished data). Although we are aware that in theory some complications may be envisioned when comparing cpDNA with mtDNA, at the current time there is a paucity of available datasets that contain sufficiently large amounts of sequence data to enable meaningful statistical comparisons. Thus this dataset provides the most suitable information at this time. The data analysed here have maximal coverage of 36 times, with a mean and modal coverage of 8.7 and 8 times, respectively. The
L.tulipfera cpDNA sequences are available at the NCBI Trace Archives (Trace Identifiers 1367656065–1367659980). Analysis of the genomic data produced indicates that levels of heteroplasmy in the sample are negligible, thus unlikely to effect the analyses (J.E. Carlson, J.H. Leebens-Mack and S. Schuster, unpublished data). Furthermore, as DNA from this sample was freshly extracted from modern tissue, miscoding lesions observed in the data are unlikely to be due to anything other than PCR or other sequencing error that arises during the GS20 data production process. The miscoding lesion spectrum was extracted from the data in the same manner as applied to the mtDNA data. For data summary see
Table 1.
A χ
2-test of independence was used to investigate whether the distribution of miscoding lesions was the same in the mammoth and chloroplast sequence data. The data were first summarized into six complementary damage pairs (
Table 1). Subsequently, because nucleotide usage is different between the mammoth and chloroplast data, tests were performed separately on those miscoding lesions that originated from an A or T (A+T), and those that originated from a G or C (G+C).
Gilbert M.T., Binladen J., Miller W., Wiuf C., Willerslev E., Poinar H., Carlson J.E., Leebens-Mack J.H, & Schuster S.C. (2006). Recharacterization of ancient DNA miscoding lesions: insights in the era of sequencing-by-synthesis. Nucleic Acids Research, 35(1), 1-10.