Metagenomic reads from ancient samples may contain a mixture of sequence reads from the species of interest as well as from genetically similar taxa that represent environmental contamination. To deal with this issue and remove such nonspecific reads after extraction with the EToKi prepare module, the EToKi assemble module can be used to align the extracted reads after comparisons with an ingroup of genomes related to the species of interest and with an outgroup of genomes from other species. In the case of Figure 5 , the ingroup consisted of Y. pestis genomes CO92 (2001), Pestoides F, KIM10+ and 91001, and the outgroup consisted of Y. pseudotuberculosis genomes IP32953 and IP31758, Y. similis 228, and Y. enterocolitica 8081. Reads were excluded which had higher alignment scores to the outgroup genomes than to the ingroup genomes. Prior to mapping reads to the Y. pestis reference genome (CO92) (2001), a pseudogenome was created in which all nucleotides were masked to ensure that only nucleotides supported by metagenomic reads would be used for phylogenetic analysis. For the 13 ancient genomes whose publications included complete SNP lists, we unmasked the sites in the pseudogenomes that were included in the published SNP lists. For the other 43 genomes, EToKi was used as in Supplemental Figure S6 to map the filtered metagenomic reads onto the pseudogenome with minimap2 (Li 2018 (link)), evaluate them with Pilon (Walker et al. 2014 (link)), and unmask sites in the pseudogenome that were covered by three or more reads and had a consensus base that was supported by ≥80% of the mapped reads. All 56 pseudogenomes were uploaded to EnteroBase together with their associated metadata.
Full text: Click here