panini was applied to a collection of 616 systematically sampled pneumococcal isolates from a vaccine and antimicrobial-resistance surveillance project in Massachusetts, USA [14 (link)]. The original analysis of the gene content in this collection identified 5442 ‘clusters of orthologous genes’ (COGs) [2 (link)], the core set of which was used to define 15 ‘sequence clusters’ with baps (http://www.helsinki.fi/bsg/software/BAPS) [15 (link)]. For most of the sequence clusters, the correspondence between a group in the panini output and the original sequence clusters was exact (Fig. 2a), reflecting their similarity both in terms of the core and accessory genomes [16 (link)]. These sets of isolates, therefore, represent well-defined distinct lineages. However, SC1, SC6, SC10 and SC12 all exhibited distinct substructuring in the panini output. This corresponded well with the diverse core genome observed in these clusters (Fig. 2b), and in each case, these groups were consistent with clades within the sequence clusters. These sequence clusters are, therefore, likely to represent amalgams of genotypes that should be subdivided into multiple clusters. Conversely, panini revealed clear substructuring within the previously unclustered SC16, which was also consistent with the core-genome phylogeny. Hence, panini can easily facilitate the division of a diverse population into discrete genotypes that are coherent in their accessory- and core-genome content.
Free full text: Click here