Bioinformatics methods are described and updated at https://github.com/BenJamesMetcalf. emm subtypes were obtained on the basis of a database of defined 180-bp sequences maintained at the CDC (ftp://ftp.cdc.gov/pub/infectious_diseases/biotech/tsemm/). This subtyping scheme is based on a sequence that consists of 10 codons corresponding to the C-terminal end of the M protein signal sequence and 50 codons corresponding to the N terminus of the mature M protein (46 (link)). The WGS emm typing scheme employs de novo assembly and queries sequences closely linked to 21-bp emm typing primer 1 (27 (link)) situated adjacent to the emm type-specific region.
A PBP2x transpeptidase amino acid sequence type was generated for each isolate as described for GBS PBP2x for detection of first-step mutations leading to β-lactam resistance (17 (link)). Additionally, the ARG-ANNOT and ResFinder databases were incorporated (23 (link), 24 (link)). Sequence targets for detection of the presence/absence of 21 T antigen backbone (tee) genes (29 (link)), the gacI glycosyl transferase specific for the group A antigen (28 (link)), the hyaluronic acid synthetic locus hasA (47 (link)), emm-like genes that flank emm (9 (link)), four different fibronectin-binding domain repeat proteins (48 (link)), the R28 surface antigen (30 (link)), the sda1-encoded DNase (49 (link)), sequence polymorphisms associated with the ngo operon (4 (link), 5 (link)), two conserved rocA null mutations (50 (link), 51 (link)), 12 exotoxin genes (speA to speC, speG to speM, ssa, smeZ) (52 (link)), and the streptococcal inhibitor of complement (31 (link), 32 (link)) were obtained through the references indicated.