The sequence definition database was seeded using the core loci identified in finished Neisseria meningitidis genome annotations. The locus tag identifiers, ‘NEIS’ followed by an integer, was adopted in order to allow automated accessioning of loci as they are identified and added to the database. The NEIS, (short for ‘Neisseria genus’) loci list was determined using the genome annotations of FAM18, H44/76, G2136, Z2491 and MC58 and represent, notionally, the pan-genome of the meningococcus. This included the ribosomal protein loci, a sub set of the core loci which are also orthologous across all bacterial species [40 (link)]. The NEIS identifiers are linked to an alias table that contains additional locus nomenclature associated with each locus which is searchable and therefore cross compatible with various annotations; such as specific finished genome locus tags, KEGG EC or common name. The number of loci contained in the list of the NEIS locus identifiers is not static and will change as loci are curated and added to the database over time.
The draft genome sequences were queried within BIGSdb using BLAST against the sequence definition database to identify defined allelic variation. Alleles were automatically annotated and assigned with the appropriate allele number for those loci for which definitions exist, in a process referred to as ‘tagging’ while new alleles were manually curated and assigned a new allele accession number. For the gene sequences with frame shift mutations, internal stop codons, etc., the sequence was assigned an allele designation and flagged as having an internal stop codon. Any gene sequences with missing data, i.e. those at the ends of contigs, were flagged as incomplete and not assigned an allele number. Once identified the locus allelic variant was linked to the isolate metadata.
Free full text: Click here