Eight genomes containing between zero and seven CRISPR loci were chosen at random from CRISPRdb (20 (link)) (Table 1). Synthetic Illumina data sets (101 bp reads) representing ∼20× coverage were generated for each genome using Grinder 0.4.5 (21 (link)); command-line options: –cf 20 –rd 101 –md poly4. Crass 0.3.1 was run on each data set using a kmer length of 9 (all other parameters default). The spacers identified by Crass were mapped onto the reference genome using blastn 2.2.25+ (22 (link)) to determine whether they were correctly positioned. The spacer graphs for each data set were also analysed to determine whether the ordering of spacers generated by Crass accurately reflected the CRISPR loci found in the original genome assembly.

Specificity and sensitivity analysis of Crass on synthetic short read data sets

Total spacersDetected spacersMissing edgesErroneous edgesSpecificitySensitivity
Bacteroides fragilis YCH46
    CRISPR1973010.63
Acinetobacter sp. ADP1
    CRISPR1660010.83
    CRISPR221210011.00
    CRISPR390882010.98
Sulfolobus solfataricus P2
    CRISPR11021020510.95
    CRISPR294940010.96
    CRISPR331310011.00
    CRISPR495950310.97
    CRISPR5651010.80
    CRISPR622220011.00
    CRISPR765641010.98
Natrialba magadii
    CRISPR127181100.890.58
Helicobacter pylori B800N/AN/A1N/A
Magnetospirillum magneticum AMB-100N/AN/A1N/A
Tsukamurella paurometabola00N/AN/A1N/A
Oligotropha carboxidovorans OM500N/AN/A1N/A
Overall5685531880.990.89

Crass was used to examine synthetic data sets constructed from four genomes that contained between one and seven CRISPR loci, in addition to four genomes that did not contain CRISPRs. The specificity of Crass was calculated by determining the number of detected spacers that did not originate from CRISPRs; the sensitivity was determined by comparing the reconstructed spacer ordering to the ordering found in the genome.