Details about the detection of a cas gene cluster with associated arrays (CRISPR–Cas system) and CRISPR arrays only for complete genomes were retrieved from the CRISPR–Cas++ database. CRISPR arrays recorded by CRISPR–Cas++ were assigned to Levels 1–4 based on the criteria required to select the minimal structure of putative CRISPR as reported by Pourcel et al. (2020 (link)). Level 1 is the lowest level of confidence. Levels 2–4 were assigned based on the conservation of repeats (which must be high in a real CRISPR) and on the similarity of spacers (it must be low). Level 4 CRISPRs were defined as the most reliable ones. Levels 1–3 may correspond to false CRISPRs. In our study, only CRISPRs recorded with Level 4, were considered. CRISPRs without a set of cas genes in the host genome were defined as “orphans.” Genomes harboring cas gene clusters were then submitted to the CRISPRone analysis suite (http://omics.informatics.indiana.edu/CRISPRone/) (Zhang & Ye, 2017 (link)) to graphically visualize the architecture of each cluster. The same suite was used to search and visualize cas gene clusters in the high‐quality assemblies. A subtype of cas gene clusters was assigned according to the recent classification update for CRISPR–Cas systems (Makarova et al., 2020 (link)).
Free full text: Click here