The sequences and annotations of complete genomes were downloaded from NCBI RefSeq (last accessed in November 2013, http://ftp.ncbi.nih.gov/genomes/refseq/bacteria/). Our analysis included 2484 bacterial genomes (see Supplementary Table S1). We used the classification of replicons in plasmids and chromosomes as provided in the GenBank files. Our dataset included 2626 replicons labeled as chromosomes and 2006 as plasmids. The attC sites used to build the covariance model and the accession numbers of the replicons manually curated for the presence or absence of attC sites were retrieved from INTEGRALL, the reference database of integron sequences (http://integrall.bio.ua.pt/) (38 (link)). We used a set of 291 attC sites (Supplementary File 1) to build and test the model, and a set of 346 sequences with expert annotation of 596 attC sites to analyze the quality of the program predictions (Supplementary Tables S2a and S2b).