A set of 2,831 Actinobacterial genomes was downloaded from NCBI by querying for "Whole genome shotgun sequencing project" or "Complete genome" in combination with the taxonomic identifier for actinobacteria. The Propionibacteriales, Micrococcales, Corynebacteriales and Bifidobacteriales orders were excluded, as they contain large numbers of genomes without relevant natural product-producing capacity, except the Nocardiaceae family from the Corynebacteriales (see next section). To these set, 249 additional draft assemblies from the Metcalf lab were added (e.g. Streptomyces sp. B-1348. See BioProject PRJNA488366). Draft genome assemblies from this BioProject were obtained by using SPAdes50 (link) with default options.
All files were processed with antiSMASH v417 (link) (parameters:--minimal ). The antiSMASH-annotated genome sequences are available as Online Data (antiSMASH_results_Metcalf_B, antiSMASH_results_Metcalf_J and antiSMASH_results_NCBI).
To the resulting 73,260 predicted Biosynthetic Gene Clusters (BGCs), 1,393 more were added from the Minimum Information about a Biosynthetic Gene Cluster database (MIBiG21 (link), release 1.3, August 2016, antiSMASH-analyzed versions from each entry) as reference data.
This final BGC set was then analyzed with BiG-SCAPE using version 31 of the Pfam database. The “hybrids” mode, which allows BGCs with mixed annotations be analyzed in their individual Class sets (e.g. a BGC annotated as lantipeptide-t1pks will be analyzed as both a RiPP and a PKSI) was enabled. Two results sets were created (Online Data: BiG-SCAPE Results network files): one with the default "global" mode enabled, and the other with "glocal" mode enabled (SeeFig. 2 ).
All files were processed with antiSMASH v417 (link) (parameters:
To the resulting 73,260 predicted Biosynthetic Gene Clusters (BGCs), 1,393 more were added from the Minimum Information about a Biosynthetic Gene Cluster database (MIBiG21 (link), release 1.3, August 2016, antiSMASH-analyzed versions from each entry) as reference data.
This final BGC set was then analyzed with BiG-SCAPE using version 31 of the Pfam database. The “hybrids” mode, which allows BGCs with mixed annotations be analyzed in their individual Class sets (e.g. a BGC annotated as lantipeptide-t1pks will be analyzed as both a RiPP and a PKSI) was enabled. Two results sets were created (Online Data: BiG-SCAPE Results network files): one with the default "global" mode enabled, and the other with "glocal" mode enabled (See