Simulated genomes were generated from an initial set of 3604 draft genomes within IMG identified as being of high quality (see Supplemental Methods). To help alleviate bias toward well-sampled lineages, 280 of the 3604 high-quality draft genomes with identical phylogenetic marker genes were not used during the generation of simulated genomes. Simulated genomes were generated at varying degrees of completeness and contamination using three distinct random sampling models. Under the random fragment model, each contig comprising a genome was fragmented into nonoverlapping windows of a fixed size between 5 and 50 kbp. This size range was selected because it approximates the contig lengths of genomes recovered from metagenomic data or single-cell genomics: The mean N50 of the GEBA-MDM single-cell genomes, Wrighton acetate-amended aquifer population genomes, and Sharon infant gut population genomes is ∼28 kbp, ∼17 kbp, and ∼ 12 kbp, respectively. In order to generate genomes at a desired level of completeness and contamination, fragments were sampled without or with replacement, respectively. Windows were sampled until a simulated genome had completeness and contamination equal to or just greater than the target values. Generation of simulated genomes was limited to draft genomes as finished genomes were used to determine appropriate lineage-specific marker sets suitable for evaluating genomes (Fig. 3 ).
The 2430 draft reference genomes comprised of 20 or more contigs were used to simulate partial and contaminated genomes reflecting the characteristics of assembled contigs. Under this random contig model, genomes were generated by randomly removing contigs until the simulated genome reached or fell below a target completeness level. Contamination was introduced by randomly adding contigs with replacement from a single randomly selected genome until the desired level of contamination was reached or exceeded. These 2430 draft genomes were also used to generate genomes reflecting the limitations of metagenomic binning methods that rely on the statistical properties of contigs (e.g., tetranucleotide signature, coverage) to establish putative population genomes. To simulate this, partial genomes were generated by randomly removing contigs with a probability inversely proportional to their length until the simulated genome reached or fell below a target completeness level. Contamination was introduced by randomly selecting another draft reference genome and adding contigs from this genome with a probability inversely proportional to length until the desired level of contamination was reached or exceeded.
The 2430 draft reference genomes comprised of 20 or more contigs were used to simulate partial and contaminated genomes reflecting the characteristics of assembled contigs. Under this random contig model, genomes were generated by randomly removing contigs until the simulated genome reached or fell below a target completeness level. Contamination was introduced by randomly adding contigs with replacement from a single randomly selected genome until the desired level of contamination was reached or exceeded. These 2430 draft genomes were also used to generate genomes reflecting the limitations of metagenomic binning methods that rely on the statistical properties of contigs (e.g., tetranucleotide signature, coverage) to establish putative population genomes. To simulate this, partial genomes were generated by randomly removing contigs with a probability inversely proportional to their length until the simulated genome reached or fell below a target completeness level. Contamination was introduced by randomly selecting another draft reference genome and adding contigs from this genome with a probability inversely proportional to length until the desired level of contamination was reached or exceeded.