We validate Ginkgo by reproducing major findings of several single cell sequencing studies that employ three different WGA techniques: MALBAC, DOP-PCR/WGA4, and MDA. Take together, we analyze the data characteristics of nine datasets across five tissue types (Table 1). The Ginkgo parameters for these datasets are described in the main text, and additional parameters are noted below.
Reads were mapped to hg19 using bowtie and only uniquely mapped reads (mapping quality score >= 25) were kept. Mapped read counts ranged from 1,538,234 (Ni et al.) to 30,638,853 (Lu et al.) with a mean of 15,827,886. To perform an unbiased comparison, all samples were randomly downsampled to 1,538,234 reads to match the lowest available coverage.
In order to compute the GC biases across all nine datasets we calculate the lowess fit of the log base 2 normalized read counts with respect to the bin GC content for each sample. A sample with no GC bias would have a flat normalized read count of zero across all bins and all GC values. After the lowess fit, we monitor the bias of each cell by calculating the proportion of bins that show a two fold change from the expected coverage in either direction (by +/− 1, log base 2).