Reads were demultiplexed and, where applicable, the remaining poly-A tail of the mRNA was trimmed off. Reads were then aligned to the Homo Sapiens genome (build 37.68, also containing the ERCC spike in sequences) using GSNAP47 (link), with the expected paired-end length set to 400bp and the allowable deviation from the expected paired-end length set to 100bp. Reads overlapping uniquely with mRNA genes were counted using HTSeq48 (link). As a first filtering step, we retained all cells in which we observed more than 750 genes at a minimum of 10 reads each, and a total of at least 150,000 reads. We removed all genes from the dataset that were not observed by at least 10 reads in at least 5 cells. Statistics on these filtering steps are displayed in Supplementary Fig. 2.
We then fitted error models49 (link) to the readcount data (see also below). In 35 cells of individual 2 and 1 cell of individual 1, we observed an extreme overdispersion of the genes classified as non-dropout events. These cells were removed. In Individual 1, we further excluded 13 cells with an abnormal CD38-CD90high immunophenotype (Supplementary Fig. 1a). These cells were clear outliers also with regard to gene expression, as they mostly expressed genes associated with various types of mature immune cells (not shown).