The Fluidigm assay is sensitive, and owing to the exponential amplification of starting mRNA, even minute contamination can render a measurement unreliable. Similarly, variation in cell preparation can have significant impact on the resulting experiment and data, such as unintentional empty wells, which would distort estimates of . This suggests identifying, and possibly removing outliers before conducting further analysis. We examine both the discrete component and the continuous component in screening for outliers. We define the robust z-transformed positive expression value as
where the median and median absolute deviation (MAD) are calculated, for a given gene, over expressed cells (i.e. ), and is a scaling constant that gives the standard deviation in terms of the MAD for the normal distribution. Next, let be the Bernoulli variance-stabilizing transformation of the proportion of genes expressed in well . Then, we define the robust z-transformed fraction as
where the median, MAD and are as defined previously. This leads to the following steps for filtering:
Supplementary Material . Using this approach, we find that picking works well for the datasets we consider here, see Section 3.
where the median and median absolute deviation (MAD) are calculated, for a given gene, over expressed cells (i.e. ), and is a scaling constant that gives the standard deviation in terms of the MAD for the normal distribution. Next, let be the Bernoulli variance-stabilizing transformation of the proportion of genes expressed in well . Then, we define the robust z-transformed fraction as
where the median, MAD and are as defined previously. This leads to the following steps for filtering:
Remove null cells with no detected genes, i.e.
Pick threshold for z filtering (
Calculate
Remove wells in which genes have