We consider three types of genewise cell type-relative expression measurements: specificity, enrichment, and absolute expression levels (Fig. 1). Specificity is defined as the difference between a gene’s expression in the cell type of interest compared to the other cell type in which it has its highest expression. Enrichment is defined as the difference between a gene’s expression in the cell type of interest compared to all of the other cell types. Finally, absolute expression is defined as the relative expression of a gene within a cell type, irrespective of that gene’s expression in other cell types. To calculate specificity and enrichment, we first filtered the data sets to retain only those genes that had an average (arithmetic mean) of at least five read counts in at least one cell type. Next, we estimated the dispersion of each gene and fit a negative binomial generalized linear model to the count data using the R package edgeR58 (link). In all data sets, cell type was modeled as a covariate, alongside adjustment covariates specific to each data set (Table 1). We set the prior.count variable in edgeR to 10, which adds pseudocounts to each observation relative to the library size of each sample, thus increasing the proportion of shrinkage to allow for more robust signature estimation. To calculate cell type enrichment, we compared the expression of samples annotated to that cell type, which we call the cell type of interest, to the expression of samples annotated to all the other major brain cell types, which we call the reference cell set. For example, samples annotated to astrocytes were compared to samples annotated to endothelial cells, neurons, microglia, oligodendrocytes, and OPCs. The exception is that either oligodendrocytes or OPCs were excluded from the reference cell set when the other was the cell type of interest, since their expression patterns are too similar to allow for meaningful reference comparisons. To calculate cell type specificity, we performed contrasts on the fitted models that compared each cell type to all reference cell types individually, and chose the minimum resulting fold-change for each cell type of interest. For each of the cell types in each of the data sets, we created a volcano plot with the results of the differential expression enrichment analysis and highlighted the genes that had Benjamini-Hochberg59 (link) adjusted p-value less than 0.05 and fold-change in the cell type of interest versus the others of greater than or equal to 4.
To calculate absolute expression within each cell type, we first normalized count values by the quantile method, as above, as well as by read length in order to generate RPKM (Reads Per Kilobase of transcript per Million mapped reads) values. We then calculated the arithmetic mean of the RPKMs within each cell type, and quantified the associated dispersion by finding the standard error of the mean. Genes within each sample were ranked by their expression values in order to facilitate cross data set comparisons.
Free full text: Click here