We provide a statistical significance assessment for the presence of a cell type in the mixture by learning scores distributions for cell types in random mixtures. For each cell type X, we generate a random matrix as follows: In each reference data set we find all cell types corresponding to samples, except X and its parent or descendants (if X is CD8+ Tem cells, then we also exclude CD8+ T cells; if X is CD8+ T cells, we exclude all CD8+ cell types). We then use the same procedure we used for generating training samples, but adding an additional 5% random noise. The main difference here is that we randomly mix in all cell types (except X) and not just a small subset. We then run the xCell pipeline for these random mixtures. In most cell types the produced scores show similarity to a beta distribution; thus, using the fitdistr function from the MASS package, we fit such a distribution for each of the mixtures we generated (e.g., for a mixture excluding cell type X we fit a beta distribution for cell type X). In five of the cell types the scores from the random mixtures consistently produced 0; thus, we define those distributions as constant 0.001 (Additional file 2 : Figure S7). Given an input data set, we can now calculate a p value for each xCell score with the null hypothesis that the cell type is not present in the mixture. The actual distributions we use to calculate the p values are combinations of those learned from FANTOM5, Blueprint, and ENCODE for sequencing-based input, and IRIS, HPCA, and Novershtern for microarray-based input. The p value for a score of a cell type in a sample is the chance of the region in the distribution of the corresponding cell type to exceed the score. In the testing samples we used a threshold of 20% to define a non-significant score. We used this threshold to have a trade-off between detecting the non-negligible scores of cell types not in the mixture and not detecting scores of cell type in the mixture, thus affecting the power of estimating the underlying cell type fractions (Additional file 4 ).
Full text: Click here