We define as a pyramid a directed acyclic graph with a root node. Samples of microenvironment purified cells were labeled according to their reported immune or stromal populations, resulting in 63 distinct labels in the MCP discovery series, with an additional 15 labels for the MCP validation series, resulting in a total of 78 labels. We organized these labels in a pyramidal graph (Additional file
2: Figure S1) with nodes representing populations (categories) and directed edges representing relations of inclusion. For instance, the labels “CD8
+ T cells”, “CD4
+ T cells”, “Tγδ cells”, “Memory T cells”, “Activated T cells”, and “Naïve T cells” and all labels included in them (for instance “Effector-memory CD8 T cells”) form the “T cells” category, which itself is included in the “T/NK lineage” category. Of these 78 sample labels, some correspond to terminal leaves of this pyramid (e.g., “Canonical CD4 Treg cells”), while others correspond to higher level nodes (e.g., peripheral-blood mononuclear cells (“PBMC”)). In addition to these 78 labels, 15 hematopoiesis or immunology-inspired categories that are not directly represented by samples but relevant for their organization in a structured pyramid (for instance “Lymphocytes”) or as a potential cell population (for instance “antigen-experienced B cells”) were added (Additional file
1: Table S13). Categories corresponding to tumor samples were discarded for the identification of TM and only kept as negative controls, resulting in 68 categories available for screening.
Having defined this set of 78 labels and 68 categories (53 categories are directly represented by labels, with 15 additional categories not directly represented in the dataset), we exhaustively encoded the relationships between labels and categories using three possible relationships (Additional file
1: Table S13). Relative to a category, we define three sets of samples:
C : “positive samples” are those whose label is included in the category (all cells composing a sample which is in C are in the category)
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$ \overline{C} $$\end{document}C¯ : “negative samples” are those whose label is strictly non-overlapping with the category (all cells of a sample which is in \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$ \overline{C} $$\end{document}C¯ are not in the category)
-1 : “mixed samples” are those whose label is partly overlapping with the category (some cells of the sample are in C and some are in \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$ \overline{C} $$\end{document}C¯).
For instance, for CD8
+ T cells,
C is the set of samples whose label is “CD8 T cells” or “Effector memory CD8 T cells” (Additional file
2: Figure S1; Additional file
1: Table S13), mixed samples are, for instance, CD3
+ T cells as they mix CD4
+ and CD8
+ T cells, or PBMC as they mix CD8
+ T cells with, e.g., monocytes.
is defined as all non-positive non-mixed samples.
Note that the relationships represented in Additional file
2: Figure S1 only correspond to the “direct inclusion” relationship, which is transitive (we thus removed for clarity all the arrows which can be inferred by transitivity). Hence, strict exclusion or mixture relationships are not represented but are taken into account during the screening process (the related information is available in Additional file
1: Table S13).
Becht E., Giraldo N.A., Lacroix L., Buttard B., Elarouci N., Petitprez F., Selves J., Laurent-Puig P., Sautès-Fridman C., Fridman W.H, & de Reyniès A. (2016). Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biology, 17, 218.