Comprehensive Preprocessing of Sequencing Datasets

Full details of each dataset^{9 (link),10 (link),17 (link),20 ,21 (link),37 (link),42 ,50 (link),55 ,64 (link)–69 (link)}, including data type, sample type, source, and normalization approach, are available in Supplementary Table 1. Briefly, next generation sequencing datasets were downloaded and analyzed using the authors’ normalization settings unless otherwise specified; these consisted of transcripts per million (TPM), reads per kilobase of transcript per million (RPKM), or fragments per kilobase of transcript per million (FPKM) space. For analyses in log₂ space, we added 1 to expression values prior to log₂ adjustment. Affymetrix microarray datasets were summarized and normalized as described in ‘Gene expression profiling – Microarrays’ (Supplementary Note 1), using RMA in cases where bulk tissues and ground truth cell subsets were profiled on the same Affymetrix platform, and otherwise using MAS5 normalization. NanoString nCounter data were downloaded from the supplement of Chen et al.²⁰ and analyzed with batch correction in non-log linear space, but without any additional preprocessing.
Two publicly available PBMC datasets from healthy donors profiled by Chromium v2 (5’ and 3’ kits) were downloaded (Supplementary Table 1) and preprocessed as described in ‘Gene expression profiling – Single-cell RNA-seq’ (Supplementary Note 1), with the following minor modifications. During quality control, we excluded cells with >5000 expressed genes for 5’ PBMCs, >4000 expressed genes for 3’ PBMCs, and <200 expressed genes for both datasets. Seurat “FindClusters” was applied on the first 20 principal components, with the resolution parameter set to 0.6. Cell labels were assigned as described above. In addition, myeloid cells were defined by high CD68 expression, megakaryocytes by high PPBP expression, and dendritic cells by high FCER1A expression.
For the 3’ FL signature matrix in Supplementary Figs. 11d, and14a-b, publicly available 10x Chromium v2 scRNA-seq data (3’ kit)⁷⁰ were downloaded (Supplementary Table 1) and preprocessed as described for the 10x PBMC signature matrices above, but with the following differences. Seurat “FindClusters” was applied on the first 10 principal components, with the resolution parameter set to 0.6. Cell labels were assigned based on the following canonical marker genes (MS4A1 = B cells; CD3E, CD8A and CD8B = CD8 T cells; CD3E and CD4 = CD4 T cells).

Partial Protocol Preview
This section provides a glimpse into the protocol.
The remaining content is hidden due to licensing restrictions, but the full text is available at the following link: Access Free Full Text.

Newman A.M., Steen C.B., Liu C.L., Gentles A.J., Chaudhuri A.A., Scherer F., Khodadoust M.S., Esfahani M.S., Luca B.A., Steiner D., Diehn M, & Alizadeh A.A. (2019). Determining cell-type abundance and expression from bulk tissues with digital cytometry. Nature biotechnology, 37(7), 773-782.

Publication 2019

B cells Bulk Cd4 t cells Cd8 t cells Cell Chromium Dendritic cells Donors Figs Genes Megakaryocytes Microarrays Myeloid cells Ppbp Rna seq Scrna seq Supplement Tissues

Corresponding Organization : Stanford University

Top 5 similar protocols

Protocol cited in 210 other protocols

Variable analysis

independent variables

None explicitly mentioned

dependent variables

Gene expression

control variables

None explicitly mentioned

positive controls

None mentioned

negative controls

None mentioned

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!