Gene Expression Data Normalization Protocol

We started from data sets that were already normalized for their respective study without any additional normalization procedure to account for different platform derivation. For the signal intensity data generated by one-channel oligonucleotide microarrays, Affymetrix's GeneChip, we applied a lower threshold of 20U and a upper threshold of 16,000U. For the log2 transformed ratio data generated by cDNA microarrays, we first removed genes whose values were missing in more than 5% of the samples, and then imputed the missing values for the rest of the genes using a k-nearest neighbor algorithm [15] (link) (ImputeMissingValues.KNN, in the GenePattern software package, http://www.broad.mit.edu/genepattern/).
Before marker gene selection, we used following gene filtering. For the oligonucleotide array data, only genes exhibiting at least 3-fold differential expression and an absolute difference of at least 100 units across the samples in the experiment were included. For the cDNA array data, only genes with an absolute log2 ratio greater than one and whose difference in log2 ratio across all the samples in the data set was greater than one were included.
Before applying the SubMap, each microarray probe ID was converted into its corresponding HUGO gene symbol (http://www.gene.ucl.ac.uk/nomenclature/), and multiple probe data corresponding to a single gene symbol was averaged. The number of genes remaining for our analyses of multiple tissue types, DLBCL, breast cancer, and DLBCL (with survival data) data sets were 5565, 661, 1213, and 3795, respectively.

Free full text: Click here

Hoshida Y., Brunet J.P., Tamayo P., Golub T.R, & Mesirov J.P. (2007). Subclass Mapping: Identifying Common Subtypes in Independent Disease Data Sets. PLoS ONE, 2(11), e1195.

Publication 2007

Breast cancer Cdna array Gene Gene selection Genechip Genes Microarray Oligonucleotide array Tissue types

Corresponding Organization :

Other organizations : Massachusetts Institute of Technology, Broad Institute, Dana-Farber Cancer Institute

Top 5 similar protocols

Protocol cited in 154 other protocols

Variable analysis

independent variables

Affymetrix's GeneChip one-channel oligonucleotide microarray data
CDNA microarray data

dependent variables

Signal intensity data generated by one-channel oligonucleotide microarrays
Log2 transformed ratio data generated by cDNA microarrays

control variables

The data sets were already normalized for their respective study without any additional normalization procedure to account for different platform derivation.
For the signal intensity data generated by one-channel oligonucleotide microarrays, Affymetrix's GeneChip, a lower threshold of 20U and an upper threshold of 16,000U were applied.
For the log2 transformed ratio data generated by cDNA microarrays, genes whose values were missing in more than 5% of the samples were removed, and the missing values for the rest of the genes were imputed using a k-nearest neighbor algorithm.
Before marker gene selection, for the oligonucleotide array data, only genes exhibiting at least 3-fold differential expression and an absolute difference of at least 100 units across the samples were included.
Before marker gene selection, for the cDNA array data, only genes with an absolute log2 ratio greater than one and whose difference in log2 ratio across all the samples in the data set was greater than one were included.
Before applying the SubMap, each microarray probe ID was converted into its corresponding HUGO gene symbol, and multiple probe data corresponding to a single gene symbol was averaged.

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!