Simulating Class and Batch Effects in Omics Data

To simulate class-effect proportion (CEP), class effects are applied onto 0, 0.2, 0.5 and 0.8 of measured proteins (Fig. 2). The magnitude of applied effect sizes is randomly selected from 0.2, 0.5, 0.8, 1 and 2. The class effect is applied in one class, but not the other, and is a proportionate increment. For example, a 0.2 class-effect level means a 20% increment from the original value. When CEP is high, it leads to sample classes whose basal expression states are drastically different.Figure 2

Simulation strategies for data with simulated class and batch effects (A) and data with real batch effects, but simulated class effects (B).

Batch effects are simulated similarly, except the batch effects are inserted according to batch factors (the categorization of technical batches). In this simplistic scenario, we simply assign half of the samples of each class, to each batch.
Since the set of differential variables are known a priori, normalization performance across the five strategies may be evaluated by statistical feature selection (based on the two-sample t test; α = 0.05 significance level) and overall batch-effect correction based on the gPCA delta^{21 (link)} (see below).
For statistical feature selection, the precision, recall and their harmonic mean (the F-score) are used. These are expressed as:

\begin{matrix} Precision = \frac{TP}{T P + F P} \\ Recall = \frac{TP}{T P + F N} \\ F-score = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l} \end{matrix}

where TP, FP and FN refer to true positives, false positives and false negatives, respectively. The efficacy of batch correction is evaluated using gPCA^{21 (link)}. The gPCA delta measures the proportion of variance due to batch effects in test data, and is bound between 0 and 1. Ideally, we want this to be as low as possible following normalization.

Free full text: Click here

Zhao Y., Wong L, & Goh W.W. (2020). How to do quantile normalization correctly for gene expression data analyses. Scientific Reports, 10, 15534.

Publication 2020

Proteins Recall Tptp

Corresponding Organization : Nanyang Technological University

Other organizations : National University of Singapore

Top 5 similar protocols

Protocol cited in 11 other protocols

Variable analysis

independent variables

Proportion of measured proteins to which class effects are applied (0, 0.2, 0.5, 0.8)
Magnitude of applied class effect sizes (0.2, 0.5, 0.8, 1, 2)
Presence/absence of class effect (applied in one class, not the other)

dependent variables

Precision, recall, and F-score of statistical feature selection
GPCA delta (measure of batch effect correction efficacy)

control variables

Significance level (α = 0.05) for statistical feature selection
Half of the samples of each class assigned to each batch for batch effect simulation

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!