Protocol detail

Drug-induced liver injury prediction

Find Similar Protocols

The training set was a subset of CMap consisting of gene-expression data and known DILI status for 190 small molecules (130 of which had been found to cause DILI in patients). The test set consisted of an additional 86 small molecules. The CMap gene-expression data were generated using Affymetrix gene-expression microarrays. In Phase I, we used the Single Channel Array Normalization (SCAN) algorithm [14 (link)]—a single-sample normalization method—to process the individual CEL files (raw data), which we downloaded from the CMap website (https://portals.broadinstitute.org/cmap/). As part of the normalization process, we used BrainArray annotations to discard faulty probes and to summarize the values at the gene level (using Entrez Gene identifiers) [15 (link)]. We wrote custom Python scripts (https://python.org) to summarize the data and execute analytical steps. The scripts we used to normalize and prepare the data can be found here: https://osf.io/v3qyg/.
For each treatment on each cell line, CMap provides gene-expression data for multiple biological replicates of vehicle-treated cells. For simplicity, we averaged gene-expression values across the multiple vehicle files. We then subtracted these values from the corresponding gene expression values for the compounds of interest. Finally, we merged the vehicle-adjusted data into separate files for MCF7 and PC3, respectively.
The SCAN algorithm is designed for precision-medicine workflows in which biological samples may arrive serially and thus may need to be processed one sample at a time [14 (link)]. This approach provides logistical advantages and ensures that the data distribution of each sample is similar, but it does not attempt to adjust for systematic differences that may be observed across samples. Therefore, during Phase II, we generated an alternative version of the data, which we normalized using the FARMS algorithm [16 (link)]—a multi-sample normalization method. This enabled us to evaluate whether the single-sample nature of the SCAN algorithm may have negatively affected classification accuracy in Phase I. Irrespective of normalization method, it is possible that batch effects can bias a machine-learning analysis. Indeed, the CMap data were processed in many batches. Therefore, for SCAN and FARMS, we created an additional version of the expression data by adjusting for batch effects using the ComBat algorithm [17 (link)].

Free full text: Click here

Sumsion G.R., Bradshaw MS I.I.I., Beales J.T., Ford E., Caryotakis G.R., Garrett D.J., LeBaron E.D., Nwosu I.O, & Piccolo S.R. (2020). Diverse approaches to predicting drug-induced liver injury using gene-expression profiles. Biology Direct, 15, 1.

Publication 2020

Biological Cell line Cells Gene Gene expression Mcf7 Microarrays Patients Precision medicine Python

Top 5 similar protocols

Protocol cited in 10 other protocols

Variable analysis

independent variables

Gene-expression data for 190 small molecules in the training set and 86 small molecules in the test set

dependent variables

Known DILI (Drug-Induced Liver Injury) status for the small molecules

control variables

Vehicle-treated cells (multiple biological replicates) were used as controls to adjust the gene-expression data for the compounds of interest

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!