Perturbation signatures of cell lines treated with drugs from the Connectivity Map1 (link) were compared to various breast cancer backgrounds to investigate how drug induced gene expression relates to disease signature. Unweighted SAGES for all overexpressed genes for each experimental sample were calculated and compared to unweighted SAGES of all proteins from the proteomics samples with log ratio expression greater than one and to unweighted SAGES of all genes from the COSMIC mutated breast cancer gene dataset. Unweighted SAGES were used to ensure that all overexpressed proteins contributed equally to the analysis. For each sample, the number of significantly different features from background were counted. The significance level of 0.05 was selected. Multiple hypothesis testing was not employed because the aim was ultimately to assess similarity to background rather than difference from background and using this technique would increase the type II error. The samples were divided into breast cancer treatment drugs (doxorubicin, fulvestrant, letrozole, megestrol, methotrexate, paclitaxel, raloxifene, tamoxifen, and vinblastine) according to a list published by the National Cancer Institute, and all other drugs in the Connectivity Map database. The average number of statistically significantly different features for each group were calculated and a two-sided, type 2, student t test was used to determine the p value.
Gene expression of the perturbation samples and the expression signatures used to calculate the SAGES of the two backgrounds were also directly compared. The Jaccard coefficient, which is the intersection of the genes overexpressed in both sets over the union of the genes overexpressed in both sets, was used to determine signature similarity. Samples were divided into breast cancer and other drugs and compared with a two-sided, type 2, student t test. Additionally, the average Jaccard coefficient for both groups was determined.
Free full text: Click here