In the study reported in Bhattacharjee et al. [30 (link)], gene expression levels in 6 small-cell lung carcinomas (SMC), 21 squamous cell lung carcinomas (SQ), 20 pulmonary carcinoids (COID) and 17 normal lung specimens were measured. We calculate the t-statistic for all the genes using the 6 SMCs, 21 SQs or 20 COIDs versus the 17 normal samples, resulting in three t-score profiles for SMC/normal, SQ/normal, and COID/normal, respectively. Each of these profiles is taken as the expression differentiation vector e. The binding affinity data is calculated based on the 546 positional weight matrices (PWMs) in vertebrates extracted from TRANSFAC9.4 [32 (link)]. For each of these 546 PWMs, we used the program MATCH to scan the upstream regions of all human genes from the transcription start site up to 1000 bp [31 (link)]. To minimize the false positive rate, the pre-calculated cut-off values for these PWMs (provided by the MATCH program) are used. The matching-scores for all significant hits of the same PWM in each upstream region are aggregated. When no hit is found in the upstream region of a gene, the score is set to 0. The vector of the aggregated matching-scores for each PWMs is taken as the binding vector m. The above data processing results in 3 expression change vectors (SMC, SQ, and COIDs) and 546 binding vectors. For each combination of expression change profiles and matching-score vectors, we applied our method to calculate the AC score as well as its significance.
Free full text: Click here