Additionally, we developed a model that estimates the probability of occurrence of an observed Single Protein Network arising from the upregulated gene list between HNSCC and normal paired tissue in GSE6631. Each of these unregulated HNSCC gene was translated to its corresponding protein identifier in the network (HNSCC protein). Each HNSCC protein was mapped to each of the rest HNSCC proteins according to existing pairs of protein interactions in the original PPIN yielding an Observed number of distinct Protein Interactions (Observed count of PI). Thereafter, the same procedure was applied to the 10,000 permuted PPINs yielding control counts of distinct protein interactions for each of the UG (Control count of PI). Since each HNSCC protein had a constant node degree in each permutation (see the previous paragraph), this procedure controlled properly for HNSCC proteins having more protein interactions than others thus providing no statistical advantage to those better connected proteins (such as hub or bottleneck proteins). For each HNSCC protein, a P-value was assigned by measuring the frequency at which the “Observed count of PI” of that HNSCC protein occurred in the empirical distribution of 10,000 “Control count of PI” for these specific HNSCC proteins (Table 11 in Text S2). Each HNSCC proteins were subsequently ranked according to its P-value. At each cutoff P-value, a certain number of HNSCC proteins were prioritized. Consequently, a FDR of the prioritized HNSCC proteins (FDR of prioritized proteins) was calculated by dividing the median number of proteins prioritized at that cutoff in the empirical distributions of permuted PPINs divided by the observed number of prioritized HNSCC proteins in the real PPIN. We refer to this approach as single protein analysis in the network (SPAN).
A similar procedure was developed to calculate the FDR over a pair of protein interactors among the observed prioritized HNSCC proteins (FDR of links). A “Prioritized HNSCC PPIN” (Figure 3) was predicted from SPAN in the “genome-scale PPIN” with a FDR of 7.14% for the links between labeled genes and of 10.15% for upregulated HNSCC genes in GSE6631. The resulting network was drawn using Cytoscape [84] (link). Details on the protein interaction dataset supporting each pair of protein interactions are provided in Table 12 in Text S2. Hubs in the PPIN are defined as the top 20% of proteins' node degree (grey nodes in Figure 3A). Similarly, the bottlenecks (grey nodes in Figure 3B) are defined as proteins are the top 20% betweenness score calculated using the “betweenness.c” program we developed (http://www.gersteinlab.org/proj/bottleneck/) [30] (link). 10.4% of the PPIN proteins were observed to have both hub and bottleneck properties. Enrichment studies of hub, bottleneck and hub-bottleneck proteins presented in Figure 3 have been conducted using one-tailed cumulative hypergeometric distribution.
Free full text: Click here