Each gene belongs to at least one family that contains a group of genes generated from a common ancestral gene. In some cases, the whole family is associated with a single disease. Accordingly, the associations of each gene in the family of SF3B1 with breast and hematologic cancer types were investigated and compared with the obtained results from the previous analyses of gene-based data. As reported in the HGNC database, SF3B1 is a member of three families:
Armadillo-like helical domain containing (ARMH): 244 genes have a common superhelical structure, but they have different functions36 (link).
SF3b complex: 7 genes form the multi-component SF3b complex to recognize the branch point of pre-mRNA for splicing. SF3b complex is also a subgroup of “U2 small nuclear ribonucleoprotein” which has a root family of “Major spliceosome". There are 145 genes by considering all subfamilies of “Major spliceosome".
B-WICH chromatin-remodelling complex subunits (B-WICH): 8 genes are involved in the mechanism of regulating RNA Polymerase III Transcription37 (link).
Two approaches were used to test the effect of including other genes from the same family in the association with any cancer type. First, we accessed HGNC database to obtain two lists of genes from ARMH and B-WICH families. For major spliceosome, we did not include all genes since the cryo-EM study demonstrated that SF3b complex is disassociated after the late B
act state
38 (link). Spliceosomal C and spliceosomal P complexes formed after B
act state. In addition, spliceosomal E complex is a superfamily of “U1 small nuclear ribonucleoprotein”. Therefore, we excluded these three families. The final list included 244 ARMH genes, 8 genes from B-WICH complex, and 70 spliceosome components. Then, we merged each family’s list with COSMIC tables of MDS, AML, CLL, and BC separately. The numbers of mutations and samples were retrieved for a comparison with the resulted values associated only with SF3B1.
In the other approach, Reactome FI app was used to derive a gene network of each family and highlight common pathways among its genes. Then, a gene-disease network was derived using DisGeNET app for each family network. Based on the number of associated genes and disease’s degree parameters, we could determine whether including genes from any of the three families would increase the association with the considered cancer types. As explained in detail in the results section, adding genes, which have the same origin, increased the association with specific cancer types.