Prediction of potentially disease-causing combinations was performed using VarCoPP [70 (link), 109 (link)] on an in-house cluster. VarCoPP is designed to process alleles in pairs to prioritize disease-causing combinations. This classifier, trained on digenic cases contained in the digenic disease database (DIDA) [110 (link)], uses 11 features at the variant (e.g., CADD raw scores), gene (e.g., haploinsufficiency) and gene-pair level (e.g., biological distance). Specifically, 500 random forest predictors constitute VarCoPP, where each individual predictor classifies a given variant combination. Two scores are assigned to each combination, the classification score CS (i.e., median probability calculated over all the pathogenic probabilities provided by the ensemble of predictors) and the support score SS (i.e., percentage of the 500 predictors that deem the combination pathogenic). Thresholds are defined with regard to these two scores to create confidence zones. We considered bi-locus variant combinations that fells in the 99% confidence zone (CS ≥ 0.74; SS = 100%). These combinations were further inspected using the ORVAL plateform (https://orval.isquare.be) [70 (link)], which incorporates VarCoPP [109 (link)].
Free full text: Click here