Benchmarking MCA-Miner against FP-Growth in BRL

Example 1

The MCA-miner method disclosed herein in FIGS. 2A-2C, when used together with BRL, offers the power of rule list interpretability while maintaining the predictive capabilities of already established machine learning methods.

The performance and computational efficiency of the new MCA-miner is benchmarked against the “Titanic” dataset, as well as the following five (5) datasets available in the UCI Machine Learning Repository: “Adult,” “Autism Screening Adult,” “Breast Cancer Wisconsin (Diagnostic),” “Heart Disease,” and “HIV-1 protease cleavage,” which are designated as Adult, ASD, Cancer, Heart, and HIV, respectively. These datasets represent a wide variety of real-world experiments and observations, thus enabling the improvements described herein to be compared against the original BRL implementation using the FP-Growth miner.

All six benchmark datasets correspond to binary classification tasks. The experiments were conducted using the same set up in each of the benchmarks. First, the dataset is transformed into a format that is compatible with the disclosed BRL implementation. Second, all continuous attributes are quantized into either two (2) or three (3) categories, while keeping the original categories of all other variables. It is worth noting that depending on the dataset and how its data was originally collected, the existing taxonomy and expert domain knowledge are prioritized in some instances to generate the continuous variable quantization. A balanced quantization is generated when no other information was available. Third, a model is trained and tested using 5-fold cross-validations, reporting the average accuracy and Area Under the ROC Curve (AUC) as model performance measurements.

Table 1 presents the empirical result of comparing both implementations. The notation in the table follows the definitions above. To strive for a fair comparison between both implementations, the parameters rmax=2 and smin=0:3 are fixed for both methods, and in particular for MCA-miner μmin=0:5 and M=70 are also set. The multi-core implementations for both the new MCA-miner and BRL were executed on six parallel processes, and stopped when the Gelman & Rubin parameter satisfied {circumflex over (R)}≤1.05. All the experiments were run using a single AWS EC2 c5.18×large instance with 72 cores.

TABLE 1

Performance evaluation of FP-Growth against MCA-miner

when used with BRL on benchmark datasets. t_trainis the full training wall time.

FP-GROWTH + BRLMCA-MINER + BRL

DATASETnpΣ_t-1^p|α₁|ACCURACYAUCt_train[s]ACCURACYAUCt_train[s]

Adult45.222141110.810.855120.810.85115

ASD24821890.870.901980.870.9016

Cancer569321500.920.971680.920.9422

Heart30313490.820.861170.820.8615

HIV5.84081600.870.884490.870.8836

Titanic2.201380.790.761180.790.7510

It is clear from the experiments in Table 1 that the new MCA-miner matches the performance of FP-Growth in each case, while significantly reducing the computation time required to mine rules and train a BRL model.

Free full text: Click here

US11857322B2. Systems and methods for screening, diagnosing, and stratifying patients (2024-01-02). BlackThorn Therapeutics, Inc. [US]. Inventors: Qingzhu Gao [US], Humberto Andres Gonzalez Cabezas [US], Parvez Ahammad [US], Yuelu Liu [US].

Patent 2024

Adult Autism Breast cancer Cancer Cleavage Diagnostic Figs Heart Heart disease Hiv 1 protease Hiv 2

Top 5 similar protocols

Variable analysis

independent variables

FP-GROWTH
MCA-MINER

dependent variables

ACCURACY
T_train

control variables

Rmax=2
Smin=0.3
μmin=0.5

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!