This dataset is split into an 80% development set, used to train predictive models, and a 20% internal holdout test set, used for model evaluation. Datasets are constructed such that no data from a single patient appears in different data splits; i.e., all data splits are done on a per-patient basis. We further split the development set on a per-patient level using an 80–20 split into training and “dev” sets. The training set is used to train the model and the dev set is used to determine when training is completed.
Our second dataset consists of 2725 records from 1249 unique patients who underwent cardiac catheterization at the Brigham and Women’s Hospital (Hospital 2). As with data from MGH, these patients all had a diagnosis of heart failure (according to ICD 9/10 codes in their medical record) within the 1 year prior to their catheterization date. We used this entire dataset as an external validation set for model evaluation.
Each record in the datasets consists of: the mean Pulmonary Capillary Wedge Pressure (as measured by cardiac catheterization), a 10-s, 12-lead ECG recorded by the same system (GE Healthcare MUSE) on the same day as the catheterization procedure, and basic demographic information (age/sex). Dataset details are summarized in Table
Model performance (AUROC) on test data. HFNet significantly outperforms the baseline logistic regression (LR) model.
Model | AUROC | |
---|---|---|
Internal test set | External holdout set | |
LR | 0.71 + − 0.01 | 0.67 + − 0.01 |
HFNet |
Significant values are in bold.
Key: *: p value < 1e − 10.