Skip to content

tmehari/feature_importance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ECG Feature Importance Rankings: Cardiologists vs. Algorithms

This repository accompanies the papers Multi-Class ECG Feature Importance Rankings: Cardiologists vs Algorithms and ECG Feature Importance Rankings: Cardiologists vs. Algorithms.

Installation

Create and activate an Anaconda environment:

conda env create -f environment.yml
conda activate fi

Compute Feature Importance Rankings

TLDR;

To reproduce the results from ECG Feature Importance Rankings: Cardiologists vs. Algorithms, simply run

for i in {1..5}
do
    python code/run_binary_feature_importance_experiments.py --output_dir "output${i}"
done

Then, navigate through the ./scoring_algorithm.ipynb notebook to generate all the tables presented in the study.

There are two different implementations for feature importance rankings:

  1. Binary Classification: One pathology out of [AVBlock, LBBB, RBBB] vs NORM.
  2. One vs All: Train a one-vs-all classification task for one pathology out of [AVBlock, LBBB, NORM, RBBB] vs the others.

The binary classification is implemented in code/run_binary_feature_importance_experiments.py, and the one-vs-all classification is implemented in code/run_multiclass_feature_importance_experiments.py.

We offer three model-dependent explanation methods: [lime, shap, permutation], for these four models: [random forest, logistic regression, xgb, neural network].

Additionally, we offer five filter methods, which are methods that do not use machine learning models at their base. Namely, these are: Maximum Relevance - Minimum Redundancy (MRMR), Neighbourhood Component Analysis (NCA), Relieff, Modified ROCAUC, and the χ2 Test.

Some machine learning models implicitly learn feature importance values. We offer the feature importance values of three models: [random forest, logistic regression, Gaussian Process Classifier], and call them the implicit feature importance methods.

Hence, in total, we offer 3*4+5+3=20 feature importance methods. You can simply run all experiments for all pathologies by

python code/run_binary_feature_importance_experiments.py --output_dir ./path/to/output/directory

If you do not specify an output directory, the results will be saved to ./output. If you are interested in a single experiment you can, for example, calculate the rankings for the binary classification task for LBBB vs NORM with shap as the explanation method and logistic regression and random forest as models, respectively, by running:

python code/run_binary_feature_importance_experiments.py --patho lbbb --method shap  --model lr --model rf

You can get all rankings for a pathology (e.g., LBBB) by simply running:

python code/run_binary_feature_importance_experiments.py --patho lbbb

The rankings are saved as CSV files in ./output. Here is an overview of the possible command-line arguments:

Argument Explanation
--patho Defines the pathology (AVBlock, Lbbb, Rbbb) for which the feature importance methods shall be computed.
--model Defines the machine learning model (not relevant if using a filter method).
--method Defines the feature importance method.

For running the code, use the following abbreviations for the models and methods:

Model Abbreviation
Random Forest rf
Logistic Regression lr
XGB xgb
Neural Network dn
Gaussian Process Classifier gp
Methods Abbreviation
SHAP shap
LIME lime
Permutation permutation
Implicit implicit
Filter Methods Abbreviation
MRMR mrmr
NCA nca
Relieff relieff
Modified ROCAUC modifiedrocauc
χ2 Test chisquared

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published