OVBLR-SFE : An Optimal Variational Bayesian Logistic Regression (OVBLR) model with a Salient Feature Estimation strategy.
OVBLR-SFE is a novel interpretable machine learning method that leverages feature importance to enhance interpretability. This method incorporates variational inference and a Bayesian framework to approximate the posterior probability distribution, and utilizes the estimated parameters of the posterior distribution as weights for regression coefficients. Additionally, we have defined the concept of significant features based on a 95% confidence interval ( 95%CI) to facilitate the selection of important features in high-dimensional datasets.
The code for OVBLR-SFE is implemented based on PRML and scikit-learn.
- Feature Importance: OVBLR-SFE focuses on identifying and quantifying the importance of features within a dataset.
- Variational Inference: The method utilizes variational inference techniques to approximate the posterior probability distribution.
- Bayesian Framework: OVBLR-SFE adopts a Bayesian framework, allowing for a principled approach to modeling and inference.
- Weighted Regression Coefficients: The estimated parameters of the posterior probability distribution are employed as weights for the regression coefficients.
- Significance-based Feature Selection: The concept of significant features is defined based on a 95% confidence interval (95%CI) to aid in selecting important features in high-dimensional datasets.
pip = "*"
sklearn = "*"
ipykernel = {version = "*", index = "https://pypi.douban.com/simple"}
ffmpeg = "*"
matplotlib = "*"
scikit-learn = "*"
pandas = "*"
numpy = "*"
imblearn = "*"
seaborn = "*"
openpyxl = "*"
polling = "*"
socks = "*"
lime = "*"
shap = "*"
eli5 = "*"
ipython = "*"
jupyter = "*"
mglearn = "*"
self-paced-ensemble = "*"
tabulate = "*"
pymoo = "*"
python setup.py install
- Import the OVBLR-SFE module
from prml.linear import VariationalLogisticRegression
-
Prepare your dataset and ensure it is in the appropriate format.
-
Train the OVBLR-SFE model:
vlr = VariationalLogisticRegression(a0=1, b0=1)
vlr.fit(X_train, y_train, feature_names)
- Obtain feature importance scores:
importance_scores = vlr.feature_importance()
--------------------------
(feature_name, weight, lower, upper, is_salient_feature)
('Bare_Nuclei', '1.9137 ± 0.2416', Decimal('1.4402'), Decimal('2.3872'), True),
('Clump_Thickness', '1.4302 ± 0.2088', Decimal('1.0210'), Decimal('1.8394'), True)
......
- Prediction using trained models:
y_pred_prob = vlr.proba(Xtest)
y_pred = vlr.predict(Xtest)
score = vlr.score(Xtest, Ytest)
You can run single_test.py. For more results see the images and image_acme folder.