Skip to content

Official implementation of Bagging Folds using Synthetic Majority Oversampling for Imbalance Classification

Notifications You must be signed in to change notification settings

WiktorPieklik/BaggFold

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Official implementation of BaggFold

Bagging Folds using Synthetic Majority Oversampling is a novel approach designed to address the challenges of binary imbalanced data classification. It is a meta-framework that incorporates techniques such as data partitioning, threshold optimization, oversampling of the majority class and classifier ensemble.

In the initial stage, BaggFold divides the data set into perfectly balanced folds, each containing an equal number of minority and majority instances. The majority of instances are selected without replacement.

In the second step, each fold is uniquely assigned to a base classifier. Subsequently, each classifier fine-tunes its decision threshold using Youden's J statistics.

The final step involves training in a modified bagging fashion, where BaggFold performs both training and inference concurrently. This approach enhances performance and reduces inference times, contributing to the overall efficiency of the framework.

baggfold framework

This repository also implements two SMOTE techniques:

  • Center Point SMOTE (CP-SMOTE),
  • Inner and Outer SMOTE (IO-SMOTE).

Both are presented in a paper Two novel smote methods for solving imbalanced classification problems by Yuan Bao & Sibo Yang.

To setup project

  1. Create virtual environment using your favourite tool.
  2. Activate created virtual environment.
  3. Within virtual env run
python install.py

In main.py there is a demo code to get started with BaggFold.

About

Official implementation of Bagging Folds using Synthetic Majority Oversampling for Imbalance Classification

Topics

Resources

Stars

Watchers

Forks

Languages