Skip to content

tabular_augmentation contains some classical and noval methods used for data augmentation, it makes tabular data augmentation easier.

License

Notifications You must be signed in to change notification settings

zhxfei/tabular_augmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Description

tabular_augmentation contains some classical and novel methods used for data augmentation, making tabular data augmentation easier, especially for few-shot learning case.

Usage

SMOTE-based methods

from tabular_augmentation import smote_augmentation
method = 'SVMSMOTE'
x_synthesis, y_synthesis = smote_augmentation(x_few_train, y_few_train, method, seed=seed,
                                              oversample_num=100, positive_ratio=None,
                                              knn_neighbors=3)
tabular_model_test(x_synthesis, y_synthesis, x_test, y_test, model_name='xgb')

Mixup-base methods

from tabular_augmentation import mixup_augmentation_with_weight
method = 'vanilla'
x_synthesis, y_synthesis, sample_weight = mixup_augmentation_with_weight(
            x_few_train, y_few_train, oversample_num=200, alpha=1, beta=1, mixup_type=method, seed=seed, rebalanced_ita=1)
tabular_model_test(x_synthesis, y_synthesis, x_test, y_test, model_name='xgb', sample_weight=sample_weight)

CTGAN/TVAE-based methods

Methods(CTGAN/TVAE/DeltaTVAE/DiffTVAE) use sdv_synthesis function to generate synthetic data, and ConditionalTVAE use sdv_synthesis_cvae function

from tabular_augmentation import sdv_synthesis, sdv_synthesis_cvae
method = 'CTGAN'

x_synthesis, y_synthesis = sdv_synthesis(
            x_few_train, y_few_train, method, oversample_num=5000,
            seed=seed, init_synthesizer=True, positive_ratio=0.5,
        )
tabular_model_test(x_synthesis, y_synthesis, x_test, y_test, model_name='xgb')

TabDDPM-based methods

from tabular_augmentation import ddpm_synthesis

method = "DDPM"

x_synthesis, y_synthesis = ddpm_synthesis(
            x_few_train, y_few_train, method, oversample_num=5000, seed=seed, init_synthesizer=True, positive_ratio=None, train_steps=10000)
tabular_model_test(x_synthesis, y_synthesis, x_test, y_test, model_name='xgb')

Example

For details, please refer to example.ipynb

Cite

SMOTE

imbalanced-learn

MIXUP

[ICLR' 18]mixup: BEYOND EMPIRICAL RISK MINIMIZATION Mixup

[ICLR' 22]Noisy Feature Mixup NoisyMixup

[ECCV' 20]Remix: Rebalanced Mixup

CTGAN/TVAE

[NIPS' 19]Modeling Tabular data using Conditional GAN CTGAN

TabDDPM

[ICML' 23] TabDDPM: Modelling Tabular Data with Diffusion Models TabDDPM

About

tabular_augmentation contains some classical and noval methods used for data augmentation, it makes tabular data augmentation easier.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published