# Watershed-based oversampling for imbalanced dataset classification: WSSMOTE

For several real-world problems, the dataset is composed of two or more imbalanced classes: a minority and majority ones. With usual machine learning methods, this imbalance often leads to poor results where the majority class is over-fitted while the minority class is misclassified. To alleviate these issues, several pre-processing methods, such as SMOTE andDBSMOTE, create new artificial points for the minority class. Neverthe-ess, these oversampling methods explicitly or implicitly make hypothesesabout the clusters size, shape, or density that may not fit the datasetin  practice.  We  propose  to  improve  these  oversampling methods and reduce cluster assumptions, by relying on a classifier:  the  watershed-cut.  We  called  this  method  WSSMOTE.

The following code generates four tabs containing G-mean score for several imabalanced datasets (https://sci2s.ugr.es/keel/datasets.php). These 12 tables corespond to Tab 3, 4 and 5 on the proposed article.

**Note**: To compile results faster, the following code is only 10-fold validation. Thus, results and more precisly approximations can be different from those you will find in the article. 

### Prerequises to launch file:

In [1]:
#pip install higra

In [2]:
#pip install smote-variants

In [3]:
#pip install -U imbalanced-learn

In [4]:
path = './Parameters.xlsx' #Change it with your own path

### Libraries to import

In [5]:
import numpy as np
import pandas as pd
import higra as hg
import scipy as sp
from sklearn import svm
from sklearn import tree
import sys

import math
import random
import wget
import imblearn

from zipfile import ZipFile

from sklearn.neighbors import NearestNeighbors

from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

from smote_variants import DBSMOTE
from imblearn.over_sampling import SMOTE
from gsmote import GeometricSMOTE

from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import f1_score
from sklearn.metrics import roc_auc_score

In [6]:
from warnings import simplefilter
simplefilter(action='ignore', category=FutureWarning) #Ignore warning comments

### Import imbalanced datasets

#### Import zip file from KEEL datasets

In [7]:
# wget.download('https://sci2s.ugr.es/keel/dataset/data/imbalanced/wisconsin.zip')
# wget.download('https://sci2s.ugr.es/keel/dataset/data/imbalanced/yeast4.zip')
# wget.download('https://sci2s.ugr.es/keel/dataset/data/imbalanced/vehicle1.zip')
# wget.download('https://sci2s.ugr.es/keel/keel-dataset/datasets/imbalanced/imb_noisyBordExamples/03subcl5-600-5-70-BI.zip')
# wget.download('https://sci2s.ugr.es/keel/keel-dataset/datasets/imbalanced/imb_noisyBordExamples/paw02a-800-7-60-BI.zip')
# wget.download('https://sci2s.ugr.es/keel/dataset/data/imbalanced/ecoli1.zip')
# wget.download('https://sci2s.ugr.es/keel/dataset/data/imbalanced/segment0.zip')
# wget.download('https://sci2s.ugr.es/keel/dataset/data/imbalanced/pima.zip')
# wget.download('https://sci2s.ugr.es/keel/dataset/data/imbalanced/cleveland-0_vs_4.zip')
# wget.download('https://sci2s.ugr.es/keel/dataset/data/imbalanced/yeast6.zip')
# wget.download('https://sci2s.ugr.es/keel/dataset/data/imbalanced/page-blocks-1-3_vs_4.zip')
# wget.download('https://sci2s.ugr.es/keel/dataset/data/imbalanced/glass-0-1-5_vs_2.zip')

#### Transform zip file into dat file into arrays contening data and labels

In [8]:
def examples_transform(examples_zip, examples_data):
    examples = [] #containing n dataframe with n the number of example
    labels = [[0] for i in range(len(examples_zip))] # contains n label array with n the number of example

    for ind_ex in range(len(examples_zip)):
        zip = ZipFile(examples_zip[ind_ex])
        zip.extractall() 

        #Open .dat files and compile data and labels information into dataframe
        file = open(examples_data[ind_ex])
        data = file.read()

        data = data.split('@')[-1]
        data = data.split('\n')[1:]
        data = np.array(data)
        data_f = []
        for index in range(len(data)):
            data_f.append(data[index].split(','))
        data_f = data_f[:-1]
        data_f = pd.DataFrame(data_f)
        
        nb = data_f.shape[1] - 1 #last columns = labels
        data_f[nb] = [data_f[nb][i].strip() for i in range(len(data_f[nb]))]
        elment = np.unique(data_f[nb])
        data_f[nb].replace({elment[0]: '1', elment[1]: '2'}, inplace=True)
        
        Y = np.array(data_f[nb], dtype=int)

        X = data_f.drop(columns=nb)
        X = X.astype(float)
        X = np.array(X)
        
        examples.append(X)
        labels[ind_ex] = Y
        file.close()
    return examples, labels

In [9]:
#Execution of examples function with imbalanced datasets studied in the paper
examples_zip = ['wisconsin.zip', 'yeast4.zip',  'vehicle1.zip', 
            '03subcl5-600-5-70-BI.zip', 'paw02a-800-7-60-BI.zip', 'ecoli1.zip', 'pima.zip', 'yeast6.zip', 
               'page-blocks-1-3_vs_4.zip', 'glass-0-1-5_vs_2.zip']
examples_data = ['wisconsin.dat', 'yeast4.dat',  'vehicle1.dat', 
            '03subcl5-600-5-70-BI.dat', 'paw02a-800-7-60-BI.dat', 'ecoli1.dat', 'pima.dat', 'yeast6.dat', 
               'page-blocks-1-3_vs_4.dat', 'glass-0-1-5_vs_2.dat' ]

examples, labels = examples_transform(examples_zip, examples_data)

## WSSMOTE algorithm

In [10]:
def dijkstra(graph, source):
    '''
    Dijkstra algorithm
    Input:
        - graph -- containing all the distance between data points 
        - source -- a specific data points 
    Output:
        - 
    '''
    distances, parents = dict(), dict()

    for node in graph:
        if node not in distances.keys():
            distances[node] = float('Inf')
        if node not in parents.keys():
            parents[node] = None
    distances[source] = 0

    for ind_graph in range(len(graph)-1):
        for pt_i in graph:
            for pt_j in graph[pt_i]:
                assert graph[pt_i][pt_j] >= 0
                if distances[pt_j] > distances[pt_i] + graph[pt_i][pt_j]:
                    distances[pt_j] = distances[pt_i] + graph[pt_i][pt_j]
                    parents[pt_j] = pt_i

    return distances, parents

In [11]:
def higra_clustering(X_min, n_nei):
    '''
    Generate clusters using Higra
    Input:
        - X_min: minority dataset 
        - n_nei: Number of neighbors 
    Output:
        - labels: clustering labelling
        - num_labels: numbers of clusters
    '''
    graph, edge_weights = hg.make_graph_from_points(X_min, graph_type='knn+mst', mode='distance', n_neighbors=n_nei)
    labels = hg.labelisation_watershed(graph, edge_weights)
    num_labels = np.max(labels)
    return labels - 1, num_labels

In [12]:
def WSSMOTE(X, labels, n_nei, coeff_min, n_add):
    '''
    WSSMOTE algorithm
    Input:
        - X: data points
        - labels: array
        - n_nei: number of neighbors
        - coeff_min: label of the minority class
    Output:
        - np.vstack([X, samples]): oversampling data 
        - np.hstack([labels, np.repeat(coeff_min, len(samples))]): oversampling labels
    '''
     #Compute clusters
    X_min = X[labels == coeff_min]
    labels_clust, num_labels = higra_clustering(X_min, n_nei)

    #Construct array of clusters
    clusters = [np.where(labels_clust == ind_label)[0] for ind_label in range(num_labels)]
    cluster_sizes = np.array([np.sum(labels_clust == i) for i in range(num_labels)])
    cluster_dist = cluster_sizes/(np.sum(cluster_sizes))

    #Graphs, shortest path and centroids
    graphs, centroid_indices, sorthest_path = [], [], []
    
    for num_label in range(num_labels):
        cluster = X_min[clusters[num_label]]
        nn = NearestNeighbors(n_neighbors=len(cluster), metric='euclidean', n_jobs=1).fit(cluster)
        centroid_ind = nn.kneighbors([np.mean(cluster, axis=0)])[1][0][0] # closest data of the real centroid
        centroid_indices.append(centroid_ind)
        
        graph = dict() # distance between two points in cluster 
        
        dist, ind = nn.kneighbors(cluster)
#         dist = dist[:, 1:]
#         ind = ind[:, 1:]
        for pt_cluster_i in range(len(cluster)):
            graph[pt_cluster_i] = dict()
            for pt_cluster_j in range(len(cluster)):
                graph[pt_cluster_i][pt_cluster_j] = dist[pt_cluster_i][ind[pt_cluster_i] == pt_cluster_j][0]
                    
        sorthest_path.append(dijkstra(graph, centroid_ind))
        graphs.append(graph)
        
    #New data points creation
    samples = []
    while len(samples) < n_add*abs(len(np.where(labels==coeff_min)[0]) - len(np.where(labels==(1+(coeff_min%2)))[0])):
        cluster_idx = np.random.choice(np.arange(len(clusters)), p=cluster_dist)
        cluster = X_min[clusters[cluster_idx]]
        idx = np.random.choice(range(len(clusters[cluster_idx])))

        distances, parents = sorthest_path[cluster_idx]

        path = [idx]
        while not parents[path[-1]] is None:
            path.append(parents[path[-1]])

        if len(path) == 1:
            X_b = cluster[path[0]]
            samples.append(X_b)
        else:
            X_a = cluster[path[0]]
            X_b = cluster[path[-1]]
            sample = X_a + (X_b-X_a)*np.random.uniform(0,1)
            samples.append(sample)
    return np.vstack([X, samples]), np.hstack([labels, np.repeat(coeff_min, len(samples))])

## Classifers algorithms

In [13]:
def score(res, y_test):
    '''
    Gmean score
    Input: 
        - Res: predicted labels
        - y_test: real labels
    Output:
        -Gmean accuracy
    '''
    array = confusion_matrix(y_test, res)
    if len(array[0]) == 2:
        tn, fp, fn, tp = array.ravel()
        if (tn*tp) == 0:
            acc = 0
        else:
            acc = math.sqrt((tp*tn)/((tp+fn)*(fp+tn)))
    else: 
        acc =0
    return acc

In [14]:
def watershed_cut(X_train, X_test, y_train, y_test):
    '''
    Watershed cut
    Input:
        - X_train
        - X_test
        - y_train
        - y_test
    Output:
        - y_keep = labels predicted
    '''
    X_tot = np.concatenate((X_train, X_test))
    seeds = np.int_(np.concatenate((y_train, np.zeros(len(y_test))))) # concatenate y_train and 0 for the test

    best_acc = 0
    
    for n_nei in np.arange(5,66,10): #to find the best number of neighbors
        graph, edge_weights = hg.make_graph_from_points(X_tot, graph_type='knn', mode='distance', 
                                                        n_neighbors=n_nei, metric='euclidian')
        y_watershed = hg.labelisation_seeded_watershed(graph, edge_weights, seeds)
        y_watershed_test = y_watershed[len(y_train):]

        if len(np.unique(y_watershed_test)) <= 2:
            acc = score(y_watershed_test, y_test)
        else:
            acc = 0
        if acc >= best_acc:
            best_acc = acc 
            y_keep = y_watershed_test
    return y_keep

In [15]:
def SVM_method(X_train, X_test, y_train, y_test):
    '''
    SVM classification algorithm
    Input:
        - X_train
        - X_test
        - y_train
        - y_test
    Output:
        - y_SVM = labels predicted
    '''
    clf = svm.LinearSVC()
    clf = clf.fit(X_train, y_train)
    y_SVM = clf.predict(X_test)
    return y_SVM

In [16]:
def DecisionTree_method(X_train, X_test, y_train, y_test):
    '''
    Decision Tree algorithm
    Input:
        - X_train
        - X_test
        - y_train
        - y_test
    Output:
        - y_tree = labels predicted
    '''
    clf = tree.DecisionTreeClassifier()
    clf =  clf.fit(X_train, y_train)
    y_tree = clf.predict(X_test)
    return y_tree

In [17]:
def KNeiClass_method(X_train, X_test, y_train, y_test):
    '''
    KNN algorithm
    Input:
        - X_train
        - X_test
        - y_train
        - y_test
    Output:
        - y_keep = labels predicted
    '''
    best_acc = 0
    for n_nei in np.arange(2,55,10):  #to find the best number of neighbors
        clf = KNeighborsClassifier(n_neighbors=n_nei)
        clf = clf.fit(X_train, y_train)
        y_nei = clf.predict(X_test)
        array = confusion_matrix(y_test, y_nei)
        acc = score(y_nei, y_test)
        if acc >= best_acc:
            best_acc = acc
            y_keep = y_nei
    return y_keep

## Compile best parameters for each over sampling, classifier algorithms and imbalanced datasets

In [18]:
SMOTE_param = pd.read_excel(path, 'SMOTE') #Compute SMOTE parameter
SMOTE_param.index = SMOTE_param['Unnamed: 0']
SMOTE_param = SMOTE_param.drop(columns='Unnamed: 0')
SMOTE_param = SMOTE_param.to_dict()

DBSMOTE_eps = pd.read_excel(path, 'DBSMOTE-eps') #Compute DBSMOTE parameter
DBSMOTE_eps.index = DBSMOTE_eps['Unnamed: 0']
DBSMOTE_eps = DBSMOTE_eps.drop(columns='Unnamed: 0')
DBSMOTE_eps = DBSMOTE_eps.to_dict()

DBSMOTE_min = pd.read_excel(path, 'DBSMOTE-min')
DBSMOTE_min.index = DBSMOTE_min['Unnamed: 0']
DBSMOTE_min = DBSMOTE_min.drop(columns='Unnamed: 0')
DBSMOTE_min = DBSMOTE_min.to_dict()

DBSMOTE_prop = pd.read_excel(path, 'DBSMOTE-prop')
DBSMOTE_prop.index = DBSMOTE_prop['Unnamed: 0']
DBSMOTE_prop = DBSMOTE_prop.drop(columns='Unnamed: 0')
DBSMOTE_prop = DBSMOTE_prop.to_dict()

DBSMOTE_param_ = [DBSMOTE_eps, DBSMOTE_min, DBSMOTE_prop]
DBSMOTE_param = {}
for key in DBSMOTE_eps.keys():
    for k in DBSMOTE_eps[key].keys():
        if not key in DBSMOTE_param.keys():
            DBSMOTE_param[key] = dict()
        DBSMOTE_param[key][k] = tuple(d[key][k] for d in DBSMOTE_param_)

WSMOTE_nn = pd.read_excel(path, 'WSSMOTE-knn') #Compute WSSMOTE parameter
WSMOTE_nn.index = WSMOTE_nn['Unnamed: 0']
WSMOTE_nn = WSMOTE_nn.drop(columns='Unnamed: 0')
WSMOTE_nn = WSMOTE_nn.to_dict()

WSMOTE_add = pd.read_excel(path, 'WSSMOTE-n_add')
WSMOTE_add.index = WSMOTE_add['Unnamed: 0']
WSMOTE_add = WSMOTE_add.drop(columns='Unnamed: 0')
WSMOTE_add = WSMOTE_add.to_dict()

WSSMOTE_param_ = [WSMOTE_nn, WSMOTE_add]
WSSMOTE_param = {}
for key in WSMOTE_nn.keys():
    for k in WSMOTE_nn[key].keys():
        if not key in WSSMOTE_param.keys():
            WSSMOTE_param[key] = dict()
        WSSMOTE_param[key][k] = tuple(d[key][k] for d in WSSMOTE_param_)
    
    
GSMOTE_nn = pd.read_excel(path, 'GSMOTE-knn') #Compute WSSMOTE parameter
GSMOTE_nn.index = GSMOTE_nn['Unnamed: 0']
GSMOTE_nn = GSMOTE_nn.drop(columns='Unnamed: 0')
GSMOTE_nn = GSMOTE_nn.to_dict()

GSMOTE_trunc = pd.read_excel(path, 'GSMOTE-trunc') #Compute WSSMOTE parameter
GSMOTE_trunc.index = GSMOTE_trunc['Unnamed: 0']
GSMOTE_trunc = GSMOTE_trunc.drop(columns='Unnamed: 0')
GSMOTE_trunc = GSMOTE_trunc.to_dict()

GSMOTE_defor = pd.read_excel(path, 'GSMOTE-defor') #Compute WSSMOTE parameter
GSMOTE_defor.index = GSMOTE_defor['Unnamed: 0']
GSMOTE_defor = GSMOTE_defor.drop(columns='Unnamed: 0')
GSMOTE_defor = GSMOTE_defor.to_dict()

GSMOTE_param_ = [GSMOTE_trunc, GSMOTE_defor, GSMOTE_nn]
GSMOTE_param = {}
for key in GSMOTE_trunc.keys():
    for k in GSMOTE_trunc[key].keys():
        if not key in GSMOTE_param.keys():
            GSMOTE_param[key] = dict()
        GSMOTE_param[key][k] = tuple(d[key][k] for d in GSMOTE_param_)

In [19]:
dicts_paramater = dict() # Dictionnary: contains all parameters 
dicts_paramater['SMOTE'] = SMOTE_param
dicts_paramater['WSSMOTE'] = WSSMOTE_param
dicts_paramater['DBSMOTE'] = DBSMOTE_param
dicts_paramater['GSSMOTE'] = GSMOTE_param

# G-means Score

In [20]:
def generate_dico_results(classifiers, classifiers_name, dicts_paramater, oversampling_methods, examples_name, coeffs_min):
    '''
    Generate three dictionnary containing Gmean, F1 or AUC score for each imbalanced datasets
    Input:
        - classifiers: list of classifier algortihm used
        - classifiers_name: list of classifier algoritm names used
        - dict_paramters: dictionnary containing best parameters for oversampling method
        - oversampling_method: list of oversampling methods used
        - coeffs_min: list of the label of the minority data points for each imbalanced datasets
    Output:
        - dictionnary - keys = oversampling method - values: scores
    '''
    dico_over_med_g_mean, dico_over_std_g_mean = dict(), dict()
    dico_over_med_auc, dico_over_std_auc = dict(), dict()
    dico_over_med_f, dico_over_std_f= dict(), dict()
    
    for over_method in oversampling_methods:
        tab_med_g_mean, tab_std_g_mean = pd.DataFrame(), pd.DataFrame()
        tab_med_auc, tab_std_auc = pd.DataFrame(), pd.DataFrame()
        tab_med_f, tab_std_f = pd.DataFrame(), pd.DataFrame()
        
        for idx, example in enumerate(examples):
            print(examples_name[idx])
            X, y = example, labels[idx]
            data_median_g_mean, data_std_g_mean = [], []
            data_median_auc, data_std_auc = [], []
            data_median_f, data_std_f = [], []
            for ind_class, classifier in enumerate(classifiers):
                acc_g_mean, acc_f, acc_auc = [], [], []
                for num_round in range(10):
                    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)
                    if over_method == 'without':
                        X_train_over, y_train_over = X_train.copy(), y_train.copy()
                    elif over_method == 'SMOTE':
                        mino =len(np.where(y==coeffs_min[idx])[0])
                        majo = len(np.where(y==(1+(coeffs_min[idx])%2))[0])
                        ratio = (1-(mino/majo))*majo + mino 
                        sm = SMOTE(sampling_strategy=ratio/majo,
                                   k_neighbors = 3)
                        X_train_over, y_train_over = sm.fit_resample(X_train, y_train)
                    elif over_method == 'GSSMOTE':
                        sm = GeometricSMOTE(k_neighbors=dicts_paramater[over_method][examples_name[idx]][classifiers_name[ind_class]][2], 
                                            truncation_factor=dicts_paramater[over_method][examples_name[idx]][classifiers_name[ind_class]][0], 
                                            deformation_factor=dicts_paramater[over_method][examples_name[idx]][classifiers_name[ind_class]][1])
                        X_train_over, y_train_over = sm.fit_resample(X_train, y_train)
                    elif over_method == 'DBSMOTE':
                        sm = DBSMOTE(eps = dicts_paramater[over_method][examples_name[idx]][classifiers_name[ind_class]][0], 
                                     min_samples= dicts_paramater[over_method][examples_name[idx]][classifiers_name[ind_class]][1], 
                                     proportion = dicts_paramater[over_method][examples_name[idx]][classifiers_name[ind_class]][2])
                        X_train_over, y_train_over = sm.fit_resample(X_train, y_train)
                    elif over_method == 'WSSMOTE':
                        X_train_over, y_train_over= WSSMOTE(X_train, y_train, n_nei=dicts_paramater[over_method][examples_name[idx]][classifiers_name[ind_class]][0],
                                                        coeff_min=coeffs_min[idx], n_add =dicts_paramater[over_method][examples_name[idx]][classifiers_name[ind_class]][1])
                    res = classifier(X_train_over, X_test, y_train_over, y_test)
                    acc_g_mean.append(score(res, y_test))
                    acc_f.append(f1_score(y_test, res))
                    try:
                        acc_auc.append(roc_auc_score(y_test, res))
                    except ValueError:
                        acc_auc.append(0)
                    
                    
                data_median_g_mean.append(round(np.median(acc_g_mean)*100, 2))
                data_std_g_mean.append(round(sp.stats.iqr(acc_g_mean)*100, 2))
                data_median_f.append(round(np.median(acc_f)*100, 2))
                data_std_f.append(round(sp.stats.iqr(acc_f)*100, 2))
                data_median_auc.append(round(np.median(acc_auc)*100, 2))
                data_std_auc.append(round(sp.stats.iqr(acc_auc)*100, 2))
                
            tab_med_g_mean[examples_name[idx]], tab_std_g_mean[examples_name[idx]]  = data_median_g_mean, data_std_g_mean
            tab_med_f[examples_name[idx]], tab_std_f[examples_name[idx]]  = data_median_f, data_std_f
            tab_med_auc[examples_name[idx]], tab_std_auc[examples_name[idx]]  = data_median_auc, data_std_auc
            
        tab_med_g_mean.index, tab_std_g_mean.index = classifiers_name, classifiers_name
        tab_med_f.index, tab_std_f.index = classifiers_name, classifiers_name
        tab_med_auc.index, tab_std_auc.index = classifiers_name, classifiers_name
        
        dico_over_med_g_mean[over_method], dico_over_std_g_mean[over_method] = tab_med_g_mean, tab_std_g_mean
        dico_over_med_f[over_method], dico_over_std_f[over_method] = tab_med_f, tab_std_f
        dico_over_med_auc[over_method], dico_over_std_auc[over_method] = tab_med_auc, tab_std_auc
        
    return (dico_over_med_g_mean, dico_over_std_g_mean), (dico_over_med_f, dico_over_std_f), (dico_over_med_auc, dico_over_std_auc)

In [21]:
classifiers =  [watershed_cut, KNeiClass_method, DecisionTree_method, SVM_method]
classifiers_name = ['WC','KNN', 'DT', 'SVM']
oversampling_methods = ['without', 'SMOTE', 'DBSMOTE', 'GSSMOTE', 'WSSMOTE']
examples_name = ['wisconsin', 'yeast4', 'vehicle1', 'Subcl35', 'Paw', 'ecoli1', 'pima', 'yeast6', 'page_blocks', 'glass']
coeffs_min = [2, 2, 2, 1, 1, 2, 2, 2, 2, 2]

score_g_mean, score_f, score_auc = generate_dico_results(classifiers, classifiers_name, dicts_paramater, oversampling_methods, examples_name, coeffs_min)

wisconsin




yeast4
vehicle1




Subcl35




Paw




ecoli1
pima




yeast6
page_blocks




glass




wisconsin




yeast4
vehicle1




Subcl35




Paw




ecoli1
pima




yeast6
page_blocks




glass


2022-01-21 12:20:26,508:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 10, 'eps': 1.2, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")


wisconsin


2022-01-21 12:20:27,974:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 10, 'eps': 1.2, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:20:29,437:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 10, 'eps': 1.2, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:20:30,937:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 10, 'eps': 1.2, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:20:32,332:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 10, 'eps': 1.2, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:20:33,777:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 10, 'eps': 1.2, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:20:35,304:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 10, 'eps': 1.2, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:20:36,684:INFO:DBSMOTE: Running sampli

2022-01-21 12:20:44,967:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 10, 'eps': 1.2, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:20:45,177:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 10, 'eps': 1.2, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:20:45,482:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 2, 'eps': 0.5, 'min_samples': 3, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:20:45,495:INFO:DBSMOTE: Number of clusters is 0, trying to increase eps and decrease min_samples
2022-01-21 12:20:45,496:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 2, 'eps': 0.75, 'min_samples': 2, 'n_jobs': 1, 'random_state': <module 'numpy.random' from 'C:\\\\Users\\\\oucht\\\\anaconda3\\\\lib\\\\site-packages\\\\numpy\\\\random\\\\__init__.py'>}")
2022-01-21 12:20:45,586:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 2, 'eps': 0.5, 'min_samples': 3, 'n_jobs': 1, 'random

yeast4


2022-01-21 12:20:49,823:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 5, 'eps': 0.5, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:20:52,615:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 5, 'eps': 0.5, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:20:55,401:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 5, 'eps': 0.5, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:20:58,334:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 5, 'eps': 0.5, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:21:01,458:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 5, 'eps': 0.5, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:21:04,980:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 5, 'eps': 0.5, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:21:08,268:INFO:DBSMOTE: Running sampling via

2022-01-21 12:22:12,553:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 10, 'eps': 1.2, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:22:14,175:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1, 'eps': 1.0, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")


vehicle1


2022-01-21 12:22:14,708:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1, 'eps': 1.0, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:22:15,210:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1, 'eps': 1.0, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:22:15,723:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1, 'eps': 1.0, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:22:16,130:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1, 'eps': 1.0, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:22:16,649:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1, 'eps': 1.0, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:22:17,106:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1, 'eps': 1.0, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:22:17,617:INFO:DBSMOTE: Running sampling via

2022-01-21 12:22:25,569:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 5, 'eps': 0.5, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:22:26,105:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 5, 'eps': 0.5, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:22:26,584:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 5, 'eps': 1.0, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:22:27,250:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 5, 'eps': 1.0, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:22:27,950:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 5, 'eps': 1.0, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:22:28,663:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 5, 'eps': 1.0, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:22:29,433:INFO:DBSMOTE: Running sampling via

Subcl35


2022-01-21 12:22:34,152:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 10, 'eps': 0.8, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:22:35,156:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 10, 'eps': 0.8, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:22:36,093:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 10, 'eps': 0.8, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:22:37,148:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 10, 'eps': 0.8, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:22:38,143:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 10, 'eps': 0.8, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:22:39,079:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 10, 'eps': 0.8, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:22:40,019:INFO:DBSMOTE: Running sampli

2022-01-21 12:22:51,494:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 2, 'eps': 1.2, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")


Paw


2022-01-21 12:22:52,057:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 2, 'eps': 1.2, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:22:52,766:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 2, 'eps': 1.2, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:22:53,278:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 2, 'eps': 1.2, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:22:53,884:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 2, 'eps': 1.2, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:22:54,397:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 2, 'eps': 1.2, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:22:54,976:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 2, 'eps': 1.2, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:22:55,623:INFO:DBSMOTE: Running sampling via

2022-01-21 12:23:11,101:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1, 'eps': 1.0, 'min_samples': 3, 'n_jobs': 1, 'random_state': None}")


ecoli1


2022-01-21 12:23:11,321:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1, 'eps': 1.0, 'min_samples': 3, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:23:11,553:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1, 'eps': 1.0, 'min_samples': 3, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:23:11,766:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1, 'eps': 1.0, 'min_samples': 3, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:23:11,977:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1, 'eps': 1.0, 'min_samples': 3, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:23:12,189:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1, 'eps': 1.0, 'min_samples': 3, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:23:12,338:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1, 'eps': 1.0, 'min_samples': 3, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:23:12,541:INFO:DBSMOTE: Running sampling via

pima


2022-01-21 12:23:17,842:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 5, 'eps': 0.5, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:23:18,429:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 5, 'eps': 0.5, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:23:18,945:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 5, 'eps': 0.5, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:23:19,410:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 5, 'eps': 0.5, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:23:19,900:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 5, 'eps': 0.5, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:23:20,379:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 5, 'eps': 0.5, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:23:20,870:INFO:DBSMOTE: Running sampling via

2022-01-21 12:23:23,466:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1, 'eps': 0.75, 'min_samples': 4, 'n_jobs': 1, 'random_state': <module 'numpy.random' from 'C:\\\\Users\\\\oucht\\\\anaconda3\\\\lib\\\\site-packages\\\\numpy\\\\random\\\\__init__.py'>}")
2022-01-21 12:23:23,474:INFO:DBSMOTE: Number of clusters is 0, trying to increase eps and decrease min_samples
2022-01-21 12:23:23,474:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1, 'eps': 1.125, 'min_samples': 3, 'n_jobs': 1, 'random_state': <module 'numpy.random' from 'C:\\\\Users\\\\oucht\\\\anaconda3\\\\lib\\\\site-packages\\\\numpy\\\\random\\\\__init__.py'>}")
2022-01-21 12:23:23,614:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1, 'eps': 0.5, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:23:23,630:INFO:DBSMOTE: Number of clusters is 0, trying to increase eps and decrease min_samples
2022-01-21 12:23:23,630:INFO:DBSMOTE: Running sampling via ('DBSMOTE

yeast6


2022-01-21 12:23:34,625:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 10, 'eps': 1.0, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:23:42,078:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 10, 'eps': 1.0, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:23:50,686:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 10, 'eps': 1.0, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:23:58,918:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 10, 'eps': 1.0, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:24:06,045:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 10, 'eps': 1.0, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:24:13,405:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 10, 'eps': 1.0, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:24:21,319:INFO:DBSMOTE: Running sampli

2022-01-21 12:24:52,882:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1, 'eps': 0.8, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:24:53,036:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1, 'eps': 0.8, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:24:53,044:INFO:DBSMOTE: Number of clusters is 0, trying to increase eps and decrease min_samples
2022-01-21 12:24:53,044:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1, 'eps': 1.2000000000000002, 'min_samples': 4, 'n_jobs': 1, 'random_state': <module 'numpy.random' from 'C:\\\\Users\\\\oucht\\\\anaconda3\\\\lib\\\\site-packages\\\\numpy\\\\random\\\\__init__.py'>}")
2022-01-21 12:24:53,217:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1, 'eps': 0.8, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:24:53,225:INFO:DBSMOTE: Number of clusters is 0, trying to increase eps and decrease min_samples
2022-01-21 12:24:

page_blocks


2022-01-21 12:24:54,907:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 2, 'eps': 0.5, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:24:55,270:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 2, 'eps': 0.5, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:24:55,701:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 2, 'eps': 0.5, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:24:56,114:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 2, 'eps': 0.5, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:24:56,115:INFO:DBSMOTE: Number of clusters is 0, trying to increase eps and decrease min_samples
2022-01-21 12:24:56,115:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 2, 'eps': 0.75, 'min_samples': 4, 'n_jobs': 1, 'random_state': <module 'numpy.random' from 'C:\\\\Users\\\\oucht\\\\anaconda3\\\\lib\\\\site-packages\\\\numpy\\\\random\\\\__

2022-01-21 12:25:02,247:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 2, 'eps': 1.2, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:25:02,375:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 2, 'eps': 0.5, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")


glass


2022-01-21 12:25:02,589:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 2, 'eps': 0.5, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:25:02,779:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 2, 'eps': 0.5, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:25:02,962:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 2, 'eps': 0.5, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:25:03,118:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 2, 'eps': 0.5, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:25:03,270:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 2, 'eps': 0.5, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:25:03,403:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 2, 'eps': 0.5, 'min_samples': 1, 'n_jobs': 1, 'random_state': None}")
2022-01-21 12:25:03,577:INFO:DBSMOTE: Running sampling via

wisconsin




yeast4
vehicle1




Subcl35




Paw




ecoli1
pima




yeast6
page_blocks




glass




wisconsin




yeast4
vehicle1




Subcl35




Paw




ecoli1
pima




yeast6
page_blocks




glass




### Results

##### G-Mean

In [22]:
dico_over_med, dico_over_std = score_g_mean[0], score_g_mean[1]

In [23]:
pd.DataFrame.from_dict(dico_over_med['GSSMOTE'].astype(str) + '+-'+ dico_over_std['GSSMOTE'].astype(str)).to_excel('./resultats_G_G_1.xlsx')

In [24]:
print('Tab oversampling method: without' )
print(pd.DataFrame.from_dict(dico_over_med['without']))
print(pd.DataFrame.from_dict(dico_over_std['without']))

Tab oversampling method: without
     wisconsin  yeast4  vehicle1  Subcl35    Paw  ecoli1   pima  yeast6  \
WC       93.90   49.93     61.61    58.28  68.18   83.92  62.20   78.48   
KNN      94.86   30.13     52.54    73.85  76.58   84.28  67.87   62.72   
DT       93.31   58.44     66.22    62.99  62.93   84.73  64.90   71.65   
SVM      95.81    0.00     54.80     0.00   0.00   84.13  31.46    0.00   

     page_blocks  glass  
WC         92.00   52.4  
KNN        78.63    0.0  
DT         99.10   47.7  
SVM        78.15    0.0  
     wisconsin  yeast4  vehicle1  Subcl35    Paw  ecoli1   pima  yeast6  \
WC        3.45   14.97      5.39    13.84   9.70    8.70   3.56   11.63   
KNN       1.86   10.31      1.44     7.82   5.64    6.48   5.11   11.27   
DT        2.01   20.95      5.76     9.48  12.22    3.55   3.93   13.49   
SVM       2.68    0.00     52.54    30.98   0.00    8.22  21.21    0.00   

     page_blocks  glass  
WC          8.28  50.65  
KNN         9.33  37.66  
DT     

In [25]:
print('Tab oversampling method: SMOTE' )
print(pd.DataFrame.from_dict(dico_over_med['SMOTE']))
print(pd.DataFrame.from_dict(dico_over_std['SMOTE']))

Tab oversampling method: SMOTE
     wisconsin  yeast4  vehicle1  Subcl35    Paw  ecoli1   pima  yeast6  \
WC       96.25   66.98     62.84    63.40  72.43   82.75  64.33   67.96   
KNN      96.70   83.92     72.37    76.61  83.45   91.15  72.29   89.63   
DT       93.58   63.73     63.11    71.39  77.12   85.93  66.47   65.91   
SVM      96.96   78.02     53.10    43.43  30.88   88.56  28.54   86.14   

     page_blocks  glass  
WC         99.11  52.24  
KNN        99.32  72.61  
DT        100.00  61.56  
SVM        95.14   8.01  
     wisconsin  yeast4  vehicle1  Subcl35    Paw  ecoli1   pima  yeast6  \
WC        3.12   17.96      7.62    10.48   2.15    9.68   4.17   17.93   
KNN       0.91    2.49      3.55     2.57   3.42    4.64   3.56    5.43   
DT        2.37   15.33      6.49     3.43   8.79    4.48   4.79   14.11   
SVM       1.11    3.59     27.42    13.08  24.19    5.29  21.18    7.26   

     page_blocks  glass  
WC          7.15  16.17  
KNN         3.09   9.63  
DT       

In [26]:
print('Tab oversampling method: DBSMOTE' )
print(pd.DataFrame.from_dict(dico_over_med['DBSMOTE']))
print(pd.DataFrame.from_dict(dico_over_std['DBSMOTE']))

Tab oversampling method: DBSMOTE
     wisconsin  yeast4  vehicle1  Subcl35    Paw  ecoli1   pima  yeast6  \
WC       95.52   58.57     61.92    58.91  73.27   80.60  62.18   75.05   
KNN      97.00   85.71     70.97    79.46  84.34   87.69  68.53   90.00   
DT       93.25   59.47     63.19    71.04  74.24   87.72  63.98   67.62   
SVM      96.85   80.40     73.94    38.84  35.60   87.67  40.62   87.35   

     page_blocks  glass  
WC         91.73  59.28  
KNN        68.09  68.60  
DT         99.77  49.90  
SVM        95.19  23.26  
     wisconsin  yeast4  vehicle1  Subcl35    Paw  ecoli1   pima  yeast6  \
WC        0.70   22.02      4.16     6.75  12.94    9.75   3.71   10.83   
KNN       1.81    3.78      2.52     6.70   2.27    3.01   4.35    3.42   
DT        2.63    9.65      5.46     7.20  12.23   12.24   1.80   13.07   
SVM       1.31    7.82      6.65    22.01  39.53    3.96  42.64    8.24   

     page_blocks  glass  
WC          8.97  64.80  
KNN        25.91   9.30  
DT     

In [27]:
print('Tab oversampling method: GSSMOTE' )
print(pd.DataFrame.from_dict(dico_over_med['GSSMOTE']))
print(pd.DataFrame.from_dict(dico_over_std['GSSMOTE']))

Tab oversampling method: GSSMOTE
     wisconsin  yeast4  vehicle1  Subcl35    Paw  ecoli1   pima  yeast6  \
WC       94.49   71.07     62.01    69.14  70.70   85.43  64.37   81.91   
KNN      97.27   88.37     71.66    75.59  84.23   87.07  71.72   86.05   
DT       93.97   56.79     67.26    69.69  75.82   85.06  66.26   73.11   
SVM      97.88   83.43     52.18    41.82  32.49   88.53  48.59   87.95   

     page_blocks  glass  
WC         93.00  50.69  
KNN        96.26  80.34  
DT         91.95  54.68  
SVM        94.82  22.65  
     wisconsin  yeast4  vehicle1  Subcl35    Paw  ecoli1   pima  yeast6  \
WC        1.73   16.59      5.05     6.98   8.76    5.35   5.54    8.43   
KNN       2.03    5.14      3.96     5.62   3.53    4.67   3.68    1.48   
DT        3.31    5.32      4.32     8.03   5.64    5.01   3.18   11.05   
SVM       1.08    6.49     38.55    25.95  43.87    4.57  41.61    8.63   

     page_blocks  glass  
WC          8.12  28.75  
KNN         2.07  12.69  
DT     

In [28]:
print('Tab oversampling method: WSSMOTE' )
print(pd.DataFrame.from_dict(dico_over_med['WSSMOTE']))
print(pd.DataFrame.from_dict(dico_over_std['WSSMOTE']))

Tab oversampling method: WSSMOTE
     wisconsin  yeast4  vehicle1  Subcl35    Paw  ecoli1   pima  yeast6  \
WC       94.81   62.36     58.84    65.61  75.47   82.45  64.81   71.16   
KNN      98.03   83.13     69.66    76.12  84.13   91.98  72.41   87.49   
DT       93.99   61.35     68.75    69.72  74.37   84.89  65.52   67.82   
SVM      97.39   76.68     71.54    45.56  14.99   89.58  53.93   90.07   

     page_blocks  glass  
WC         92.85  55.44  
KNN        95.13  76.82  
DT        100.00  40.13  
SVM        95.92  25.71  
     wisconsin  yeast4  vehicle1  Subcl35    Paw  ecoli1   pima  yeast6  \
WC        2.38   16.48      3.85     3.72  10.64    6.68   2.81   17.70   
KNN       1.63    3.20      3.82     2.11   2.17    4.61   2.64   10.53   
DT        1.75   16.35      5.31     4.45   8.46    3.82   5.21   15.05   
SVM       1.74   11.89      6.84    19.74  34.66    3.33  31.21    5.94   

     page_blocks  glass  
WC          9.71  22.87  
KNN         3.42  12.31  
DT     

##### F1-Score 

In [29]:
dico_over_med, dico_over_std = score_f[0], score_f[1]

In [30]:
pd.DataFrame.from_dict(dico_over_med['WSSMOTE'].astype(str) + '+-'+ dico_over_std['WSSMOTE'].astype(str)).to_excel('./resultats_W_F_1.xlsx')

In [31]:
print('Tab oversampling method: without' )
print(pd.DataFrame.from_dict(dico_over_med['without']))
print(pd.DataFrame.from_dict(dico_over_std['without']))

Tab oversampling method: without
     wisconsin  yeast4  vehicle1  Subcl35    Paw  ecoli1   pima  yeast6  \
WC       95.93   97.64     80.07    41.43  52.16   92.94  75.34   98.75   
KNN      96.64   98.35     84.82    50.19  58.20   94.89  82.59   98.97   
DT       95.43   97.23     82.22    45.72  45.45   92.14  78.11   98.69   
SVM      97.66   98.22     85.93     0.00   0.00   93.77  74.79   98.63   

     page_blocks  glass  
WC         99.09  93.75  
KNN        98.21  94.41  
DT         99.10  92.50  
SVM        96.60  93.81  
     wisconsin  yeast4  vehicle1  Subcl35    Paw  ecoli1   pima  yeast6  \
WC        2.61    0.71      2.04    14.71  12.23    4.08   3.47    0.98   
KNN       1.65    0.24      1.94     6.89   9.38    1.75   3.19    0.38   
DT        1.11    0.97      2.29    11.60  17.10    3.00   3.39    0.58   
SVM       1.51    0.45      5.83    19.74   0.00    2.79  45.30    0.62   

     page_blocks  glass  
WC          0.35   3.11  
KNN         1.33   4.20  
DT     

In [32]:
print('Tab oversampling method: SMOTE' )
print(pd.DataFrame.from_dict(dico_over_med['SMOTE']))
print(pd.DataFrame.from_dict(dico_over_std['SMOTE']))

Tab oversampling method: SMOTE
     wisconsin  yeast4  vehicle1  Subcl35    Paw  ecoli1   pima  yeast6  \
WC       97.16   96.75     80.82    43.43  56.00   91.60  72.03   97.91   
KNN      97.52   92.82     73.50    48.24  48.00   92.85  76.53   94.93   
DT       95.64   96.82     80.94    50.77  55.91   93.38  74.80   97.77   
SVM      97.76   91.87     84.79    26.24  25.27   92.67  59.72   94.25   

     page_blocks  glass  
WC         99.33  86.49  
KNN        99.32  81.47  
DT        100.00  92.11  
SVM        97.26  51.65  
     wisconsin  yeast4  vehicle1  Subcl35   Paw  ecoli1   pima  yeast6  \
WC        1.77    0.91      2.95     9.64  3.05    2.44   4.44    0.73   
KNN       0.84    1.42      4.84     5.31  8.03    1.98   2.68    1.17   
DT        0.82    1.58      3.38     3.13  5.40    1.74   3.05    0.39   
SVM       1.10    1.82      9.68     7.38  4.10    4.01  58.25    1.07   

     page_blocks  glass  
WC          0.79   4.20  
KNN         3.16   4.84  
DT          0.

In [33]:
print('Tab oversampling method: DBSMOTE' )
print(pd.DataFrame.from_dict(dico_over_med['DBSMOTE']))
print(pd.DataFrame.from_dict(dico_over_std['DBSMOTE']))

Tab oversampling method: DBSMOTE
     wisconsin  yeast4  vehicle1  Subcl35    Paw  ecoli1   pima  yeast6  \
WC       97.13   97.70     80.32    42.57  54.00   91.64  74.66   98.83   
KNN      98.28   90.62     78.68    54.03  50.77   90.91  74.58   97.04   
DT       95.83   96.84     82.09    48.21  48.37   92.86  75.35   98.76   
SVM      97.71   95.50     73.89    29.35  22.59   89.96  75.96   96.60   

     page_blocks  glass  
WC         99.10  94.74  
KNN        98.43  82.86  
DT         99.77  93.51  
SVM        97.66  12.76  
     wisconsin  yeast4  vehicle1  Subcl35    Paw  ecoli1   pima  yeast6  \
WC        1.00    0.62      1.97    11.04  17.42    2.48   2.63    0.35   
KNN       0.94    4.83      1.12     9.67   4.19    1.30   6.00    1.52   
DT        1.23    0.50      3.24     6.27  13.02    2.12   2.54    0.48   
SVM       1.00    0.87     15.92     7.90  25.05    4.28  13.58    0.56   

     page_blocks  glass  
WC          0.36   4.22  
KNN         0.77   3.00  
DT     

In [34]:
print('Tab oversampling method: GSSMOTE' )
print(pd.DataFrame.from_dict(dico_over_med['GSSMOTE']))
print(pd.DataFrame.from_dict(dico_over_std['GSSMOTE']))

Tab oversampling method: GSSMOTE
     wisconsin  yeast4  vehicle1  Subcl35    Paw  ecoli1   pima  yeast6  \
WC       96.51   96.39     79.87    41.90  49.96   93.13  74.39   98.40   
KNN      97.99   92.23     78.75    50.06  50.00   91.52  77.91   95.70   
DT       95.75   97.20     82.34    45.72  53.18   92.08  75.72   98.48   
SVM      98.13   92.72     83.90    25.71  22.47   91.36  71.09   93.57   

     page_blocks  glass  
WC         99.54  90.91  
KNN        96.19  84.06  
DT         98.60  91.70  
SVM        94.90  16.62  
     wisconsin  yeast4  vehicle1  Subcl35    Paw  ecoli1   pima  yeast6  \
WC        0.82    0.83      2.12     9.50   7.83    3.04   4.65    0.86   
KNN       0.85    2.03      2.96     4.36   3.66    2.58   2.01    1.18   
DT        1.45    0.82      1.81     6.28   9.59    2.28   4.39    0.60   
SVM       1.20    1.32      1.86    27.15  25.30    2.00  57.53    1.07   

     page_blocks  glass  
WC          1.24   3.14  
KNN         2.15   4.92  
DT     

In [35]:
print('Tab oversampling method: WSSMOTE' )
print(pd.DataFrame.from_dict(dico_over_med['WSSMOTE']))
print(pd.DataFrame.from_dict(dico_over_std['WSSMOTE']))

Tab oversampling method: WSSMOTE
     wisconsin  yeast4  vehicle1  Subcl35    Paw  ecoli1   pima  yeast6  \
WC       96.35   97.61     76.61    46.02  58.00   91.20  72.12   98.12   
KNN      98.18   88.27     74.01    49.73  49.44   93.33  76.27   96.43   
DT       96.12   97.02     84.58    49.25  56.21   92.11  73.85   98.41   
SVM      97.93   91.69     69.31    26.11   9.69   90.08  74.12   94.08   

     page_blocks  glass  
WC         99.33  90.67  
KNN        95.01  80.60  
DT        100.00  90.68  
SVM        96.40  20.20  
     wisconsin  yeast4  vehicle1  Subcl35    Paw  ecoli1   pima  yeast6  \
WC        1.68    0.67      2.27     6.07  15.03    5.27   3.01    0.50   
KNN       1.10    3.04      3.09     2.16   8.69    2.41   2.65    1.01   
DT        1.76    0.94      2.90     6.85  10.55    2.51   3.41    0.39   
SVM       1.57    1.62      9.85    25.56  20.75    2.70  10.05    1.18   

     page_blocks  glass  
WC          0.76   4.31  
KNN         3.56   3.13  
DT     

#### AUC-score

In [36]:
dico_over_med, dico_over_std = score_auc[0], score_auc[1]

In [37]:
pd.DataFrame.from_dict(dico_over_med['GSSMOTE'].astype(str) + '+-'+ dico_over_std['GSSMOTE'].astype(str)).to_excel('./resultats_G_A_1.xlsx')

In [38]:
print('Tab oversampling method: without' )
print(pd.DataFrame.from_dict(dico_over_med['without']))
print(pd.DataFrame.from_dict(dico_over_std['without']))

Tab oversampling method: without
     wisconsin  yeast4  vehicle1  Subcl35    Paw  ecoli1   pima  yeast6  \
WC       93.94   62.00     63.81    64.58  71.49   84.38  64.47   80.36   
KNN      94.90   54.48     61.16    74.28  77.64   85.31  70.39   69.52   
DT       93.35   66.25     67.68    67.56  68.04   84.88  66.39   75.56   
SVM      95.84   50.00     63.81    50.00  50.00   84.90  53.91   50.00   

     page_blocks  glass  
WC         92.26  61.99  
KNN        80.55  50.00  
DT         99.10  58.01  
SVM        79.66  50.00  
     wisconsin  yeast4  vehicle1  Subcl35   Paw  ecoli1  pima  yeast6  \
WC        3.35    7.41      4.22     9.41  6.96    8.58  3.33    9.54   
KNN       1.82    3.28      1.61     7.09  4.64    5.74  3.56    6.74   
DT        1.84   11.52      5.09     6.73  8.90    3.41  3.51    9.46   
SVM       2.63    0.00     21.35     1.79  0.00    6.57  3.64    0.00   

     page_blocks  glass  
WC          7.42  16.68  
KNN         7.71   6.80  
DT          0.04 

In [39]:
print('Tab oversampling method: SMOTE' )
print(pd.DataFrame.from_dict(dico_over_med['SMOTE']))
print(pd.DataFrame.from_dict(dico_over_std['SMOTE']))

Tab oversampling method: SMOTE
     wisconsin  yeast4  vehicle1  Subcl35    Paw  ecoli1   pima  yeast6  \
WC       96.25   71.32     64.18    66.95  74.62   83.01  64.39   72.57   
KNN      96.72   83.95     72.87    76.66  84.12   91.25  72.30   89.75   
DT       93.61   69.04     65.71    72.23  77.61   86.17  66.76   70.92   
SVM      96.97   78.39     62.66    51.15  54.77   88.62  53.87   86.16   

     page_blocks  glass  
WC         99.11  57.61  
KNN        99.32  73.48  
DT        100.00  67.37  
SVM        95.17  50.64  
     wisconsin  yeast4  vehicle1  Subcl35   Paw  ecoli1  pima  yeast6  \
WC        3.09   12.57      6.95     7.93  2.24    8.51  4.54   13.07   
KNN       0.91    2.34      3.29     2.56  3.25    4.62  3.42    5.47   
DT        2.35   10.66      5.58     2.16  7.26    4.38  4.75    8.89   
SVM       1.12    3.12     14.01     2.74  7.69    5.24  5.70    6.94   

     page_blocks  glass  
WC          6.92  12.35  
KNN         3.02  10.01  
DT          0.33  1

In [40]:
print('Tab oversampling method: DBSMOTE' )
print(pd.DataFrame.from_dict(dico_over_med['DBSMOTE']))
print(pd.DataFrame.from_dict(dico_over_std['DBSMOTE']))

Tab oversampling method: DBSMOTE
     wisconsin  yeast4  vehicle1  Subcl35    Paw  ecoli1   pima  yeast6  \
WC       95.58   66.47     63.75    65.06  75.09   81.25  63.72   77.65   
KNN      97.02   85.83     70.97    79.61  84.75   87.74  68.72   90.07   
DT       93.31   66.66     65.71    72.24  75.84   87.82  65.11   72.52   
SVM      96.85   81.26     75.85    54.17  55.13   87.75  55.05   87.57   

     page_blocks  glass  
WC         92.04  64.95  
KNN        73.21  69.58  
DT         99.77  60.77  
SVM        95.23  52.70  
     wisconsin  yeast4  vehicle1  Subcl35    Paw  ecoli1   pima  yeast6  \
WC        0.65   13.61      3.01     5.21  11.00    7.57   3.01    8.27   
KNN       1.79    4.13      2.44     6.94   2.75    3.10   4.27    3.30   
DT        2.44    6.01      4.58     5.18  10.04   11.19   1.65    8.58   
SVM       1.31    6.31      4.16     6.06   7.08    3.95  13.02    7.76   

     page_blocks  glass  
WC          8.38  27.84  
KNN        16.98   8.88  
DT     

In [41]:
print('Tab oversampling method: GSSMOTE' )
print(pd.DataFrame.from_dict(dico_over_med['GSSMOTE']))
print(pd.DataFrame.from_dict(dico_over_std['GSSMOTE']))

Tab oversampling method: GSSMOTE
     wisconsin  yeast4  vehicle1  Subcl35    Paw  ecoli1   pima  yeast6  \
WC       94.53   74.21     63.76    70.07  72.73   85.66  64.97   83.27   
KNN      97.27   88.45     71.88    76.53  85.08   87.21  71.80   86.16   
DT       94.02   65.05     68.57    70.25  76.86   85.43  66.88   76.32   
SVM      97.88   83.43     63.17    50.00  55.16   88.53  59.83   87.97   

     page_blocks  glass  
WC         93.19  58.89  
KNN        96.33  80.93  
DT         92.17  61.51  
SVM        94.95  52.56  
     wisconsin  yeast4  vehicle1  Subcl35   Paw  ecoli1   pima  yeast6  \
WC        1.69   11.94      3.90     6.25  7.03    5.16   4.90    7.35   
KNN       2.02    5.53      3.97     6.12  4.08    4.62   3.68    1.43   
DT        3.16    3.35      3.38     6.83  3.82    4.90   2.45    8.70   
SVM       1.07    6.04     14.25    10.85  7.08    4.58  14.06    8.72   

     page_blocks  glass  
WC          7.81  17.41  
KNN         1.99  13.44  
DT          

In [42]:
print('Tab oversampling method: WSSMOTE' )
print(pd.DataFrame.from_dict(dico_over_med['WSSMOTE']))
print(pd.DataFrame.from_dict(dico_over_std['WSSMOTE']))

Tab oversampling method: WSSMOTE
     wisconsin  yeast4  vehicle1  Subcl35    Paw  ecoli1   pima  yeast6  \
WC       94.83   68.61     60.33    68.75  76.91   82.66  65.18   74.86   
KNN      98.04   83.19     70.65    77.17  85.02   92.02  72.45   87.75   
DT       94.08   68.22     70.33    71.19  76.01   85.08  66.02   72.36   
SVM      97.41   77.30     75.20    50.48  52.25   89.89  59.06   90.08   

     page_blocks  glass  
WC         93.08  61.49  
KNN        95.25  78.27  
DT        100.00  53.16  
SVM        96.01  53.34  
     wisconsin  yeast4  vehicle1  Subcl35   Paw  ecoli1   pima  yeast6  \
WC        2.37    9.82      3.84     2.01  8.49    6.22   2.78   13.52   
KNN       1.63    3.45      4.18     2.52  2.56    4.61   2.81    9.89   
DT        1.76   10.05      3.94     4.38  6.66    3.85   5.27   11.31   
SVM       1.75   11.57      5.97     7.38  6.02    3.69  12.07    6.11   

     page_blocks  glass  
WC          9.24   9.15  
KNN         3.28  13.94  
DT          