# Watershed-based oversampling for imbalanced dataset classification: WSSMOTE

For several real-world problems, the dataset is composed of two or more imbalanced classes: a minority and majority ones. With usual machine learning methods, this imbalance often leads to poor results where the majority class is over-fitted while the minority class is misclassified. To alleviate these issues, several pre-processing methods, such as SMOTE andDBSMOTE, create new artificial points for the minority class. Neverthe-ess, these oversampling methods explicitly or implicitly make hypothesesabout the clusters size, shape, or density that may not fit the datasetin  practice.  We  propose  to  improve  these  oversampling methods and reduce cluster assumptions, by relying on a classifier:  the  watershed-cut.  We  called  this  method  WSSMOTE.

The following code generates four tabs containing G-mean score for several imabalanced datasets (https://sci2s.ugr.es/keel/datasets.php). These 12 tables corespond to Tab 3, 4 and 5 on the proposed article.

Careful: This algorithm is a 10-fold validation, that allows users to generate values easier. So results could be a little different from those you find in the article. 

### Prerequises to launch file:

In [1]:
#pip install higra

In [2]:
#pip install smote-variants

In [3]:
#pip install -U imbalanced-learn

In [4]:
path = 'C:/Users/oucht/Documents/These/DGMM paper/Code/Parameters.xlsx' #Change it with your own path

### Libraries to import

In [5]:
import numpy as np
import pandas as pd
import higra as hg
import scipy as sp
from sklearn import svm
from sklearn import tree
import sys

import math
import random
import wget
import imblearn

from zipfile import ZipFile

from sklearn.neighbors import NearestNeighbors



from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

from smote_variants import DBSMOTE
from imblearn.over_sampling import SMOTE


from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import f1_score
from sklearn.metrics import roc_auc_score

from gsmote import GeometricSMOTE

In [6]:
from warnings import simplefilter
simplefilter(action='ignore', category=FutureWarning) #Ignore warning comments

### Import imbalanced datasets

#### Import zip file from KEEL datasets

In [7]:
# wget.download('https://sci2s.ugr.es/keel/dataset/data/imbalanced/wisconsin.zip')
# wget.download('https://sci2s.ugr.es/keel/dataset/data/imbalanced/yeast4.zip')
# wget.download('https://sci2s.ugr.es/keel/dataset/data/imbalanced/vehicle1.zip')
# wget.download('https://sci2s.ugr.es/keel/keel-dataset/datasets/imbalanced/imb_noisyBordExamples/03subcl5-600-5-70-BI.zip')
# wget.download('https://sci2s.ugr.es/keel/keel-dataset/datasets/imbalanced/imb_noisyBordExamples/paw02a-800-7-60-BI.zip')
# wget.download('https://sci2s.ugr.es/keel/dataset/data/imbalanced/new-thyroid.zip')
# wget.download('https://sci2s.ugr.es/keel/dataset/data/imbalanced/pima.zip')
# wget.download('https://sci2s.ugr.es/keel/dataset/data/imbalanced/ecoli1.zip')

#### Transform zip file into dat file into arrays contening data and labels

In [8]:
def clean_data(data):
    '''
    Function: Clean data: Suppress '@' and '\n' and split it.
    Input: 
        -data 
    Output:
        -data
    '''
    data_split = data.split('@')[-1]
    data_split = data_split.split('\n')[1:]
    data_split = np.array(data_split)
    
    data_end = []
    for index in range(len(data_split)):
      data_end.append(data_split[index].split(','))
    data_end = data_end[:-1]
    data_end = pd.DataFrame(data_end)
    return data_end 

def examples(examples_zip, examples_data):
  """
  Function: Generate data (X) and labels (Y) for each imbalanced datasets from zip files.
  Input:
    - data .zip file name
    - data .dat file name included in the zip file
  Output: Arrays contening data and labels
  """

  examples = []
  labels = [[0] for i in range(len(examples_zip))]

  for ind_ex in range(len(examples_zip)):
    #Extract zip files
    zip = ZipFile(examples_zip[ind_ex])
    zip.extractall() 
    
    #Open .dat files and find data and labels information
    file = open(examples_data[ind_ex])
    data = file.read()
    
    data_end = clean_data(data)
    
    nb = data_end.shape[1] - 1

    data_end[nb] = [data_end[nb][i].strip() for i in range(len(data_end[nb]))]
    elment = np.unique(data_end[nb])
    
    if len(elment) == 2:
        data_end[nb].replace({elment[0]: '0', elment[1]: '1'}, inplace=True)
    else: #cas thyroid
        data_end[nb].replace({elment[0]: '0', elment[1]: '0', elment[2]: '1'}, inplace=True)
    
    Y = data_end[nb]
    Y = Y.astype(int)
    Y += 1
    
    X = data_end.drop(columns=nb)
    X = X.astype(float)
    X = np.array(X)
    
    examples.append(X)
    labels[ind_ex] = Y
    file.close()
  return examples, labels

In [9]:
#Execution of examples function with imbalanced datasets studied in the paper
examples_zip = ['wisconsin.zip', 'yeast4.zip',  'vehicle1.zip', 
            '03subcl5-600-5-70-BI.zip', 'paw02a-800-7-60-BI.zip', 
            'new-thyroid.zip', 'ecoli1.zip']
examples_data = ['wisconsin.dat', 'yeast4.dat',  'vehicle1.dat', 
            '03subcl5-600-5-70-BI.dat', 'paw02a-800-7-60-BI.dat', 
            'new-thyroid.dat', 'ecoli1.dat']

examples, labels = examples(examples_zip, examples_data)

## WSSMOTE algorithm

In [10]:
def dijkstra(graph, source):
  """ 
  Compute Dijkstra algorithm:
  Input: 
    - Graph: array conteninig dictionnary
    - Source: vertex
  Output:
    - Distances: dictionnary 
    - Parents: dictionnary
  """

  distances, parents = dict(), dict()
    
  for node in graph:
      distances[node] = float('Inf')
      parents[node] = None
  distances[source] = 0

  for ind_graph in range(len(graph)-1):
      for pt_i in graph:
          for pt_j in graph[pt_i]:
            assert graph[pt_i][pt_j] >= 0
            if distances[pt_j] > distances[pt_i] + graph[pt_i][pt_j]:
              distances[pt_j] = distances[pt_i] + graph[pt_i][pt_j]
              parents[pt_j] = pt_i
                
  return distances, parents

In [11]:
def higra_clustering(X_min, n_nei):
  """
  Function: Using Higra to generate clusters
  Input:
      - X_min: Minority datasets points
      - n_nei: Number of minority data points
  Output: 
      - labels: Clustering labelling
      - num_labels: Number of labels
  """
  graph, edge_weights = hg.make_graph_from_points(X_min, graph_type='knn', mode='distance', n_neighbors=n_nei)
  labels = hg.labelisation_watershed(graph, edge_weights)
  num_labels = np.max(labels)
  return labels - 1, num_labels

def WSSMOTE(X, labels, n_nei, coeff_min, n_add):
  """
  Function: WSSMOTE algorithm
  Input: 
    - X: data points 
    - labels: data points labelling
    - n_nei: parameter k
    - coeff_min: corresponding to the minority label could be 1 or 2 
  """

  #Compute clusters
  X_min = X[labels == coeff_min]
  labels_clust, num_labels = higra_clustering(X_min, n_nei)

  #Construct array of clusters
  clusters = [np.where(labels_clust == ind_label)[0] for ind_label in range(num_labels)]
  cluster_sizes = np.array([np.sum(labels_clust == i) for i in range(num_labels)])
  cluster_dist = cluster_sizes/(np.sum(cluster_sizes))

  #Graphs, shortest path and centroids
  graphs =[]
  centroid_indices = []
  sorthest_path = []
  for num_label in range(num_labels):
    cluster = X_min[clusters[num_label]]
    graph = dict()
    nn = NearestNeighbors(n_neighbors=len(cluster), metric='euclidean', n_jobs=1).fit(cluster)
    dist, ind = nn.kneighbors(cluster)
    for pt_cluster_i in range(len(cluster)):
      graph[pt_cluster_i] = dict()
      for pt_cluster_j in range(len(cluster)):
        graph[pt_cluster_i][pt_cluster_j] = dist[pt_cluster_i][ind[pt_cluster_i] == pt_cluster_j][0]
    graphs.append(graph)
    centroid_ind = nn.kneighbors(np.mean(cluster, axis=0).reshape(1, -1))[1][0][0]
    centroid_indices.append(centroid_ind)
    sorthest_path.append(dijkstra(graph, centroid_ind))
  
  #New data points creation
  samples = []
  while len(samples) < n_add*abs(len(np.where(labels==coeff_min)[0]) - len(np.where(labels==((1+coeff_min)%2))[0])):
    cluster_idx = np.random.choice(np.arange(len(clusters)), p=cluster_dist)
    cluster = X_min[clusters[cluster_idx]]
    idx = np.random.choice(range(len(clusters[cluster_idx])))
    
    distances, parents = sorthest_path[cluster_idx]

    path = [idx]
    while not parents[path[-1]] is None:
      path.append(parents[path[-1]])
    
    if len(path) == 1:
      X_b = cluster[path[0]]
      samples.append(X_b)
#     elif len(path) == 2:
#       X_a = cluster[path[0]]
#       X_b = cluster[path[1]]
#       sample = X_a + (X_b-X_a)*np.random.uniform(0,1)
#       samples.append(sample)
    else:
      random_vertex = np.random.randint(len(path)-1)
      X_a = cluster[path[random_vertex]]
      X_b = cluster[path[random_vertex+1]]
      sample = X_a + (X_b-X_a)*np.random.uniform(0,1)
      samples.append(sample)
  return np.vstack([X, samples]), np.hstack([labels, np.repeat(coeff_min, len(samples))])

## Classifers algorithms

In [12]:
def score(res, y_test):
    '''
    Function: Compute Gmean score
    Input: 
        - Res: predicted labels
        - y_test: real labels
    Output:
        -Gmean accuracy
    '''
    array = confusion_matrix(y_test, res)
    if len(array.ravel()) != 1:
        tn, fp, fn, tp = array.ravel()
    if (tn*tp) == 0:
      acc = 0
    else:
      acc = math.sqrt((tp*tn)/((tp+fn)*(fp+tn)))
    return acc

In [13]:
def watershed_cut(X_train, X_test, y_train, y_test):
  """
  Function: Watershed cut
  Parameters: 
    - X_train: points used for the training
    - X_test: points used for the testing
    - y_train: labels used for the training
    -y_test: labels used for the testing
  Return: 
    - y_keep: labels predicted
  """
  X_tot = np.concatenate((X_train, X_test))
  seeds = np.int_(np.concatenate((y_train, np.zeros(len(y_test)))))

  best_acc = 0
    
  for n_nei in np.arange(5,66,10):
    graph, edge_weights = hg.make_graph_from_points(X_tot, graph_type='knn', mode='distance',
                                                    n_neighbors=n_nei, metric='euclidian')
    y_watershed = hg.labelisation_seeded_watershed(graph, edge_weights, seeds)
    y_watershed_test = y_watershed[len(y_train):]
    
    if len(np.unique(y_watershed_test)) <= 2:
      acc = score(y_watershed_test, y_test)
    else:
        acc = 0
    if acc >= best_acc:
        best_acc = acc 
        y_keep = y_watershed_test
  return y_keep

In [14]:
def SVM_method(X_train, X_test, y_train, y_test):
  """
  Function: SVM
  Parameters: 
    - X_train: points used for the training 
    - X_test: points used for the testing 
    - y_train: labels used for the training
    -y_test: labels used for the testing
  Return: 
    - y_SVM: labels predicted 
  """
  clf = svm.LinearSVC()
  clf = clf.fit(X_train, y_train)
  y_SVM = clf.predict(X_test)
  return y_SVM

In [15]:
def DecisionTree_method(X_train, X_test, y_train, y_test):
  """
  Function: Decision Tree
  Parameters: 
    - X_train: points used for the training 
    - X_test: points used for the testing 
    - y_train: labels used for the training
    -y_test: labels used for the testing
  Return: 
    - y_tree: labels predicted
  """
  clf = tree.DecisionTreeClassifier()
  clf =  clf.fit(X_train, y_train)
  y_tree = clf.predict(X_test)
  return y_tree

In [16]:
def KNeiClass_method(X_train, X_test, y_train, y_test):
  """
  Function: KNN
  Parameters: 
    - X_train: points used for the training
    - X_test: points used for the testing
    - y_train: labels used for the training
    -y_test: labels used for the testing
  Return: 
    - y_keep: labels predicted
  """
  best_acc = 0
  for n_nei in np.arange(2,55,10):
    clf = KNeighborsClassifier(n_neighbors=n_nei)
    clf = clf.fit(X_train, y_train)
    y_nei = clf.predict(X_test)
    array = confusion_matrix(y_test, y_nei)
    acc = score(y_nei, y_test)
    if acc >= best_acc:
        best_acc = acc
        y_keep = y_nei
  return y_keep

## Compile best parameters for each over sampling, classifier algorithms and imbalanced datasets

In [17]:
SMOTE_param = pd.read_excel(path, 'SMOTE') #Compute SMOTE parameter
SMOTE_param.index = SMOTE_param['Unnamed: 0']
SMOTE_param = SMOTE_param.drop(columns='Unnamed: 0')
SMOTE_param = SMOTE_param.to_dict()

DBSMOTE_eps = pd.read_excel(path, 'DBSMOTE-eps') #Compute DBSMOTE parameter
DBSMOTE_eps.index = DBSMOTE_eps['Unnamed: 0']
DBSMOTE_eps = DBSMOTE_eps.drop(columns='Unnamed: 0')
DBSMOTE_eps = DBSMOTE_eps.to_dict()

DBSMOTE_min = pd.read_excel(path, 'DBSMOTE-min')
DBSMOTE_min.index = DBSMOTE_min['Unnamed: 0']
DBSMOTE_min = DBSMOTE_min.drop(columns='Unnamed: 0')
DBSMOTE_min = DBSMOTE_min.to_dict()

DBSMOTE_param_ = [DBSMOTE_eps, DBSMOTE_min]
DBSMOTE_param = {}
for key in DBSMOTE_eps.keys():
  for k in DBSMOTE_eps[key].keys():
    if not key in DBSMOTE_param.keys():
      DBSMOTE_param[key] = dict()
    DBSMOTE_param[key][k] = tuple(d[key][k] for d in DBSMOTE_param_)

WSMOTE_nn = pd.read_excel(path, 'WSSMOTE-nn') #Compute WSSMOTE parameter
WSMOTE_nn.index = WSMOTE_nn['Unnamed: 0']
WSMOTE_nn = WSMOTE_nn.drop(columns='Unnamed: 0')
WSMOTE_nn = WSMOTE_nn.to_dict()

WSMOTE_add = pd.read_excel(path, 'WSSMOTE-n_add')
WSMOTE_add.index = WSMOTE_add['Unnamed: 0']
WSMOTE_add = WSMOTE_add.drop(columns='Unnamed: 0')
WSMOTE_add = WSMOTE_add.to_dict()

WSSMOTE_param_ = [WSMOTE_nn, WSMOTE_add]
WSSMOTE_param = {}
for key in WSMOTE_nn.keys():
  for k in WSMOTE_nn[key].keys():
    if not key in WSSMOTE_param.keys():
      WSSMOTE_param[key] = dict()
    WSSMOTE_param[key][k] = tuple(d[key][k] for d in WSSMOTE_param_)
    
    
GSMOTE_nn = pd.read_excel(path, 'GSMOTE-nn') #Compute WSSMOTE parameter
GSMOTE_nn.index = GSMOTE_nn['Unnamed: 0']
GSMOTE_nn = GSMOTE_nn.drop(columns='Unnamed: 0')
GSMOTE_nn = GSMOTE_nn.to_dict()

In [18]:
dicts_paramater = dict() # Dictionnary: contains all parameters 
dicts_paramater['SMOTE'] = SMOTE_param
dicts_paramater['WSSMOTE'] = WSSMOTE_param
dicts_paramater['DBSMOTE'] = DBSMOTE_param
dicts_paramater['GSMOTE'] = GSMOTE_nn

# G-means Score

In [19]:
def generate_dico_results(classifiers, classifiers_name, dicts_paramater, oversampling_methods, examples_name, coeffs_min):
    '''
    Function: 
        - generate a dictionnary containing Gmean score for each imbalanced datasets
        Warning: this code is a 10-fold cross-validation because of time computing
    Input:
        - classifiers: list of classifier algortihm used
        - classifiers_name: list of classifier algoritm names used
        - dict_paramters: dictionnary containing best parameters for oversampling method
        - oversampling_method: list of oversampling methods used
        - coeffs_min: list of the label of the minority data points for each imbalanced datasets
    Output:
        - dictionnary - keys = oversampling method - values: Gmean scores
    '''
    dico_over_med_g_mean, dico_over_std_g_mean = dict(), dict()
    dico_over_med_auc, dico_over_std_auc = dict(), dict()
    dico_over_med_f, dico_over_std_f= dict(), dict()
    
    for over_method in oversampling_methods:
        tab_med_g_mean, tab_std_g_mean = pd.DataFrame(), pd.DataFrame()
        tab_med_auc, tab_std_auc = pd.DataFrame(), pd.DataFrame()
        tab_med_f, tab_std_f = pd.DataFrame(), pd.DataFrame()
        
        for idx, example in enumerate(examples):
            X, y = example, labels[idx]
            data_median_g_mean, data_std_g_mean = [], []
            data_median_auc, data_std_auc = [], []
            data_median_f, data_std_f = [], []
            for ind_class, classifier in enumerate(classifiers):
                acc_g_mean, acc_f, acc_auc = [], [], []
                for num_round in range(10):
                    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
                    if over_method == 'without':
                        X_train_over, y_train_over = X_train.copy(), y_train.copy()
                    elif over_method == 'SMOTE':
                        sm = SMOTE(k_neighbors = dicts_paramater[over_method][examples_name[idx]][classifiers_name[ind_class]])
                        X_train_over, y_train_over = sm.fit_resample(X_train, y_train)
                    elif over_method == 'GSSMOTE':
                        sm = GeometricSMOTE(k_neighbors=dicts_paramater['GSMOTE'][examples_name[idx]][classifiers_name[ind_class]])
                        X_train_over, y_train_over = sm.fit_resample(X_train, y_train)
                    elif over_method == 'DBSMOTE':
                        sm = DBSMOTE(eps = dicts_paramater[over_method][examples_name[idx]][classifiers_name[ind_class]][0], 
                                 min_samples= dicts_paramater[over_method][examples_name[idx]][classifiers_name[ind_class]][1])
                        X_train_over, y_train_over = sm.fit_resample(X_train, y_train)
                    elif over_method == 'WSSMOTE':
                        X_train_over, y_train_over= WSSMOTE(X_train, y_train, n_nei=dicts_paramater[over_method][examples_name[idx]][classifiers_name[ind_class]][0],
                                                        coeff_min=coeffs_min[idx], n_add =dicts_paramater[over_method][examples_name[idx]][classifiers_name[ind_class]][1])
                    res = classifier(X_train_over, X_test, y_train_over, y_test)
                    acc_g_mean.append(score(res, y_test))
                    acc_f.append(f1_score(y_test, res))
                    acc_auc.append(roc_auc_score(y_test, res))
                    
                data_median_g_mean.append(round(np.median(acc_g_mean)*100, 2))
                data_std_g_mean.append(round(sp.stats.iqr(acc_g_mean)*100, 2))
                data_median_f.append(round(np.median(acc_f)*100, 2))
                data_std_f.append(round(sp.stats.iqr(acc_f)*100, 2))
                data_median_auc.append(round(np.median(acc_auc)*100, 2))
                data_std_auc.append(round(sp.stats.iqr(acc_auc)*100, 2))
                
            tab_med_g_mean[examples_name[idx]], tab_std_g_mean[examples_name[idx]]  = data_median_g_mean, data_std_g_mean
            tab_med_f[examples_name[idx]], tab_std_f[examples_name[idx]]  = data_median_f, data_std_f
            tab_med_auc[examples_name[idx]], tab_std_auc[examples_name[idx]]  = data_median_auc, data_std_auc
            
        tab_med_g_mean.index, tab_std_g_mean.index = classifiers_name, classifiers_name
        tab_med_f.index, tab_std_f.index = classifiers_name, classifiers_name
        tab_med_auc.index, tab_std_auc.index = classifiers_name, classifiers_name
        
        dico_over_med_g_mean[over_method], dico_over_std_g_mean[over_method] = tab_med_g_mean, tab_std_g_mean
        dico_over_med_f[over_method], dico_over_std_f[over_method] = tab_med_f, tab_std_f
        dico_over_med_auc[over_method], dico_over_std_auc[over_method] = tab_med_auc, tab_std_auc
        
    return (dico_over_med_g_mean, dico_over_std_g_mean), (dico_over_med_f, dico_over_std_f), (dico_over_med_auc, dico_over_std_auc)

In [20]:
classifiers =  [watershed_cut, KNeiClass_method, DecisionTree_method, SVM_method]
classifiers_name = ['WC','KNN', 'DT', 'SVM']
oversampling_methods = ['without', 'SMOTE', 'DBSMOTE', 'GSSMOTE', 'WSSMOTE']
examples_name = ['wisconsin', 'yeast4', 'vehicle1', 'Subcl35', 'Paw','thyroid', 'ecoli1']
coeffs_min = [2, 2, 2, 1, 1, 1, 2]

score_g_mean, score_f, score_auc = generate_dico_results(classifiers, classifiers_name, dicts_paramater, oversampling_methods, examples_name, coeffs_min)





2021-06-11 15:09:26,443:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.5, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:26,589:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.5, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:26,731:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.5, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:26,876:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.5, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:27,022:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.5, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:27,173:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.5, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:27,317:INFO:DBSMOTE: Running 

2021-06-11 15:09:28,783:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:28,806:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:28,829:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:28,852:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:28,875:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:28,898:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:28,922:INFO:DBSMOTE: Running 

2021-06-11 15:09:38,429:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:38,519:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:38,606:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:38,688:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:38,771:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:38,854:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:38,942:INFO:DBSMOTE: Running 

2021-06-11 15:09:44,000:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 3, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:44,006:INFO:DBSMOTE: Number of clusters is 0, trying to increase eps and decrease min_samples
2021-06-11 15:09:44,007:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 1.2000000000000002, 'min_samples': 2, 'n_jobs': 1, 'random_state': <module 'numpy.random' from 'C:\\\\Users\\\\oucht\\\\anaconda3\\\\lib\\\\site-packages\\\\numpy\\\\random\\\\__init__.py'>}")
2021-06-11 15:09:44,059:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 3, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:44,067:INFO:DBSMOTE: Number of clusters is 0, trying to increase eps and decrease min_samples
2021-06-11 15:09:44,068:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 1.2000000000000002, 'min_samples': 2, 'n_jobs': 1, 'random_state': <m

2021-06-11 15:09:45,356:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:45,567:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:45,794:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:46,009:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:46,217:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:46,432:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:46,643:INFO:DBSMOTE: Running 

2021-06-11 15:09:50,802:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:51,087:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:51,400:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:51,674:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:51,994:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:52,334:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.8, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:52,644:INFO:DBSMOTE: Running 

2021-06-11 15:09:57,436:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.5, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:57,497:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.5, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:57,561:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.5, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:57,619:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.5, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:57,685:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.5, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:57,738:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.5, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:57,803:INFO:DBSMOTE: Running 

2021-06-11 15:09:58,950:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.3, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:58,975:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.5, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:58,979:INFO:DBSMOTE: Number of clusters is 0, trying to increase eps and decrease min_samples
2021-06-11 15:09:58,980:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.75, 'min_samples': 4, 'n_jobs': 1, 'random_state': <module 'numpy.random' from 'C:\\\\Users\\\\oucht\\\\anaconda3\\\\lib\\\\site-packages\\\\numpy\\\\random\\\\__init__.py'>}")
2021-06-11 15:09:59,093:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.5, 'min_samples': 5, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:09:59,098:INFO:DBSMOTE: Number of clusters is 0, trying to increase eps and decrease min_samples
2021-06-11 15:09:59,099

2021-06-11 15:10:00,815:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.5, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:10:00,837:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.5, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:10:00,858:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.5, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:10:00,886:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.5, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:10:00,910:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.5, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:10:00,935:INFO:DBSMOTE: Running sampling via ('DBSMOTE', "{'proportion': 1.0, 'eps': 0.5, 'min_samples': 2, 'n_jobs': 1, 'random_state': None}")
2021-06-11 15:10:00,960:INFO:DBSMOTE: Running 





### Results

##### G-Mean

In [21]:
dico_over_med, dico_over_std = score_g_mean[0], score_g_mean[1]

In [22]:
print('Tab oversampling method: without' )
print(pd.DataFrame.from_dict(dico_over_med['without']))
print(pd.DataFrame.from_dict(dico_over_std['without']))

Tab oversampling method: without
     wisconsin  yeast4  vehicle1  Subcl35    Paw  thyroid  ecoli1
WC       95.02   60.84     59.88    63.10  67.98    92.19   81.75
KNN      95.47   28.30     55.01    70.94  75.87    93.78   87.90
DT       93.86   51.34     66.32    63.60  72.07    92.96   85.18
SVM      96.15    0.00     62.20     0.00   0.00    81.65   81.68
     wisconsin  yeast4  vehicle1  Subcl35   Paw  thyroid  ecoli1
WC        2.45   10.96      5.09     5.23  7.84     8.07    6.02
KNN       3.84    6.78      6.42     3.70  6.80     7.03    6.32
DT        3.33   18.11      2.73     4.48  9.12     3.82    6.47
SVM       2.13    0.00     39.88    43.63  0.00    12.14    5.17


In [23]:
print('Tab oversampling method: SMOTE' )
print(pd.DataFrame.from_dict(dico_over_med['SMOTE']))
print(pd.DataFrame.from_dict(dico_over_std['SMOTE']))

Tab oversampling method: SMOTE
     wisconsin  yeast4  vehicle1  Subcl35    Paw  thyroid  ecoli1
WC       95.24   77.89     68.15    72.68  76.21    94.06   82.70
KNN      96.46   91.61     72.23    79.00  85.61    91.22   91.23
DT       94.41   67.23     71.55    72.10  72.16    90.71   83.20
SVM      96.97   78.63     62.72    35.40  24.05    82.16   83.88
     wisconsin  yeast4  vehicle1  Subcl35    Paw  thyroid  ecoli1
WC        2.81   10.97      5.72     4.41   4.90     4.22   10.58
KNN       1.60    6.24      2.28     4.22   2.46     6.35    5.02
DT        4.95   12.01     11.62     6.18   3.45     6.36   13.57
SVM       0.79   10.33     48.08    32.12  49.98     9.56    4.76


In [24]:
print('Tab oversampling method: DBSMOTE' )
print(pd.DataFrame.from_dict(dico_over_med['DBSMOTE']))
print(pd.DataFrame.from_dict(dico_over_std['DBSMOTE']))

Tab oversampling method: DBSMOTE
     wisconsin  yeast4  vehicle1  Subcl35    Paw  thyroid  ecoli1
WC       95.44   62.62     60.34    65.89  78.22    93.43   83.89
KNN      97.54   78.32     54.43    78.03  85.45    93.25   86.81
DT       92.20   53.98     66.14    68.89  68.98    89.62   82.25
SVM      97.06   76.84     74.90    47.67   0.00    78.28   82.16
     wisconsin  yeast4  vehicle1  Subcl35    Paw  thyroid  ecoli1
WC        2.63    9.33      5.62     7.54   8.87     7.39    9.06
KNN       0.64   14.35      3.03     3.51   1.45     3.33    6.46
DT        1.77   12.49      4.39     8.61  12.00     3.42    3.99
SVM       1.10    5.31     38.75    23.99  27.97    24.96    4.56


In [25]:
print('Tab oversampling method: GSSMOTE' )
print(pd.DataFrame.from_dict(dico_over_med['GSSMOTE']))
print(pd.DataFrame.from_dict(dico_over_std['GSSMOTE']))

Tab oversampling method: GSSMOTE
     wisconsin  yeast4  vehicle1  Subcl35    Paw  thyroid  ecoli1
WC       95.23   75.86     59.05    66.88  77.13    93.16   81.44
KNN      98.04   86.02     70.65    77.46  85.73    92.32   89.51
DT       92.91   58.13     66.63    71.89  73.97    92.46   86.06
SVM      97.05   85.65     37.61    38.66  28.77    80.36   88.72
     wisconsin  yeast4  vehicle1  Subcl35    Paw  thyroid  ecoli1
WC        1.91   16.73      5.60     7.01   3.45     5.51    4.40
KNN       0.83    5.26      7.05     3.65   2.69     5.62    2.45
DT        4.35   13.46      3.05     6.28   7.69     5.86    5.90
SVM       0.69    5.76     39.03     9.57  28.74     8.26    3.08


In [26]:
print('Tab oversampling method: WSSMOTE' )
print(pd.DataFrame.from_dict(dico_over_med['WSSMOTE']))
print(pd.DataFrame.from_dict(dico_over_std['WSSMOTE']))

Tab oversampling method: WSSMOTE
     wisconsin  yeast4  vehicle1  Subcl35    Paw  thyroid  ecoli1
WC       96.61   61.32     64.48    67.32  73.19    93.64   84.92
KNN      98.45   83.84     71.25    77.65  85.09    95.74   91.50
DT       92.29   55.64     67.21    67.64  70.04    89.39   88.38
SVM      98.33   71.64     75.72    35.80   0.00    75.84   87.75
     wisconsin  yeast4  vehicle1  Subcl35    Paw  thyroid  ecoli1
WC        1.77    9.75      2.94     8.93   9.54     4.63    7.42
KNN       1.90    7.57      2.97     3.51   8.11     4.39    5.31
DT        2.85   11.57      4.13     7.53   6.93     2.03   10.62
SVM       1.49    6.34      2.25    13.78  47.64    20.30    7.37


##### F1-Score 

In [27]:
dico_over_med, dico_over_std = score_f[0], score_f[1]

In [28]:
print('Tab oversampling method: without' )
print(pd.DataFrame.from_dict(dico_over_med['without']))
print(pd.DataFrame.from_dict(dico_over_std['without']))

Tab oversampling method: without
     wisconsin  yeast4  vehicle1  Subcl35    Paw  thyroid  ecoli1
WC       97.34   97.83     80.64    44.17  54.25    91.88   90.83
KNN      96.74   98.02     84.87    49.90  54.20    92.58   94.75
DT       95.68   97.13     83.40    40.51  53.42    86.67   92.03
SVM      97.03   98.63     86.27     0.00   0.00    78.51   92.62
     wisconsin  yeast4  vehicle1  Subcl35    Paw  thyroid  ecoli1
WC        1.99    0.79      4.72     5.08  11.94    10.06    3.68
KNN       2.27    0.35      3.13     9.32   3.41     8.25    1.07
DT        0.86    0.35      2.07     5.73  14.56     4.55    3.60
SVM       1.56    1.00      8.40    23.70   0.00    15.19    2.04


In [29]:
print('Tab oversampling method: SMOTE' )
print(pd.DataFrame.from_dict(dico_over_med['SMOTE']))
print(pd.DataFrame.from_dict(dico_over_std['SMOTE']))

Tab oversampling method: SMOTE
     wisconsin  yeast4  vehicle1  Subcl35    Paw  thyroid  ecoli1
WC       96.60   94.70     78.95    50.63  54.84    93.21   92.08
KNN      97.19   92.28     75.14    53.14  54.06    89.44   92.79
DT       96.00   96.29     82.90    50.98  44.92    84.62   93.96
SVM      97.47   92.88     84.18    22.76   9.77    76.79   89.24
     wisconsin  yeast4  vehicle1  Subcl35    Paw  thyroid  ecoli1
WC        1.60    1.19      2.37     9.91   5.83     5.12    1.64
KNN       1.40    0.94      2.82     3.67   7.57     7.13    2.94
DT        2.46    0.44      4.15     6.12   9.41     5.56    2.56
SVM       0.72    1.06     20.52    32.34  24.65    10.04    2.80


In [30]:
print('Tab oversampling method: DBSMOTE' )
print(pd.DataFrame.from_dict(dico_over_med['DBSMOTE']))
print(pd.DataFrame.from_dict(dico_over_std['DBSMOTE']))

Tab oversampling method: DBSMOTE
     wisconsin  yeast4  vehicle1  Subcl35    Paw  thyroid  ecoli1
WC       97.08   98.25     79.84    47.79  59.44    91.29   92.02
KNN      98.29   95.48     84.40    56.21  49.32    91.29   91.91
DT       95.65   97.66     84.51    51.92  46.93    87.23   92.01
SVM      97.78   93.33     84.36    30.37   0.00    68.82   90.72
     wisconsin  yeast4  vehicle1  Subcl35    Paw  thyroid  ecoli1
WC        1.42    0.47      2.97     7.12  16.20     9.00    2.94
KNN       0.56    2.31      0.92     7.16   4.44     5.27    4.73
DT        1.70    0.63      1.31     9.16  14.77     4.38    2.62
SVM       0.86    5.25      4.79    25.67  20.92    24.26    5.26


In [31]:
print('Tab oversampling method: GSSMOTE' )
print(pd.DataFrame.from_dict(dico_over_med['GSSMOTE']))
print(pd.DataFrame.from_dict(dico_over_std['GSSMOTE']))

Tab oversampling method: GSSMOTE
     wisconsin  yeast4  vehicle1  Subcl35    Paw  thyroid  ecoli1
WC       96.58   95.63     80.23    44.07  53.27    91.32   90.47
KNN      98.28   94.56     79.05    50.80  51.93    87.30   92.78
DT       95.60   97.28     82.56    49.56  46.56    87.48   92.45
SVM      97.80   91.23     80.97    18.14  21.64    76.39   91.19
     wisconsin  yeast4  vehicle1  Subcl35    Paw  thyroid  ecoli1
WC        1.13    1.30      4.35     8.48   8.90     5.89    2.07
KNN       0.55    2.60      5.01     8.38   2.91     9.53    2.88
DT        2.27    0.78      3.77     6.88  12.27     5.36    3.07
SVM       0.81    1.61     22.09     8.71  23.00     9.18    1.35


In [32]:
print('Tab oversampling method: WSSMOTE' )
print(pd.DataFrame.from_dict(dico_over_med['WSSMOTE']))
print(pd.DataFrame.from_dict(dico_over_std['WSSMOTE']))

Tab oversampling method: WSSMOTE
     wisconsin  yeast4  vehicle1  Subcl35    Paw  thyroid  ecoli1
WC       97.44   97.35     79.02    49.45  51.96    90.99    92.3
KNN      98.44   90.10     70.35    50.65  57.10    93.54    93.0
DT       94.56   97.19     82.52    47.25  50.00    85.71    92.9
SVM      98.39   72.93     73.42    23.66   0.00    67.03    90.0
     wisconsin  yeast4  vehicle1  Subcl35    Paw  thyroid  ecoli1
WC        1.14    0.98      2.28     9.41   2.91     6.12    1.59
KNN       1.92    3.64      4.99     6.25   8.41     2.57    4.12
DT        2.79    0.63      3.82     7.19  16.07     3.66    2.11
SVM       1.00    3.86      5.57     8.65  24.39    22.17    2.01


#### AUC-score

In [33]:
dico_over_med, dico_over_std = score_auc[0], score_auc[1]

In [34]:
print('Tab oversampling method: without' )
print(pd.DataFrame.from_dict(dico_over_med['without']))
print(pd.DataFrame.from_dict(dico_over_std['without']))

Tab oversampling method: without
     wisconsin  yeast4  vehicle1  Subcl35    Paw  thyroid  ecoli1
WC       95.09   67.61     63.14    67.15  71.61    92.50   81.77
KNN      95.49   54.01     62.08    71.68  77.15    93.91   88.17
DT       93.87   61.75     67.65    66.02  74.52    93.08   85.69
SVM      96.15   50.00     67.58    50.00  50.00    82.44   82.41
     wisconsin  yeast4  vehicle1  Subcl35   Paw  thyroid  ecoli1
WC        2.35    6.82      2.71     3.80  6.12     7.91    5.68
KNN       3.80    2.25      4.81     2.76  6.10     6.90    5.72
DT        3.06    9.54      2.74     3.89  7.14     3.92    5.15
SVM       2.13    0.00     18.98     0.12  0.00     9.76    5.19


In [35]:
print('Tab oversampling method: SMOTE' )
print(pd.DataFrame.from_dict(dico_over_med['SMOTE']))
print(pd.DataFrame.from_dict(dico_over_std['SMOTE']))

Tab oversampling method: SMOTE
     wisconsin  yeast4  vehicle1  Subcl35    Paw  thyroid  ecoli1
WC       95.25   78.84     68.46    73.81  76.72    94.17   83.05
KNN      96.49   91.96     72.68    79.36  86.13    91.61   91.62
DT       94.44   71.24     72.38    72.74  73.21    90.80   83.55
SVM      96.98   79.30     68.93    50.48  53.53    83.55   83.94
     wisconsin  yeast4  vehicle1  Subcl35   Paw  thyroid  ecoli1
WC        2.82    8.64      5.26     3.56  3.73     4.22    9.52
KNN       1.57    6.52      2.25     4.79  3.20     6.05    5.13
DT        4.85    9.18     10.27     5.68  3.22     6.41   11.39
SVM       0.74    9.25     25.94     7.88  9.56     7.46    4.59


In [36]:
print('Tab oversampling method: DBSMOTE' )
print(pd.DataFrame.from_dict(dico_over_med['DBSMOTE']))
print(pd.DataFrame.from_dict(dico_over_std['DBSMOTE']))

Tab oversampling method: DBSMOTE
     wisconsin  yeast4  vehicle1  Subcl35    Paw  thyroid  ecoli1
WC       95.46   68.97     62.64    68.53  79.60    93.65   84.67
KNN      97.55   79.56     60.76    78.22  86.21    93.27   87.03
DT       92.28   63.46     68.56    70.95  71.82    90.03   82.92
SVM      97.06   77.48     75.87    52.84  50.00    79.31   82.50
     wisconsin  yeast4  vehicle1  Subcl35   Paw  thyroid  ecoli1
WC        2.56    6.23      3.18     7.16  7.77     7.21    8.76
KNN       0.61   12.47      2.47     4.28  1.81     3.04    6.40
DT        1.51    7.13      3.20     6.29  9.26     3.20    3.00
SVM       1.10    3.90     19.71     6.09  4.22    17.68    4.64


In [37]:
print('Tab oversampling method: GSSMOTE' )
print(pd.DataFrame.from_dict(dico_over_med['GSSMOTE']))
print(pd.DataFrame.from_dict(dico_over_std['GSSMOTE']))

Tab oversampling method: GSSMOTE
     wisconsin  yeast4  vehicle1  Subcl35    Paw  thyroid  ecoli1
WC       95.31   77.53     61.72    68.88  77.70    93.34   81.79
KNN      98.05   86.14     70.77    77.99  86.75    92.34   89.51
DT       92.94   65.94     68.08    72.35  74.93    92.47   86.12
SVM      97.05   85.69     57.13    50.00  54.06    81.98   88.73
     wisconsin  yeast4  vehicle1  Subcl35   Paw  thyroid  ecoli1
WC        1.89   13.32      5.72     6.02  3.37     5.18    4.02
KNN       0.83    4.90      6.95     3.21  2.20     5.47    2.69
DT        4.28    7.42      2.91     6.11  6.66     5.88    5.66
SVM       0.71    5.66     12.92     3.95  6.00     7.20    3.31


In [38]:
print('Tab oversampling method: WSSMOTE' )
print(pd.DataFrame.from_dict(dico_over_med['WSSMOTE']))
print(pd.DataFrame.from_dict(dico_over_std['WSSMOTE']))

Tab oversampling method: WSSMOTE
     wisconsin  yeast4  vehicle1  Subcl35    Paw  thyroid  ecoli1
WC       96.61   67.90     65.85    69.85  75.38    93.66   85.33
KNN      98.47   84.04     73.39    79.50  85.62    95.77   91.54
DT       92.30   64.35     68.63    69.44  72.34    89.45   88.54
SVM      98.34   73.38     78.06    51.47  50.00    77.49   88.01
     wisconsin  yeast4  vehicle1  Subcl35   Paw  thyroid  ecoli1
WC        1.77    6.10      2.24     6.29  7.69     4.31    6.92
KNN       1.88    8.00      2.19     4.14  8.44     4.20    5.20
DT        2.92    6.36      3.81     6.40  5.49     2.03    9.30
SVM       1.48    6.70      2.49     2.56  9.33    16.29    7.74
