 --- 
# UCI - Default from Credit Card Clients
---

# Dataset presentation

This dataset contains information on default payments, demographic factors, credit data, history of payment, and bill statements of credit card clients in Taiwan from April 2005 to September 2005.

It can be found here:
https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients

## Variables
There are 25 variables:

* ID: ID of each client
* LIMIT_BAL: Amount of given credit in NT dollars (includes individual and family/supplementary credit
* SEX: Gender (1=male, 2=female)
* EDUCATION: (1=graduate school, 2=university, 3=high school, 4=others, 5=unknown, 6=unknown)
* MARRIAGE: Marital status (1=married, 2=single, 3=others)
* AGE: Age in years
* PAY_0: Repayment status in September, 2005 (-1=pay duly, 1=payment delay for one month, 2=payment delay for two months, ... 8=payment delay for eight months, 9=payment delay for nine months and above)
* PAY_2: Repayment status in August, 2005 (scale same as above)
* PAY_3: Repayment status in July, 2005 (scale same as above)
* PAY_4: Repayment status in June, 2005 (scale same as above)
* PAY_5: Repayment status in May, 2005 (scale same as above)
* PAY_6: Repayment status in April, 2005 (scale same as above)
* BILL_AMT1: Amount of bill statement in September, 2005 (NT dollar)
* BILL_AMT2: Amount of bill statement in August, 2005 (NT dollar)
* BILL_AMT3: Amount of bill statement in July, 2005 (NT dollar)
* BILL_AMT4: Amount of bill statement in June, 2005 (NT dollar)
* BILL_AMT5: Amount of bill statement in May, 2005 (NT dollar)
* BILL_AMT6: Amount of bill statement in April, 2005 (NT dollar)
* PAY_AMT1: Amount of previous payment in September, 2005 (NT dollar)
* PAY_AMT2: Amount of previous payment in August, 2005 (NT dollar)
* PAY_AMT3: Amount of previous payment in July, 2005 (NT dollar)
* PAY_AMT4: Amount of previous payment in June, 2005 (NT dollar)
* PAY_AMT5: Amount of previous payment in May, 2005 (NT dollar)
* PAY_AMT6: Amount of previous payment in April, 2005 (NT dollar)
* default.payment.next.month: Default payment (1=yes, 0=no)


In [94]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Useful imports

## Packages

In [95]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import sklearn
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

In [96]:
pd.set_option('display.max_columns', 120)

In [97]:
sns.set_style("darkgrid")

## Data Preparation imports

from sklearn.preprocessing import StandardScaler

## Model imports

In [98]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV

In [99]:
import xgboost as xgb

In [100]:
import lightgbm as lgb

In [101]:
# import catboost

## Preproecssing imports

In [102]:
from sklearn.pipeline import Pipeline

## Metrics imports

In [103]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score
from sklearn.metrics import roc_auc_score

In [104]:
import os
os.environ['PATH'].split(';')

['C:\\Users\\twang\\AppData\\Local\\Continuum\\anaconda3',
 'C:\\Users\\twang\\AppData\\Local\\Continuum\\anaconda3\\Library\\mingw-w64\\bin',
 'C:\\Users\\twang\\AppData\\Local\\Continuum\\anaconda3\\Library\\usr\\bin',
 'C:\\Users\\twang\\AppData\\Local\\Continuum\\anaconda3\\Library\\bin',
 'C:\\Users\\twang\\AppData\\Local\\Continuum\\anaconda3\\Scripts',
 'C:\\Users\\twang\\AppData\\Local\\Continuum\\anaconda3\\bin',
 'C:\\Users\\twang\\AppData\\Local\\Continuum\\anaconda3\\condabin',
 'C:\\Program Files\\Docker\\Docker\\Resources\\bin',
 'C:\\ProgramData\\Oracle\\Java\\javapath',
 'C:\\WINDOWS\\system32',
 'C:\\WINDOWS',
 'C:\\WINDOWS\\System32\\Wbem',
 'C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0',
 'C:\\Program Files\\PuTTY',
 'C:\\Program Files\\Intel\\WiFi\\bin',
 'C:\\Program Files\\Common Files\\Intel\\WirelessCommon',
 'C:\\Program Files\\Git LFS',
 'C:\\Program Files\\Pandoc',
 'C:\\Program Files\\MiKTeX 2.9\\miktex\\bin\\x64',
 'C:\\Users\\twang\\AppData\\Local\\Conti

# Importing our Data easily!

In [105]:
path = 'UCI_Credit_Card.csv'

In [106]:
from dataprep.load import load_raw_data
from dataprep.load import load_data
from dataprep.load import load_data_xy


df = load_data(path)
X_raw, y_raw = load_data_xy(path)

# Defining our train/test sets

## Splitting our Df into test/train/val

In [140]:
from sklearn.model_selection import train_test_split

df_train, df_val = train_test_split(df, 
                                    test_size=6000, 
                                    stratify=df.default, 
                                    random_state=42)
df_train, df_test = train_test_split(df_train, 
                                     test_size=6000, 
                                     stratify=df_train.default, 
                                     random_state=42)

## Splitting into (X,y)

In [141]:
from dataprep.load import df2xy

X_train, y_train = df2xy(df_train, 'default')
X_test, y_test = df2xy(df_test, 'default')
X_val, y_val = df2xy(df_val, 'default')

# Protocol for pipelined workflow

Puis on se doit de définir les dictionnaires / classes que l'on va utiliser pour stocker les modèles, leurs scores selon toutes les métriques

Ainsi, on pourra les capitalizer facilement

# Elementary Pipelines

Elementary Pipelines are Pipelines that only do a little processing, such as adding / removing a single feature

## Elementary Transformers from Packages

In [107]:
from sklearn.preprocessing import StandardScaler

## Custom Transformers

In [108]:
# Gather data by age group
from dataprep.pipelines import AgeBinAdder
# Gender x Marriage new category
from dataprep.pipelines import GenderXMarriageAdder
# Gender x AgeBin new category
from dataprep.pipelines import GenderXAgeBinAdder
# Predict next month's bill statement
from dataprep.pipelines import NextBillAdder
# Get_dummies to Df
from dataprep.pipelines import CategoricalWarrior
# Drop a column
from dataprep.pipelines import ColumnDropper


# Complex Pipelines

Complex Pipelines are combinations of multiple Elementary Pipelines

We will define them, and check that they work well below

In [122]:
complex_pipelines = []

## Helper function

In [109]:
def show_pipe_result(pipeline, df, n=2):
    result = pipeline.fit_transform(df)
    print("Shape: ", result.shape)
    if isinstance(result, pd.DataFrame):
        # We're using display, because the df doesn't show by itself when it's encapsulated in a function
        display(result.head())
    elif isinstance(result, np.ndarray):
        print(result[:n])

## Defining our Pipelines

### With Scaler only

First, let's scale the data, as our only data modification

In [110]:
# Pipeline with a StandardScaler
complex_pipe1 = Pipeline([
    ("scaler", StandardScaler())
])

In [111]:
show_pipe_result(complex_pipe1, X_raw)

Shape:  (30000, 23)
[[-1.13672015  0.81016074  0.21186989 -1.06879721 -1.24601985  1.84394071
   2.39995869 -0.40419908 -0.36400938 -0.33135449 -0.33818882 -0.64250107
  -0.64739923 -0.66799331 -0.67249727 -0.66305853 -0.65272422 -0.34194162
  -0.22708564 -0.29680127 -0.30806256 -0.31413612 -0.29338206]
 [-0.3659805   0.81016074  0.21186989  0.84913055 -1.02904717 -0.54231679
   2.39995869 -0.40419908 -0.36400938 -0.33135449  2.956928   -0.65921875
  -0.66674657 -0.63925429 -0.62163594 -0.60622927 -0.59796638 -0.34194162
  -0.21358766 -0.24000461 -0.24422965 -0.31413612 -0.18087821]]


### With Next bill & Scaler

Let's add the Next bill prediction

In [112]:
# Pipeline with a StandardScaler
complex_pipe2 = Pipeline([
    ("next_bill", NextBillAdder()),
    ("scaler", StandardScaler())
])

In [113]:
show_pipe_result(complex_pipe2, X_raw)

Shape:  (30000, 24)
[[-1.13672015  0.81016074  0.21186989 -1.06879721 -1.24601985  1.84394071
   2.39995869 -0.40419908 -0.36400938 -0.33135449 -0.33818882 -0.64250107
  -0.64739923 -0.66799331 -0.67249727 -0.66305853 -0.65272422 -0.34194162
  -0.22708564 -0.29680127 -0.30806256 -0.31413612 -0.29338206 -0.64264241]
 [-0.3659805   0.81016074  0.21186989  0.84913055 -1.02904717 -0.54231679
   2.39995869 -0.40419908 -0.36400938 -0.33135449  2.956928   -0.65921875
  -0.66674657 -0.63925429 -0.62163594 -0.60622927 -0.59796638 -0.34194162
  -0.21358766 -0.24000461 -0.24422965 -0.31413612 -0.18087821 -0.65937279]]


### With our 4 engineered features

First, let's add our 4 feature engineering Elementary Pipelines:

In [114]:
complex_pipe3 = Pipeline([
    ("age_bin", AgeBinAdder()),
    ("gender_age_bin", GenderXAgeBinAdder()),
    ("gender_marriage", GenderXMarriageAdder()),
    ("next_bill", NextBillAdder())
])

In [115]:
show_pipe_result(complex_pipe3, X_raw)

Shape:  (30000, 27)


Unnamed: 0,limit_bal,gender,education,marriage,age,pay_1,pay_2,pay_3,pay_4,pay_5,pay_6,bill_amt1,bill_amt2,bill_amt3,bill_amt4,bill_amt5,bill_amt6,pay_amt1,pay_amt2,pay_amt3,pay_amt4,pay_amt5,pay_amt6,age_bin,gen_ageBin,gen_mar,pred_bill_amt0
0,20000.0,2,2,1,24,1,1,0,0,0,0,3913.0,3102.0,689.0,0.0,0.0,0.0,0.0,689.0,0.0,0.0,0.0,0.0,20s,"(2, 20s)","(2, 1)",3909.683795
1,120000.0,2,2,2,26,0,1,0,0,0,1,2682.0,1725.0,2682.0,3272.0,3455.0,3261.0,0.0,1000.0,1000.0,1000.0,0.0,2000.0,20s,"(2, 20s)","(2, 2)",2678.148995
2,90000.0,2,2,2,34,0,0,0,0,0,0,29239.0,14027.0,13559.0,14331.0,14948.0,15549.0,1518.0,1500.0,1000.0,1000.0,1000.0,5000.0,30s,"(2, 30s)","(2, 2)",29177.529785
3,50000.0,2,2,1,37,0,0,0,0,0,0,46990.0,48233.0,49291.0,28314.0,28959.0,29547.0,2000.0,2019.0,1200.0,1100.0,1069.0,1000.0,30s,"(2, 30s)","(2, 1)",46995.03811
4,50000.0,1,2,1,57,0,0,0,0,0,0,8617.0,5670.0,35835.0,20940.0,19146.0,19131.0,2000.0,36681.0,10000.0,9000.0,689.0,679.0,50s,"(1, 50s)","(1, 1)",8605.584386


### With our 4 engineered features, but whithout 'age'

Let's remove the 'age' feature:

In [116]:
complex_pipe4 = Pipeline([
    ("age_bin", AgeBinAdder()),
    ("gender_age_bin", GenderXAgeBinAdder()),
    ("gender_marriage", GenderXMarriageAdder()),
    ("next_bill", NextBillAdder()),
    ("age_remove", ColumnDropper('age'))
])

In [117]:
show_pipe_result(complex_pipe4, X_raw)

Shape:  (30000, 26)


Unnamed: 0,limit_bal,gender,education,marriage,pay_1,pay_2,pay_3,pay_4,pay_5,pay_6,bill_amt1,bill_amt2,bill_amt3,bill_amt4,bill_amt5,bill_amt6,pay_amt1,pay_amt2,pay_amt3,pay_amt4,pay_amt5,pay_amt6,age_bin,gen_ageBin,gen_mar,pred_bill_amt0
0,20000.0,2,2,1,1,1,0,0,0,0,3913.0,3102.0,689.0,0.0,0.0,0.0,0.0,689.0,0.0,0.0,0.0,0.0,20s,"(2, 20s)","(2, 1)",3909.683795
1,120000.0,2,2,2,0,1,0,0,0,1,2682.0,1725.0,2682.0,3272.0,3455.0,3261.0,0.0,1000.0,1000.0,1000.0,0.0,2000.0,20s,"(2, 20s)","(2, 2)",2678.148995
2,90000.0,2,2,2,0,0,0,0,0,0,29239.0,14027.0,13559.0,14331.0,14948.0,15549.0,1518.0,1500.0,1000.0,1000.0,1000.0,5000.0,30s,"(2, 30s)","(2, 2)",29177.529785
3,50000.0,2,2,1,0,0,0,0,0,0,46990.0,48233.0,49291.0,28314.0,28959.0,29547.0,2000.0,2019.0,1200.0,1100.0,1069.0,1000.0,30s,"(2, 30s)","(2, 1)",46995.03811
4,50000.0,1,2,1,0,0,0,0,0,0,8617.0,5670.0,35835.0,20940.0,19146.0,19131.0,2000.0,36681.0,10000.0,9000.0,689.0,679.0,50s,"(1, 50s)","(1, 1)",8605.584386


### One-hot encoding

Let's add a CategoricalWarrior to 1-hot encode our categorical features:

In [118]:
# Pipeline with 1-hot encoded
complex_pipe5 = Pipeline([
    ("age_bin", AgeBinAdder()),
    ("gender_age_bin", GenderXAgeBinAdder()),
    ("gender_marriage", GenderXMarriageAdder()),
    ("next_bill", NextBillAdder()),
    ("age_remove", ColumnDropper('age')),
    ("1_hot", CategoricalWarrior(['gender', 'education', 'marriage']))
])

In [119]:
show_pipe_result(complex_pipe5, X_raw)

Shape:  (30000, 50)


Unnamed: 0,limit_bal,pay_1,pay_2,pay_3,pay_4,pay_5,pay_6,bill_amt1,bill_amt2,bill_amt3,bill_amt4,bill_amt5,bill_amt6,pay_amt1,pay_amt2,pay_amt3,pay_amt4,pay_amt5,pay_amt6,pred_bill_amt0,gender_2,gender_1,education_2,education_1,education_3,education_4,marriage_1,marriage_2,marriage_3,age_bin_20s,age_bin_30s,age_bin_40s,age_bin_50s,age_bin_60+,"gen_ageBin_(1, '20s')","gen_ageBin_(1, '30s')","gen_ageBin_(1, '40s')","gen_ageBin_(1, '50s')","gen_ageBin_(1, '60+')","gen_ageBin_(2, '20s')","gen_ageBin_(2, '30s')","gen_ageBin_(2, '40s')","gen_ageBin_(2, '50s')","gen_ageBin_(2, '60+')","gen_mar_(1, 1)","gen_mar_(1, 2)","gen_mar_(1, 3)","gen_mar_(2, 1)","gen_mar_(2, 2)","gen_mar_(2, 3)"
0,20000.0,1,1,0,0,0,0,3913.0,3102.0,689.0,0.0,0.0,0.0,0.0,689.0,0.0,0.0,0.0,0.0,3909.683795,1,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0
1,120000.0,0,1,0,0,0,1,2682.0,1725.0,2682.0,3272.0,3455.0,3261.0,0.0,1000.0,1000.0,1000.0,0.0,2000.0,2678.148995,1,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0
2,90000.0,0,0,0,0,0,0,29239.0,14027.0,13559.0,14331.0,14948.0,15549.0,1518.0,1500.0,1000.0,1000.0,1000.0,5000.0,29177.529785,1,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0
3,50000.0,0,0,0,0,0,0,46990.0,48233.0,49291.0,28314.0,28959.0,29547.0,2000.0,2019.0,1200.0,1100.0,1069.0,1000.0,46995.03811,1,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0
4,50000.0,0,0,0,0,0,0,8617.0,5670.0,35835.0,20940.0,19146.0,19131.0,2000.0,36681.0,10000.0,9000.0,689.0,679.0,8605.584386,0,1,1,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0


### One-hot encoding & Scaler

Let's add a StandardScaler to normalize our inputs:

In [120]:
# Pipeline with 1-hot encoded
# Pipeline with a StandardScaler
complex_pipe6 = Pipeline([
    ("age_bin", AgeBinAdder()),
    ("gender_age_bin", GenderXAgeBinAdder()),
    ("gender_marriage", GenderXMarriageAdder()),
    ("next_bill", NextBillAdder()),
    ("age_remove", ColumnDropper('age')),
    ("1_hot", CategoricalWarrior(['gender', 'education', 'marriage'])),
    ("scaler", StandardScaler())
])

In [121]:
show_pipe_result(complex_pipe6, X_raw)

Shape:  (30000, 50)
[[-1.13672015  1.84394071  2.39995869 -0.40419908 -0.36400938 -0.33135449
  -0.33818882 -0.64250107 -0.64739923 -0.66799331 -0.67249727 -0.66305853
  -0.65272422 -0.34194162 -0.22708564 -0.29680127 -0.30806256 -0.31413612
  -0.29338206 -0.64264241  0.81016074 -0.81016074  1.06689977 -0.73837457
  -0.44275183 -0.12588573  1.09377971 -1.06647132 -0.11281222  1.31303214
  -0.74528643 -0.5002604  -0.26221495 -0.1069072  -0.38324492 -0.41253329
  -0.30719909 -0.17756815 -0.07747568  1.78424128 -0.51816884 -0.35858507
  -0.18694037 -0.0732252  -0.45737276 -0.52865999 -0.0696908   1.59446883
  -0.67608338 -0.08828139]
 [-0.3659805  -0.54231679  2.39995869 -0.40419908 -0.36400938 -0.33135449
   2.956928   -0.65921875 -0.66674657 -0.63925429 -0.62163594 -0.60622927
  -0.59796638 -0.34194162 -0.21358766 -0.24000461 -0.24422965 -0.31413612
  -0.18087821 -0.65937279  0.81016074 -0.81016074  1.06689977 -0.73837457
  -0.44275183 -0.12588573 -0.91426088  0.93767172 -0.11281222  1.

In [123]:
complex_pipelines = [complex_pipe1,
                    complex_pipe2,
                    complex_pipe3,
                    complex_pipe4,
                    complex_pipe5,
                    complex_pipe6]

# Prediction Classifiers

Let's define the different classifiers that we will be using

In [124]:
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import AdaBoostClassifier

In [125]:
dico_models = {}

In [126]:
dico_models['log'] = log_clf = LogisticRegression(C=0.1, 
                             solver='liblinear',
                             penalty='l2',
                             class_weight='balanced', 
                             random_state=42, 
                             n_jobs=-1)
dico_models['svm'] =  SVC(gamma='auto', C=1, class_weight='balanced')
dico_models['tree'] = DecisionTreeClassifier(criterion='entropy', 
                                  random_state=42, 
                                 max_leaf_nodes=5,
                                 class_weight='balanced')
dico_models['forest'] = RandomForestClassifier(n_estimators=500, 
                                 max_leaf_nodes=10, 
                                 n_jobs=-1, 
                                 class_weight='balanced',
                                random_state=42)

dico_models['adaboost'] = AdaBoostClassifier()

# Prediction Pipelines

Prediction Pipelines consists of the following architecture:  
  [Complex Pipeline, Classifier]

In [128]:
from collections import defaultdict

dico_pipelined_models = defaultdict(dict)

In [129]:
for i in range(len(complex_pipelines)):
    for (name, model) in dico_models.items():
        dico_pipelined_models[i][name] = Pipeline([
            ('preproc', complex_pipelines[i]),
            ('classifier', model)
        ])

In [134]:
dico_pipelined_models[0]

{'log': Pipeline(memory=None,
          steps=[('preproc',
                  Pipeline(memory=None,
                           steps=[('scaler',
                                   StandardScaler(copy=True, with_mean=True,
                                                  with_std=True))],
                           verbose=False)),
                 ('classifier',
                  LogisticRegression(C=0.1, class_weight='balanced', dual=False,
                                     fit_intercept=True, intercept_scaling=1,
                                     l1_ratio=None, max_iter=100,
                                     multi_class='warn', n_jobs=-1, penalty='l2',
                                     random_state=42, solver='liblinear',
                                     tol=0.0001, verbose=0, warm_start=False))],
          verbose=False), 'svm': Pipeline(memory=None,
          steps=[('preproc',
                  Pipeline(memory=None,
                           steps=[('scaler',
    

In [135]:
dico_pipelined_models[4]

{'log': Pipeline(memory=None,
          steps=[('preproc',
                  Pipeline(memory=None,
                           steps=[('age_bin', AgeBinAdder()),
                                  ('gender_age_bin', GenderXAgeBinAdder()),
                                  ('gender_marriage', GenderXMarriageAdder()),
                                  ('next_bill', NextBillAdder()),
                                  ('age_remove', ColumnDropper(column='age')),
                                  ('1_hot',
                                   CategoricalWarrior(attribute_names=['gender',
                                                                       'education',
                                                                       'marriage']))],
                           verbose=False)),
                 ('classifier',
                  LogisticRegression(C=0.1, class_weight='balanced', dual=False,
                                     fit_intercept=True, intercept_scaling=1,
        

# Training

In [136]:
dico_classification_reports = defaultdict(dict)
dico_confusion_matrices = defaultdict(dict)

In [144]:
for i in range(len(complex_pipelines)):
    for (name, model) in dico_models.items():
        pipe = dico_pipelined_models[i][name]
        pipe.fit(X_train, np.ravel(y_train))
        print('Pipe {}, {}'.format(i, name.capitalize()))
        print('--------')
        
        y_pred = pipe.predict(X_test)
        dico_classification_reports[i][name] = classification_report(y_test, y_pred)
        dico_confusion_matrices[i][name] = confusion_matrix(y_test, y_pred)
        print(dico_classification_reports[i][name])
        #print(dico_confusion_matrices[i][name])

        

  " = {}.".format(effective_n_jobs(self.n_jobs)))


Pipe 0, Log
--------
              precision    recall  f1-score   support

           0       0.87      0.83      0.85      4673
           1       0.48      0.57      0.52      1327

    accuracy                           0.77      6000
   macro avg       0.68      0.70      0.69      6000
weighted avg       0.78      0.77      0.78      6000

Pipe 0, Svm
--------
              precision    recall  f1-score   support

           0       0.87      0.80      0.84      4673
           1       0.46      0.59      0.52      1327

    accuracy                           0.76      6000
   macro avg       0.67      0.70      0.68      6000
weighted avg       0.78      0.76      0.77      6000

Pipe 0, Tree
--------
              precision    recall  f1-score   support

           0       0.86      0.83      0.85      4673
           1       0.48      0.54      0.51      1327

    accuracy                           0.77      6000
   macro avg       0.67      0.69      0.68      6000
weighted a

  " = {}.".format(effective_n_jobs(self.n_jobs)))


Pipe 1, Log
--------
              precision    recall  f1-score   support

           0       0.87      0.83      0.85      4673
           1       0.48      0.57      0.52      1327

    accuracy                           0.77      6000
   macro avg       0.68      0.70      0.69      6000
weighted avg       0.78      0.77      0.78      6000

Pipe 1, Svm
--------
              precision    recall  f1-score   support

           0       0.87      0.80      0.84      4673
           1       0.46      0.59      0.52      1327

    accuracy                           0.76      6000
   macro avg       0.67      0.70      0.68      6000
weighted avg       0.78      0.76      0.77      6000

Pipe 1, Tree
--------
              precision    recall  f1-score   support

           0       0.86      0.83      0.85      4673
           1       0.48      0.54      0.51      1327

    accuracy                           0.77      6000
   macro avg       0.67      0.69      0.68      6000
weighted a

ValueError: setting an array element with a sequence.

# TO DO Next