# Breast cancer predicting model - ML

Based on this dataset: https://www.kaggle.com/uciml/breast-cancer-wisconsin-data


> Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.
n the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23-34].

> This database is also available through the UW CS ftp server:
ftp ftp.cs.wisc.edu
cd math-prog/cpo-dataset/machine-learn/WDBC/

> Also can be found on UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

# 0 - Intro




**My main objective is to build 5 models of machine learning and 1 model of deep learning to predict if the tumor is a benign one or not.
Besides, I am going to do the tuning of parameters of each model to get a better accuracy.
Then, I perform the comparison of accuracy between these models.**


**Table of contents**

* [1 - Exploratory data analysis](#1-Exploratory-Data-Analysis)
* [2 - Dataset preparation](#2-Dataset-Preparation)
* [3 - Create train and test datasets](#3-Create-Train-And-Test-Datasets)
* [4 - Models](#4-Models)
    - [4.1 - SVM](#4.1-SVM)
        - [4.1.1 - SVM - Build a model with default parameters](#4.1.1-Build-A-Model-With-Default-Parameters)
        - [4.1.2 - SVM - Parameters tuning](#4.1.2-Parameters-Tuning)
        - [4.1.3 - SVM - Confusion Matrix](#4.1.3-Confusion-Matrix)
        - [4.1.4 - SVM - Importance of each feature](#4.1.4-Importance-Of-Each-Feature)
        - [4.1.5 - SVM - Cross Validation](#4.1.5-Cross-Validation)
    - [4.2 - Decision Tree](#4.2-Decision-Tree)
        - [4.2.1 - Decision Tree - Build a model with default parameters](#4.2.1-Build-A-Model-With-Default-Parameters)
        - [4.2.2 - Decision Tree - Parameters tuning](#4.2.2-Parameters-Tuning)
        - [4.2.3 - Decision Tree - Confusion Matrix](#4.2.3-Confusion-Matrix)
        - [4.2.4 - Decision Tree - Importance of each feature](#4.2.4-Importance-Of-Each-Feature)
        - [4.2.5 - Decision Tree - Cross Validation](#4.2.5-Cross-Validation)
    - [4.3 - Logistic regression](#4.3-Logistic-Regression)
        - [4.3.1 - Logistic Regression - Build a model with default parameters](#4.3.1-Build-A-Model-With-Default-Parameters)
        - [4.3.2 - Logistic Regression - Parameters tuning](#4.3.2-Parameters-Tuning)
        - [4.3.3 - Logistic Regression - Confusion Matrix](#4.3.3-Confusion-Matrix)
        - [4.3.4 - Logistic Regression - Importance of each feature](#4.3.4-Importance-Of-Each-Feature)
        - [4.3.5 - Ligistic Regression - Cross Validation](#4.3.5-Cross-Validation)
    - [4.4 - Random forest](#4.4-Random-Forest)
        - [4.4.1 - Random Forest - Build a model with default parameters](#4.4.1-Build-A-Model-With-Default-Parameters)
        - [4.4.2 - Random Forest - Parameters tuning](#4.4.2-Parameters-Tuning)
        - [4.4.3 - Random Forest - Confusion Matrix](#4.4.3-Confusion-Matrix)
        - [4.4.4 - Random Forest - Importance of each feature](#4.4.4-Importance-Of-Each-Feature)
        - [4.4.5 - Random Forest - Cross Validation](#4.4.5-Cross-Validation)
    - [4.5 - KNN](#4.5-KNN)
        - [4.5.1 - KNN - Build a model with default parameters](#4.5.1-Build-A-Model-With-Default-Parameters)
        - [4.5.2 - KNN - Parameters tuning](#4.5.2-Parameters-Tuning)
        - [4.5.3 - KNN - Confusion Matrix](#4.5.3-Confusion-Matrix)
        - [4.5.4 - KNN - Importance of each feature](#4.5.4-Importance-Of-Each-Feature)
        - [4.5.5 - KNN - Cross Validation](#4.5.5-Cross-Validation)
* [5 - Deep learning - Tensorflow and Keras](#5-Deep-Learning-TensorFlow-And-Keras)
    - [5.1 - Using test_split](#5.1-Using-Test-Split)
    - [5.2 - Using cross validation](#5.2-Using-Cross-Validation)
* [6 - Conclusion](#6-Conclusion)
* [7 - Appendix](#7-Appendix)

<a id="1-Exploratory-Data-Analysis"></a>
# 1 - Exploratory data analysis

 **Attribute Information:**

1) ID number

2) Diagnosis (M = malignant, B = benign) 

3)

Ten real-valued features are computed for each cell nucleus:

a) radius (mean of distances from center to points on the perimeter)
b) texture (standard deviation of gray-scale values)
c) perimeter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter^2 / area - 1.0)
g) concavity (severity of concave portions of the contour)
h) concave points (number of concave portions of the contour)
i) symmetry
j) fractal dimension ("coastline approximation" - 1)

The mean, standard error and "worst" or largest (mean of the three
largest values) of these features were computed for each image,
resulting in 30 features. For instance, field 3 is Mean Radius, field
13 is Radius SE, field 23 is Worst Radius.

All feature values are recoded with four significant digits.

Missing attribute values: none

Class distribution: 357 benign, 212 malignant

https://www.kaggle.com/uciml/breast-cancer-wisconsin-data

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split,cross_val_score,GridSearchCV
from sklearn import neighbors
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import r2_score,confusion_matrix, accuracy_score, plot_confusion_matrix #utilizada para verificar a acurácia do modelo construído
from sklearn.svm import SVC #utilizada para importar o algoritmo SVM
#from sklearn.linear_model import LinearRegression
#import tensorflow
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.wrappers.scikit_learn import KerasClassifier

Necessary functions

In [None]:
def DisplayConfusionMatrix(_classifier, _title):
    df_cm_svm = pd.DataFrame(_classifier, index = [i for i in "01"],columns = [i for i in "01"])
    disp = plot_confusion_matrix(_classifier, x_test, y_test,
                                 display_labels = ['Benign','Malignant'],
                                 cmap = plt.cm.Blues,
                                 normalize = None)
    disp.ax_.set_title(_title)

    #print(_title)
    #print(disp.confusion_matrix)

#https://medium.com/analytics-vidhya/evaluating-a-random-forest-model-9d165595ad56

def DisplayConfusionMatrix_2(_y_pred, _title):
    matrix = confusion_matrix(y_test, _y_pred)
    #matrix = matrix.astype('float') / matrix.sum(axis=1)[:, np.newaxis]  # if we want %

    # Build the plot
    plt.figure(figsize=(7,7))
    sns.set(font_scale=1.4)
    sns.heatmap(matrix, annot=True, annot_kws={'size':10},
            cmap=plt.cm.Blues, linewidths=0.2)

    # Add labels to the plot
    class_names = ['Benign','Malignant']
    tick_marks = np.arange(len(class_names))
    tick_marks2 = tick_marks + 0.5
    plt.xticks(tick_marks, class_names, rotation=25)
    plt.yticks(tick_marks2, class_names, rotation=0)
    plt.xlabel('Predicted label')
    plt.ylabel('True label')
    plt.title(_title)
    plt.show()
    
def FeatureImportance (_model):
    fi = pd.DataFrame({'feature': list(x_train.columns),
                       'importance': _model.feature_importances_}).sort_values('importance', ascending = False)
    return fi

In [None]:
#Gets de dataset as a DataFrame
ds = pd.read_csv('/kaggle/input/breast-cancer-wisconsin-data/data.csv')

In [None]:
ds.shape

The dataset contains 33 columns and 569 records.
Each record corresponds to  

In [None]:
ds.info()

In [None]:
ds.describe().T

In [None]:
ds.head()

In [None]:
ds.tail()

We can consider that the data of the column "diagnosis" is balanced

In [None]:
sns.countplot(data=ds, x='diagnosis')
plt.title('Count by diagnosis')
plt.show()

<a id="2-Dataset-Preparation"></a>
# 2 - Dataset preparation

In [None]:
# Transform the class column (diagnosis) into a int64
# 0 = Benign
# 1 = Malignant

#ds.diagnosis = ds.diagnosis == 'M'
#ds.diagnosis = ds.diagnosis.astype('int')
# OR
ds['diagnosis'] = ds['diagnosis'].map({'M':1,'B':0})

In [None]:
# Drop columns "id" and "Unnamed: 32"
ds.drop(['id','Unnamed: 32'], axis= 1, inplace=True)

<a id="3-Create-Train-And-Test-Datasets"></a>
# 3 - Create train and test datasets

In [None]:
# Create different arrays for features and target
x = ds.iloc[:, 1:-1]
y = ds.iloc[:, 0]

In [None]:
# Correlation between all features
f,ax = plt.subplots(figsize=(18, 18))
sns.heatmap(x.corr(), annot=True, linewidths=.5, fmt= '.1f',ax=ax)

#As we can see, there are many features that are strongly correlated. I decided to use all features.
#To be done: eliminate unuseful features

* Let's split the dataset:

    1. 80% for training the model
    2. 20% for testing the model

In [None]:
# Create train and test datasets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 42, stratify = y)

<a id="4-Models"></a>
# 4 - Models

<a id="4.1-SVM"></a>
# 4.1 - SVM

<a id="4.1.1-Build-A-Model-With-Default-Parameters"></a>
# 4.1.1 - SVM - Build a model with default parameters

In [None]:
modelSVM = SVC(kernel = 'linear')
modelSVM.fit(x_train, y_train)
labels_svm = modelSVM.predict(x_test)
score_svm = modelSVM.score(x_test, y_test)
print("Score (SVM): %f" % score_svm)
conf_mx_svm = confusion_matrix(y_test, labels_svm)
scores = [score_svm]

<a id="4.1.2-Parameters-Tuning"></a>
# 4.1.2 - SVM - Parameters tuning

In [None]:
params_svm = {'kernel' : ['linear', 'rbf'],   #not used: poly, precomputed and sigmoid
              'gamma' : ['scale', 'auto']}
grid_search_svm = GridSearchCV(estimator = modelSVM,
                           param_grid = params_svm,
                           scoring = 'accuracy',
                           cv = 5)
grid_search_svm = grid_search_svm.fit(x, y)
print(f'The best parameters for SVM are: "{grid_search_svm.best_params_}" and this model can explain the dataset with an accuracy of {str(np.round(grid_search_svm.best_score_ * 100,2))} %')
scores.append(grid_search_svm.best_score_)

<a id="4.1.3-Confusion-Matrix"></a>
# 4.1.3 - SVM - Confusion Matrix

In [None]:
DisplayConfusionMatrix(modelSVM,"Confusion matrix for SVM model")

<a id="4.1.4-Importance-Of-Each-Feature"></a>
# 4.1.4 - SVM - Importance of each feature

In [None]:
#To be implemented

#FeatureImportance(modelSVM)
#def f_importances(coef, names):
#    imp = coef
#    imp,names = zip(*sorted(zip(imp,names)))
#    plt.barh(range(len(names)), imp, align='center')
#    plt.yticks(range(len(names)), names)
#    plt.show()

#features_names
#f_importances(modelSVM.coef_, features_names)
#modelSVM.coef_

<a id="4.1.5-Cross-Validation"></a>
# 4.1.5 - SVM - Cross Validation

In [None]:
#Cross validation for SVM with default hyper parameters
svm_cv_default = cross_val_score(estimator = modelSVM,
                             X = x, y = y,
                             cv = 10, scoring = 'accuracy')
score_svm_default_cv = svm_cv_default.mean()
print("Score (SVM default CV): %f" % score_svm_default_cv)
scores_cv = [score_svm_default_cv]

<a id="4.2-Decision-Tree"></a>
# 4.2 - Decision Tree

<a id="4.2.1-Build-A-Model-With-Default-Parameters"></a>
# 4.2.1 - Decision Tree - Build a model with default parameters

In [None]:
tree = DecisionTreeClassifier()
tree.fit(x_train, y_train)
labels_tree = tree.predict(x_test)
score_tree = tree.score(x_test, y_test)
print("Score (Decision tree): %f" % score_tree)
conf_mx_tree = confusion_matrix(y_test, labels_tree)
#accuracia = accuracy_score(y_test, labels_tree)
#print ("Acuracia utilizando o SVM :" , accuracia , "\nEm porcentagem : ", round(accuracia*100) , "%\n")
scores.append(score_tree)

<a id="4.2.2-Parameters-Tuning"></a>
# 4.2.2 - Decision Tree - Parameters tuning

In [None]:
params_tree = {'criterion' : ['gini', 'entropy'],
               'max_depth' : range(1,10)}
grid_search_tree = GridSearchCV(estimator = tree,
                           param_grid = params_tree,
                           scoring = 'accuracy',
                           cv = 5)
grid_search_tree = grid_search_tree.fit(x, y)
print(f'The best parameters for Decision Tree are: "{grid_search_tree.best_params_}" and this model can explain the dataset with an accuracy of {str(np.round(grid_search_tree.best_score_ * 100,2))} %')
scores.append(grid_search_tree.best_score_)

<a id="4.2.3-Confusion-Matrix"></a>
# 4.2.3 - Decision Tree - Confusion Matrix

In [None]:
DisplayConfusionMatrix(tree,"Confusion matrix for Decision Tree model")

<a id="4.2.4-Importance-Of-Each-Feature"></a>
# 4.2.4 - Decision Tree - Importance of each feature

In [None]:
FeatureImportance(tree)

<a id="4.2.5-Cross-Validation"></a>
# 4.2.5 - Decision Tree - Cross Validation

In [None]:
#Cross validation for Decision Tree with default hyper parameters
tree_cv_default = cross_val_score(estimator = tree,
                             X = x, y = y,
                             cv = 10, scoring = 'accuracy')
score_tree_default_cv = tree_cv_default.mean()
print("Score (Decision Tree default CV): %f" % score_tree_default_cv)
scores_cv.append(score_tree_default_cv)

<a id="4.3-Logistic-Regression"></a>
# 4.3 - Logistic regression

<a id="4.3.1-Build-A-Model-With-Default-Parameters"></a>
# 4.3.1 - Logistic Regression - Build a model with default parameters

In [None]:
logreg = LogisticRegression(max_iter=3000)
logreg.fit(x_train, y_train)
labels_logreg = logreg.predict(x_test)
conf_mx_logreg = confusion_matrix(y_test, labels_logreg)
score_lr = logreg.score(x_test, y_test)
print("Score (Logistic Regression): %f" % score_lr)
scores.append(score_lr)

<a id="4.3.2-Parameters-Tuning"></a>
# 4.3.2 - Logistic Regression - Parameters tuning

In [None]:
params_logreg = {"solver":[ 'newton-cg', 'liblinear', 'sag', 'saga'],  #not used: 'lbfgs'
                 "max_iter" : [10000]}
grid_search_logreg = GridSearchCV(estimator = logreg,
                           param_grid = params_logreg,
                           scoring = 'accuracy',
                           cv = 5)
grid_search_logreg = grid_search_logreg.fit(x, y)
scores.append(grid_search_logreg.best_score_)
print(f'The best parameters for Logistic Regression are: "{grid_search_logreg.best_params_}" and this model can explain the dataset with an accuracy of {str(np.round(grid_search_logreg.best_score_ * 100,2))} %')

<a id="4.3.3-Confusion-Matrix"></a>
# 4.3.3 - Logistic Regression - Confusion Matrix

In [None]:
DisplayConfusionMatrix(logreg,"Confusion matrix for Logistic Regression model")

<a id="4.3.4-Importance-Of-Each-Feature"></a>
# 4.3.4 - Logistic Regression - Importance of each feature

In [None]:
#To be implemented

<a id="4.3.5-Cross-Validation"></a>
# 4.3.5 - Logistic Regression - Cross Validation

In [None]:
#Cross validation for Logistic Regression with default hyper parameters
logreg_cv_default = cross_val_score(estimator = logreg,
                             X = x, y = y,
                             cv = 10, scoring = 'accuracy')
score_logreg_default_cv = logreg_cv_default.mean()
print("Score (Logistic Regression default CV): %f" % score_logreg_default_cv)
scores_cv.append(score_logreg_default_cv)

<a id="4.4-Random-Forest"></a>
# 4.4 - Random forest

<a id="4.4.1-Build-A-Model-With-Default-Parameters"></a>
# 4.4.1 - Random Forest - Build a model with default parameters

In [None]:
forest = RandomForestClassifier()
forest.fit(x_train, y_train)
labels_rf = forest.predict(x_test)
conf_mx_rf = confusion_matrix(y_test, labels_rf)
score_rf = forest.score(x_test, y_test)
print("Score (Random forest): %f" % score_rf)
scores.append(score_rf)

<a id="4.4.2-Parameters-Tuning"></a>
# 4.4.2 - Random Forest - Parameters tuning

In [None]:
params_rf = { 
    'n_estimators': [100, 200, 500],
    'max_features': ['auto', 'sqrt', 'log2'],
  #  'max_depth' : ['none', 4,5,6,7,8],
    'criterion' : ['gini', 'entropy']
}

#params_rf = {"criterion":['gini'], "n_estimators" : range(60,110)}

grid_search_rf = GridSearchCV(estimator = forest,
                           param_grid = params_rf,
                           scoring = 'accuracy',
                           cv = 5)
grid_search_rf = grid_search_rf.fit(x, y)
scores.append(grid_search_rf.best_score_)
print(f'The best parameters for Random Forest are: "{grid_search_rf.best_params_}" and this model can explain the dataset with an accuracy of {str(np.round(grid_search_rf.best_score_ * 100,2))} %')

<a id="4.4.3-Confusion-Matrix"></a>
# 4.4.3 - Random Forest - Confusion Matrix

In [None]:
DisplayConfusionMatrix_2(labels_rf, 'Confusion Matrix for Random Forest Model')

<a id="4.4.4-Importance-Of-Each-Feature"></a>
# 4.4.4 - Random Forest - Importance of each feature

In [None]:
FeatureImportance(forest)

<a id="4.4.5-Cross-Validation"></a>
# 4.4.5 - Random Forest - Cross Validation

In [None]:
#Cross validation for Random Fores with default hyper parameters
rf_cv_default = cross_val_score(estimator = forest,
                             X = x, y = y,
                             cv = 10, scoring = 'accuracy')
score_rf_default_cv = rf_cv_default.mean()
print("Score (Radom Forest default CV): %f" % score_rf_default_cv)
scores_cv.append(score_rf_default_cv)

<a id="4.5-KNN"></a>
# 4.5 - KNN

<a id="4.5.1-Build-A-Model-With-Default-Parameters"></a>
# 4.5.1 - KNN - Build a model with default parameters

In [None]:
knn = neighbors.KNeighborsClassifier(n_neighbors = 4)
knn.fit(x_train, y_train)
labels_knn = knn.predict(x_test)
score_knn = knn.score(x_test, y_test)
print("Score (KNN): %f" % score_knn)
conf_mx_knn = confusion_matrix(y_test, labels_knn)
scores.append(score_knn)
#print("R2 Score %f " % r2_score(y_test, labels_knn))
#knn.score(x_test, y_test) , np.mean(labels_knn == y_test), (labels_knn == y_test).sum() / len(x_test), "R2 Score %f " % r2_score(y_test, labels_knn)

<a id="4.5.2-Parameters-Tuning"></a>
# 4.5.2 - KNN - Parameters tuning

In [None]:
params_knn = {'n_neighbors': [5,7,9,11,13,15,17,19,21],
              'algorithm' : ['auto', 'ball_tree', 'kd_tree', 'brute'],
              'weights' : ['uniform', 'distance']}
grid_search_knn = GridSearchCV(estimator = knn,
                           param_grid = params_knn,
                           scoring = 'accuracy',
                           cv = 5)
grid_search_knn = grid_search_knn.fit(x, y)
print(f'The best parameters for KNN are: "{grid_search_knn.best_params_}" and this model can explain the dataset with an accuracy of {str(np.round(grid_search_knn.best_score_ * 100,2))} %')
scores.append(grid_search_knn.best_score_)

<a id="4.5.3-Confusion-Matrix"></a>
# 4.5.3 - KNN - Confusion Matrix

In [None]:
DisplayConfusionMatrix(knn,"Confusion matrix for KNN model")

<a id="4.5.4-Importance-Of-Each-Feature"></a>
# 4.5.4 - KNN - Importance of each feature

In [None]:
#To be implemented

<a id="4.5.5-Cross-Validation"></a>
# 4.5.5 - KNN - Cross Validation

In [None]:
#Cross validation for KNN with default hyper parameters
knn_cv_default = cross_val_score(estimator = knn,
                             X = x, y = y,
                             cv = 10, scoring = 'accuracy')
score_knn_default_cv = knn_cv_default.mean()
print("Score (KNN default CV): %f" % score_knn_default_cv)
scores_cv.append(score_knn_default_cv)

<a id="5-Deep-Learning-TensorFlow-And-Keras"></a>
# 5 - Deep learning - Tensorflow and Keras

"Deep Learning com Python de A a Z - O Curso Completo" - https://www.udemy.com/course/deep-learning-com-python-az-curso-completo/
Udemy course from https://iaexpert.academy/



TBD - Parameteres tuning

<a id="5.1-Using-Test-Split"></a>
# 5.1 - Using test_split

In [None]:
classifier_split = Sequential()
#classifier_split.add(Dense(units = 16, activation = 'relu', kernel_initializer = 'random_uniform', input_dim = 29))
classifier_split.add(Dense(units = 8, activation = 'relu', kernel_initializer = 'normal', input_dim = 29))
#classifier_split.add(Dense(units = 16, activation = 'relu', kernel_initializer = 'random_uniform'))
classifier_split.add(Dense(units = 8, activation = 'relu', kernel_initializer = 'normal'))
classifier_split.add(Dense(units = 1, activation = 'sigmoid'))

otimizador = keras.optimizers.Adam(lr = 0.001, decay = 0.0001, clipvalue = 0.5)
classifier_split.compile(optimizer = otimizador, loss = 'binary_crossentropy',
                      metrics = ['binary_accuracy'])

# Fit model
classifier_split.fit(x_train, y_train,
                  batch_size = 10, epochs = 100, verbose = 0)
# Predict
labels_rn_split = classifier_split.predict(x_test)
labels_rn_split = (labels_rn_split > 0.5)

In [None]:
DisplayConfusionMatrix_2(labels_rn_split, 'Confusion Matrix for Neural Network - Using split')

In [None]:
precision = accuracy_score(y_test, labels_rn_split)
print(precision)
scores.append(precision)
resultado = classifier_split.evaluate(x_test, y_test)

<a id="5.2-Using-Cross-Validation"></a>
# 5.2 - Using cross validation

In [None]:
previsores = x
classe = y

def createNeuralNetwork():
    classifier_cv = Sequential()
    classifier_cv.add(Dense(units = 8, activation = 'relu', kernel_initializer = 'normal', input_dim = 29))
    #classifier_cv.add(Dense(units = 16, activation = 'relu', kernel_initializer = 'random_uniform', input_dim = 29))
    classifier_cv.add(Dropout(0.2))
    classifier_cv.add(Dense(units = 8, activation = 'relu', kernel_initializer = 'normal'))
    #classifier_cv.add(Dense(units = 16, activation = 'relu', kernel_initializer = 'random_uniform'))
    classifier_cv.add(Dropout(0.2))
    classifier_cv.add(Dense(units = 1, activation = 'sigmoid'))
    otimizador = keras.optimizers.Adam(lr = 0.001, decay = 0.0001, clipvalue = 0.5)
    classifier_cv.compile(optimizer = otimizador, loss = 'binary_crossentropy',
                      metrics = ['binary_accuracy'])
    return classifier_cv

In [None]:
classifier_cv = KerasClassifier(build_fn = createNeuralNetwork,
                                epochs = 100,
                                batch_size = 10, verbose = 0)
labels_rn_cv = cross_val_score(estimator = classifier_cv,
                             X = previsores, y = classe,
                             cv = 10, scoring = 'accuracy')

In [None]:
mean = labels_rn_cv.mean()
scores.append(mean)
scores_cv.append(mean)
stddev = labels_rn_cv.std()
print(mean)
print(stddev)

<a id="6-Conclusion"></a>
# 6 - Conclusion

In [None]:
models = ['SVM', 'SVM tunned','Decision tree','Decision tree tunned','Logistic regression','Logistic regression tunned', 'Random forest','Random forest tunned','KNN','KNN tunned', 'Neural network using split', 'Neural network using cross validation']
df_scores = pd.DataFrame({'Model': models,
                       'Score': scores}).sort_values(['Score', 'Model'],ascending = [False, True])
df_scores

In [None]:
#models_cv = ['SVM CV', 'SVM tunned CV','Decision tree CV','Decision tree tunned CV','Logistic regression CV','Logistic regression tunned CV', 'Random forest CV','Random forest tunned CV','KNN CV','KNN tunned CV', 'Neural network using CV']
models_cv = ['SVM CV','Decision tree CV','Logistic regression CV','Random forest CV','KNN CV','Neural network using CV']
df_scores_cv = pd.DataFrame({'Model CV': models_cv,
                       'Score CV': scores_cv}).sort_values(['Score CV', 'Model CV'],ascending = [False, True])
df_scores_cv

In [None]:
best_model = df_scores.iloc[0]
#type(best_model.Score)
print(f'We can conclude that "{best_model.Model}" model can explain this dataset with an accuracy of {str(np.round(best_model.Score * 100,2))} %')

<a id="7-Appendix"></a>
# 7 - Appendix

In [None]:
# Graphical representation for Decision tree
from sklearn.tree import export_graphviz
import graphviz

export_graphviz(tree, out_file="mytree.dot")
with open("mytree.dot") as f:
    dot_graph = f.read()
graphviz.Source(dot_graph)