The notebook contains examples of how to use SciKit Learn Neural Net MLPClassifier

The dataset is the Breast Cancer dataset from scikit-learn and is based on the content from the book:

[Introduction to Machine Learning with Python](https://www.amazon.com/Introduction-Machine-Learning-Python-Scientists/dp/1449369413/ref=sr_1_1?ie=UTF8&qid=1519586427&sr=8-1&keywords=introduction+to+machine+learning+with+python&dpID=51ZPksI0E9L&preST=_SX218_BO1,204,203,200_QL40_&dpSrc=srch)

by Andreas Muller and Sarah Guido.

The notebook covers the **Neural Networks (Deep Learning)** section starting on page 106


There are two types of Neural Networks models:

*MLPClassifier*

*MLPRegressor*


Neural Networks require that the features be scaled between -1 and 1 (mean of zero, and a standard deviation of 1 ) which can be accomplished by using the StandardScalar class.


The SciKit implementations are not as robust as others, *keras*, *lasagna*, *tensor flow*.



In [39]:
from sklearn.datasets import load_breast_cancer
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import cross_val_score
from sklearn.metrics import classification_report,confusion_matrix

In [2]:
cancer = load_breast_cancer()

In [5]:
# Print the maximum value for each feature
important_features = pd.DataFrame(list(zip(cancer.feature_names, cancer.data.max(axis=0))), columns=['Feature', 'Max Value']).sort_values('Max Value', ascending=False)
important_features.head(100)

Unnamed: 0,Feature,Max Value
23,worst area,4254.0
3,mean area,2501.0
13,area error,542.2
22,worst perimeter,251.2
2,mean perimeter,188.5
21,worst texture,49.54
1,mean texture,39.28
20,worst radius,36.04
0,mean radius,28.11
12,perimeter error,21.98


You can see from the table above, that the values are not scaled.  We will do that as part of the data preparation.

In [8]:
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, random_state=0)

In [9]:
mlp = MLPClassifier(random_state=42)
mlp.fit(X_train, y_train)

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=42, shuffle=True,
       solver='adam', tol=0.0001, validation_fraction=0.1, verbose=False,
       warm_start=False)

In [11]:
training_score = mlp.score(X_train, y_train)
testing_score = mlp.score(X_test, y_test)
print(f"Training Accuracy Score: {training_score}")
print(f"Testing Accuracy Score: {testing_score}")

Training Accuracy Score: 0.9061032863849765
Testing Accuracy Score: 0.8811188811188811


Without scaling the values, the model actually performs poorly.  Neural Networks expect all features to be scaled the same way. i.e. zero mean with a standard deviation of 1.

In [24]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.fit_transform(X_test)

In [25]:
X_train[0][0]

11.85

In [28]:
X_train_scaled[0][0]

-0.6507990659615951

In [30]:
mlp = MLPClassifier(random_state=42)
mlp.fit(X_train_scaled, y_train)



MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=42, shuffle=True,
       solver='adam', tol=0.0001, validation_fraction=0.1, verbose=False,
       warm_start=False)

In [31]:
training_score = mlp.score(X_train_scaled, y_train)
testing_score = mlp.score(X_test_scaled, y_test)
print(f"Training Accuracy Score: {training_score}")
print(f"Testing Accuracy Score: {testing_score}")

Training Accuracy Score: 0.9929577464788732
Testing Accuracy Score: 0.958041958041958


Now the accuracy is much better, but we received an error because we need to increase the maximum interations.

In [32]:
mlp = MLPClassifier(max_iter=1000, random_state=42)
mlp.fit(X_train_scaled, y_train)

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=1000, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=42, shuffle=True,
       solver='adam', tol=0.0001, validation_fraction=0.1, verbose=False,
       warm_start=False)

In [33]:
training_score = mlp.score(X_train_scaled, y_train)
testing_score = mlp.score(X_test_scaled, y_test)
print(f"Training Accuracy Score: {training_score}")
print(f"Testing Accuracy Score: {testing_score}")

Training Accuracy Score: 0.9953051643192489
Testing Accuracy Score: 0.958041958041958


Notice that the warning went away.  In the default settings for MLPClassifier, notice that alpha=0.0001.  We can try to decrease the model complexity to get better generalization performance.  If we increase alpha, that will add stronger regularization of the weights with the goal of better testing accuracy.

In [34]:
mlp = MLPClassifier(max_iter=1000, alpha=1, random_state=42)
mlp.fit(X_train_scaled, y_train)

MLPClassifier(activation='relu', alpha=1, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=1000, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=42, shuffle=True,
       solver='adam', tol=0.0001, validation_fraction=0.1, verbose=False,
       warm_start=False)

In [35]:
training_score = mlp.score(X_train_scaled, y_train)
testing_score = mlp.score(X_test_scaled, y_test)
print(f"Training Accuracy Score: {training_score}")
print(f"Testing Accuracy Score: {testing_score}")

Training Accuracy Score: 0.9882629107981221
Testing Accuracy Score: 0.972027972027972


Increasing the regularization, increasing alpha, has produced the best result so far..

In [38]:
mlp = MLPClassifier(max_iter=1000, alpha=1, random_state=42)
scaler = StandardScaler()
scores = cross_val_score(mlp, scaler.fit_transform(cancer.data), cancer.target)
print(scores)
print(scores.mean())

[0.98421053 0.96315789 0.97883598]
0.9754014666295369


In [42]:
mlp = MLPClassifier(max_iter=1000, alpha=1, random_state=42)
mlp.fit(X_train_scaled, y_train)
predictions = mlp.predict(X_test_scaled)

In [44]:
print(confusion_matrix(y_test,predictions))

[[51  2]
 [ 2 88]]


The confusion matrix tells us that:
51 True negative predictions where the patient did not have cancer.
2 False negatives predictions, where the model false predicted the patient did not have cancer when they did
2 False positive predictions, where the model falsely predicted the patient had cancer when they did not
88 True Positive predictions, where the model accurately predicted the patient had cancer. 

In [45]:
print(classification_report(y_test,predictions))

             precision    recall  f1-score   support

          0       0.96      0.96      0.96        53
          1       0.98      0.98      0.98        90

avg / total       0.97      0.97      0.97       143

