# Neural Networks

Nueral networks have been increasing in popularity due to their advantages over traditional machine learning models because of their flexibility and customisability.

A neural network basic element is called a perceptron, this receives inputs, multiplies them by weights and gives the result to an activation function (logistic, relu, etc..), and in turn this will produce an output.

A neural network is built by creating layers (levels) made up of these perceptrons. A neural network has three layers: input, hidden, and output. 

- Input layer: this is the layer that receives the data
- Output layer: this is the layer that returns a result
- Hidden layer/s: this can be more than one, and this is where computation takes place

## Predicting Diabetes

We are going to use the same dataset used for classification, to assign labels whether a person is diabetic or not.

### Loading the data

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv(fr'C:\Users\ivane\Desktop\ACI-3\data\diabetes.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15000 entries, 0 to 14999
Data columns (total 10 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   PatientID               15000 non-null  int64  
 1   Pregnancies             15000 non-null  int64  
 2   PlasmaGlucose           15000 non-null  int64  
 3   DiastolicBloodPressure  15000 non-null  int64  
 4   TricepsThickness        15000 non-null  int64  
 5   SerumInsulin            15000 non-null  int64  
 6   BMI                     15000 non-null  float64
 7   DiabetesPedigree        15000 non-null  float64
 8   Age                     15000 non-null  int64  
 9   Diabetic                15000 non-null  int64  
dtypes: float64(2), int64(8)
memory usage: 1.1 MB


In [3]:
df.head()

Unnamed: 0,PatientID,Pregnancies,PlasmaGlucose,DiastolicBloodPressure,TricepsThickness,SerumInsulin,BMI,DiabetesPedigree,Age,Diabetic
0,1354778,0,171,80,34,23,43.509726,1.213191,21,0
1,1147438,8,92,93,47,36,21.240576,0.158365,23,0
2,1640031,7,115,47,52,35,41.511523,0.079019,23,0
3,1883350,9,103,78,25,304,29.582192,1.28287,43,1
4,1424119,1,85,59,27,35,42.604536,0.549542,22,0


### Check data

Next step is to check the state of the data. We can obtain basic statistics.

In [6]:
df.describe()

Unnamed: 0,PatientID,Pregnancies,PlasmaGlucose,DiastolicBloodPressure,TricepsThickness,SerumInsulin,BMI,DiabetesPedigree,Age,Diabetic
count,15000.0,15000.0,15000.0,15000.0,15000.0,15000.0,15000.0,15000.0,15000.0,15000.0
mean,1502922.0,3.224533,107.856867,71.220667,28.814,137.852133,31.509646,0.398968,30.137733,0.333333
std,289253.4,3.39102,31.981975,16.758716,14.555716,133.068252,9.759,0.377944,12.089703,0.47142
min,1000038.0,0.0,44.0,24.0,7.0,14.0,18.200512,0.078044,21.0,0.0
25%,1252866.0,0.0,84.0,58.0,15.0,39.0,21.259887,0.137743,22.0,0.0
50%,1505508.0,2.0,104.0,72.0,31.0,83.0,31.76794,0.200297,24.0,0.0
75%,1755205.0,6.0,129.0,85.0,41.0,195.0,39.259692,0.616285,35.0,1.0
max,1999997.0,14.0,192.0,117.0,93.0,799.0,56.034628,2.301594,77.0,1.0


We can also check if there are any nulls.

In [5]:
df.nunique()

PatientID                 14895
Pregnancies                  15
PlasmaGlucose               149
DiastolicBloodPressure       90
TricepsThickness             69
SerumInsulin                663
BMI                       15000
DiabetesPedigree          14999
Age                          56
Diabetic                      2
dtype: int64

### Selecting Labels and Features

Now we can define the features, and the target label.

In [10]:
target = ['Diabetic']
exclude = ['PatientID']

predictors = list(set(list(df.columns)) - set(target) - set(exclude))

### Data Scaling

In neural networks it is ideal to convert all the values to the same range. This can be done using MinMaxScaler found in sklearn. It changes all the values from 0 to 1.

In [11]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
scaled_df = scaler.fit_transform(df[predictors])

scaled_df

array([[1.14649682e-02, 0.00000000e+00, 3.13953488e-01, ...,
        5.10511281e-01, 6.02150538e-01, 8.58108108e-01],
       [2.80254777e-02, 3.57142857e-02, 4.65116279e-01, ...,
        3.61229438e-02, 7.41935484e-01, 3.24324324e-01],
       [2.67515924e-02, 3.57142857e-02, 5.23255814e-01, ...,
        4.38385837e-04, 2.47311828e-01, 4.79729730e-01],
       ...,
       [5.47770701e-02, 5.35714286e-02, 4.18604651e-01, ...,
        1.56958511e-01, 6.98924731e-01, 3.31081081e-01],
       [1.87261146e-01, 3.57142857e-02, 1.27906977e-01, ...,
        1.00835769e-01, 7.95698925e-01, 5.94594595e-01],
       [6.34394904e-01, 2.32142857e-01, 4.65116279e-01, ...,
        3.11749422e-02, 4.40860215e-01, 4.72972973e-01]])

### Data Splitting

Now, we can split the data making sure to use the scaled data for the features.

In [12]:
from sklearn.model_selection import train_test_split

X = scaled_df
y = df[target[0]].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

print('Training set shape:', X_train.shape)
print('Testing set shape:', X_test.shape)

Training set shape: (10500, 8)
Testing set shape: (4500, 8)


### Training the Model

In this case we will be using a Neural Network classifier named MLPClassifier. It accepts a number of parameters, in this case:
- hidden_layer_sizes is set to (8,8,8) this means 3 hidden layers with 8 perceptrons in each node
- activation is set to 'relu', this determines the activation function 
- solver is set to 'adam', this is the solver for the weight optimization
- max_iter is set to 1000, this is the number of times the network iterates until it converges

For more information: **[MLPClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html)**

After the model is trained, we can predict using the test data set.

In [13]:
from sklearn.neural_network import MLPClassifier

mlp = MLPClassifier(hidden_layer_sizes=(8,8,8), activation='relu', solver='adam', max_iter=1000)
mlp.fit(X_train, y_train)

predictions = mlp.predict(X_test)

predictions = mlp.predict(X_test)

In [15]:
mlp2 = MLPClassifier(hidden_layer_sizes=(8,6,4), activation='relu', solver='adam', max_iter=1000)
mlp2.fit(X_train, y_train)

predictions2 = mlp2.predict(X_test)

### Evaluting the Model

After we predict the values we can use any metric we want to calculate the accuracy of the model. In this case a classification report, and a confusion matrix is created.

In [17]:
from sklearn.metrics import confusion_matrix, classification_report

print(confusion_matrix(y_test, predictions))
print(classification_report(y_test, predictions))

[[2773  213]
 [ 275 1239]]
              precision    recall  f1-score   support

           0       0.91      0.93      0.92      2986
           1       0.85      0.82      0.84      1514

    accuracy                           0.89      4500
   macro avg       0.88      0.87      0.88      4500
weighted avg       0.89      0.89      0.89      4500



## Repeated Cross-Validation

In [18]:
alpha = [0.001, 0.0001]
hidden_layer_sizes = [(8,8,8), (8,6,4)]
max_iter = [5000,1000]
solver = ['adam', 'lbfgs']

random_grid ={'alpha': alpha,
              'hidden_layer_sizes': hidden_layer_sizes,
              'max_iter': max_iter,
              'solver': solver}

print(random_grid)

{'alpha': [0.001, 0.0001], 'hidden_layer_sizes': [(8, 8, 8), (8, 6, 4)], 'max_iter': [5000, 1000], 'solver': ['adam', 'lbfgs']}


In [20]:
from sklearn.model_selection import RandomizedSearchCV
from sklearn.neural_network import MLPClassifier

mlp = MLPClassifier()
mlp_random = RandomizedSearchCV(estimator = mlp, param_distributions = random_grid, n_iter = 10, cv = 3, verbose=5, random_state=1)

mlp_random.fit(X_train, y_train)

Fitting 3 folds for each of 10 candidates, totalling 30 fits
[CV 1/3] END alpha=0.001, hidden_layer_sizes=(8, 8, 8), max_iter=1000, solver=lbfgs;, score=0.788 total time=   1.2s
[CV 2/3] END alpha=0.001, hidden_layer_sizes=(8, 8, 8), max_iter=1000, solver=lbfgs;, score=0.866 total time=   6.6s
[CV 3/3] END alpha=0.001, hidden_layer_sizes=(8, 8, 8), max_iter=1000, solver=lbfgs;, score=0.891 total time=   7.4s
[CV 1/3] END alpha=0.0001, hidden_layer_sizes=(8, 6, 4), max_iter=5000, solver=lbfgs;, score=0.853 total time=  14.5s
[CV 2/3] END alpha=0.0001, hidden_layer_sizes=(8, 6, 4), max_iter=5000, solver=lbfgs;, score=0.871 total time=   9.9s
[CV 3/3] END alpha=0.0001, hidden_layer_sizes=(8, 6, 4), max_iter=5000, solver=lbfgs;, score=0.780 total time=   5.6s
[CV 1/3] END alpha=0.001, hidden_layer_sizes=(8, 6, 4), max_iter=1000, solver=lbfgs;, score=0.854 total time=   1.0s


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


[CV 2/3] END alpha=0.001, hidden_layer_sizes=(8, 6, 4), max_iter=1000, solver=lbfgs;, score=0.873 total time=   6.2s


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


[CV 3/3] END alpha=0.001, hidden_layer_sizes=(8, 6, 4), max_iter=1000, solver=lbfgs;, score=0.888 total time=   6.4s
[CV 1/3] END alpha=0.001, hidden_layer_sizes=(8, 8, 8), max_iter=1000, solver=adam;, score=0.885 total time=   4.0s
[CV 2/3] END alpha=0.001, hidden_layer_sizes=(8, 8, 8), max_iter=1000, solver=adam;, score=0.861 total time=   2.6s
[CV 3/3] END alpha=0.001, hidden_layer_sizes=(8, 8, 8), max_iter=1000, solver=adam;, score=0.897 total time=   5.4s
[CV 1/3] END alpha=0.001, hidden_layer_sizes=(8, 6, 4), max_iter=1000, solver=adam;, score=0.866 total time=   5.3s
[CV 2/3] END alpha=0.001, hidden_layer_sizes=(8, 6, 4), max_iter=1000, solver=adam;, score=0.867 total time=   2.9s
[CV 3/3] END alpha=0.001, hidden_layer_sizes=(8, 6, 4), max_iter=1000, solver=adam;, score=0.884 total time=   4.6s
[CV 1/3] END alpha=0.0001, hidden_layer_sizes=(8, 8, 8), max_iter=1000, solver=adam;, score=0.884 total time=   3.3s
[CV 2/3] END alpha=0.0001, hidden_layer_sizes=(8, 8, 8), max_iter=1000

In [21]:
from sklearn.metrics import accuracy_score, f1_score

print('Best Estimator: ', mlp_random.best_estimator_)
print('Best Parameters: ', mlp_random.best_params_)

Best Estimator:  MLPClassifier(alpha=0.001, hidden_layer_sizes=(8, 8, 8), max_iter=5000)
Best Parameters:  {'solver': 'adam', 'max_iter': 5000, 'hidden_layer_sizes': (8, 8, 8), 'alpha': 0.001}


In [22]:
predicted = mlp_random.predict(X_test)

print('Accuracy Score: {}'.format(accuracy_score(y_test, predicted)))
print('F1 Score: {}'.format(f1_score(y_test, predicted)))

Accuracy Score: 0.892
F1 Score: 0.8397097625329816


In [23]:
print(confusion_matrix(y_test, predicted))
print(classification_report(y_test, predicted))

[[2741  245]
 [ 241 1273]]
              precision    recall  f1-score   support

           0       0.92      0.92      0.92      2986
           1       0.84      0.84      0.84      1514

    accuracy                           0.89      4500
   macro avg       0.88      0.88      0.88      4500
weighted avg       0.89      0.89      0.89      4500

