In this workshop, we get our hands dirty with neural networks! While implementing a neural network from scratch is not an easy task (check out ACM AI's advanced track!), Python libraries like Scikit make it easy to implement simple neural networks. Eventually, its probably a good idea to implement deep learning models with Tensor Flow or Pytorch, but Scikit is a pretty good place to start!

In [None]:
import pandas as pd
import numpy as np

data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/blood-transfusion/transfusion.data')
print(data.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 748 entries, 0 to 747
Data columns (total 5 columns):
 #   Column                                      Non-Null Count  Dtype
---  ------                                      --------------  -----
 0   Recency (months)                            748 non-null    int64
 1   Frequency (times)                           748 non-null    int64
 2   Monetary (c.c. blood)                       748 non-null    int64
 3   Time (months)                               748 non-null    int64
 4   whether he/she donated blood in March 2007  748 non-null    int64
dtypes: int64(5)
memory usage: 29.3 KB
None


In [None]:
data = data.to_numpy()
print(data)

[[    2    50 12500    98     1]
 [    0    13  3250    28     1]
 [    1    16  4000    35     1]
 ...
 [   23     3   750    62     0]
 [   39     1   250    39     0]
 [   72     1   250    72     0]]


In [None]:
np.random.seed(1)
# Exercise: Shuffle the data!
np.random.shuffle(data)
print(data)

[[   4    4 1000   43    1]
 [   4   10 2500   28    1]
 [   2    1  250    2    0]
 ...
 [   4   17 4250   71    1]
 [   8   10 2500   63    0]
 [   4    6 1500   16    1]]


In [None]:
# Print the seventh row of data
# print(data[6])
# Print the second column of the data
# print(data[:,1])
# Print the last column of the data
# print(data[:,-1])
# Print the 3rd element of the fifth row
# print(data[4][2])

# Print all columns except the first two
data[:,2:]

array([[1000,   43,    1],
       [2500,   28,    1],
       [ 250,    2,    0],
       ...,
       [4250,   71,    1],
       [2500,   63,    0],
       [1500,   16,    1]])

In [None]:
split1 = int(0.6*len(data))
split2 = int(0.8*len(data))
# Split the data into training, cross validation and testing data
train, cv, test = data[:split1,:], data[split1:split2,:], data[split2:,:]

In [None]:
# Exercise: Split train, cv and test into X and Y
trainX = train[:,:4] #train[:,:-1]
trainY = train[:,-1] #train[:,4]

cvX = cv[:,:4]
cvY = cv[:,-1] #cv[:,4]

testX = test[:,:4]
testY = test[:,-1] #test[:,4]

Before implementing any ML model, its always a good idea to standardize our data. This means that for every feature, we subtract the mean and divide by the standard deiviation.

In [None]:
#Clearly, we do not have zero mean and unit variance
print(trainX.var(axis=0))
print(trainX.mean(axis=0))

[5.43112046e+01 3.39849928e+01 2.12406205e+06 5.75997748e+02]
[   9.07589286    5.54017857 1385.04464286   34.18303571]


Let Python do the job of scaling for us!

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
# Scale the training data!
scaler.fit(trainX)
trainX = scaler.transform(trainX)

print("Variance: ", trainX.var(axis=0)) #every variance should be 1
print("Mean: ", trainX.mean(axis=0)) #every mean should be 0 or very close to 0

Variance:  [1. 1. 1. 1.]
Mean:  [-4.36159045e-17  6.34413157e-17 -2.77555756e-17 -6.34413157e-17]


We also scale the cross validation and testing data according to the mean and variance of the training data. 
(Discuss: Why don't we scale the cross validation data and testing data with their own mean and variance?)

In [None]:
print("Variance of cv: ", cvX.var(axis=0))
print("Mean of cv: ", cvX.mean(axis=0))
print("Variance of test: ", testX.var(axis=0))
print("Mean of test: ", testX.mean(axis=0))

Variance of cv:  [1.11711156e+02 2.58862667e+01 1.61789167e+06 5.32470400e+02]
Mean of cv:  [  10.71333333    5.02       1255.           32.64      ]
Variance of test:  [5.04291556e+01 4.19955556e+01 2.62472222e+06 6.99971600e+02]
Mean of test:  [   9.58666667    5.93333333 1483.33333333   36.22      ]


In [None]:
# Exercise: Scale cvX and testX as well

cvX = scaler.transform(cvX)
testX = scaler.transform(testX)


#New means and variances
#Note that this won't be all zeros and ones like in the previous case. Who wants to explain why?
print("Variance of cv: ", cvX.var(axis=0))
print("Mean of cv: ", cvX.mean(axis=0))
print("Variance of test: ", testX.var(axis=0))
print("Mean of test: ", testX.mean(axis=0))

Variance of cv:  [2.05687126 0.76169699 0.76169699 0.92443139]
Mean of cv:  [ 0.2221881  -0.08922958 -0.08922958 -0.06429328]
Variance of test:  [0.92852213 1.23570883 1.23570883 1.21523322]
Mean of test:  [0.06930809 0.06744037 0.06744037 0.08487368]


Its now time to decide the number of hidden layers and the number of neurons in each layer of our neural network. For now, we just create two hidden layers with 5 and 3 neurons respectively. This is an arbitrary choice. Later, we will see how to choose this appropriately. 

In [None]:
from sklearn.neural_network import MLPClassifier
from sklearn import metrics

# Initialize a neural network as described above
neural_net = MLPClassifier(hidden_layer_sizes=(5, 3), random_state=1,max_iter=1000)

In [None]:
# Fit the model with the training data
neural_net.fit(trainX,trainY)

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=False, epsilon=1e-08,
              hidden_layer_sizes=(5, 3), learning_rate='constant',
              learning_rate_init=0.001, max_fun=15000, max_iter=1000,
              momentum=0.9, n_iter_no_change=10, nesterovs_momentum=True,
              power_t=0.5, random_state=1, shuffle=True, solver='adam',
              tol=0.0001, validation_fraction=0.1, verbose=False,
              warm_start=False)

In [None]:
# Exercise:
# 1. Predict on the training data
# 2. Calculate the training accuracy
trainY_pred = neural_net.predict(trainX)
acc = metrics.accuracy_score(trainY, trainY_pred, normalize=True)
print('Neural Network:\t-- train acc %.3f' % acc)

Neural Network:	-- train acc 0.799


In [None]:
# Exercise:
# 1. Predict on the test data
# 2. Calculate the test accuracy
testY_pred = neural_net.predict(testX)
acc = metrics.accuracy_score(testY, testY_pred, normalize=True)
print('Neural Network:\t-- test acc %.3f' % acc)

Neural Network:	-- test acc 0.740


What is a MLP classifier? Who wants to explain why?

Okay, so we have a neural network up and running! Is that all? Not quite.. Ideally, we should play around with the inputs to the neural network function to determine what the optimal result should be. These inputs/variables are called the 'hyperparameters'. We could vary the number of layers, the number of neurons in each layer, the learning rate, the type of gradient descent algorithm, the maximum number of iterations and so on. To decide the optimal values, we make use of the cross validation set.

Play around with these parameters to see what combination of hyperparamaters gives the highest cross validation accuracy! Then train this model on the training data and look at its accuracy on the test data. This documentation describes the various hyperparameters: https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier

In [None]:
neural_net = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(7,), random_state=1,max_iter=1000)
neural_net.fit(trainX, trainY)
cv_pred = neural_net.predict(cvX)
acc = metrics.accuracy_score(cv_pred, cvY, normalize=True)
print('Cross validation accuracy: %.3f' % acc)

Cross validation accuracy: 0.860


Having found the hyperparameters that give the best accuracy, we use these hyperparameters to train our model and evaluate its performance on the test data.

In [None]:
neural_net.fit(trainX,trainY)

train_pred = neural_net.predict(trainX)
acc = metrics.accuracy_score(train_pred, trainY, normalize=True)
print('Training accuracy: %.3f' % acc)

test_pred = neural_net.predict(testX)
acc = metrics.accuracy_score(test_pred, testY, normalize=True)
print('Testing accuracy: %.3f' % acc)

Training accuracy: 0.819
Testing accuracy: 0.713


Of course, it's a pain to manually tune the hyperparamters to figure out what the best combination is. This is why we search across a range of possibilities for each hyperparameter and choose the best parameters. An example is given below, but feel free to search amonst more hyperparaters! Of course, we need to keep in mind that the more combinations we introduce, the longer it will take for the model to run. It is always helpful to look at the documentation of the model to figure out which hyperparameters you want to play around with.

In [None]:
best_acc = -1
opt_max_layers = None
opt_max_neurons = None
opt_lr = None
opt_solver = None

max_layers = 3
max_neurons_per_layer = 6
learning_rates = [0.01,0.001,0.0001]
solvers = ['adam','lbfgs']

import progressbar

for i in progressbar.progressbar(range(1,max_layers+1)):
  for j in range(2,max_neurons_per_layer):
    for lr in learning_rates:
      for solver in solvers:
        neural_net = MLPClassifier(solver=solver, alpha=lr, hidden_layer_sizes=(j,)*i, random_state=1,max_iter=1000)
        neural_net.fit(trainX,trainY) # train model based on training data
        cv_pred = neural_net.predict(cvX) # evaluate on cv
        acc = metrics.accuracy_score(cv_pred, cvY, normalize=True)
        if acc > best_acc:
          best_acc = acc
          opt_max_layers = i
          opt_max_neurons = j
          opt_lr = lr
          opt_solver = solver
print(opt_max_layers, opt_max_neurons, opt_lr, opt_solver)

100% (3 of 3) |##########################| Elapsed Time: 0:00:24 Time:  0:00:24


1 2 0.01 lbfgs


In [None]:
print(opt_max_layers, opt_max_neurons, opt_lr, opt_solver)
neural_net = MLPClassifier(solver=opt_solver, alpha=opt_lr, hidden_layer_sizes=(opt_max_neurons,)*opt_max_layers, random_state=1,max_iter=500)
neural_net.fit(trainX,trainY)

train_pred = neural_net.predict(trainX)
acc = metrics.accuracy_score(train_pred, trainY, normalize=True)
print('Training accuracy: %.3f' % acc)

test_pred = neural_net.predict(testX)
acc = metrics.accuracy_score(test_pred, testY, normalize=True)
print('Testing accuracy: %.3f' % acc)

1 2 0.01 lbfgs
Training accuracy: 0.819
Testing accuracy: 0.727


Once you're comfortable with this, you may want to search for 'GridSearchCV' that allows you to perform hyperparameter tuning without writing the for loops yourself.

Congrats! You've just built your first neural network and tuned its hyperparameters as well. You can see that the test accuracy after the hypertuning is higher that just any guess.

Thanks for coming out today! We would be super grateful if you could fill out our feedback for here: [tinyurl.com/applymlfeedbackthree](https://tinyurl.com/applymlfeedbackthree)