In this workshop, we get our hands dirty with neural networks! While implementing a neural network from scratch is not an easy task (check out ACM AI's advanced track!), Python libraries like Scikit make it easy to implement simple neural networks. Eventually, its probably a good idea to implement deep learning models with Tensor Flow or Pytorch, but Scikit is a pretty good place to start!

In [61]:
import pandas as pd
import numpy as np

data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/blood-transfusion/transfusion.data')
data = data.to_numpy()
np.random.seed(1)
np.random.shuffle(data)
split1 = int(0.6*len(data))
split2 = int(0.8*len(data))
train, cv, test = data[:split1,:], data[split1:split2,:], data[split2:,:]

trainX = train[:,:4]
trainY = train[:,-1]

cvX = cv[:,:4]
cvY = cv[:,-1]

testX = test[:,:4]
testY = test[:,-1]

Before implementing any ML model, its always a good idea to standardize our data. This means that for every feature, we subtract the mean and divide by the standard deiviation.

In [62]:
#Clearly, we do not have zero mean and unit variance
print(trainX.var(axis=0))
print(trainX.mean(axis=0))

[5.43112046e+01 3.39849928e+01 2.12406205e+06 5.75997748e+02]
[   9.07589286    5.54017857 1385.04464286   34.18303571]


Let Python do the job of scaling for us!

In [63]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(trainX)
trainX = scaler.transform(trainX)
print("Variance: ", trainX.var(axis=0)) #every variance should be 1
print("Mean: ", trainX.mean(axis=0)) #every mean should be 0 or very close to 0

Variance:  [1. 1. 1. 1.]
Mean:  [-4.36159045e-17  6.34413157e-17 -2.77555756e-17 -6.34413157e-17]


We also scale the cross validation and testing data according to the mean and variance of the training data. 
(Discuss: Why don't we scale the cross validation data and testing data with their own mean and variance?)

In [64]:
cvX = scaler.transform(cvX)
testX = scaler.transform(testX)

#New means and variances
#Note that this won't be all zeros and ones like in the previous case. Who wants to explain why?
print("Variance of cv: ", cvX.var(axis=0))
print("Mean of cv: ", cvX.mean(axis=0))
print("Variance of test: ", testX.var(axis=0))
print("Mean of test: ", testX.mean(axis=0))

Variance of cv:  [2.05687126 0.76169699 0.76169699 0.92443139]
Mean of cv:  [ 0.2221881  -0.08922958 -0.08922958 -0.06429328]
Variance of test:  [0.92852213 1.23570883 1.23570883 1.21523322]
Mean of test:  [0.06930809 0.06744037 0.06744037 0.08487368]


Its now time to decide the number of hidden layers and the number of neurons in each layer of our neural network. For now, we just create two hidden layers with 5 and 3 neurons respectively. This is an arbitrary choice. Later, we will see how to choose this appropriately. 

In [66]:
from sklearn.neural_network import MLPClassifier
from sklearn import metrics

neural_net = MLPClassifier(hidden_layer_sizes=(5, 3), random_state=1)
neural_net.fit(trainX,trainY)

train_pred = neural_net.predict(trainX)
acc = metrics.accuracy_score(train_pred, trainY, normalize=True)
print('Training accuracy: %.3f' % acc)

test_pred = neural_net.predict(testX)
acc = metrics.accuracy_score(test_pred, testY, normalize=True)
print('Testing accuracy: %.3f' % acc)

Training accuracy: 0.799
Testing accuracy: 0.733




Okay, so we have a neural network up and running! Is that all? Not quite.. Ideally, we should play around with the inputs to the neural network function to determine what the optimal result should be. These inputs/variables are called the 'hyperparameters'. We could vary the number of layers, the number of neurons in each layer, the learning rate, the type of gradient descent algorithm, the maximum number of iterations and so on. To decide the optimal values, we make use of the cross validation set.

Play around with these parameters to see what combination of hyperparamaters gives the highest cross validation accuracy! Then train this model on the training data and look at its accuracy on the test data. This documentation describes the various hyperparameters: https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier

In [67]:
neural_net = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(15,), random_state=1)
neural_net.fit(cvX, cvY)
cv_pred = neural_net.predict(cvX)
acc = metrics.accuracy_score(cv_pred, cvY, normalize=True)
print('Cross validation accuracy: %.3f' % acc)

Cross validation accuracy: 0.887


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


Having found the hyperparameters that give the best accuracy, we use these hyperparameters to train our model and evaluate its performance on the test data.

In [68]:
neural_net.fit(trainX,trainY)

train_pred = neural_net.predict(trainX)
acc = metrics.accuracy_score(train_pred, trainY, normalize=True)
print('Training accuracy: %.3f' % acc)

test_pred = neural_net.predict(testX)
acc = metrics.accuracy_score(test_pred, testY, normalize=True)
print('Testing accuracy: %.3f' % acc)

Training accuracy: 0.821
Testing accuracy: 0.720


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


Of course, it's a pain to manually tune the hyperparamters to figure out what the best combination is. This is why we search across a range of possibilities for each hyperparameter and choose the best parameters. An example is given below, but feel free to search amonst more hyperparaters! Of course, we need to keep in mind that the more combinations we introduce, the longer it will take for the model to run. It is always helpful to look at the documentation of the model to figure out which hyperparameters you want to play around with.

In [74]:
max_layers = 3
max_neurons_per_layer = 5 #every layer need not have the same number of neurons, of course!
learning_rates = [10**(-i) for i in range(1,7)]
solvers = ['lbfgs', 'sgd', 'adam']

best_acc = -1
opt_max_layers = None
opt_max_neurons = None
opt_lr = None
opt_solver = None

for i in range(1,max_layers+1):
  for j in range(2,max_neurons_per_layer):
    for lr in learning_rates:
      for solver in solvers:
        neural_net = MLPClassifier(solver=solver, alpha=lr, hidden_layer_sizes=(j,)*i, random_state=1,max_iter=500)
        neural_net.fit(cvX,cvY)
        acc = metrics.accuracy_score(cv_pred, cvY, normalize=True)
        if acc > best_acc:
          best_acc = acc
          opt_max_layers = i
          opt_max_neurons = j
          opt_lr = lr
          opt_solver = solver
print(opt_max_layers, opt_max_neurons, opt_lr, opt_solver)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


Done!
1 2 0.1 lbfgs




In [76]:
print(opt_max_layers, opt_max_neurons, opt_lr, opt_solver)
neural_net = MLPClassifier(solver=opt_solver, alpha=opt_lr, hidden_layer_sizes=(opt_max_neurons,)*opt_max_layers, random_state=1,max_iter=500)
neural_net.fit(trainX,trainY)

train_pred = neural_net.predict(trainX)
acc = metrics.accuracy_score(train_pred, trainY, normalize=True)
print('Training accuracy: %.3f' % acc)

test_pred = neural_net.predict(testX)
acc = metrics.accuracy_score(test_pred, testY, normalize=True)
print('Testing accuracy: %.3f' % acc)

1 2 0.1 lbfgs
Training accuracy: 0.808
Testing accuracy: 0.727


Once you're comfortable with this, you may want to search for 'GridSearchCV' that allows you to perform hyperparameter tuning without writing the for loops yourself.

Congrats! You've just built your first neural network and tuned its hyperparameters as well. You can see that the test accuracy after the hypertuning is higher that just any guess.