# Chapter-8: Artificial Neural Network

Importing required packages and libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as sm
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix as cm
import random

In [None]:
!pip install neurolab

In [None]:
import neurolab as nl

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
github_link="https://raw.githubusercontent.com/venkatareddykonasani/ML_DL_py_TF/master/Chapter8_ANN/Datasets/"

## Decision Boundary
The logistic regression line gives us the predicted values between 0 and 1 as the final output. For a new point, if the predicted value is 0.95, then we can consider the predicted class as class-1 if the predicted value is 0.05, then we consider the predicted class as class-0. Usually, we set a threshold at 0.5. If the predicted values are below 0.5, then we classify them as class-0 and rest all classified as class-1. Through logistic regression line looks like an “S” shaped curve, when it comes to decision making, we use it as a decision boundary that separates class-0 and class-1.

We will see example of decision Boundary:

Employee purchase data has three columns. Employee Age, Experience and whether they have purchased the product or not. The product is related to insurance. The objective is to predict the target variable “purchase” by using Age and Experience as predictor variables. We will build a logistic regression line. However, we are interested in the decision boundary after the creation of logistic regression. For this demo purpose, we will use a subset of the data. We will use the complete data in the later sections.

In [None]:
#Emp_Purchase_raw = pd.read_csv(r"/content/drive/My Drive/DataSets/Chapter-8/Chapter-8/datasets/Emp_Purchase/Emp_Purchase.csv")
Emp_Purchase_raw = pd.read_csv(github_link+"/Emp_Purchase/Emp_Purchase.csv")

In [None]:
Emp_Purchase1=Emp_Purchase_raw[Emp_Purchase_raw.Sample_Set<3]
print(Emp_Purchase1.shape)
print(Emp_Purchase1.columns.values)
print(Emp_Purchase1.head(10))

In [None]:
Emp_Purchase1.Purchase.value_counts()

There are 74 records in this subset. We later use Age, Experience to predict purchase. We will now draw the graph of the data that shows the relation between the predictor and target variables. The below code helps us in plotting the data of all the three columns.

In [None]:
fig = plt.figure()
ax1 = fig.add_subplot(111)
plt.rcParams["figure.figsize"] = (8,6)
plt.title('Age, Experience  vs Purchase', fontsize=20)

ax1.scatter(Emp_Purchase1.Age[Emp_Purchase1.Purchase==0],Emp_Purchase1.Experience[Emp_Purchase1.Purchase==0], s=100, c='b', marker="o", label='Purchase 0')
ax1.scatter(Emp_Purchase1.Age[Emp_Purchase1.Purchase==1],Emp_Purchase1.Experience[Emp_Purchase1.Purchase==1], s=100, c='r', marker="x", label='Purchase 1')
ax1.set_xlabel('Age',fontsize=15)
ax1.set_ylabel('Experience',fontsize=15)

plt.xlim(min(Emp_Purchase1.Age), max(Emp_Purchase1.Age))
plt.ylim(min(Emp_Purchase1.Experience), max(Emp_Purchase1.Experience))
plt.legend(loc='upper left');

plt.show()

Please note that we need not draw these graphs while solving the actual problems. We are drawing
them here to get a better visual intuition.

From the output plot, we can see both the classes of the output. Purchase =0 and Purchase=1. We will now build the logistic regression. Derive the decision boundary then draw the decision boundary on top of this plot. We can already make a guess where the logistic regression is going to appear.

In [None]:
model1 = sm.logit(formula='Purchase ~ Age+Experience', data=Emp_Purchase1)
fitted1 = model1.fit()
print(fitted1.summary2())

In [None]:
predicted_values=fitted1.predict(Emp_Purchase1[["Age"]+["Experience"]])
predicted_values[1:10]
threshold=0.5

In [None]:
import numpy as np
predicted_class=np.zeros(predicted_values.shape)
predicted_class[predicted_values>threshold]=1

In [None]:
predicted_class

In [None]:
from sklearn.metrics import confusion_matrix as cm
ConfusionMatrix = cm(Emp_Purchase1[['Purchase']],predicted_class)
print(ConfusionMatrix)
accuracy=(ConfusionMatrix[0,0]+ConfusionMatrix[1,1])/sum(sum(ConfusionMatrix))
print('Accuracy : ',accuracy)
error=1-accuracy
print('Error: ',error)

We will find the coefficients

In [None]:
slope1=fitted1.params[1]/(-fitted1.params[2])
intercept1=fitted1.params[0]/(-fitted1.params[2])

Now we will finally draw decision boundary for this regression model

In [None]:
fig = plt.figure()
ax1 = fig.add_subplot(111)
plt.rcParams["figure.figsize"] = (8,6)
plt.title('Decision Boundary', fontsize=20)

ax1.scatter(Emp_Purchase1.Age[Emp_Purchase1.Purchase==0],Emp_Purchase1.Experience[Emp_Purchase1.Purchase==0], s=100, c='b', marker="o", label='Purchase 0')
ax1.scatter(Emp_Purchase1.Age[Emp_Purchase1.Purchase==1],Emp_Purchase1.Experience[Emp_Purchase1.Purchase==1], s=100, c='r', marker="x", label='Purchase 1')
ax1.set_xlabel('Age',fontsize=15)
ax1.set_ylabel('Experience',fontsize=15)

plt.xlim(min(Emp_Purchase1.Age), max(Emp_Purchase1.Age))
plt.ylim(min(Emp_Purchase1.Experience), max(Emp_Purchase1.Experience))
plt.legend(loc='upper left');

x_min, x_max = ax1.get_xlim()
ax1.plot([0, x_max], [intercept1, x_max*slope1+intercept1])

plt.show()

From the output, we can see the decision boundary created by the logistic regression line. As expected, the decision boundary is between the two classes. That concludes this section. Finally, the takeaway is, “Every logistic regression line creates a decision boundary; it looks like a straight line between two classes.” 

## Multiple decision boundary
The creation of the decision boundary works correctly when the two classes in the target variable separable with a straight line. Not every dataset has a clear separating boundary between them. Suppose if we have data which cannot be divided in two classes using single decision boundary then a logistic regression line may fail in this case. All the cases where the separating boundary is non-linear or when we need more than one decision boundary, logistic regression fails. In the above example, we considered a subset of the data. We will now look at the full data. Below code helps us in plotting the data 

### Problem

In [None]:
fig = plt.figure()
ax1 = fig.add_subplot(111)
plt.rcParams["figure.figsize"] = (8,6)
plt.title('Age, Experience  vs Purchase - Overall Data', fontsize=20)


ax1.scatter(Emp_Purchase_raw.Age[Emp_Purchase_raw.Purchase==0],Emp_Purchase_raw.Experience[Emp_Purchase_raw.Purchase==0], s=100, c='b', marker="o", label='Purchase 0')
ax1.scatter(Emp_Purchase_raw.Age[Emp_Purchase_raw.Purchase==1],Emp_Purchase_raw.Experience[Emp_Purchase_raw.Purchase==1], s=100, c='r', marker="x", label='Purchase 1')
ax1.set_xlabel('Age',fontsize=15)
ax1.set_ylabel('Experience',fontsize=15)

plt.xlim(min(Emp_Purchase_raw.Age), max(Emp_Purchase_raw.Age))
plt.ylim(min(Emp_Purchase_raw.Experience), max(Emp_Purchase_raw.Experience))
plt.legend(loc='upper left');
plt.show()

We will now force-fit a logistic regression line to this data and try to draw the decision boundary. In the earlier case, we could easily guess where the decision boundary will end up, now we can not guess it. Logistic regression does not work here. We will force-fit one and see the results.

In [None]:
model = sm.logit(formula='Purchase ~ Age+Experience', data=Emp_Purchase_raw)
fitted = model.fit()
print(fitted.summary2())

Getting slope and intercept of the logistic line

In [None]:
slope=fitted.params[1]/(-fitted.params[2])
intercept=fitted.params[0]/(-fitted.params[2])

In [None]:
predicted_values=fitted.predict(Emp_Purchase_raw[["Age"]+["Experience"]])
predicted_values[1:10]

In [None]:
threshold=0.5
threshold

In [None]:
import numpy as np
predicted_class=np.zeros(predicted_values.shape)
predicted_class[predicted_values>threshold]=1

In [None]:
predicted_class[1:10]

Creating confusion matrix

In [None]:
from sklearn.metrics import confusion_matrix as cm
ConfusionMatrix = cm(Emp_Purchase_raw[['Purchase']],predicted_class)
print(ConfusionMatrix)
accuracy=(ConfusionMatrix[0,0]+ConfusionMatrix[1,1])/sum(sum(ConfusionMatrix))
print(accuracy)

In [None]:
error=1-accuracy
error

In [None]:
fig = plt.figure()
ax1 = fig.add_subplot(111)
plt.rcParams["figure.figsize"] = (8,6)
plt.title('Decision Boundary - Overall Data', fontsize=20)

ax1.scatter(Emp_Purchase_raw.Age[Emp_Purchase_raw.Purchase==0],Emp_Purchase_raw.Experience[Emp_Purchase_raw.Purchase==0], s=100, c='b', marker="o", label='Purchase 0')
ax1.scatter(Emp_Purchase_raw.Age[Emp_Purchase_raw.Purchase==1],Emp_Purchase_raw.Experience[Emp_Purchase_raw.Purchase==1], s=100, c='r', marker="x", label='Purchase 1')
plt.xlim(min(Emp_Purchase_raw.Age), max(Emp_Purchase_raw.Age))
plt.ylim(min(Emp_Purchase_raw.Experience), max(Emp_Purchase_raw.Experience))
plt.legend(loc='upper left');
ax1.set_xlabel('Age',fontsize=15)
ax1.set_ylabel('Experience',fontsize=15)

x_min, x_max = ax1.get_xlim()
ax1.plot([0, x_max], [intercept, x_max*slope+intercept],linewidth=5, c='r')
plt.show()

From the output, we can see that the decision boundary at the bottom left side, near the origin. It is
visibly apparent that this decision boundary cannot separate the classes.

### Solution
If there are multiple decision boundaries in data, then directly predicting the target with input variables does not work. By looking at the data, we can see some patterns, and we are sure that we can have a better classification model. In the overall data plot, if we consider only region-1(R1), then logistic regression works perfectly to separate the two classes. Same way, if we consider region-2(R2), then also logistic regression does the best job of separating both the classes. We will build logistic regression model-1 for region-1. We get the predicted values from that model. These precited values will be our intermediate output h 1 . Similarly, we will build another model for region-2; this will be our second intermediate output h 2 . Finally, we will use these intermediate outputs h 1 and h 2 for prediction of y. Instead of building one logistic regression line x 1 x 2 vs y, we are now building three logistic regression lines. They are x 1, x 2 vs h 1 , x 1, x 2 vs h 2 , and finally h 1, h 2 vs y. Since we are changing the region, the values of x 1 and x 2 are different in the intermediate models h 1 and h 2

### Building intermediate output models

**h1 model**

In [None]:
Emp_Purchase1=Emp_Purchase_raw[Emp_Purchase_raw.Sample_Set<3]
model1 = sm.logit(formula='Purchase ~ Age+Experience', data=Emp_Purchase1)
fitted1 = model1.fit()

In [None]:
Emp_Purchase_raw['h1']=fitted1.predict(Emp_Purchase_raw[["Age"]+["Experience"]])

**h2 model**

In [None]:
Emp_Purchase2=Emp_Purchase_raw[Emp_Purchase_raw.Sample_Set>1]
model2 = sm.logit(formula='Purchase ~ Age+Experience', data=Emp_Purchase2)
fitted2 = model2.fit(method="bfgs")

In [None]:
Emp_Purchase_raw['h2']=fitted2.predict(Emp_Purchase_raw[["Age"]+["Experience"]])

In [None]:
print(Emp_Purchase_raw[['Age', 'Experience','h1','h2','Purchase']])

In the above output, we can see the values of h1 and h2. These are the predictions made from the
logistic regression line. Before we go ahead with model building, we will plot h1,h2 vs. target
variable. Below code helps us in plotting the target against h1 and h2

In [None]:
fig = plt.figure()
ax = fig.add_subplot(111)
plt.rcParams["figure.figsize"] = (8,6)
plt.title('h1, h2 vs target ', fontsize=20)

ax.scatter(Emp_Purchase_raw.h1[Emp_Purchase_raw.Purchase==0],Emp_Purchase_raw.h2[Emp_Purchase_raw.Purchase==0], s=100, c='b', marker="o", label='Purchase 0')
ax.scatter(Emp_Purchase_raw.h1[Emp_Purchase_raw.Purchase==1],Emp_Purchase_raw.h2[Emp_Purchase_raw.Purchase==1], s=100, c='r', marker="x", label='Purchase 1')
ax.set_xlabel('h1',fontsize=15)
ax.set_ylabel('h2',fontsize=15)

plt.xlim(min(Emp_Purchase_raw.h1), max(Emp_Purchase_raw.h1)+0.2)
plt.ylim(min(Emp_Purchase_raw.h2), max(Emp_Purchase_raw.h2)+0.2)

plt.legend(loc='lower left');
plt.show()

In the plot, we can see the input variables h1 and h2 can classify the target variable y. We can draw
one straight line that separates clas-0 and class-1. If we build a logistic regression line that it may
appear on the top right corner like a diagonal line. Let us build and draw the decision boundary.

In [None]:
model_combined = sm.logit(formula='Purchase ~ h1+h2', data=Emp_Purchase_raw)
fitted_combined = model_combined.fit(method="bfgs")
print(fitted_combined.summary())

**Logistic Regerssion model with Intermediate outputs as input**

In [None]:
slope_combined=fitted_combined.params[1]/(-fitted_combined.params[2])
intercept_combined=fitted_combined.params[0]/(-fitted_combined.params[2])

Finally draw the decision boundary for this logistic regression model

In [None]:
fig = plt.figure()
ax2 = fig.add_subplot(111)
plt.rcParams["figure.figsize"] = (8,7)
plt.title('h1, h2 vs target ', fontsize=20)

ax2.scatter(Emp_Purchase_raw.h1[Emp_Purchase_raw.Purchase==0],Emp_Purchase_raw.h2[Emp_Purchase_raw.Purchase==0], s=100, c='b', marker="o", label='Purchase 0')
ax2.scatter(Emp_Purchase_raw.h1[Emp_Purchase_raw.Purchase==1],Emp_Purchase_raw.h2[Emp_Purchase_raw.Purchase==1], s=100, c='r', marker="x", label='Purchase 1')
ax2.set_xlabel('h1',fontsize=15)
ax2.set_ylabel('h2',fontsize=15)

plt.xlim(min(Emp_Purchase_raw.h1), max(Emp_Purchase_raw.h1)+0.2)
plt.ylim(min(Emp_Purchase_raw.h2), max(Emp_Purchase_raw.h2)+0.2)

plt.legend(loc='lower left');

x_min, x_max = ax2.get_xlim()
y_min,y_max=ax2.get_ylim()
ax2.plot([x_min, x_max], [x_min*slope_combined+intercept_combined, x_max*slope_combined+intercept_combined],linewidth=4)
plt.show()

The decision boundary is as expected by us. We have now solved the problem of multiple decision
boundaries using the intermediate output models. We have transformed the x1, x2 vs. y data into a
different space of h1, h2 vs. y. In the first case, x1, x2 vs. y, we could not separate a straight line
decision boundary. When these input variables are transformed into intermediate variables, we can
now find the separating boundary.

**Accuracy and error of the model1**

In [None]:
predicted_values=fitted_combined.predict(Emp_Purchase_raw[["h1"]+["h2"]])
predicted_values[1:10]

In [None]:
threshold=0.5
threshold

In [None]:
import numpy as np
predicted_class=np.zeros(predicted_values.shape)
predicted_class[predicted_values>threshold]=1

In [None]:
from sklearn.metrics import confusion_matrix as cm
ConfusionMatrix = cm(Emp_Purchase_raw[['Purchase']],predicted_class)
print("ConfusionMatrix\n", ConfusionMatrix)
accuracy=(ConfusionMatrix[0,0]+ConfusionMatrix[1,1])/sum(sum(ConfusionMatrix))
print("accuracy\n", accuracy)

We finally classified y with high accuracy. We could achieve above 90% accuracy with three models.
If the amount of non-linearity is more, then we may need more of these intermediate models in the
first layer. We need two decision boundaries for the classification of this data. Below code helps us
in visualizing the decision boundaries from the two intermediate output models h 1 and h 2

In [None]:
slope1=fitted1.params[1]/(-fitted1.params[2])
intercept1=fitted1.params[0]/(-fitted1.params[2])

In [None]:
slope2=fitted2.params[1]/(-fitted2.params[2])
intercept2=fitted2.params[0]/(-fitted2.params[2])

In [None]:
fig = plt.figure()
ax1 = fig.add_subplot(111)
plt.rcParams["figure.figsize"] = (8,6)
plt.title('Age, Experience  vs Purchase - Overall Data', fontsize=20)


ax1.scatter(Emp_Purchase_raw.Age[Emp_Purchase_raw.Purchase==0],Emp_Purchase_raw.Experience[Emp_Purchase_raw.Purchase==0], s=100, c='b', marker="o", label='Purchase 0')
ax1.scatter(Emp_Purchase_raw.Age[Emp_Purchase_raw.Purchase==1],Emp_Purchase_raw.Experience[Emp_Purchase_raw.Purchase==1], s=100, c='r', marker="x", label='Purchase 1')
ax1.set_xlabel('Age',fontsize=15)
ax1.set_ylabel('Experience',fontsize=15)

plt.xlim(min(Emp_Purchase_raw.Age), max(Emp_Purchase_raw.Age))
plt.ylim(min(Emp_Purchase_raw.Experience), max(Emp_Purchase_raw.Experience))

x_min, x_max = ax1.get_xlim()
ax1.plot([0, x_max], [intercept1, x_max*slope1+intercept1],linewidth=4)
ax1.plot([0, x_max], [intercept2, x_max*slope2+intercept2],linewidth=4)

plt.legend(loc='upper left');
plt.show()

From the output, we can see that the two models in the layer-1 do an excellent job of finding the
overall boundaries between the two classes. That completes our final point “We used layered
modeling approach and created intermediate outputs to solve the problem of non-linear data or a
dataset with multiple decision boundaries.”

## Neural Network algorithm
* Step-1 Random Initialization
* Step-2 Activation and Feed Forward
* Step-3 Error Calculation and Back Propagation
* Step-4: Weights updating
* Step-5: Stopping Criteria


## Gradient Descent


In [None]:
def lr_gd(X, y, w1, w0, learning_rate, epochs):
     for i in range(epochs):
          y_pred = (w1 * X) + w0
          error = sum([k**2 for k in (y-y_pred)])
          
          ##Gradients
          w0_gradient = -sum(y - y_pred)
          w1_gradient = -sum(X * (y - y_pred))
          
          ##Weight Updating
          w0 = w0 - (learning_rate * w0_gradient)
          w1 = w1 - (learning_rate * w1_gradient)
          
          print("epoch", i, "error =>", round(error,2), "w0 => ", round(w0,2), "w1 => ",round(w1,2))
     return error, w0, w1

We will use the above function to solve this regression problem below. We will generate some
random data using the formula . This means that after solving this regression line, we should get the
weights as the output. Below is the code for data creation and solving it.

In [None]:
x_data=np.random.random(10)
y_data= x_data*20 + 10 

In [None]:
w0_init=5
w1_init=10

In [None]:
lr_gd(X=x_data, y=y_data, w1=w1_init, w0=w0_init, learning_rate=0.01, epochs=600)	 

We can see from the output that at 600 epochs the error is almost zero and the final weights given
by GD are [10.21,19.29]

## Recognizing Handwritten Digits
In this case study, we will take the images of handwritten digits and build an aneural network model
that takes the input as these handwritten digits and builds a model to predict the number inside that
image.Below is the code for importing an
image and printing the pixel values.

In [None]:
#x=plt.imread(r'/content/drive/My Drive/DataSets/Chapter-8/Chapter-8/datasets/Sample_images/Marketvegetables.jpg')

In [None]:
#Image importing
import matplotlib.pyplot as plt
import urllib.request  

#read the image
urllib.request.urlretrieve((github_link+"/Sample_images/Marketvegetables.jpg"), "Marketvegetables.jpg")
x=plt.imread('Marketvegetables.jpg')

In [None]:
plt.rcParams["figure.figsize"] = (12,8)
plt.imshow(x)

In [None]:
print('Shape of the image',x.shape) 
print(x)

From the output, we can see that the image has 2400 rows; those are the number of pixels in the
height of the image. One thousand six hundred columns are the number of pixels in width. The
depth is 3, and this number three corresponds to RGB intensities. If we print the pixel values of the
image, we can see each row by row values and in each cell, there are three numbers between 0 and
255.

In our dataset, we are considering grayscale images. The grayscale images have length and width but
they do not have three values in depth. There will be one number in the color dimension for
greyscale images. The objective in this case study is to take the grayscale image as input and predict
the number inside it using the neural network model. Taking images as input indeed means that
taking pixel intensity numbers as input to predict the numbers.

### Data
The data we are considering here has 16X16 pixels. Overall, 256 pixels in each image. Small size
image still we have to work with 256 input variables. In our model, we will have 256 input variables .

Our data is a standard dataset known as USPS data; it stands for United Staes Postal Services data.
This data contains 7,291 scanned images of handwritten digits, and these images are converted to
CSV files by taking the pixel intensities.

AT&amp;T research labs shared the USPS data set. The other standard digits dataset is MNIST data with
28X28 pixels and 60,000 records. We will use this data later. We will use USPS data in this case
study. We can download these datasets from Dr. Yann Le Cunn website
http://yann.lecun.com/exdb/mnist/
Below code helps in importing the data

USPS Data importing

In [None]:
#digits_data_raw = np.loadtxt(r"/content/drive/My Drive/DataSets/Chapter-8/Chapter-8/datasets/USPS/USPS_train.txt")
digits_data_raw = np.loadtxt(github_link+"/USPS/USPS_train.txt")

Input data is in nparry format. we convert it into dataframe for better handling


In [None]:
digits_data=pd.DataFrame(digits_data_raw)

In [None]:
print(digits_data.shape)

In [None]:
print(digits_data.head())

As expected, there are 256-pixel values. The extra column is the target column — the label
associated with each image. For example, the label for the image in the first row is 6. The labels are
stored in the first column. The code below gives us the frequency of each label in the dataset.

In [None]:
print(digits_data[0:][0].value_counts())

From the above output, we can see that there are more than 1000 combinations of image 0 similarly
around 1000 combinations of image-1., We will now draw a few images. Let us see few images.
While creating this dataset, the 16X16 image is flattened to make it as a single row in the data. We
need to take a row from this data and build the pixel matrix of size 16X16. The below code helps us
in drawing images.

First image

In [None]:
i=0
data_row=digits_data_raw[i][1:]
pixels = np.matrix(data_row)
pixels=pixels.reshape(16,16)
plt.title(["Row number ", i] , fontsize=20)
plt.imshow(pixels, cmap='Greys')

Second image

In [None]:
i=1
data_row=digits_data_raw[i][1:]
pixels = np.matrix(data_row)
pixels=pixels.reshape(16,16)
plt.title(["Row number ", i] , fontsize=20)
plt.imshow(pixels, cmap='Greys')

In the above code, ‘i’ is the row number. By changing the value row number(i) in the above code, we
can draw a few more images.

In [None]:
i=5000 
data_row=digits_data_raw[i][1:]
pixels = np.matrix(data_row)
pixels=pixels.reshape(16,16)
plt.title(["Row number ", i] , fontsize=20)
plt.imshow(pixels, cmap='Greys')

We will now prepare the data for model building

Train and Test data creation

In [None]:
X=digits_data.drop(digits_data.columns[[0]], axis=1)
y=digits_data[0:][0]

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y ,test_size=0.2, random_state=33)

Shape of the data

In [None]:
print("X_train shape", X_train.shape)
print("y_train shape", y_train.shape)
print("X_test shape", X_test.shape)
print("y_test shape", y_test.shape)

### Model building
Before building the model, we need an essential data transformation. Till now, we discussed just one
binary output in our target variable. Here our output is not binary. There are ten classes in our
output. An image can contain any number between [0,9]. The output is multi-class and we are going
to build a multi-class classification model. We can extend our binary target classification concept to
multiclass classification.

There are ten classes in the target, and we are going to create one binary variable for each class. It is a
simple one-hot encoding on the target variable. The below code helps us in creating the one-hot
encoded variables for the target.

Creating multiple binary columns for multiple outputs

In [None]:
digit_labels=pd.DataFrame()

Convert target into onehot encoding

In [None]:
digit_labels = pd.get_dummies(y_train)

see our newly created labels data

In [None]:
digit_labels.head(10)

In the above output we can see ten newly created binary variables derived from one target column.
Since there are ten nodes in the output layer, for each new data point, we will get ten predicted
values. We will assign the class that has the maximum probability. We now have the input layer with
256 nodes and an output layer with ten nodes. We are now ready to build the model. Before
building the neural network model, we need to create a list with minimum and maximum values of
all the input variables. We are going to use this list later on in the model. Below is the code for
creating a list with 256 pairs of values one pair for each of the input variables.

getting minimum and maximum of each column of x_train into a list

In [None]:
min_max_all_cols=[[X_train[i][0:].min(), X_train[i][0:].max()] for i in range(1,X_train.shape[1]+1)]

In [None]:
print(len(min_max_all_cols))
print(min_max_all_cols)


Below is the code for building the neural network model.

**Configure the network**

In [None]:
import neurolab as nl
net = nl.net.newff(minmax=min_max_all_cols,size=[20,10],transf=[nl.trans.LogSig()]*2)
#Training method is Resilient Backpropagation method
net.trainf = nl.train.train_rprop 

**Train the network**

In [None]:
net.train(X_train, digit_labels, show=1, epochs=300)

**Code explanation**

**newff():** Function to configure neural networks
minmax: This parameter takes a list of lists as input. We need to supply
the minimum and maximum value of all the input variables. This helps the algorithm in weights initilialization

**size=[20,10]:** Takes a list as input
Mention nodes in each layer except the input layer
[Nodes in hidden layer1, Nodes in hidden layer2,…, Nodes in hidden layer k, Nodes in output layer]
[20,10] – One hidden layer with 20 nodes and an output layer with ten nodes

**Transf= nl.trans.LogSig():** ‘Transf’ is the activation function parameter LogSig() is the sigmoid function standard syntax. For regression output we can use “PureLin”

**[nl.trans.LogSig()]*2:** **2 denotes the two activations from input to hidden, hidden to
output in this case. If we have two hidden layers, then we need to mention*3 

**net.train:** Function to fit the model. Takes training data as input
**show=1:** show=1 – Shows error value in each epoch.
show=0 - Builds the model directly. Epochs do not show errors epochs The number of epochs. One epoch is one full run of the data. Mention epochs as a number between 50-500

We can now print the model. The model is nothing but a set of weights. Below is the code for
fetching the weights from the model.

In [None]:
print(net.layers[0].np['w'])
print(net.layers[0].np['b'])

In [None]:
print(net.layers[1].np['w'])
print(net.layers[1].np['b'])

The code below gives us an idea of the count of weights.

In [None]:
print(net.layers[0].np['w'].shape)
print(net.layers[0].np['b'].shape)
print(net.layers[1].np['w'].shape)
print(net.layers[1].np['b'].shape)

The model building is completed. We have to use these weights to get the predictions for new data
points

### Deciding hidden nodes
There are several hyperparameters in the neural network model. The most impactful
hyperparameter is the number of hidden nodes. If this number is too high, then the model will be
overfitted. If this is too low, then the model will be under fitted. We need to look at the train and
test data accuracies to finetune this parameter. We can use a binary search approach to finetune
this parameter.

### Model prediction and tree validation
Each new data point in the test data gives us ten probabilities, one probability for each digit. We will
take the final result as the digit with maximum probability.

Prediction on test data

In [None]:
predicted_values = net.sim(X_test)
predicted=pd.DataFrame(predicted_values)
print(round(predicted.head(10),3))

Converting predicted probabilitis into numbers

In [None]:
predicted_number=predicted.idxmax(axis=1)
print(predicted_number.head(15))

In the above output, we can see ten probabilities for each data point. We convert them to single
dights based on class with the highest probability. For example, the first record has a 0.998
probability for class-6. Last data point as 1.0 probability for class-3. We can now create the
confusion matrix and calculate the accuracy.

In [None]:
ConfusionMatrix = cm(y_test,predicted_number)
print("ConfusionMatrix on test data \n", ConfusionMatrix)

In [None]:
accuracy=np.trace(ConfusionMatrix)/sum(sum(ConfusionMatrix))
print("Test Accuracy", accuracy)

We can see that the accuracy of the test data is 83%. Let us use this model to get some predictions
on a few data points from test data.

In [None]:
i=623
random_sampel_data=digits_data_raw[[i]]
random_sampel_data1=pd.DataFrame(random_sampel_data)
X_sample=random_sampel_data1.drop(random_sampel_data1.columns[[0]], axis=1)

In [None]:
predicted_values = net.sim(X_sample)
predicted=pd.DataFrame(predicted_values)
predicted_number=predicted.idxmax(axis=1)
predicted_number

In [None]:
data_row=random_sampel_data[0][1:]
pixels = np.matrix(data_row)
pixels=pixels.reshape(16,16)
plt.rcParams["figure.figsize"] = (7,5)
plt.title(["Row = ", i, "Prediction Digit ", predicted_number[0]], fontsize=20)
plt.imshow(pixels, cmap='Greys')

The above code randomly takes a point from the data and gives us the predicted values.