Artificial Neural Networks
================

------
**Deep Learning for Computer Vision**<br>
(c) Research Group CAMMA, University of Strasbourg<br>
Website: http://camma.u-strasbg.fr/
-----

### About this notebook

- **Objectives**: 
  - Train and test simple ANNs
  - Experiment with underfitting / overfitting
  - Perform experiments on Spiral3 and MNIST
  

- **Instructions**:
  - To make the best use of this notebook, read the provided instructions and code, fill in the *#TODO* blocks, and run the code.
  - Load MNIST dataset from https://seafile.unistra.fr/f/11b3075bb2df41cf8db2/?dl=1
  

### Warm-up

Go to the [Tensorflow Playground](https://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=circle&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=4,2&seed=0.90143&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false&percTrainData_hide=true&discretize_hide=true&noise_hide=true&problem_hide=true)

1. Study the complexity of each of the 4 proposed datasets for classification

2. Experiment with the network structure and training parameters to classify each dataset using only the point coordinates as input features 

3. Study underfitting and overfitting behaviors by acting on network capacity and regularization parameter

4. If non-linear features are used as input, how much can you simplify the network structure and training process? 

### Neural Networks for Classification of Spiral3 Toy Dataset

Import libraries:

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pickle
import gzip

Create and visualize the dataset:

In [None]:
def toy_spiral3(N=200, K=3, D=2):
  np.random.seed(0)
  #N: number of points per class
  #D: dimensionality
  #K: number of classes
  X = np.zeros((N*K,D))
  y = np.zeros(N*K, dtype='uint8')
  for j in range(K):
    ix = range(N*j,N*(j+1))
    r = np.linspace(0.0,1,N) # radius
    t = 5 + np.linspace(j*4,(j+1)*4,N) + np.random.randn(N)*0.3 # theta
    X[ix] = np.c_[r*np.sin(t), r*np.cos(t)]
    y[ix] = j
  return X,y

In [None]:
def toy_plot(X,y):  
  fig = plt.figure()
  plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.Spectral)
  plt.xlim([-1.5,1.5])
  plt.ylim([-1.5,1.5])

In [None]:
np.random.seed(999) #For reproducibility
[X,y] = toy_spiral3(200)
toy_plot(X,y)

**Code for training and inference of a 2-layer neural network**

Identify the difference compared to the Softmax code from the previous notebook

In [None]:
class NN2LClassifier:

  def __init__(self, num_hidden=100, nclasses=3, ndims=2):
    self.nclasses = nclasses
    self.ndims = ndims
    self.h = num_hidden # size of hidden layer
    self.W1 = None
    self.b1 = None
    self.W2 = None
    self.b2 = None

  # Train the classifier's parameters
  def train(self, X, y, learning_rate=1e-0, reg_weight=1e-3, num_iters=10000, verbose=True):
    #Hyperparameters
    #learning_rate: gradient descent step size
    #reg_weight: regularization   
    N = X.shape[0] #number of data points
  
    # initialize parameters randomly
    self.W1 = 0.01 *np.random.randn(self.ndims,self.h) #NOTE: much faster if you remove 0.01 here
    self.b1 = np.zeros((1,self.h))
    self.W2 = 0.01 *np.random.randn(self.h,self.nclasses)
    self.b2 = np.zeros((1,self.nclasses))

    #Gradient descent 
    for i in range(num_iters):
  
      # evaluate class scores, [N x K]
      hidden = np.maximum(0, np.dot(X, self.W1) + self.b1) #ReLU activation
      scores = np.dot(hidden, self.W2) + self.b2
  
      # compute the class probabilities
      expo = np.exp(scores)
      softm = expo / np.sum(expo, axis=1, keepdims=True) # [N x K]
  
      # compute the loss: average cross-entropy loss and regularization
      logs = -np.log(softm[range(N),y])
      data_loss = np.sum(logs)/N
      reg_loss = 0.5*reg_weight*np.sum(self.W1*self.W1) + 0.5*reg_weight*np.sum(self.W2*self.W2)
      loss = data_loss + reg_loss

      # compute the gradient on scores
      dscores = softm
      dscores[range(N),y] -= 1
      dscores /= N
  
      # backpropate the gradient to the parameters
      # first backprop into parameters W2 and b2
      dW2 = np.dot(hidden.T, dscores)
      db2 = np.sum(dscores, axis=0, keepdims=True)
      # next backprop into hidden layer
      dhidden = np.dot(dscores, self.W2.T)
      # backprop the ReLU non-linearity
      dhidden[hidden <= 0] = 0
      # finally into W1,b1
      dW1 = np.dot(X.T, dhidden)
      db1 = np.sum(dhidden, axis=0, keepdims=True)
  
      # add regularization gradient contribution
      dW2 += reg_weight * self.W2
      dW1 += reg_weight * self.W1
  
      # perform a parameter update
      self.W1 += -learning_rate * dW1
      self.b1 += -learning_rate * db1
      self.W2 += -learning_rate * dW2
      self.b2 += -learning_rate * db2
      
      if verbose and (i % 50 == 0):   #Print loss every few steps
        print("iteration %d: loss=%f ; train_acc:%f" % (i, loss, self.accuracy(X,y)) )
      
    return self.accuracy(X,y)

  # predict the classes for all input
  def predict(self,X):
      scores = np.dot(np.maximum(0, np.dot(X, self.W1) + self.b1),self.W2) + self.b2
      predicted_classes = np.argmax(scores, axis=1)
      return predicted_classes
      
  # compute accuracy on x 
  def accuracy(self,X,y):
      predicted_classes = self.predict(X)
      accuracy = np.mean(predicted_classes == y)
      #print 'training accuracy: %.2f' % (accuracy)
      return accuracy


Train the 2-layer neural network on the Spiral3 dataset and display the accuracy:

In [None]:
classifier = NN2LClassifier(num_hidden=64)
training_acc = classifier.train(X,y,num_iters=3000,learning_rate=1,reg_weight=1e-3)
print('training accuracy: %.2f' %training_acc)

How does this accuracy compare to the previous approach on Spiral3 ?

Plot accuracy as a function of number of hidden nodes (e.g. 2^i with i in 0...10 while keeping the other parameters fixed):

In [None]:
#TODO<
#TODO>

### Classification on MNIST

Load the MNIST dataset (images of size 28x28):

In [None]:
from google.colab import drive
drive.mount('/content/drive')
path = '/content/drive/MyDrive/datasets/' #TO ADAPT IF NEEDED
f = gzip.open(path+'mnist.pkl.gz', 'rb')
train_set, valid_set, test_set = pickle.load(f,encoding='bytes')
f.close()

#%% Shuffle the data and define the data variables
X_train,y_train = train_set
X_test,y_test = test_set

inds=np.arange(0,X_train.shape[0])
np.random.shuffle(inds)
X_train,y_train = X_train[inds],y_train[inds] 

inds=np.arange(0,X_test.shape[0])
np.random.shuffle(inds)
X_test,y_test = X_test[inds],y_test[inds] 

print(X_train.shape)
print(X_test.shape)

Train the classifier on this dataset:

In [None]:
#TODO<
#TODO>

Test the classifier on the test set:

In [None]:
#TODO<
#TODO>

How do these results compare to the previous classification results on MNIST ?