# Q2: Multiclass Support Vector Machine exercise

In this part of exercise you will:
    
- implement a fully-vectorized **loss function** for the SVM
- implement the fully-vectorized expression for its **analytic gradient**
- **check your implementation** using numerical gradient
- use a validation set to **tune the learning rate and regularization** strength
- **optimize** the loss function with **SGD**
- **visualize** the final learned weights


In [None]:
# Run some setup code for this notebook.

import random
import numpy as np
from test.data_utils import load_CIFAR10
import matplotlib.pyplot as plt


## CIFAR-10 Data Loading and Preprocessing

In [None]:
# Load the raw CIFAR-10 data.
# As a sanity check, print out the size of the training and test data.


In [None]:
# Visualize some examples from the dataset.
# show a few examples of training images from each class.


In [None]:
# Split the data into train, val, and test sets. In addition we will
# create a small development set as a subset of the training data;
# you can use this for development so our code runs faster.
num_training = 49000
num_validation = 1000
num_test = 1000
num_dev = 500

# Your validation set will be num_validation points from the original
# training set.


# Your training set will be the first num_train points from the original
# training set.


# You will also make a development set, which is a small subset of
# the training set.


# use the first num_test points of the original test set as our
# test set.


In [None]:
# Preprocessing: reshape the image data into rows


# As a sanity check, print out the shapes of the data


In [None]:
# Preprocessing: subtract the mean image
# first: compute the image mean based on the training data


In [None]:
# second: subtract the mean image from train and test data


In [None]:
# third: append the bias dimension of ones (i.e. bias trick) so that our SVM
# only has to worry about optimizing a single weight matrix W.



## SVM Classifier

Your code for this section will all be written inside **test/clf/linear_svm.py**. 

As you can see, we have prefilled the function `compute_loss_naive` which uses for loops to evaluate the multiclass SVM loss function. 

In [None]:
# Evaluate the naive implementation of the loss we provided for you:
from test.clf.linear_svm import svm_loss_naive
import time

# generate a random SVM weight matrix of small numbers


The `grad` returned from the function above is right now all zero. Derive and implement the gradient for the SVM cost function and implement it inline inside the function `svm_loss_naive`. You will find it helpful to interleave your new code inside the existing function.

To check that you have correctly implemented the gradient correctly, you can numerically estimate the gradient of the loss function and compare the numeric estimate to the gradient that you computed. We have provided code that does this for you:

In [None]:
# Once you've implemented the gradient, recompute it with the code below
# and gradient check it with the function we provided for you

# Compute the loss and its gradient at W.


# Numerically compute the gradient along several randomly chosen dimensions, and
# compare them with your analytically computed gradient. The numbers should match
# almost exactly along all dimensions.


# do the gradient check once again with regularization turned on
# you didn't forget the regularization gradient did you?


### Inline Question 1:
It is possible that once in a while a dimension in the gradcheck will not match exactly. What could such a discrepancy be caused by? Is it a reason for concern? What is a simple example in one dimension where a gradient check could fail? *Hint: the SVM loss function is not strictly speaking differentiable*

**Your Answer:** *fill here.*

In [None]:
# Next implement the function svm_loss_vectorized; for now only compute the loss;
# we will implement the gradient in a moment.


# The losses should match but your vectorized implementation should be much faster.


In [None]:
# Complete the implementation of svm_loss_vectorized, and compute the gradient
# of the loss function in a vectorized way.



# The naive implementation and the vectorized implementation should match, but
# the vectorized version should still be much faster.




# The loss is a single number, so it is easy to compare the values computed
# by the two implementations. The gradient on the other hand is a matrix, so
# use the Frobenius norm to compare them.


### Stochastic Gradient Descent

Now you have vectorized and efficient expressions for the loss, the gradient and our gradient matches the numerical gradient. We are therefore ready to do SGD to minimize the loss.

In [None]:
# In the file linear_classifier.py, implement SGD in the function
# LinearClassifier.train() and then run it with the code below.


In [None]:
# A useful debugging strategy is to plot the loss as a function of
# iteration number:


In [None]:
# Write the LinearSVM.predict function and evaluate the performance on both the
# training and validation set


In [None]:
# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of about 0.4 on the validation set.


# results is dictionary mapping tuples of the form
# (learning_rate, regularization_strength) to tuples of the form
# (training_accuracy, validation_accuracy). The accuracy is simply the fraction
# of data points that are correctly classified.



################################################################################
# TODO:                                                                        #
# Write code that chooses the best hyperparameters by tuning on the validation #
# set. For each combination of hyperparameters, train a linear SVM on the      #
# training set, compute its accuracy on the training and validation sets, and  #
# store these numbers in the results dictionary. In addition, store the best   #
# validation accuracy in best_val and the LinearSVM object that achieves this  #
# accuracy in best_svm.                                                        #
#                                                                              #
# Hint: You should use a small value for num_iters as you develop your         #
# validation code so that the SVMs don't take much time to train; once you are #
# confident that your validation code works, you should rerun the validation   #
# code with a larger value for num_iters.                                      #
################################################################################

    
# Print out results.


In [None]:
# Visualize the cross-validation results


In [None]:
# Evaluate the best svm on test set


In [None]:
# Visualize the learned weights for each class.
# Depending on your choice of learning rate and regularization strength, these may
# or may not be nice to look at.


### Inline question 2:
Describe what your visualized SVM weights look like, and offer a brief explanation for why they look they way that they do.

**Your answer:** *fill here*