In [1]:
#  Ebnable HTML/CSS 
from IPython.core.display import HTML
HTML("<link href='https://fonts.googleapis.com/css?family=Passion+One' rel='stylesheet' type='text/css'><style>div.attn { font-family: 'Helvetica Neue'; font-size: 30px; line-height: 40px; color: #FFFFFF; text-align: center; margin: 30px 0; border-width: 10px 0; border-style: solid; border-color: #5AAAAA; padding: 30px 0; background-color: #DDDDFF; }hr { border: 0; background-color: #ffffff; border-top: 1px solid black; }hr.major { border-top: 10px solid #5AAA5A; }hr.minor { border: none; background-color: #ffffff; border-top: 5px dotted #CC3333; }div.bubble { width: 65%; padding: 20px; background: #DDDDDD; border-radius: 15px; margin: 0 auto; font-style: italic; color: #f00; }em { color: #AAA; }div.c1{visibility:hidden;margin:0;height:0;}div.note{color:red;}</style>")

___
Enter Team Member Names here (*double click to edit*):

- Name 1: Thomas Adams
- Name 2: Suleiman Hijazeen
- Name 3: Nancy Le
- Name 4: Andrew Whigham

___

# In Class Assignment Two
In the following assignment you will be asked to fill in python code and derivations for a number of different problems. Please read all instructions carefully and turn in the rendered notebook (or HTML of the rendered notebook)  before the end of class (or right after class). The initial portion of this notebook is given before class and the remainder is given during class. Please answer the initial questions before class, to the best of your ability. Once class has started you may rework your answers as a team for the initial part of the assignment. 

<a id="top"></a>
## Contents
* <a href="#Loading">Loading the Data</a>
* <a href="#svm">Linear SVMs</a>
* <a href="#svm_using">Using Linear SVMs</a>
* <a href="#nonlinear">Non-linear SVMs</a>

________________________________________________________________________________________________________

<a id="Loading"></a>
<a href="#top">Back to Top</a>
## Loading the Data
Please run the following code to read in the "olivetti faces" dataset from sklearn's data loading module. 

This will load the data into the variable `ds`. `ds` is a `bunch` object with fields like `ds.data` and `ds.target`. The field `ds.data` is a numpy matrix of the continuous features in the dataset. **The object is not a pandas dataframe. It is a numpy matrix.** Each row is a set of observed instances, each column is a different feature. It also has a field called `ds.target` that is an integer value we are trying to predict (i.e., a specific integer represents a specific person). Each entry in `ds.target` is a label for each row of the `ds.data` matrix. 

In [3]:
# fetch the images for the dataset
# this will take a long time the first run because it needs to download
# after the first time, the dataset will be save to your disk (in sklearn package somewhere) 
# if this does not run, you may need additional libraries installed on your system (install at your own risk!!)
from sklearn.datasets import fetch_lfw_people

lfw_people = fetch_lfw_people(min_faces_per_person=20, resize=None)

In [5]:
# get some of the specifics of the dataset
X = lfw_people.data
y = lfw_people.target
names = lfw_people.target_names

n_samples, n_features = X.shape
_, h, w = lfw_people.images.shape
n_classes = len(names)



print("n_samples: {}".format(n_samples))
print("n_features: {}".format(n_features))
print("n_classes: {}".format(n_classes))
print("Original Image Sizes {} by {}".format(h,w))
print (125*94) # the size of the images are the size of the feature vectors

n_samples: 3023
n_features: 11750
n_classes: 62
Original Image Sizes 125 by 94
11750


**Question 1:** For the faces dataset, describe what the data represents? That is, what is each column? What is each row? What do the unique class values represent?

Each column represents a pixel as a feature. Each row is an instance of the data - a specific image. Since there are 3023 samples in the dataset, there are 3023 rows. Since the images are 125 by 94 pixels, we have 125 * 94 = 11750 columns of features.


___

<a id="svm"></a>
<a href="#top">Back to Top</a>
## Linear Support Vector Machines

**Question 2:** If we were to train a linear Support Vector Machine (SVM) upon the faces data, how many parameters would need to be optimized in the model? That is, how many coefficients would need to be calculated?


We would need the same number of parameters as features for each classifier, which would by 125 * 94 = 11750 coefficients per classifier. Depending on whether we are using One-Vs-All or One-Vs-One, we would have either:

One-Vs-All:

    62 classifiers......      (n_classes) 

or 

One-Vs-One:

    1,891 classifiers....     (n_classes * (n_classes - 1) / 2).

In [6]:
%%time 
# Enter any scratchwork or calculations here
from sklearn.svm import LinearSVC
clf = LinearSVC()
clf.fit(X, y)

print(clf.coef_)

#This strategy, also known as one-vs-all, is implemented in OneVsRestClassifier. 
#The strategy consists in fitting one classifier per class. For each classifier, 
#the class is fitted against all the other classes. In addition to its computational 
#efficiency (only n_classes classifiers are needed), one advantage of this approach is 
#its interpretability. Since each class is represented by one and only one classifier, 
#it is possible to gain knowledge about the class by inspecting its corresponding classifier.
#This is the most commonly used strategy and is a fair default choice.




[[ 0.00233145  0.00089948 -0.00184752 ...  0.01409334  0.00074242
  -0.00358786]
 [-0.04148765 -0.03136253 -0.01567535 ...  0.044099    0.02492506
   0.00118109]
 [ 0.01599051  0.0265195   0.03556783 ...  0.00647601  0.01674502
   0.01670068]
 ...
 [-0.0053116  -0.0162482  -0.0188582  ...  0.01449738  0.03493597
   0.05328726]
 [ 0.10930735  0.03130511 -0.0249525  ...  0.04288251  0.02474159
  -0.01944119]
 [-0.02639129 -0.0184326  -0.01059024 ... -0.02820624 -0.02604045
  -0.02366287]]
Wall time: 8min 43s




In [8]:
print(clf.coef_.shape)

(62, 11750)


**Question 3:** 
- **Part A:** Given the number of parameters calculated above, would you expect the model to train quickly using **batch optimization techniques**? Why or why not? 
- **Part B:** Is there a way to reduce training time?
- **Part C:** If we transformed the X data using principle components analysis (PCA) with 100 components, how many parameters would we need to find for a linear Support Vector Machine (SVM)?

*Enter you answer here (double click)*


A. No, there are too many features to quickly compute SVCs.

B. Yes, if we could reduce the number of dimensions in the dataset before computing our SVCs, it would reduce the computation time.

C. 100. By reducing the data to only 100 dimensions, we would only need to calculate 100 parameters for our SVC.
___

In [24]:
# Enter any scratchwork or calculations here
from sklearn.decomposition import PCA

pca = PCA(n_components=100)
pca.fit(X)
X_pca = pca.transform(X)


clf_pca = LinearSVC()
clf_pca.fit(X_pca, y)


print(clf_pca.coef_.shape)


print('Part C. With 100 features: ', clf_pca.coef_.shape[1])



(62, 100)
Part C. With 100 features:  100




___
<a id="svm_using"></a>
<a href="#top">Back to Top</a>

# Using Linear SVMs

**Exercise 1:** Use the block of code below to check if the number of parameters you calculated is equal to the number of parameters returned by `sklearn`'s implementation of the Linear SVM. **Was your calculation correct? If different, can you think of a reason why the parameters would not match?**

In [10]:
from sklearn.svm import LinearSVC
from sklearn.decomposition import PCA

n_components = 100
pca = PCA(n_components=n_components,svd_solver='randomized')
Xpca = pca.fit_transform(X)

clf = LinearSVC()
clf.fit(Xpca,y)




LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='squared_hinge', max_iter=1000,
     multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
     verbose=0)

In [26]:
#===================================================================
# Enter your code below to calculate the number of parameters in the model 

print(clf.coef_.shape)

#===================================================================

(62, 100)


Yes, the number of parameters (per classifier) we calculated matched sklearn's implementation of the Linear SVM. If it had not matched, it could be because of a difference in approach (One-Vs-One or One-Vs-All). These produce a difference in the number of classifiers for the resulting matrix, but the number of columns would still be the same - 100. This is because PCA has reduced the dimensionality to 100. 

___
**Exercise 2:** Use the starter code below to calculate two quantities: 
- **Part A.:** The overall accuracy of the trained linear svm on the training set
- **Part B.:** The *mean, standard deviation, maximum, and minimum* of the **accuracy per class** on the training set

You might be interested in the following documentation of the confusion matrix calculated by `scikit-learn`:
- http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html

And an example matrix returned by the confusion matrix function:
<img src="http://scikit-learn.org/stable/_images/sphx_glr_plot_confusion_matrix_001.png",width=400,height=400>

In [30]:
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
import numpy as np

yhat = clf.predict(Xpca)


#===================================================
# Enter your code below

cm = confusion_matrix(y, yhat)
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
diag = np.diagonal(cm)

print('Overall Accuracy is ', accuracy_score(y, yhat))


print('The class accuracy is ',np.mean(diag), '+-', np.std(diag),end=' ')
print('(min,max) (',np.min(diag), np.max(diag),')')

#===================================================

Overall Accuracy is  0.9388025140588819
The class accuracy is  0.8966422304532808 +- 0.1398543291869833 (min,max) ( 0.35 1.0 )


___
<a id="nonlinear"></a>
<a href="#top">Back to Top</a>

# Non-linear SVMs
Now let's explore the use of non-linear svms. More explicitly, using different kernels. Take a look at the example training and testing code below for the non-linear SVM. All parameters are left as default, except we change the kernel to be `rbf`. Run the block of code below.



In [44]:
from sklearn.svm import SVC

clf = SVC(kernel='rbf')
clf.fit(Xpca,y)
yhat = clf.predict(Xpca)
accuracy = dict()
accuracy['rbf'] = accuracy_score(y,yhat)
print('Overall Accuracy is ',accuracy_score(y,yhat))



Overall Accuracy is  0.9388025140588819


___
**Exercise 3:** Use the starter code from above to calculate the accuracy for three different non-linear SVM kernels. That is, repeat the code above for different `kernel` parameters. **Only use non-linear kernels.  Which kernel is most accurate with the default parameters?**

You might be interested in the documentation of the scikit-learn SVM implementation, available here:
- http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

In [54]:
#===================================================
# Enter your code below
clf = SVC(kernel='poly')
clf.fit(Xpca,y)
yhat = clf.predict(Xpca)
accuracy['poly'] = accuracy_score(y,yhat)

clf = SVC(kernel='sigmoid')
clf.fit(Xpca,y)
yhat = clf.predict(Xpca)
accuracy['sigmoid'] = accuracy_score(y,yhat)

clf = SVC(kernel='precomputed')
kernel_train = np.dot(Xpca, Xpca.T)
yhat = SVC(kernel='precomputed').fit(kernel_train,y).predict(kernel_train)

accuracy['precomputed'] = accuracy_score(y,yhat)

#===================================================



In [56]:
print (accuracy)

maximum = max(accuracy, key=accuracy.get) 
print('The highest accuracy is: ', maximum, accuracy[maximum])

{'rbf': 0.9388025140588819, 'precomputed': 0.9986768111147867, 'poly': 0.9751902084022495, 'sigmoid': 0.214356599404565}
The highest accuracy is:  precomputed 0.9986768111147867


The most accurate kernel is ... Precomputed but second best is poly. Precomputed doesn't seem to fit the intent of this exercise so we are proceeding with Poly for the following exercise.


___
**Exercise 4:** Choose the **most accurate kernel** and manipulate the settings for `gamma` to make the classification more accurate. 
- **Part A:** How accurate can you make it? 
- **Part B:** Would you expect the results to generalize well? Why or why not?

In [64]:
import numpy

#===================================================
# Enter your code below

kern = 'poly'
g = None

for x in numpy.arange(0.01, .20, .01):
    yhat = SVC(kernel=kern, gamma=x).fit(Xpca,y).predict(Xpca)
    print('With g = ', x, 'Overall Accuracy is ', accuracy_score(y,yhat))     


#===================================================

With g =  0.01 Overall Accuracy is  0.9751902084022495
With g =  0.02 Overall Accuracy is  1.0
With g =  0.03 Overall Accuracy is  1.0
With g =  0.04 Overall Accuracy is  1.0
With g =  0.05 Overall Accuracy is  1.0
With g =  0.060000000000000005 Overall Accuracy is  1.0
With g =  0.06999999999999999 Overall Accuracy is  1.0
With g =  0.08 Overall Accuracy is  1.0
With g =  0.09 Overall Accuracy is  1.0
With g =  0.09999999999999999 Overall Accuracy is  1.0
With g =  0.11 Overall Accuracy is  1.0
With g =  0.12 Overall Accuracy is  1.0
With g =  0.13 Overall Accuracy is  1.0
With g =  0.14 Overall Accuracy is  1.0
With g =  0.15000000000000002 Overall Accuracy is  1.0
With g =  0.16 Overall Accuracy is  1.0
With g =  0.17 Overall Accuracy is  1.0
With g =  0.18000000000000002 Overall Accuracy is  1.0
With g =  0.19 Overall Accuracy is  1.0


*Enter you answer here (double click)*

A. The highest accuracy we could achieve was ... 1.0

B. We would not expect this to generalize because it is very trial and error intensive, and compounding this process with large datasets could be computationally expensive. 



___
**Final Question:** Using the most accurate non-linear SVM you found in the previous question, how many parameter coefficients does the trained model contain?

*Enter you answer here (double click)*



In [74]:
#===================================================
# Enter any scratchwork calculations you need below
polySVC = SVC(kernel=kern, gamma=0.1).fit(Xpca,y)
print(polySVC.dual_coef_.shape)
print('The poly SVM contains ', polySVC.dual_coef_.shape[1], ' parameters.')

(61, 2939)
The poly SVM contains  2939  parameters.


________________________________________________________________________________________________________

That's all! Please **save (make sure you saved!!!) and upload your rendered notebook** and please include **team member names** in the notebook submission.