# Machine Learning: Lab 4 - Using Python to Design Classification Systems

Welcome to the final lab of the Machine Learning course at CoderSchool! Your final project is to build a *Spotify Music Recommendation System*. 

In order to do this, you will have to focus on having a good **DESIGN** for your system.

A good design means knowing a few things:
* Which Classifiers to use
* How to Connect them together
* How to produce a final "Score"

You spent the first half of today's lab trying to draw out a design and think of ways to produce your score from your classifiers.

Now, we will look at a few Python functions that can help you achieve your design. You don't *have* to use any of these objects; the final project can be completed without using these concepts. But knowing that they exist gives you some more options and you could find them helpful!

## DataSet

We'll use a consolidated dataset of `Dance`, `Jazz`, `Rock`, and `Rap` from Assignment 2.

In [133]:
import pandas as pd
import numpy as np

# Ignore warnings
import warnings
warnings.filterwarnings("ignore")

In [134]:
songs_dataset = pd.read_csv('Consolidated_DF_6000.csv')

In [135]:
songs_dataset.head()

Unnamed: 0,key,energy,liveliness,tempo,speechiness,acousticness,instrumentalness,time_signature,duration,loudness,valence,danceability,mode,time_signature_confidence,tempo_confidence,key_confidence,mode_confidence,genres
0,6,0.618964,0.375099,114.907,0.03443,0.129149,0.000397,0,4,302.37333,-9.496,0.947362,0.905836,0.712,0.62,0.828,0.87,dance
1,7,0.844817,0.067792,109.935,0.048568,0.127837,0.910389,1,4,435.8,-8.461,0.449367,0.687389,0.359,0.498,0.76,1.0,dance
2,0,0.940507,0.0508,128.046,0.029577,0.034144,0.881883,0,4,235.62667,-7.588,0.932188,0.728027,0.541,0.557,1.0,1.0,dance
3,0,0.965342,0.350438,124.939,0.047233,0.082816,0.000383,1,4,198.65333,-6.038,0.778784,0.638805,0.0,0.045,0.687,1.0,dance
4,1,0.639406,0.064024,88.306,0.116464,0.100388,0.941271,1,4,65.13333,-6.737,0.71993,0.866491,0.051,0.436,0.312,1.0,dance


Let's create a 10% split to use for our training and testing. We'll work with `genres` first.

In [136]:
from sklearn.cross_validation import train_test_split

In [137]:
X = songs_dataset.drop('genres', axis=1)
y = songs_dataset['genres']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=101)

## Pipeline

The `Pipeline` object in Python is a way to simply connect many different processes or steps together. Let's combine the `SelectKBest` step with a `RandomForestClassifier` step to see how we can use a `Pipeline`.

First, from `sklearn.pipeline` import `Pipeline`. Then from `sklearn.feature_selection` import `SelectKBest`, and import `RandomForestClassifier` from `sklearn.ensemble`. 

In [138]:
from sklearn.pipeline import Pipeline
from sklearn.feature_selection import SelectKBest
from sklearn.ensemble import RandomForestClassifier

Now, create a `RandomForestClassifier` called `rfc`, and a `SelectKBest` object called `selector`.

In [139]:
rfc = RandomForestClassifier()
selector = SelectKBest()

The first thing we need to do is tell the `Pipeline` all the different steps that will be involved.

We can arrange steps in a list in the follinwg way:
`steps = [(<name of step 1>, object 1), (<name of step 2>, object 2), etc.]`

So for our example, we can use something like
`steps = [('feature_selection', selector), ('random_forest', rfc)]`

Try it out!

In [140]:
steps = [('feature_selection', selector), ('random_forest', rfc)]

Then, to make a `Pipeline`, simply pass the `steps` to our `Pipeline` object in the following way:<br>
`pipeline = Pipeline(steps)`

In [141]:
pipeline = Pipeline(steps)

Now, we can call `.fit`, and `.predict` on our `pipeline` object just like any other model!

Call `.fit`, `.predict`, and then print the `classification_report`.

In [142]:
pipeline.fit(X_train, y_train)

Pipeline(memory=None,
     steps=[('feature_selection', SelectKBest(k=10, score_func=<function f_classif at 0x111967d90>)), ('random_forest', RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, ...n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False))])

In [143]:
predictions = pipeline.predict(X_test)

In [144]:
from sklearn.metrics import classification_report
print(classification_report(y_test, predictions))

             precision    recall  f1-score   support

      dance       0.73      0.74      0.74       148
       jazz       0.81      0.83      0.82       154
        rap       0.83      0.83      0.83       151
       rock       0.72      0.69      0.71       147

avg / total       0.77      0.78      0.77       600



### Combining GridSearchCV and Pipeline

The cool thing about a `Pipeline` is that you can use a single `GridSearchCV` object to try different combinations for different values! 

from `sklearn.grid_search` import `GridSearchCV`

In [145]:
from sklearn.grid_search import GridSearchCV

We will try 2 values of `k` for `SelectKBest`, 2 values for `n_estimators` and 2 values for the `min_samples_split` for our `RandomForestClassifier`.

We have to use the following syntax for defining the parameters:
* `<name of step in pipeline> + '__' + <name of parameter>`.

For example:
* the name of our `SelectKBest` step in our pipeline is `feature_selection`
* we want to change the `k` value
* so the syntax is: `feature_selection__k`

In [146]:
parameters = dict(feature_selection__k=[5,10])

Modify `parameters` above to add the following values for the `RandomForestClassifier` step as well:<br>
 `n_estimators: [50, 100]
 min_samples_split: [2,10]`

In [147]:
parameters = dict(feature_selection__k=[5, 10], 
              random_forest__n_estimators=[50, 100],
              random_forest__min_samples_split=[2, 10])

Great! Now you know how to use `GridSearchCV` -- call it on your `pipeline` object, passing in your new `parameters`. Set `verbose=3` so you can see it in action! Call `.fit` and `.predict` on your `GridSearchCV` object and print the `classification_report`.

In [148]:
grid = GridSearchCV(pipeline, param_grid=parameters, verbose=3)

grid.fit(X_train, y_train)
grid_predictions = grid.predict(X_test)
print(classification_report(y_test, grid_predictions ))

Fitting 3 folds for each of 8 candidates, totalling 24 fits
[CV] feature_selection__k=5, random_forest__min_samples_split=2, random_forest__n_estimators=50 
[CV]  feature_selection__k=5, random_forest__min_samples_split=2, random_forest__n_estimators=50, score=0.727929 -   0.4s
[CV] feature_selection__k=5, random_forest__min_samples_split=2, random_forest__n_estimators=50 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.4s remaining:    0.0s


[CV]  feature_selection__k=5, random_forest__min_samples_split=2, random_forest__n_estimators=50, score=0.711272 -   0.4s
[CV] feature_selection__k=5, random_forest__min_samples_split=2, random_forest__n_estimators=50 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.8s remaining:    0.0s


[CV]  feature_selection__k=5, random_forest__min_samples_split=2, random_forest__n_estimators=50, score=0.741935 -   0.6s
[CV] feature_selection__k=5, random_forest__min_samples_split=2, random_forest__n_estimators=100 
[CV]  feature_selection__k=5, random_forest__min_samples_split=2, random_forest__n_estimators=100, score=0.727929 -   0.9s
[CV] feature_selection__k=5, random_forest__min_samples_split=2, random_forest__n_estimators=100 
[CV]  feature_selection__k=5, random_forest__min_samples_split=2, random_forest__n_estimators=100, score=0.716824 -   0.7s
[CV] feature_selection__k=5, random_forest__min_samples_split=2, random_forest__n_estimators=100 
[CV]  feature_selection__k=5, random_forest__min_samples_split=2, random_forest__n_estimators=100, score=0.746385 -   0.7s
[CV] feature_selection__k=5, random_forest__min_samples_split=10, random_forest__n_estimators=50 
[CV]  feature_selection__k=5, random_forest__min_samples_split=10, random_forest__n_estimators=50, score=0.737368 -  

[Parallel(n_jobs=1)]: Done  24 out of  24 | elapsed:   19.3s finished


             precision    recall  f1-score   support

      dance       0.76      0.69      0.72       148
       jazz       0.78      0.86      0.81       154
        rap       0.80      0.82      0.81       151
       rock       0.72      0.69      0.71       147

avg / total       0.77      0.77      0.77       600



## Pickling

'Pickling' is a way to save your python objects to disk. You can simply save ANY python variable to your computer as a `.pickle` file and later read it. It saves time and will help you when testing!

In [16]:
import pickle
some_model = [1, 2, 3, 4, 5, 6]

print `some_model`.

In [17]:
print(some_model)

[1, 2, 3, 4, 5, 6]


The way to save your object is to call `pickle.dump`. You pass in a filename with a `.pickle` extension (in this case, `my_model.pickle`). `wb` means we are in write mode.

In [18]:
pickle.dump(some_model, open('my_model.pickle', 'wb'))

Check your computer! You should see a file called `my_model.pickle`.

The way to load your object is to call `pickle.load`. You pass in the filename that you want to load. `rb` means we are in read mode.

In [19]:
some_model_2 = pickle.load(open('my_model.pickle', 'rb'))

print `some_model_2`

In [20]:
print(some_model_2)

[1, 2, 3, 4, 5, 6]


Cool! We successfully saved and loaded a python variable to disk.

In our example, we used a list, but you can use almost any object -- a pandas DataFrame, a RandomForestClassifier model, a Doc2Vec model, a Bag Of Words matrix - almost **ANYTHING** !

## Multi-Label Output

A song can be *both* `happy` and `celebratory`. For this reason, `moods` is an example of a variable that can be described as `multi-label`. In your final project, you might choose to work with `moods` and hence with multi-label output. You can choose to do this in several ways. One way is to use one of Python's built-in classifiers which supports Multi-Label output directly! Egs: `RandomForestClassifier`, `KNeighborsClassifier`.

### DataSet
Remember how we saw that you can save anything use `.pickle`? Read in `songs_aggressive.pickle` and save it in `songs_aggressive`.

In [21]:
songs_aggressive =  pickle.load(open('songs_aggressive.pickle', 'rb'))

Print the `type` of songs_aggressive.

In [22]:
type(songs_aggressive)

pandas.core.frame.DataFrame

Notice it is a pandas DataFrame -- this means we saved a pandas dataframe as a pickle file. Nice! Print its `head`.

In [23]:
songs_aggressive.head()

Unnamed: 0,_id,album,artist,audio_features,context,decades,genres,lyrics_features,moods,name,new_context,picture,recording_id,sub_context,yt_id,yt_views
47,{'$oid': '52fdfb3d0b9398049f3cbcdf'},Toxicity,System Of A Down,"[7, 0.906388, 0.130576, 127.438, 0.122818, 0.0...","[energetic, energetic]",[],[rock],"[wake, up, wake, up, grab, a, brush, and, put,...","[aggressive, rowdy]",Chop Suey!,energetic,http://images.musicnet.com/albums/013/354/909/...,50574.0,"[energy boost, gaming ]",CSvFpBOe8eY,340810887
182,{'$oid': '52fdfb3f0b9398049f3ce64d'},Awake,Skillet,"[8, 0.9567800000000001, 0.078926, 134.992, 0.0...",[gaming ],['00s rock],[rock],"[the, secret, side, of, me, i, never, let, you...",[aggressive],Monster,,http://images.musicnet.com/albums/032/235/853/...,10525.0,,1mjlM_RnsVE,125635452
201,{'$oid': '52fdfb3f0b9398049f3ce638'},B.Y.O.B. (Parental Advisory),System Of A Down,"[1, 0.9814600000000001, 0.275786, 101.799, 0.1...",[gaming ],['00s rock],[rock],"[you, why, do, they, always, send, the, poor, ...",[aggressive],B.Y.O.B.,,http://images.musicnet.com/albums/003/654/455/...,50567.0,,zUzd9KyIDrM,116388342
212,{'$oid': '52fdfb3f0b9398049f3ce3bc'},Just One Last Time,David Guetta,"[8, 0.58184, 0.07976000000000001, 128.067, 0.0...","[party, party, party]",[],[dance: house & techno],"[this, is, the, end, station, but, i, cant, mo...","[visceral, aggressive, rowdy]",Just One Last Time (Feat. Taped Rai) [Extended],work out,http://images.musicnet.com/albums/076/827/259/...,29494.0,"[driving in the left lane, working out: cardio...",xyqQ4iT4IeU,109577676
249,{'$oid': '52fdfb430b9398049f3d5c2e'},...And Justice For All,Metallica,"[7, 0.690739, 0.11966099999999999, 102.132, 0....","[work out, work out]",[],[rock],"[i, can, t, remember, anything, can, t, tell, ...","[cocky, aggressive]",One,untroubled,http://images.musicnet.com/albums/001/986/907/...,50984.0,"[hanging out in the man cave, working out: wei...",EzgGTTtR0kc,98703416


### Classification

We want to create a `train_test_split`. We want our features (`X`) to be our `audio_features` from `songs_aggressive`.<br>
**Note:** A quick way to get your features into a list format for the `train_test_split` from the pandas Series format is to use `.values.tolist()`. 

In [41]:
audio_features = songs_aggressive['audio_features'].values.tolist()
X = audio_features
X

[[7,
  0.906388,
  0.130576,
  127.438,
  0.122818,
  0.00035299999999999996,
  0.0012000000000000001,
  0,
  4,
  210.42667,
  -5.856,
  0.31845399999999996,
  0.418888,
  0.40700000000000003,
  0.602,
  0.603,
  0.889],
 [8,
  0.9567800000000001,
  0.078926,
  134.992,
  0.074067,
  0.043111,
  0.0,
  1,
  4,
  178.01333,
  -2.336,
  0.6818649999999999,
  0.639745,
  0.552,
  0.591,
  0.916,
  1.0],
 [1,
  0.9814600000000001,
  0.275786,
  101.799,
  0.13866799999999999,
  0.008752,
  0.0,
  0,
  4,
  256.85333,
  -2.656,
  0.699446,
  0.54256,
  0.5730000000000001,
  0.5720000000000001,
  0.28600000000000003,
  0.919],
 [8,
  0.58184,
  0.07976000000000001,
  128.067,
  0.098754,
  0.004038,
  8.800000000000001e-05,
  1,
  4,
  339.38667,
  -3.9779999999999998,
  0.083594,
  0.729426,
  0.227,
  0.313,
  0.67,
  0.788],
 [7,
  0.690739,
  0.11966099999999999,
  102.132,
  0.061406999999999996,
  0.001021,
  0.045790000000000004,
  1,
  3,
  447.44,
  -9.427,
  0.418857,
  0.436285,


We want our labels, `y`,  to be the `moods` from `songs_aggressive`.

In [25]:
y = songs_aggressive['moods']

print `y`.

In [26]:
y

47                               [aggressive, rowdy]
182                                     [aggressive]
201                                     [aggressive]
212                    [visceral, aggressive, rowdy]
249                              [cocky, aggressive]
287                      [cocky, aggressive, trashy]
298                                     [aggressive]
440                               [cold, aggressive]
446                                     [aggressive]
481                                     [aggressive]
519      [cocky, aggressive, visceral, motivational]
520                                     [aggressive]
522                             [angsty, aggressive]
543                                     [aggressive]
576                                     [aggressive]
580                      [cocky, aggressive, trashy]
589                              [aggressive, rowdy]
593                                     [aggressive]
601                                [aggressive

You should see that, for many entries, there are multiple labels. For example, for index `36359`, the label has 3 values: <br>`['angsty', 'aggressive', 'rowdy']`.

Now what we want to do is convert our labels into numbers. Remember, computers always work with numbers!!! We will do this using the `MultiLabelBinarizer`.

from `sklearn.preprocessing` import `MultiLabelBinarizer`


In [27]:
from sklearn.preprocessing import MultiLabelBinarizer

Create a new instance of `MultiLabelBinarizer()` and save it in `mlb`.

In [28]:
mlb = MultiLabelBinarizer()

Now call `.fit_transform` on our labels above (`y`). Store it in `y_labels`.

In [29]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=101)

In [30]:
y_labels = mlb.fit_transform(y)

Print y_labels to see what it looks like. It should look like a bunch of lists with 1s and 0s. Each list is a label that has been converted into numbers.

In [31]:
y_labels

array([[1, 0, 0, ..., 1, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       ...,
       [1, 1, 0, ..., 0, 1, 0],
       [1, 1, 0, ..., 1, 0, 0],
       [1, 0, 0, ..., 0, 0, 0]])

Print `mlb.classes_`, `y_labels[0]`, and `y.iloc[0]` and compare them. You will notice how the labels have been encoded.

In [32]:
mlb.classes_
y_labels[0]
y.iloc[0]

['aggressive', 'rowdy']

Now, because we are going to use a `KNeighborsClassifier`, we need to scale our features. Use a `StandardScaler` to scale `X`.

In [42]:
from sklearn.preprocessing import StandardScaler
standar_scaler = StandardScaler()
X =  standar_scaler.fit_transform(X)

Now we are ready to create our `train_test_split`! Use `test_size=0.1`, `random_state=42`, `X`, and `y_labels`.

In [43]:
X_train, X_test, y_train, y_test = train_test_split(X, y_labels, test_size=0.1, random_state=42)

Finally, create your `KNeighborsClassifier`, `.fit` it to the training data, call `.predict` and print the `classification_report`.

In [48]:
from sklearn.neighbors import KNeighborsClassifier
knn_multi =  KNeighborsClassifier()
knn_multi.fit(X_train, y_train)
knn_multi_predictions = klc.predict(X_test)

In [49]:
from sklearn.metrics import classification_report
print(classification_report(y_test, knn_multi_predictions ))

             precision    recall  f1-score   support

          0       1.00      1.00      1.00       194
          1       0.54      0.35      0.42        72
          2       0.57      0.14      0.22        29
          3       0.00      0.00      0.00         4
          4       0.00      0.00      0.00        19
          5       0.50      0.09      0.15        11
          6       0.38      0.11      0.17        27
          7       0.43      0.11      0.17        28
          8       0.50      0.11      0.17        38
          9       0.15      0.07      0.10        27

avg / total       0.67      0.53      0.56       449



## Classifier Chains

Classifier Chains are a way to combine many multi-label classifiers together in a way such that each classifier in the chain gets the output of the previous classifier and uses it as a feature! This might be helpful if your labels are co-related. Moods are probably co-related, so let's see if it helps.

Let's try it out on our current dataset.

from `sklearn.multioutput` import `ClassifierChain`

In [50]:
from sklearn.multioutput import ClassifierChain

`ClassifierChain` takes in as an argument the kind of classifier you want to use. Let's use `KNeighborsClassifier` so we can compare our results.

In [51]:
knn_chain =  ClassifierChain(KNeighborsClassifier())

You can now `.fit` and `.predict` on your chain just like you would a normal classifier.

In [58]:
knn_chain.fit(X_train, y_train)
knn_chain_predictions =  knn_chain.predict(X_test)
knn_chain_report = classification_report(y_test, knn_chain_predictions)
print(knn_chain_report)

             precision    recall  f1-score   support

          0       1.00      1.00      1.00       194
          1       0.54      0.35      0.42        72
          2       0.44      0.14      0.21        29
          3       0.00      0.00      0.00         4
          4       0.00      0.00      0.00        19
          5       0.50      0.09      0.15        11
          6       0.50      0.22      0.31        27
          7       0.57      0.14      0.23        28
          8       0.67      0.16      0.26        38
          9       0.25      0.15      0.19        27

avg / total       0.70      0.54      0.58       449



What did you observe? Did your score improve?

Remember that, the number of classifiers in the chain will be equal to the number of labels! You can check this by comparing `len(chain.estimators_)` and `len(mlb.classes_)`

In [60]:
print(len(knn_chain.estimators_))
print(len(mlb.classes_))

10
10


It's also useful to know that the *order* of the classifiers inside the chain is important and can probably influence your results. Check the python [documentation](http://scikit-learn.org/stable/modules/generated/sklearn.multioutput.ClassifierChain.html) and [example](http://scikit-learn.org/stable/auto_examples/multioutput/plot_classifier_chain_yeast.html#sphx-glr-auto-examples-multioutput-plot-classifier-chain-yeast-py) for the parameters you can pass to your chain.

For more info, you can also check out section 4.1.2 on Classifier Chains in this [article](https://www.analyticsvidhya.com/blog/2017/08/introduction-to-multi-label-classification/).

## Building Binary Classifiers and looking at Probabilities

Another technique you can try is to instead use individual binary classifiers and look at the probability of their predictions. This can help you do things like declare the *top* 3 moods for example. You can also use the probabilities in the calculation of your similarity score, when deciding which songs are more similar than others to your test song.

Let's take a quick look at how to inspect the probability for a prediction, using a `RandomForestClassifier` as an example.

## DataSet

We'll work with `genres` here, as in Assignment 1. Read in `Consolidated_Dance_Jazz.csv`, which contains all the audio features and the genres for `Dance` and `Jazz` songs. Store it in `songs_dance_jazz`.

In [61]:
songs_dance_jazz = pd.read_csv('Consolidated_Dance_Jazz.csv')
songs_dance_jazz.head()

Unnamed: 0,key,energy,liveliness,tempo,speechiness,acousticness,instrumentalness,time_signature,duration,loudness,valence,danceability,mode,time_signature_confidence,tempo_confidence,key_confidence,mode_confidence,genres
0,3.0,0.705822,0.053292,126.009,0.126016,0.001966,0.0,0.0,4.0,194.09333,-3.898,0.592798,0.875137,0.004,0.114,1.0,0.742,dance
1,0.0,0.616411,0.171423,130.009,0.059577,0.058936,6.5e-05,1.0,4.0,284.41333,-7.443,0.476111,0.78976,0.499,0.489,0.708,1.0,dance
2,7.0,0.728367,0.84481,112.328,0.307629,0.00975,2.2e-05,1.0,4.0,207.15057,-14.511,0.652412,0.691151,0.844,0.384,1.0,1.0,dance
3,11.0,0.687771,0.890427,118.388,0.069611,0.031731,0.000222,1.0,4.0,357.8,-9.815,0.781912,0.771293,0.851,0.534,0.815,1.0,dance
4,6.0,0.650609,0.061545,122.677,0.066457,0.03827,3e-05,0.0,4.0,267.16,-6.094,0.552483,0.84463,0.55,0.55,0.766,1.0,dance


Perform the following steps:
* Build a `LogisticRegression` classifier called `logReg`.
* Use all the audio features as features.
    * Remember to scale your features!
* Use `genres` as the labels.
* Make a `train_test_split` with `test_size=0.33, random_state=42`. 
* `.fit`, `.predict`, and print the `classification_report` for `logReg`.

In [75]:
from sklearn.linear_model import LogisticRegression
logReg = LogisticRegression()
X =  songs_dance_jazz.drop('genres', axis=1)
y = songs_dance_jazz['genres']

from sklearn.preprocessing import StandardScaler
standard_scaler = StandardScaler()
X = standard_scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

logReg.fit(X_train, y_train)
logReg_predictions = logReg.predict(X_test)
print(classification_report(y_test, logReg_predictions))

             precision    recall  f1-score   support

      dance       0.94      0.96      0.95       682
       jazz       0.95      0.94      0.94       602

avg / total       0.95      0.95      0.95      1284



You should see an `f1-score` of around `0.95`

### Probabilities

Let's look at the song at index 20 and see what the prediction probabilities were!

In [80]:
logReg.predict(X_test[20].reshape(1,-1))

array(['dance'], dtype=object)

You can see the classifier predicted the song at index 20 as `dance`. How confident was it? We use the `predict_proba` method to find out!

In [82]:
logReg.predict_proba(X_test[20].reshape(1,-1))

array([[0.80143092, 0.19856908]])

You should see that it was around `80.14 %` sure that it was a `dance` song, and `19.86 %` sure it was `jazz`

To see the classes themselves, you can simply print `logReg.classes_`

In [83]:
logReg.classes_

array(['dance', 'jazz'], dtype=object)

## Voting Classifiers

Voting Classifiers are a convenient way to simply group many classifiers together, and take the result that most of them predict. Let's see this with a quick example using 3 classifiers: `KNearestNeighbors`, `SVC`, and `LogisticRegression`.

Do the following:
* Create a KNearestNeighbors Classifier
* Create a SVC (Support Vector Machine Classifier)
* Create a Logistic Regression Classifier

In [84]:
from sklearn.svm import SVC
knn =KNeighborsClassifier()
svc = SVC()
logReg = LogisticRegression()

Now, using the same `X_train` / `X_test` / `y_train` / `y_test` as before, fit all your classifiers to the training data.

In [86]:
knn.fit(X_train, y_train)
svc.fit(X_train, y_train)
logReg.fit(X_train, y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

Now: Let us look at the prediction each classifier has for the song at index `20`.

In [92]:
song_index = 20
print(knn.predict(X_test[song_index].reshape(1,-1))) # kNN prediction
print(logReg.predict(X_test[song_index].reshape(1,-1))) # Logistic Regression prediction
print(svc.predict(X_test[song_index].reshape(1,-1))) # Support Vector Machine Prediction

['jazz']
['dance']
['dance']


Notice how `kNN` predicted the song as `jazz`, while the other 2 predicted the song as `dance`.

What we can do is create a `VotingClassifier` that has these 3 classifiers, and its output will automatically be the most popular prediction.

from `sklearn.ensemble` import `VotingClassifier`

In [93]:
from sklearn.ensemble import VotingClassifier

Create a `VotingClassifier` that contains all your above classifiers! It requires you pass in a `list` of your classifiers, each element in the list with the following syntax:

`(<some name for your classifier>, <your classifier object>)`

In [96]:
voting_classifier = VotingClassifier([("knn", knn), ("svc", svc), ("logReg", logReg)])

Now, `.fit` your `voting_classifier` on `X_train` and `y_train`.

In [99]:
voting_classifier.fit(X_train, y_train)

VotingClassifier(estimators=[('knn', KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform')), ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=...ty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False))],
         flatten_transform=None, n_jobs=1, voting='hard', weights=None)

Let's look at the prediction for song at index 20 !

In [102]:
song_index = 20
print(voting_classifier.predict(X_test[song_index].reshape(1,-1))) # Voting prediction

['dance']


  if diff:


Notice it is `dance`! This is because it was the majority vote from the previous 3 classifiers.

## Cosine Similarity

If you decide to use the **Million Song DataSet** *(MSD)*, you won't have any mood information (unless you try to use the *MSD* and `MasterSongList.json` together ;)). In this case, one very popular similarity metric is called *cosine similarity*. It basically decides that 2 vectors are similar if they are close together to each other. You can learn more about cosine similarity in this [picture](https://lh5.googleusercontent.com/lYq5EWtpgku57oUGff4oBcQWNaxmvj9IIXGF7_ILr9uA1wgvlI0_j8dYc00).

Let's create 3 sentences that are similar. We'll make sentence_1 and sentence_2 more 'similar' to each other than sentence_3.

In [128]:
sentence_1 = "loves food"
sentence_2 = "Hannah loves food too"
sentence_3 = "hate food"
all_sentences = [sentence_1, sentence_2, sentence_3]

Great, now let's create a bag of words model using `CountVectorizer` and `.fit_transform` it to the above sentences.

In [129]:
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
bag_of_word = count_vect.fit_transform(all_sentences)
bag_of_word

<3x5 sparse matrix of type '<class 'numpy.int64'>'
	with 8 stored elements in Compressed Sparse Row format>

from `sklearn.metrics.pairwise` import `cosine_similarity`

In [130]:
from sklearn.metrics.pairwise import cosine_similarity

Let's see how 'similar' sentence_1 and sentence_2 are. Remember that the feature vectors for these sentences live inside our `bag_of_words` !

In [131]:
cosine_similarity(bag_of_word[0], bag_of_word[1])

array([[0.70710678]])

You see a value of `0.5`, which means they are pretty similar!

What about sentence_1 and sentence_3 ?

In [132]:
cosine_similarity(bag_of_word[1], bag_of_word[2])

array([[0.35355339]])

We see a value of `0.25`, which means they are not as similar.

Remember, a value of `1` and the closer the value gets to `0` it means they are not the same. `-1` means they are the 'opposite' !

OK, that's it for now. All the best with the final project!

## Recommended Reading

**Multi-Label Classification: Yelp Example**<br>
http://mondego.ics.uci.edu/projects/yelp/

**Python page on Multi-Label and Multi-Class Classification**<br>
http://scikit-learn.org/stable/modules/multiclass.html

**GridSearchCV & Pipeline: Example**<br>
https://www.civisanalytics.com/blog/workflows-in-python-using-pipeline-and-gridsearchcv-for-more-compact-and-comprehensive-code/

**Classifier Chains**<br>
https://www.analyticsvidhya.com/blog/2017/08/introduction-to-multi-label-classification/ (see section 4.1.2)<br>
http://scikit-learn.org/stable/auto_examples/multioutput/plot_classifier_chain_yeast.html

---