# Other Models

I'd like to try several other simple models included in sklearn.  SVM, Random Forest, Hidden Markov, Gradient Boosting (lightgbm), multiclass module of sklearn seem like good places to start.  I will make copies of `lr_train.py` for each model.  In principle I could use a different model for each of root, quality, add, and inversion.

## Setup

In [14]:
import pandas as pd
import numpy as np
import librosa
import librosa.display
import matplotlib.pyplot as plt
import sys, os
from sklearn.model_selection import train_test_split
import sklearn
import re
import pickle
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression

if 'chord_loader' in sys.modules:
    del sys.modules['chord_loader']
sys.path.append('.')
import chord_loader

In [2]:
#copy code from lr_train so we can get to the point where these models can be tested. 
#Use 10% data for this purpose
source = 'partial_lr'
source_dir = 'Data/processed/'

#get information from processed data directory
data_info = pd.read_csv(source_dir + 'directory.csv')
curr_data_info = data_info.loc[data_info['filepath']==source,:]
if curr_data_info.shape[0] < 1:
    print('Source not found in directory')
    sys.exit(1)
curr_data_info = curr_data_info.iloc[-1,:]

#load data
features_train = np.load(f'{source_dir}{source}_ftrain.npy')
labels_train = np.load(f'{source_dir}{source}_ltrain.npy')
features_valid = np.load(f'{source_dir}{source}_fvalid.npy')
labels_valid = np.load(f'{source_dir}{source}_lvalid.npy')
features_test = np.load(f'{source_dir}{source}_ftest.npy')
labels_test = np.load(f'{source_dir}{source}_ltest.npy')
if curr_data_info['standard']:
    standard_features_train = np.load(f'{source_dir}{source}_fstrain.npy')
    standard_labels_train = np.load(f'{source_dir}{source}_lstrain.npy')
    standard_features_valid = np.load(f'{source_dir}{source}_fsvalid.npy')
    standard_labels_valid = np.load(f'{source_dir}{source}_lsvalid.npy')
    standard_features_test = np.load(f'{source_dir}{source}_fstest.npy')
    standard_labels_test = np.load(f'{source_dir}{source}_lstest.npy')


## Baseline: logistic regression

I've trained a logistic regression model already, but I'll train it in this specific setting just to get a baseline for comparison.

In [15]:
root_model = LogisticRegression(class_weight='balanced',multi_class='ovr',C=1.0,
                                                    solver='lbfgs', max_iter=1000)
root_model.fit(features_train, labels_train[:,0])

LogisticRegression(C=1.0, class_weight='balanced', dual=False,
                   fit_intercept=True, intercept_scaling=1, l1_ratio=None,
                   max_iter=1000, multi_class='ovr', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

In [None]:
root_predict_train = root_model.predict(features_train)

In [12]:
sklearn.metrics.accuracy_score(labels_train[:,0],root_predict_train)

0.0361722590297072

## SVM

In [6]:
#Make sure function behaves like I expect
root_model = SVC(class_weight='balanced',decision_function_shape='ovr',C=1.0, max_iter=1000)
root_model.fit(features_train, labels_train[:,0])



SVC(C=1.0, cache_size=200, class_weight='balanced', coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
    kernel='rbf', max_iter=1000, probability=False, random_state=None,
    shrinking=True, tol=0.001, verbose=False)

In [7]:
root_predict_train = root_model.predict(features_train)

In [12]:
sklearn.metrics.accuracy_score(labels_train[:,0],root_predict_train)

0.0361722590297072

11 minutes to train, 40 minutes to just make a prediction, and 4% accuracy!  Maybe SVM is bad for this problem.

## Gradient Boosting