# Illustration of Multi Label Classification on the cars dataset

### Importing all the required Libraries and statements to avoid warning

In [1]:
%matplotlib inline
from sklearn.datasets import load_breast_cancer
from sklearn import tree,linear_model,neighbors, datasets
from sklearn.model_selection import cross_val_score, train_test_split, GridSearchCV, KFold
from sklearn.metrics import accuracy_score,confusion_matrix,classification_report, roc_curve, auc
from sklearn.utils.multiclass import unique_labels
from sklearn.naive_bayes import MultinomialNB
from sklearn.exceptions import ConvergenceWarning, DataConversionWarning
from sklearn.svm import SVC
from sklearn.preprocessing import OrdinalEncoder
from sklearn.preprocessing import label_binarize, StandardScaler
import scikitplot as skplt
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

In [2]:
# Ignoring warnings for clean output
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=ConvergenceWarning)
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=DataConversionWarning)
warnings.filterwarnings("ignore", category=DataConversionWarning)

### Loading the cars dataset and exploring the dataset to understand the variables and target variable

In [3]:
cars = pd.read_csv('car.data',header = None)

In [4]:
type(cars)

pandas.core.frame.DataFrame

In [5]:
cars.head(3)

Unnamed: 0,0,1,2,3,4,5,6
0,vhigh,vhigh,2,2,small,low,unacc
1,vhigh,vhigh,2,2,small,med,unacc
2,vhigh,vhigh,2,2,small,high,unacc


In [6]:
cars.columns = ['buying','maint','doors',
                     'persons','lug_boot','safety','class']

In [7]:
cars.shape

(1728, 7)

the dataset has around 1728 records with 7 variable, the one 6 is the features and the 7 variable is the target variable (class) which needs to be classified 

In [8]:
cars.describe()

Unnamed: 0,buying,maint,doors,persons,lug_boot,safety,class
count,1728,1728,1728,1728,1728,1728,1728
unique,4,4,4,3,3,3,4
top,high,high,5more,more,big,high,unacc
freq,432,432,432,576,576,576,1210


It can be observed that all the featues have all the values so missing data treatment is not required, all the features are ordinal since they have 4 distinct classes which have order.

Splitting the target variable from features so that the features can be pre processed before training

In [9]:
features = cars.loc[:,'buying':'safety']
features1 = features
target = cars[['class']]

In [10]:
features.head()

Unnamed: 0,buying,maint,doors,persons,lug_boot,safety
0,vhigh,vhigh,2,2,small,low
1,vhigh,vhigh,2,2,small,med
2,vhigh,vhigh,2,2,small,high
3,vhigh,vhigh,2,2,med,low
4,vhigh,vhigh,2,2,med,med


In [11]:
features_one_hot = pd.get_dummies(features1, drop_first=True)

In [12]:
features_one_hot.head()

Unnamed: 0,buying_low,buying_med,buying_vhigh,maint_low,maint_med,maint_vhigh,doors_3,doors_4,doors_5more,persons_4,persons_more,lug_boot_med,lug_boot_small,safety_low,safety_med
0,0,0,1,0,0,1,0,0,0,0,0,0,1,1,0
1,0,0,1,0,0,1,0,0,0,0,0,0,1,0,1
2,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0
3,0,0,1,0,0,1,0,0,0,0,0,1,0,1,0
4,0,0,1,0,0,1,0,0,0,0,0,1,0,0,1


In [13]:
target.head()

Unnamed: 0,class
0,unacc
1,unacc
2,unacc
3,unacc
4,unacc


In [14]:
target['class'].unique()

array(['unacc', 'acc', 'vgood', 'good'], dtype=object)

In [15]:
classes = {'unacc': 0,'acc': 1,'good':2,'vgood':3} 
 
target['class'] = [classes[item] for item in target['class']]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [16]:
target['class'].value_counts()

0    1210
1     384
2      69
3      65
Name: class, dtype: int64

In [17]:
enc = OrdinalEncoder()
features = enc.fit_transform(features)
features = pd.DataFrame(features)

In [18]:
features.head()

Unnamed: 0,0,1,2,3,4,5
0,3.0,3.0,0.0,0.0,2.0,1.0
1,3.0,3.0,0.0,0.0,2.0,2.0
2,3.0,3.0,0.0,0.0,2.0,0.0
3,3.0,3.0,0.0,0.0,1.0,1.0
4,3.0,3.0,0.0,0.0,1.0,2.0


In [19]:
features.columns = ['buying','maint','doors',
                     'persons','lug_boot','safety']

In [20]:
features.head()

Unnamed: 0,buying,maint,doors,persons,lug_boot,safety
0,3.0,3.0,0.0,0.0,2.0,1.0
1,3.0,3.0,0.0,0.0,2.0,2.0
2,3.0,3.0,0.0,0.0,2.0,0.0
3,3.0,3.0,0.0,0.0,1.0,1.0
4,3.0,3.0,0.0,0.0,1.0,2.0


Now that entire data is converted to numericals, we shall start with the modeling process 

### Splitting data in train and test 

In [21]:
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.30,random_state=45,stratify = target)
X1_train, X1_test, y1_train, y1_test = train_test_split(features_one_hot, target, test_size=0.30,random_state=44,stratify = target)

In [22]:
unique, counts = np.unique(y_train, return_counts=True)
dict(zip(unique, counts))

{0: 847, 1: 269, 2: 48, 3: 45}

In [23]:
unique, counts = np.unique(y_test, return_counts=True)
dict(zip(unique, counts))

{0: 363, 1: 115, 2: 21, 3: 20}

the data is split in the required ratio and the class labels ratio is maintained hence, there is no need for stratified samplied. we can continue with random split

# Decision Tree

Algorithms like decision tree has many hyperparameters which we can tweak.Grid Search method from sklearn can be used so that we can test a lot of hyperparamters and do cross validation of each to get the best set of hyper parameters.
Below is the code for decision tree hyperparameter optimization using grid search

Max_depth, min_samples in leaf and min_impurity_decrease hyperparameters is used for the model because max depth and min samples in leaf nodes should put a constraint on the tree growing full to the each individual node which would lead to overfitting and min_impurity_decrease is used to deal with underfitting because if a node is impurity with this threshold the try will try to split it and try to make the leaf nodes pure than the parent.

In [24]:
tuned_parameters = {'max_depth': np.arange(3,7),'min_samples_leaf': np.arange(5,30),"criterion":["gini","entropy"],"min_impurity_decrease":[1e-07,1e-06,1e-05,1e-04,1e-03,1e-02,1e-01,1]}

inner_cv = KFold(n_splits=4, shuffle=True)
outer_cv = KFold(n_splits=4, shuffle=True)

grid_tree = tree.DecisionTreeClassifier(random_state=45)

#Nested CV inner loop
grid = GridSearchCV(grid_tree, tuned_parameters, cv = inner_cv, scoring='accuracy')
grid.fit(X_train,y_train)

#Nested CV outer loop
nested_score = cross_val_score(grid, features, target, cv=outer_cv,scoring ='accuracy')

In [25]:
# Mean Accuracy with +/- 2 std deviations
print("Using Nested CV with grid search,accuracy: {0:.2%} +/- {1:.2%}".format(nested_score.mean(), nested_score.std() * 2))
print()
print ("The best hyper-parameters to get this accuracy is :-\n", grid.best_params_)
print()
print ("The best decision tree classifier is :-\n", grid.best_estimator_)
y_pred = grid.best_estimator_.predict(X_test)

#Goodness Measures confusion matrix and other measures like accuracy, precision,recall
print("Confusion Matrix: - \n",confusion_matrix(y_test, y_pred))
print()
print("Classification Report: - \n",classification_report(y_test, y_pred))

Using Nested CV with grid search,accuracy: 87.33% +/- 3.30%

The best hyper-parameters to get this accuracy is :-
 {'criterion': 'entropy', 'max_depth': 6, 'min_impurity_decrease': 1e-07, 'min_samples_leaf': 7}

The best decision tree classifier is :-
 DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=6,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=1e-07, min_impurity_split=None,
            min_samples_leaf=7, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=45,
            splitter='best')
Confusion Matrix: - 
 [[327  33   3   0]
 [  0  97  11   7]
 [  0  10   9   2]
 [  0   0   0  20]]

Classification Report: - 
               precision    recall  f1-score   support

           0       1.00      0.90      0.95       363
           1       0.69      0.84      0.76       115
           2       0.39      0.43      0.41        21
           3       0.69      1.00      0.82        20

**Model Goodness**

The class distribution is 1210 unacceptable, 384 acceptable ,69 good, 65 very good cars, the decision tree classifier without one hot encoding gives an **Accuracy of 87.33% +/- 3.30%** with a **precision of 90% and recall of 87%**

Our model was chosen based of **f1-score which is 88%** which is the harmonic mean of precision and recall and hence a good measure to determine a good fit.

In [26]:
tuned_parameters = {'max_depth': np.arange(3,7),'min_samples_leaf': np.arange(5,30),"criterion":["gini","entropy"],"min_impurity_decrease":[1e-07,1e-06,1e-05,1e-04,1e-03,1e-02,1e-01,1]}

inner_cv = KFold(n_splits=4, shuffle=True)
outer_cv = KFold(n_splits=4, shuffle=True)

grid_tree = tree.DecisionTreeClassifier(random_state=44)

#Nested CV inner loop
grid = GridSearchCV(grid_tree, tuned_parameters, cv = inner_cv, scoring='accuracy')
grid.fit(X1_train,y1_train)

#Nested CV outer loop
nested_score = cross_val_score(grid, features_one_hot, target, cv=outer_cv,scoring ='accuracy')

In [27]:
# Mean Accuracy with +/- 2 std deviations
print("Using Nested CV with grid search,accuracy: {0:.2%} +/- {1:.2%}".format(nested_score.mean(), nested_score.std() * 2))
print()
print ("The best hyper-parameters to get this accuracy is :-\n", grid.best_params_)
print()
print ("The best decision tree classifier is :-\n", grid.best_estimator_)
y1_pred = grid.best_estimator_.predict(X1_test)

#Goodness Measures confusion matrix and other measures like accuracy, precision,recall
print("Confusion Matrix: - \n",confusion_matrix(y1_test, y1_pred))
print()
print("Classification Report: - \n",classification_report(y1_test, y1_pred))

Using Nested CV with grid search,accuracy: 83.62% +/- 2.13%

The best hyper-parameters to get this accuracy is :-
 {'criterion': 'gini', 'max_depth': 6, 'min_impurity_decrease': 1e-07, 'min_samples_leaf': 6}

The best decision tree classifier is :-
 DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=6,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=1e-07, min_impurity_split=None,
            min_samples_leaf=6, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=44,
            splitter='best')
Confusion Matrix: - 
 [[328  35   0   0]
 [ 13  91  11   0]
 [  0  12   6   3]
 [  0  13   3   4]]

Classification Report: - 
               precision    recall  f1-score   support

           0       0.96      0.90      0.93       363
           1       0.60      0.79      0.68       115
           2       0.30      0.29      0.29        21
           3       0.57      0.20      0.30        20

   m

**Model Goodness**

The class distribution is 1210 unacceptable, 384 acceptable ,69 good, 65 very good cars, the decision tree classifier without one hot encoding gives an **Accuracy of 83.62% +/- 2.13%** with a **precision of 84% and recall of 83%**

Our model was chosen based of **f1-score which is 83%** which is the harmonic mean of precision and recall and hence a good measure to determine a good fit.

### 2. K-NN  

KNN is based on distances between data points, since we have ordinal variables we cannot say that difference between 1-2 is **not same** as 2-3 so for KNN we are running the one hot encoded version.

In [28]:
param_grid = {'n_neighbors' : np.arange(1,30), 'weights' : ['uniform','distance']}

grid_knn_clf = neighbors.KNeighborsClassifier()

inner_cv = KFold(n_splits=4, shuffle=True, random_state=45)
outer_cv = KFold(n_splits=4, shuffle=True, random_state=45)

#Nested CV innner loop
grid_knn = GridSearchCV(grid_knn_clf, param_grid, cv = inner_cv, scoring='accuracy')
grid_knn.fit(X1_train,y1_train)

#Nested CV outer loop
nested_score = cross_val_score(grid_knn, features_one_hot, target, cv=outer_cv,scoring='accuracy')

In [29]:
# Mean Accuracy with +/- 2 std deviations
print("Using Nested CV with grid search,accuracy: {0:.2%} +/- {1:.2%}".format(nested_score.mean(), nested_score.std() * 2))
print()
print ("The best hyper-parameters to get this accuracy is :-\n", grid_knn.best_params_)
print()
print ("The best decision tree classifier is :-\n", grid_knn.best_estimator_)
y1_pred = grid_knn.best_estimator_.predict(X1_test)

#Goodness Measures confusion matrix and other measures like accuracy, precision,recall
print("Confusion Matrix: - \n",confusion_matrix(y1_test, y1_pred))
print()
print("Classification Report: - \n",classification_report(y1_test, y1_pred))

Using Nested CV with grid search,accuracy: 82.47% +/- 4.01%

The best hyper-parameters to get this accuracy is :-
 {'n_neighbors': 10, 'weights': 'distance'}

The best decision tree classifier is :-
 KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=None, n_neighbors=10, p=2,
           weights='distance')
Confusion Matrix: - 
 [[355   8   0   0]
 [ 39  74   2   0]
 [  9   8   3   1]
 [  7   5   2   6]]

Classification Report: - 
               precision    recall  f1-score   support

           0       0.87      0.98      0.92       363
           1       0.78      0.64      0.70       115
           2       0.43      0.14      0.21        21
           3       0.86      0.30      0.44        20

   micro avg       0.84      0.84      0.84       519
   macro avg       0.73      0.52      0.57       519
weighted avg       0.83      0.84      0.82       519



**Model Goodness**

The class distribution is 1210 unacceptable, 384 acceptable ,69 good, 65 very good cars, the decision tree classifier without one hot encoding gives an **Accuracy of 82.47% +/- 4.01%** with a **precision of 83% and recall of 84%**

Our model was chosen based of **f1-score which is 82%** which is the harmonic mean of precision and recall and hence a good measure to determine a good fit.

## 3. Logistic Regression

In [32]:
grid_values = {
               'C':[1e-4,0.001,.009,0.01,.09,1,5,10,25,100,1000,1e4],
               'multi_class' : ['multinomial'],
              'solver': ['lbfgs']}

grid_log_clf = linear_model.LogisticRegression(random_state=45)

inner_cv = KFold(n_splits=4, shuffle=True, random_state=45)
outer_cv = KFold(n_splits=4, shuffle=True, random_state=45)

grid_logit = GridSearchCV(grid_log_clf, grid_values, cv = inner_cv, scoring='accuracy')
grid_logit.fit(X_train,y_train)

# Nested CV with parameter optimization
nested_score = cross_val_score(grid_logit, features, target, cv=outer_cv,scoring = 'accuracy')

In [33]:
# Mean Accuracy with +/- 2 std deviations
print("Using Nested CV with grid search,accuracy: {0:.2%} +/- {1:.2%}".format(nested_score.mean(), nested_score.std() * 2))
print()
print ("The best hyper-parameters to get this accuracy is :-\n", grid_logit.best_params_)
print()
print ("The best decision tree classifier is :-\n", grid_logit.best_estimator_)
y_pred = grid_logit.best_estimator_.predict(X_test)

#Goodness Measures confusion matrix and other measures like accuracy, precision,recall
print("Confusion Matrix: - \n",confusion_matrix(y_test, y_pred))
print()
print("Classification Report: - \n",classification_report(y_test, y_pred))

Using Nested CV with grid search,accuracy: 70.14% +/- 4.61%

The best hyper-parameters to get this accuracy is :-
 {'C': 0.09, 'multi_class': 'multinomial', 'solver': 'lbfgs'}

The best decision tree classifier is :-
 LogisticRegression(C=0.09, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='multinomial',
          n_jobs=None, penalty='l2', random_state=45, solver='lbfgs',
          tol=0.0001, verbose=0, warm_start=False)
Confusion Matrix: - 
 [[336  27   0   0]
 [ 91  24   0   0]
 [ 19   2   0   0]
 [ 14   6   0   0]]

Classification Report: - 
               precision    recall  f1-score   support

           0       0.73      0.93      0.82       363
           1       0.41      0.21      0.28       115
           2       0.00      0.00      0.00        21
           3       0.00      0.00      0.00        20

   micro avg       0.69      0.69      0.69       519
   macro avg       0.28      0.28      0.27       519
weig

  'precision', 'predicted', average, warn_for)


**Model Goodness**

The class distribution is 1210 unacceptable, 384 acceptable ,69 good, 65 very good cars, the decision tree classifier without one hot encoding gives an **Accuracy of 70.14% +/- 4.61%** with a **precision of 60% and recall of 69%**

Our model was chosen based of **f1-score which is 63%** which is the harmonic mean of precision and recall and hence a good measure to determine a good fit.

In [34]:
grid_values = {'penalty': ['l1', 'l2'], \
               'C':[1e-4,0.001,.009,0.01,.09,1,5,10,25,100,1000,1e4],
               'multi_class' : ['multinomial'],
              'solver': ['saga']}

grid_log_clf = linear_model.LogisticRegression(random_state=44)

inner_cv = KFold(n_splits=4, shuffle=True, random_state=45)
outer_cv = KFold(n_splits=4, shuffle=True, random_state=45)

grid_logit = GridSearchCV(grid_log_clf, grid_values, cv = inner_cv, scoring='accuracy')
grid_logit.fit(X1_train,y1_train)

# Nested CV with parameter optimization
nested_score = cross_val_score(grid_logit, features_one_hot, target, cv=outer_cv,scoring = 'accuracy')

In [35]:
# Mean Accuracy with +/- 2 std deviations
print("Using Nested CV with grid search,accuracy: {0:.2%} +/- {1:.2%}".format(nested_score.mean(), nested_score.std() * 2))
print()
print ("The best hyper-parameters to get this accuracy is :-\n", grid_logit.best_params_)
print()
print ("The best decision tree classifier is :-\n", grid_logit.best_estimator_)
y1_pred = grid_logit.best_estimator_.predict(X1_test)

#Goodness Measures confusion matrix and other measures like accuracy, precision,recall
print("Confusion Matrix: - \n",confusion_matrix(y1_test, y1_pred))
print()
print("Classification Report: - \n",classification_report(y1_test, y1_pred))

Using Nested CV with grid search,accuracy: 92.82% +/- 2.20%

The best hyper-parameters to get this accuracy is :-
 {'C': 100, 'multi_class': 'multinomial', 'penalty': 'l1', 'solver': 'saga'}

The best decision tree classifier is :-
 LogisticRegression(C=100, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='multinomial',
          n_jobs=None, penalty='l1', random_state=44, solver='saga',
          tol=0.0001, verbose=0, warm_start=False)
Confusion Matrix: - 
 [[353   9   1   0]
 [ 21  91   1   2]
 [  0   2  14   5]
 [  0   0   1  19]]

Classification Report: - 
               precision    recall  f1-score   support

           0       0.94      0.97      0.96       363
           1       0.89      0.79      0.84       115
           2       0.82      0.67      0.74        21
           3       0.73      0.95      0.83        20

   micro avg       0.92      0.92      0.92       519
   macro avg       0.85      0.85      0.84  

**Model Goodness**

The class distribution is 1210 unacceptable, 384 acceptable ,69 good, 65 very good cars, the decision tree classifier without one hot encoding gives an **Accuracy of 92.82% +/- 2.20%** with a **precision of 92% and recall of 92%**

Our model was chosen based of **f1-score which is 92%** which is the harmonic mean of precision and recall and hence a good measure to determine a good fit.

## 4. Naive Bayesian

In [38]:
grid_values = {'alpha' : [1,2,3,4,5,6,7,8,9,10]}

grid_NB_clf = MultinomialNB()

inner_cv = KFold(n_splits=4, shuffle=True, random_state=45)
outer_cv = KFold(n_splits=4, shuffle=True, random_state=45)

grid_NB = GridSearchCV(grid_NB_clf, grid_values, cv = inner_cv, scoring='accuracy')
grid_NB.fit(X_train,y_train)

# Nested CV with parameter optimization
nested_score = cross_val_score(grid_NB, features, target, cv=outer_cv,scoring = 'accuracy')

In [39]:
# Mean Accuracy with +/- 2 std deviations
print("Using Nested CV with grid search,accuracy: {0:.2%} +/- {1:.2%}".format(nested_score.mean(), nested_score.std() * 2))
print()
print ("The best hyper-parameters to get this accuracy is :-\n", grid_NB.best_params_)
print()
print ("The best decision tree classifier is :-\n", grid_NB.best_estimator_)
y_pred = grid_NB.best_estimator_.predict(X_test)

#Goodness Measures confusion matrix and other measures like accuracy, precision,recall
print("Confusion Matrix: - \n",confusion_matrix(y_test, y_pred))
print()
print("Classification Report: - \n",classification_report(y_test, y_pred))

Using Nested CV with grid search,accuracy: 70.08% +/- 4.55%

The best hyper-parameters to get this accuracy is :-
 {'alpha': 1}

The best decision tree classifier is :-
 MultinomialNB(alpha=1, class_prior=None, fit_prior=True)
Confusion Matrix: - 
 [[363   0   0   0]
 [113   2   0   0]
 [ 21   0   0   0]
 [ 20   0   0   0]]

Classification Report: - 
               precision    recall  f1-score   support

           0       0.70      1.00      0.82       363
           1       1.00      0.02      0.03       115
           2       0.00      0.00      0.00        21
           3       0.00      0.00      0.00        20

   micro avg       0.70      0.70      0.70       519
   macro avg       0.43      0.25      0.21       519
weighted avg       0.71      0.70      0.58       519



  'precision', 'predicted', average, warn_for)


**Model Goodness**

The class distribution is 1210 unacceptable, 384 acceptable ,69 good, 65 very good cars, the decision tree classifier without one hot encoding gives an **Accuracy of 70.08% +/- 4.55%** with a **precision of 71% and recall of 70%**

Our model was chosen based of **f1-score which is 58%** which is the harmonic mean of precision and recall and hence a good measure to determine a good fit.

In [40]:
grid_values = {'alpha' : [1,2,3,4,5,6,7,8,9,10]}

grid_NB_clf = MultinomialNB()

inner_cv = KFold(n_splits=4, shuffle=True, random_state=45)
outer_cv = KFold(n_splits=4, shuffle=True, random_state=45)

grid_NB = GridSearchCV(grid_NB_clf, grid_values, cv = inner_cv, scoring='accuracy')
grid_NB.fit(X1_train,y1_train)

# Nested CV with parameter optimization
nested_score = cross_val_score(grid_NB, features_one_hot, target, cv=outer_cv,scoring = 'accuracy')

In [42]:
# Mean Accuracy with +/- 2 std deviations
print("Using Nested CV with grid search,accuracy: {0:.2%} +/- {1:.2%}".format(nested_score.mean(), nested_score.std() * 2))
print()
print ("The best hyper-parameters to get this accuracy is :-\n", grid_NB.best_params_)
print()
print ("The best decision tree classifier is :-\n", grid_NB.best_estimator_)
y1_pred = grid_NB.best_estimator_.predict(X1_test)

#Goodness Measures confusion matrix and other measures like accuracy, precision,recall
print("Confusion Matrix: - \n",confusion_matrix(y1_test, y1_pred))
print()
print("Classification Report: - \n",classification_report(y1_test, y1_pred))

Using Nested CV with grid search,accuracy: 73.78% +/- 5.51%

The best hyper-parameters to get this accuracy is :-
 {'alpha': 1}

The best decision tree classifier is :-
 MultinomialNB(alpha=1, class_prior=None, fit_prior=True)
Confusion Matrix: - 
 [[360   3   0   0]
 [ 87  28   0   0]
 [ 13   8   0   0]
 [ 16   4   0   0]]

Classification Report: - 
               precision    recall  f1-score   support

           0       0.76      0.99      0.86       363
           1       0.65      0.24      0.35       115
           2       0.00      0.00      0.00        21
           3       0.00      0.00      0.00        20

   micro avg       0.75      0.75      0.75       519
   macro avg       0.35      0.31      0.30       519
weighted avg       0.67      0.75      0.68       519



  'precision', 'predicted', average, warn_for)


**Model Goodness**

The class distribution is 1210 unacceptable, 384 acceptable ,69 good, 65 very good cars, the decision tree classifier without one hot encoding gives an **Accuracy of 73.78% +/- 5.51%** with a **precision of 67% and recall of 75%**

Our model was chosen based of **f1-score which is 68%** which is the harmonic mean of precision and recall and hence a good measure to determine a good fit.

## 5. SVM

In [46]:
param_grid = {'kernel':['linear','rbf'],'C': [0.01, 0.1, 1, 10, 100], 'gamma' :[0.001, 0.01, 0.1, 1]}

grid_svc_clf= SVC(random_state = 45)

inner_cv = KFold(n_splits=4, shuffle=True, random_state=45)
outer_cv = KFold(n_splits=4, shuffle=True, random_state=45)

grid_svm = GridSearchCV(grid_svc_clf, param_grid, cv = inner_cv, scoring='accuracy')
grid_svm.fit(X_train,y_train)

# Nested CV with parameter optimization
nested_score = cross_val_score(grid_svm, features, target, cv=outer_cv,scoring='accuracy')

In [47]:
# Mean Accuracy with +/- 2 std deviations
print("Using Nested CV with grid search,accuracy: {0:.2%} +/- {1:.2%}".format(nested_score.mean(), nested_score.std() * 2))
print()
print ("The best hyper-parameters to get this accuracy is :-\n", grid_svm.best_params_)
print()
print ("The best decision tree classifier is :-\n", grid_svm.best_estimator_)
y_pred = grid_svm.best_estimator_.predict(X_test)

#Goodness Measures confusion matrix and other measures like accuracy, precision,recall
print("Confusion Matrix: - \n",confusion_matrix(y_test, y_pred))
print()
print("Classification Report: - \n",classification_report(y_test, y_pred))

Using Nested CV with grid search,accuracy: 99.25% +/- 0.76%

The best hyper-parameters to get this accuracy is :-
 {'C': 100, 'gamma': 0.1, 'kernel': 'rbf'}

The best decision tree classifier is :-
 SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.1, kernel='rbf',
  max_iter=-1, probability=False, random_state=45, shrinking=True,
  tol=0.001, verbose=False)
Confusion Matrix: - 
 [[360   3   0   0]
 [  0 115   0   0]
 [  0   0  21   0]
 [  0   1   0  19]]

Classification Report: - 
               precision    recall  f1-score   support

           0       1.00      0.99      1.00       363
           1       0.97      1.00      0.98       115
           2       1.00      1.00      1.00        21
           3       1.00      0.95      0.97        20

   micro avg       0.99      0.99      0.99       519
   macro avg       0.99      0.99      0.99       519
weighted avg       0.99      0.99      0.99       519



**Model Goodness**

The class distribution is 1210 unacceptable, 384 acceptable ,69 good, 65 very good cars, the decision tree classifier without one hot encoding gives an **Accuracy of 99.25% +/- 0.76%** with a **precision of 99% and recall of 99%**

Our model was chosen based of **f1-score which is 99%** which is the harmonic mean of precision and recall and hence a good measure to determine a good fit.

In [48]:
param_grid = {'kernel':['linear','rbf'],'C': [0.01, 0.1, 1, 10, 100], 'gamma' :[0.001, 0.01, 0.1, 1]}

grid_svc_clf= SVC(random_state = 45)

inner_cv = KFold(n_splits=4, shuffle=True, random_state=45)
outer_cv = KFold(n_splits=4, shuffle=True, random_state=45)

grid_svm = GridSearchCV(grid_svc_clf, param_grid, cv = inner_cv, scoring='accuracy')
grid_svm.fit(X1_train,y1_train)

# Nested CV with parameter optimization
nested_score = cross_val_score(grid_svm, features_one_hot, target, cv=outer_cv,scoring='accuracy')

In [49]:
# Mean Accuracy with +/- 2 std deviations
print("Using Nested CV with grid search,accuracy: {0:.2%} +/- {1:.2%}".format(nested_score.mean(), nested_score.std() * 2))
print()
print ("The best hyper-parameters to get this accuracy is :-\n", grid_svm.best_params_)
print()
print ("The best decision tree classifier is :-\n", grid_svm.best_estimator_)
y1_pred = grid_svm.best_estimator_.predict(X1_test)

#Goodness Measures confusion matrix and other measures like accuracy, precision,recall
print("Confusion Matrix: - \n",confusion_matrix(y1_test, y1_pred))
print()
print("Classification Report: - \n",classification_report(y1_test, y1_pred))

Using Nested CV with grid search,accuracy: 99.25% +/- 1.15%

The best hyper-parameters to get this accuracy is :-
 {'C': 100, 'gamma': 0.1, 'kernel': 'rbf'}

The best decision tree classifier is :-
 SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.1, kernel='rbf',
  max_iter=-1, probability=False, random_state=45, shrinking=True,
  tol=0.001, verbose=False)
Confusion Matrix: - 
 [[362   1   0   0]
 [  1 114   0   0]
 [  0   0  21   0]
 [  0   0   0  20]]

Classification Report: - 
               precision    recall  f1-score   support

           0       1.00      1.00      1.00       363
           1       0.99      0.99      0.99       115
           2       1.00      1.00      1.00        21
           3       1.00      1.00      1.00        20

   micro avg       1.00      1.00      1.00       519
   macro avg       1.00      1.00      1.00       519
weighted avg       1.00      1.00      1.00       519



**Model Goodness**

The class distribution is 1210 unacceptable, 384 acceptable ,69 good, 65 very good cars, the decision tree classifier without one hot encoding gives an **Accuracy of 99.25% +/- 1.15%** with a **precision of 100% and recall of 100%**

Our model was chosen based of **f1-score which is 100%** which is the harmonic mean of precision and recall and hence a good measure to determine a good fit.

Of the 9 classifiers test above SVM gives the highest accuracy of 99.25 +/- 0.76%. 
 [[360   3   0   0]
 [  0 115   0   0]
 [  0   0  21   0]
 [  0   1   0  19]]
 
 Above is the confusion for the same. it can be seen that classes 1,3,4 are accurately predicted whereas only class 2 has some mis classifications, 3 instances of class 2 has been predictd as class 1 and 1 instance as class 4. this is fine because these small mis classifications could be because of outliers of class 2. Overall SVM is able to accurately classofy all classes and is a very good classifier.

For the classifier like SVM, Naive Bayes, Logistic and decision tree both the one hot encoded version as well as the numerical methos was run. It was noticed that one hot encoded (categorical) version gave a better or smae accuracy as the numerical data especially for logistic regression it is seen that the accuracy increases from 70% to 92% when the data is changed to to categorical.

Pros of one hot encoding :
Since the values of a feature is represented as separate column, effect of individual value in classification can be used in modelling process

Cons of one hot encoding :
The number of dimensions increases which in turn may lead to poor model

Pros of Numerical :
Computation is faster when the data is made numerical as compared one hot encoding

Cons of numerical:
When the data is ordinal, the difference between 1-2 may not be the same as 2-3 so for algorithms like KNN it may not be helpful.