# ![](https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png) Evaluating Model Fit
Week 5 Lesson 2.3

### LEARNING OBJECTIVES
*After this lesson, you will be able to:*
- Understand the fundamentals of evaluating classifiers
- Understand precision, recall, accuracy, and f1-score
- Know how to use sklearn.metrics functions to easily compute these metrics

## Domain & Data

### Domain

Prepared for the Neural Information Processing Symposium 2003 Feature Extraction Workshop

http://clopinet.com/isabelle/Projects/NIPS2003

### Data 

MADELON is an artificial dataset, which was part of the NIPS 2003 feature selection challenge. This is a two-class classification problem with continuous input variables. The difficulty is that the problem is multivariate and highly non-linear

MADELON is an artificial dataset containing data points grouped in 32 clusters placed on the vertices of a five dimensional hypercube and randomly labeled +1 or -1. The five dimensions constitute 5 informative features. 15 linear combinations of those features were added to form a set of 20 (redundant) informative features. Based on those 20 features one must separate the examples into the 2 classes (corresponding to the +-1 labels). We added a number of distractor feature called 'probes' having no predictive power. The order of the features and patterns were randomized. 




## Problem Statement

The NIPS 2003 challenge in feature selection is to find feature selection algorithms that significantly outperform methods using all features in performing a binary classification task.

## Solution Statement

We will develop a binary classification model using a K Nearest Neighbors classifier.


## Metric 

Today, we are largely exploring the dataset. We will use 
the default metric included with the classifier.

## Benchmark

We will be assessing after our work today what an appropriate benchmark might be.

In [1]:
from os import chdir; chdir('../lib')
from __future__ import print_function
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
%matplotlib inline

from madelon import load_madelon_set_into_df
from sklearn.model_selection import train_test_split

In [2]:
madelon_feature_df, madelon_target_df = load_madelon_set_into_df()

## Introduction: Key metrics (5 mins)

Classification problems and models are evaluated differently than regression models. Whereas regression models predict a continuous variable, classification problems predict probability of belonging to a class of outcome.

Instead of evaluating models based on error like in regression, **we evaluate the models based on the correct and incorrect labeling of classes**.

Most classification metrics are based on four outcome categories during prediction:

- **True Positives:** A positive class observation (1) is correctly classified as positive by the model.
- **False Positive:** A negative class observation (0) is incorrectly classified as positive.
- **True Negative:** A negative class observation is correctly classified as negative.
- **False Negative:** A positive class observation is incorrectly classified as negative.

In [None]:
def standard_classification(model, X, y, model_args,random_state_split,):
    X_train,     \
        X_test,  \
        y_train, \
        y_test = train_test_split(X, y, 
                                  random_state=random_state_split)

    y_train = np.ravel(y_train)
    y_test = np.ravel(y_test)
    
    this_model = model(**model_args)
    this_model.fit(X_train, y_train)
    
    training_predictions = this_model.predict(X_train)
    testing_predictions = this_model.predict(X_test)
    
    this_model_class_name = this_model.__class__.__name__
    this_model_args = ' '.join([str(key)+':'+str(val) 
                     for key,val in model_args.items()])
    
    print("{} {}".format(this_model_class_name,
                         this_model_args))
    
    return {'model': this_model,
            'y_train_pred' : training_predictions,
            'y_test_pred' : testing_predictions,
            'X_test' : X_test,
            'X_train' : X_train,
            'y_test' : y_test,
            'y_train' : y_train}

In [None]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression

In [None]:
trained_knn = standard_classification(KNeighborsClassifier,
                                      madelon_feature_df,
                                      madelon_target_df,
                                      {'n_neighbors' : 17,
                                       'n_jobs':-1},
                                      random_state_split=42)

trained_logreg = standard_classification(LogisticRegression,
                                         madelon_feature_df,
                                         madelon_target_df,
                                         {'random_state' : 17,
                                          'n_jobs':-1},
                                         random_state_split=42)

### Classification evaluation metric fundamentals

##### Confusion matrix

In [None]:
from sklearn.metrics import confusion_matrix

In [None]:
def labeled_confusion_matrix(trained_model):
    conf_mat = pd.DataFrame(confusion_matrix(trained_model['y_test'], 
                                             trained_model['y_test_pred']))
    conf_mat.index = [str(cls)+'_act' 
                      for cls in trained_model['model'].classes_]

    conf_mat.columns = [str(cls)+'_pred' 
                      for cls in trained_model['model'].classes_]

    return conf_mat

The confusion matrix is very basic, and while may not seem that useful contains all of the information required for calculating more complex evaluation metrics.

A confusion matrix has as rows and columns the classes modeled by your classifier. In the case of logistic regression this will be a 2x2 matrix. Rows indicate the actual class, and columns indicate the predicted class.

In [None]:
labeled_confusion_matrix(trained_knn)

In [None]:
labeled_confusion_matrix(trained_logreg)

In the example above:
  
- 141 of the negatives were correctly identified
- 131 of the positives were correctly identified
- 108 negatives were classified as positive
- 120 positives were classified as negative


  From the 2-variable confusion matrix we can calculate **true positives**, **false positives**, **true negatives**, and **false negatives** directly from the cells. Tuning your model to adjust these metrics depends on your priorities. In healthcare for example, you may want to minimize false negatives at the expense of more false positives.

#### Independent Practice

Update the method below to return true postives, false positives, true negatives, false negatives.

In [None]:
def labeled_confusion_matrix(trained_model):
    conf_mat = pd.DataFrame(confusion_matrix(trained_model['y_test'], 
                                             trained_model['y_test_pred']))
    conf_mat.index = [str(cls)+'_act' 
                      for cls in trained_model['model'].classes_]

    conf_mat.columns = [str(cls)+'_pred' 
                      for cls in trained_model['model'].classes_]

    try:
        return conf_mat, true_postives, false_positives, true_negatives, false_negatives
    except NameError:
        print("Looks like you have some work to do!")

In [None]:
cm, tp, fp, tn, fn = labeled_confusion_matrix(trained_logreg)

In [None]:
cm

In [None]:
tp, fp, tn, fn

### Accuracy

  Accuracy is simply the *proportion of classes correctly predicted by the model*.


$$\text{Accuracy} = \frac{\text{True Positive} + \text{True Negatives}}{Total}$$

#### Idependent Practice

Complete the following "roll your own" method for calculating accuracy.

In [None]:
from sklearn.metrics import accuracy_score

def my_accuracy(trained_model):
    return None

assert my_accuracy(trained_knn) == accuracy_score(trained_knn['y_test'],
                                                  trained_knn['y_test_pred']), \
         'Those are not the same'

### Precision
*Precision* is the ability of the classifier to *avoid mislabeling when the observation belongs in another class.*


$$\text{Precision} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}}$$


  A precision score of 1 indicates that the classifier never mistakenly added observations from another class. A precision score of 0 would mean that the classifier misclassified every instance of the current class.

#### Idependent Practice

Complete the following "roll your own" method for calculating precision.

In [None]:
from sklearn.metrics import precision_score

def my_precision(trained_model):
    return None

assert my_precision(trained_knn) == precision_score(trained_knn['y_test'],
                                                    trained_knn['y_test_pred']), \
         'Those are not the same'

### Recall

*recall* is the ability of the classifier to correctly identify all observations in the current class.

$$\text{Recall} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}}$$

A recall score of 1 indicates that the classifier correctly predicted (found) all observations of the current class (by implication, no false negatives, or misclassifications of the current class). A recall score of 0 alternatively means that the classifier missed all observations of the current class.


#### Idependent Practice

Complete the following "roll your own" method for calculating recall.

In [None]:
from sklearn.metrics import recall_score

def my_recall(trained_model):
    return None

assert my_recall(trained_knn) == recall_score(trained_knn['y_test'],
                                              trained_knn['y_test_pred']), \
         'Those are not the same'

### F1-Score

**f1-score** is the harmonic mean of the precision and recall. The harmonic mean is used here rather than the more conventional arithmetic mean because the harmonic mean is more appropriate for averaging rates.

The f1-score's best value is 1 and worst value is 0, like the precision and recall scores. It is a useful metric for taking into account both measures at once.

$$\text{F}1\text{-Score} = 2\cdot\frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}$$

**support** is simply the number of observations of the labelled class.

#### Idependent Practice

Complete the following "roll your own" method for calculating F1-Score.

In [None]:
from sklearn.metrics import f1_score

def my_f1_score(trained_model):
    return None

assert my_f1_score(trained_knn) == f1_score(trained_knn['y_test'],
                                            trained_knn['y_test_pred']), \
         'Those are not the same'