# Pragmatic Evaluation (With Solutions)

In this notebook, you will learn how to run a k-flod test, as well as create different metrics based on confusion matrix

## Authors
- Xiao Fu xiao.fu.20@ucl.ac.uk

## Learning Outcomes
- **K-Fold Cross Validation:** Testing accuracy for just once doesn't account for the variance in the data and might give misleading results. K-Fold validation randomly selects one of $k$ parts of the data set then tests the accuracy on the same. After required number of iterations, the accuracy is averaged
- **Confusion matrix:** Confusion matrix is used only on classification tasks. It describes the following matrix

## Source
- https://github.com/maykulkarni/Machine-Learning-Notebooks


## Task

1. Run a K-fold test
2. Build evaluation metrics

## Importing Libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import KFold
from sklearn.metrics import accuracy_score
from sklearn.svm import SVC

## Importing the Dataset

A simple dataset available from https://www.kaggle.com/datasets/rakeshrau/social-network-ads/

In [None]:
df = pd.read_csv('data/Social_Network_Ads.csv')
X = df.iloc[:, 2:4]
y = df.iloc[:, 4]

df.head(10)

In [None]:
# Checking data X
X.head(10)

In [None]:
# Checking targets y
y.head(10)

# K-Fold Cross Validation
https://machinelearningmastery.com/k-fold-cross-validation/


        
Cross-validation is a method used to assess machine learning models on a limited data sample.

This method involves a key parameter named k, indicating the number of subsets the dataset should be divided into. This is why it's frequently referred to as k-fold cross-validation. When you choose a specific value for k, you can name the procedure accordingly, for instance, if k=10, it's called 10-fold cross-validation.

Here's the step-by-step process:

1. Shuffle the dataset (Recommended but not included in this notebook).

2. Divide the dataset into k subsets.

3. For each distinct subset:

    Use this subset as the testing data.

    Utilize the other subsets as the training data.

    Train a model using the training data and test its performance on the testing data.

    Keep the performance metric and then discard that particular model.

## Index split
In the first step, finish `k_flod_split_index` to split list `indices` into `k` parts.

In [None]:
def k_flod_split_index(indices:[], k=10) -> []:
    splited_indices = []
    
    #Insert Your Code Here
    size = int(len(indices)/k)
    for i in range(k):
        splited_indices.append(indices[size*i:size*i+size])
    #=====================
    
    return splited_indices

In [None]:
X.index
k_flod_split_index(X.index,10)

## Run K-flod

Next, finish `run_eval` to run a K-flod test.

In [None]:
# scaler
X_sca = StandardScaler()
X = X_sca.fit_transform(X)

def run_eval(X,y,k=10,evaluation=accuracy_score):
    correct = 0
    total = 0
    scores = pd.DataFrame(['score'])
#     scores = pd.DataFrame({'score':[]}) # For Newer Pandas
    splited_indices = k_flod_split_index(range(len(X)),k)
    for i in range(len(splited_indices)):
        
        #create lists for train and test indices
        #Insert Your Code Here
        train_indices = []
        for j in range(len(splited_indices)):
            if j != i:
                train_indices += splited_indices[j]
        test_indices = splited_indices[i]
        #=====================

        X_train, X_test, y_train, y_test = X[train_indices], X[test_indices], \
                                            y[train_indices], y[test_indices]
        
        # train a new SVC model.
        clf = SVC(kernel='linear', random_state=0).fit(X_train, y_train)
        
        # test the new model.
        #Insert Your Code Here
        y_test = y_test.values.tolist()
        _score = evaluation(y_test, clf.predict(X_test))
        #=====================
        
        # save the performance.
        scores = scores.append({'score':_score},ignore_index = True)
#         scores.loc[len(scores)] = _score # For Newer Pandas
        correct += _score
        total += 1
    print("Ave. score: {0:.2f}".format(correct/total))
    scores.plot.bar()

run_eval(X,y)

# Confusion Matrix

Confusion matrix is used only on classification tasks. It describes the following matrix

|            | predicted true | predicted false |
|------------|----------------|-----------------|
|actual true | True Positive  | False Negative  |
|actual false| False Positive | True Negative   |

Complete Class `ConfusionMatrix`: In the function `calculate_matrix` four parts of the matrix created in `__init__` should be calculated.

In [None]:
class ConfusionMatrix():
    def __init__(self):
        self.tp = 0
        self.tn = 0
        self.fp = 0
        self.fn = 0
        
    def calculate_matrix(self, y_true:[], y_pred:[]):
        self.__init__()
        #Insert Your Code Here
        for i in range(len(y_true)):
            if y_true[i] == 1 and y_pred[i] == 1:
                self.tp += 1
            elif y_true[i] == 1 and y_pred[i] == 0:
                self.fn += 1
            elif y_true[i] == 0 and y_pred[i] == 1:
                self.fp += 1
            elif y_true[i] == 0 and y_pred[i] == 0:
                self.tn += 1
        #=====================

## Accuracy

$$\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}$$

Class `Accuracy` is extended from class `ConfusionMatrix`, which means they share functions.

Complete the function `accuracy` using features from class `ConfusionMatrix`



In [None]:
class Accuracy(ConfusionMatrix):
    def __init__(self):
        super()
    def accuracy(self, y_true:[], y_pred:[]):
        self.calculate_matrix(y_pred,y_true)
        acc=-1
        #Insert Your Code Here
        acc = (self.tp+self.tn)/(self.tp+self.tn+self.fp+self.fn)
        #=====================
        return acc

Now we can call your accuracy in the K-fold test. Does it get the same result as the previous test?

In [None]:
_eval = Accuracy()
run_eval(X,y,evaluation=_eval.accuracy)

## Precision (Positive Predicted Value) 

$$\text{Precision} = \frac{TP}{TP + FP}$$

Intuitively, what precision states is out of the number of times your model predicts true, how many times is it correct? This metric penalizes heavily for False Positives. This metric should be considered when its OK to have some false negatives but not false positives. Imagine if your model is predicting the conclusion of a jurisdiction. Its OK to leave a criminal free, rather than punishing an innocent one. 

## Recall (Sensitivity) 

$$\text{Recall} = \frac{TP}{TP + FN}$$

Intuitively, what recall states is out of the times the output is true, how many times are you correct? This metric penalizes heavily for False Negatives. This metric should be considered when its OK to have some false positives but not false negatives.


## F1 Score

F1 score is the harmonic mean of precision and recall. 


$$\text{F}_1 = 2 \cdot \frac{\text{precision} \cdot \text{recall}}{\text{precision} + \text{recall}}$$


Finish `PrecisionAndRecall` where Precision, Recall and F1 calculated.

In [None]:
class PrecisionAndRecall(ConfusionMatrix):
    def __init__(self):
        super()
        
    def precision(self, y_true:[], y_pred:[]):
        self.calculate_matrix(y_true,y_pred)
        p=-1
        #Insert Your Code Here
        if not self.tp+self.fp:
            p=0
        else:
            p = (self.tp)/(self.tp+self.fp)
        #=====================
        return p
    
    def recall(self, y_true:[], y_pred:[]):
        self.calculate_matrix(y_true,y_pred)
        r=-1
        #Insert Your Code Here
        if not self.tp+self.fn:
            r = 0
        else:
            r = (self.tp)/(self.tp+self.fn)
        #=====================
        return r
    
    def f1(self, y_true:[], y_pred:[]):
        self.calculate_matrix(y_true,y_pred)
        f1=-1
        #Insert Your Code Here
        p = self.precision(y_true,y_pred)
        r = self.recall(y_true,y_pred)
        if not p+r:
            f1 = 0
        else:
            f1 = 2*(p*r)/(p+r)
        #=====================
        return f1

In [None]:
_eval = PrecisionAndRecall()
run_eval(X,y,evaluation=_eval.precision)

In [None]:
run_eval(X,y,evaluation=_eval.recall)

In [None]:
run_eval(X,y,evaluation=_eval.f1)

# Have a try
During this lesson, you were introduced to the methodology of executing a K-fold test and the steps to devise a performance metric.

When you revisit the code, you'll see that the evaluation procedure encompasses three segments:

1. Testing structure (in this notebook: `run_eval`)
2. The predictive model (in this notebook: `SVC`)
3. Evaluation metrics (in this notebook: `Accuracy` and `PrecisionAndRecall`)

When you craft a new framework or metric, you're essentially shaping a fresh evaluation procedure. This procedure ought to be **aligned** with your specific projects or professional requirements.

How about a **brand-new** scenario?

Or perhaps a topic **unique** to you?

Maybe you should think about it.