# Module 4 Model Evaluation

---
## Introduction Script
Hello and welcome. 


As I mentioned before, please watch the video to learn the concepts behind the algorithms, and more importantly, go through the lesson notebooks and practice as much as you can.

---
## Lesson 1: Introduction to Regression Evaluation

---
### Slide 1
#### Evaluation Metrics for Regression

- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- R-squared ( 𝑅2 )

<img src='images/linear_regression.png' width=400>

---
### Slide 2

#### Mean Absolute Error (MAE)
<img src='images/linear_regression.png' width=400>

$MAE = \frac{1}{n} \sum |y_i - \hat{y}_i|$

or

$MAE = \frac{1}{n} \sum |\epsilon_i|$


---
### Slide 2 Script


---
### Slide 3

#### Mean Squared Error (MSE)

<img src='images/linear_regression.png' width=400>

$MSE = \frac{1}{n} \sum (y_i - \hat{y}_i)^2$

or

$MSE = \frac{1}{n} \sum \epsilon_i^2$


---
### Slide 3 Script


---
### Slide 4
#### Root Mean Squared Error (RMSE)

$RMSE = \sqrt{MSE}$

or

$RMSE = \sqrt{\frac{1}{n} \sum (y_i - \hat{y}_i)^2}$


---
### Slide 4 Script


---
### Slide 5
### R-squared ($R^2$)

$MSE_{base} = \frac{1}{n} \sum (\bar{y} - \hat{y}_i)^2$

Where $\bar{y}$ is the mean of observed outputs.

$R^2 = 1- \frac{MSE_{model}}{MSE_{base}}$


### Slide 5 Script

---
### Slide 6
#### Metrics Calculation
```
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import math

dtr = DecisionTreeRegressor(random_state=23)
dtr = dtr.fit(ind_train, dep_train)
pred = dtr.predict(ind_test)

# Copute performance metrics
mae = mean_absolute_error(dep_test, pred)
mse = mean_squared_error(dep_test, pred)
rmse = math.sqrt(mse)
mr2 = r2_score(dep_test, pred)
```

### Slide 6 Script

### Slide 7
#### Residual vs. Observed Plot
<img src='images/residual.png' width=500>

---
## Lesson 2: Introduction to Classification Evaluation I

---
### Slide 1
#### Evaluation Metrics for Classification

- Accuracy Score
- Classification Report
- Confusion Matrix


---
### Slide 1 Script


---
### Slide 2
#### Accuracy Score


Accuracy Score = $\frac{Correct Prediction}{All Prediction}$


---
### Slide 2 Script


---
### Slide 3
#### Classification Report

<img src="./images/classification_report.png" width="600">

#### Precision
The proportion of the prediction that is correct. 


#### Recall
Proportion of actual class that is predicted correctly.



---
### Slide 3 Script

Precision is the proportion of the prediction that is actually correct. 

The above classification report shows that, the precision of class 0 or negative is 0.8, it means that out of all negative predictions by the model, 80% of them are correct. Likewise, 71% of all positive predictions are actually positive.

Recall is the proportion of actual class of a label that is identified correctly.

The above classification report shows that, the recall of class 0 or negative is 0.97, it means that out of all observed negative classes, 97% of them are predicted as negative by the model. On the other hand, out of all observed positive classes, only 25% of them are classified correctly as positive.


---
### Slide 4
#### Confusion Matrix
<img src="images/confusion_matrix_raw.png" width='500'>


---
### Slide 4 Script



---
### Slide 5
#### Confusion Matrix

- **True Positive (TP)**: Total number of predicted positives that are actually positive.

- **True Negative (TN)**:  Total number of predicted negatives that are actually negative.

- **False Positive (FP)** aka **Type 1 Error**: Total number of predicted positives that are actually negative.

- **False Negative (FN)** aka **Type 2 Error**: Total number of predicted negatives that are actually positive.


---
### Slide 5 Script

When there are two outcomes, 0 and 1, It is a table with 4 different combinations of predicted and actual values.


### Slide 6
#### Metrics Calculation

$Accuracy Score = \frac{Correct Predictions}{All Predictions} = \frac{TP + TN}{TP + TN + FP + FN}$

$Positive Precision = \frac{True Positive}{Predicted Positive} = \frac{TP}{TP + FP}$

$Negative Precision = \frac{True Negative}{Predicted Negative} = \frac{TN}{TN + FN}$

$Positive Recall = \frac{True Positive}{Actual Positive} = \frac{TP}{TP + FN}$

$Negative Recall = \frac{True Negative}{Actual Negative} = \frac{TN}{TN + FP}$


### Slide 6 Scrip

### Slide 7
#### When to Avoid Type 1 Error(False Positive)
- Criminal trial

#### When to Avoid Type 2 Error(False Negative)
- Cancer screening test

### Slide 7 Scrip

### Slide 8
#### Case Study: Direct Mail Marking
- Case 1: Reach to as many target cusotmers as possible(high positive recall)
- Case 2: Reduce mails to wrong customers(high positive precision)
<img src='https://eugeneloj.typepad.com/.a/6a00d834516e6369e20120a977b5af970b-pi' width=350>

### Slide 9
#### Adust class_weight Hyperparamter

##### class_weight
- **None**(defaul): give all classes same weight
- **balanced**: give weights inversely proportional to class frequencies in the input data
- **custom weight**: `{0:0.8, 1:0.2}`

---
## Lesson 3: Introduction to Classification Evaluation II

### Slide 1

#### ROC
Receiver operating characteristic
#### AUC
Area under curve



---
### Slide 2
#### ROC & AUC
<img src='images/roc.png' width=350>

**True positive rate(TPR)** aka. positive recall rate.  
The ratio of true positive over all actual positive.  
$TPR = \frac{True Positive}{All Actural Positive}= \frac{TP}{TP + FN}$

**False positive rate(FPR)**  
The ratio of false positive over all actual negative.  
$FPR = \frac{False Positive}{All Actual Negative} = \frac{FP}{TN + FP}$



---
### Slide 2 Script
ROC stands for receiver operating characteristic. Originally developed during World War Two to predict the performance of an individual using a radar system The ROC curve displays the relationship between the false positives rate and the true positives rate.

In this image, the y axis represents true positive rate or TPR, which is calculated by dividing true positive by all actual positive. This is actually the positive recall rate, or the percentage of all positive classes that are predicted correctly.

The x axis in the image represents false positive rate, or FPR, which is calculate by deviding false positive  by all negatives.

As we saw in the last lesson's case study, we can set a hyperparamter class_weight to a classifier to adjust how likely a model classifies a data point as positive.

Now let's assume we config our classifier so that it classifies all data points as negative. Then both true positive and false positive are 0, because there's no positive prediction at all. In the ROC plot, this is the starting point of an ROC curve, point (0,0). To make things simple, let's assume the dataset is a balanced dataset, which means about 50% have positive class. Now let's adjust the model so that our model predict 10% of all datapoints as positive. First, let's examine a random model, which predicts with random guess, so it predicts correctly 50% of the time. For the 10% datapoints that are predicted as positive, 5% is correctly predicted, so true positive rate is 5% divided by actual positive which is 50%, since we assume the dataset is balanced, 50% of the dataset have positive class. So TRP is 5% divided by 50% which is 0.1, similarly, false positive rate is 5% divided by all actual negatives which is also 50%, the FPR is also 0.1. If we increase the random model's positive prediction, both TPR and FPR increase at same rate. Eventually, when all datapoints are predicted as positive, now TPR is 1, because all actually positive are predicted as positive, on the other hand, FPR is also 1, becuase all actual negative are all predicted as positive, so false positive is same as actual negtives. So for a random model, the ROC curve is a straight line from (0,0) to (1,1), which is the dashed diagonal line in the plot.

Now let's look at a perfect model, which means it predicts perfectly. The roc curve still starts from (0,0), when we adjust class_weight so that the model predicts 10% of the dataset as positive, since the model is a perfect model, all positive predictions are correct, now TPR equals to 10% divided by 50%, or 0.2, and FPR is 0 because false positive is 0. When we increase positive predictions, TPR increases and FPR remains as 0, to the point when  all actual positives are predicted as positive, ROC reaches to the point (0, 1). From there, when we increase positive prediction, TPR remains as 1, and FPR starts to increase, since all new positive predictions are actual negatives. When the model predicts all datapoints as positive, ROC is now at point(1,1). So the ROC curve for a perfect model is the vertical and horizental blue line in the image.

The ROC cuver of an actual classifier should normally fall in between the roc of the random model and that of the perfect model. In the image, the green curve line represents an ROC curve of a real model.

The area under the ROC curve, or AUC, is a metric that indicate how good a model is. As shown in the image, AUC of a perfect model is 1X1 which is 1, AUC of a random model is 0.5. For a real model's AUC, the closer to 1 the better the model.

AUC is generally considered a better metric than the accuracy rate, especially with imbalanced dataset. For example, assume only 10% of a dataset have positive class, for a model that always predicts with major class, or negative, the accuracy is 90%, but the AUC is only 0.5, or same as a random model. You may try this out in the lesson notebook by setting class_weight so that negative class has value 1 and positive class has value 0.

---
### Slide 3
#### Plot ROC
#### Classifiers that have `decision_function`:
- Logistic Regression
- Support Vector Machine

#### Classifiers that have `predict_proba`:
- K-nearest Neighbors
- Decision Tree
- Random Forest



### Slide 4
#### Compare Models with ROC&AUC
<img src='images/roc_auc.png' width=500>

---
## Review Script


The first module's assignment is fairly straightforward. Just remember to work on the problems in order.

Good luck.