# Metrics used in Supervised Machine Learning

#### In many supervised machine learning statements there are two types of use cases: classification and regression

For classification case, whenever we need to check for accuracy, we follow the techniques like confusion matrix, accuracy score, where we can see what is the true positive rate, what is the recall value, what is the precision value, what is the F1 score.

For the regression, there is a slightly different way to check the accuracy of the paticular model. These techniques are R^2 (R-squared) and adjusted R^2

### $R^2$ (R-squared)

$\large R^2 = 1 - \frac{SS_{residual}}{SS_{total}} = 1 - \frac{\sum(y_i - \hat y_i)^2}{\sum(y_i - \bar y_i)^2}$, where

$\large SS_{residual}$ is hte sum of residual or error (residual or error is the difference between actual value and predicted value of best fit line)

$\large SS_{residual} = \sum(y_i - \hat y_i)^2$, where $\large y$ is actual value and $\large \hat y$ is predicted value. We square them because we need to find the difference between the negative values too.

$\large SS_{total}$ is the sum of average total. Average total is the average of all the values (parallel to x axis). We make it as our best fit line and calculate the errors (difference) between actual values and best fit line values.

$\large SS_{total} = \sum(y_i - \bar y_i)^2$, where $\large y$ is an actual value and $\large \bar y$ is the average value. We square them as well.

After we compute formula $\large R^2 = 1 - \frac{SS_{residual}}{SS_{total}}$ , we will receive value between 0 and 1. When the value is closer to 1, that means that the line is best fitted to the model nicely.

#### **Side Note**
*Can you get $R^2$ value below 0? Is it possible?*

the answer is YES.

$R^2$ may be below zero only if the best fit line is worse than the average best fit line.

So, what happens in the formula

If $\large SS_{residual}$ is grater than $\large SS_{total}$ , value will be $> 1$ and 1 - $value > 1 $ will produce negative value.
Negative value shows that model is very bad.

$\large R^2$ is used to check the goodness of the best fit line

### What is the adjusted $R^2$?

Remember, in linear regression we have 

$\large y = m_{0} + m_{1} x_{1}$ (simple linear regression)

$\large y = m_{0} + m_{1} x_{1} + m_{2}x_{2} + m_{3}x_{3} + m_{4}x_{4} $ (multiple linear regression)

As we add new independent features, our $R^2$ value usually increases. It happens because when we add a new feature, our formula assigns some value to that feature (coefficient), which decreases our $\large SS_{residual}$ value. When $\large SS_{residual}$ decreases, the outcome of the $\large \frac{SS_{residual}}{SS_{total}}$ will also decrease. Since $\large \frac{SS_{residual}}{SS_{total}}$ will be some small value, our formula $\large R^2 = 1 - \frac{SS_{residual}}{SS_{total}}$ value will increase, it will be closer to 1.

It always happens when adding new independent features and it will never decrease. The problem is if the feature that we added is not correlated with our target, $R^2$ value will still increase. Basicaly, $R^2$ is not penalizing our newly added features.

For that we use **Adjusted $R^2$**

### Adjusted $R^2$

$\large R^2 adjusted = 1 - \frac{(1-R^2)(N-1)}{N - p - 1}$, where

$R^2$ is sample R-squared (which was calculated above)

p is numper of predictors (independent features)

N is total sample size (size of the dataset, including the output feature)

It penalizes attributes that are not correlated.

when the number of features increases we subtract it from the total sample and subtract 1 (N - p - 1), this subtraction makes value smaller value. When we multiply in the nominator (1-R^2)(N-1) it will be a bigger number. and after we divide and subtract it from 1, the $R^2$ value will decrease,but only when we try to add the features which are not correlated.

#### Conclusion

- Every time you add an independent variable to a model, the **R-squared increases** even if the independent variable is insignificant. It never declines. Whereas **Adjusted R-squared** increases only when independent variable is significant and affects dependent variable.

- Adjusted R-squared value is always less than or equal to R-squared value.