# **Evaluation Metrics**

Evaluation metrics are important in machine learning because they allow us to measure the performance of a model, and compare the performance of different models. By using evaluation metrics, we can determine which model is the best for a given task, and how to improve the performance of the model.

We have three problems to solve:

1. Classification
   - Supervised Learning.
   - Output is categorical quantity.
   - Main aim is to compute the category of data.
   - Classify email as spam or non-spam.
   - Logistic Regression 

2. Regression
   - Supervised Learning.
   - Output is continous quantity.
   - Main aim is to forecast or predict.
   - Predict stock market.
   - Linear Regression 

3. Clustering
   - Unsupervised Learning.
   - Assigns data points into clusters.
   - Main aim is to group similar items clusters.
   - Find all transactions which are fraudulent in nature.
   - K-Means 

____

## **Regression Metrics**

1. **Mean Absolute Error (MAE)**
   - MAE measures the **average magnitude** of the errors in a set of prediction, without considering their directions.
   - `Interpretation`
     - Lower values are better. A value of 0 indicates no error.

`MAE = (1/n) * Σ(|y_i - ŷ_i|)`


2. **Mean Squared Error (MSE)**
   - MSE measures the average of the square of the errors - that is the average squared difference between the estimated values and the actual value.
   - `Interpretation`
     - Like MAE, lower values are better. MSE is more sensitive to outliers than MAE.
     - Outliers skewed the MEAN that's why it is sensitive to outliers.

`MSE = (1/n) * Σ((y_i - ŷ_i)^2)`


3. **Root Mean Squared Error (RMSE)**
   - RMSE is the square root of the mean of the squared errors. It's a way to quantify the size of the error made by the model.
     - `Interpretation`
       - Lower values indicate better fit. RMSE is more sensitive to outliers than MAE and often used when large errors are particularly undersirable.

`RMSE = sqrt(MSE)`


4. **R-Squared (Co-efficient of Determination)**
   - R-squared represents the proportion of the variance for the dependent variable that's explained by the independent variables in regression model.
   - `Interpretation`
     - Values ranges from 0 to 1. A higher R-squared indicates a better fit between the model and the data.

```

R² = 1 - (SS_res / SS_tot)

where:

R² is the R-Squared value
SS_res is the sum of squared residuals
SS_tot is the total sum of squares
1 is the constant value 1.

``` 

5. **Adjusted R-Squared**
   - Modified version of R-squared that has been adjusted for the number of predictors in the model.
   - `Interpretation`
     - Compares the explanatory power of regression models that contain different numbers of predictors.

## **Choosing the Right Metric**

1. Depends on specific requirements of your analysis and nature of the data.
2. If you are more concerned about outliers, MSE or RMSE might be more appropriate.
3. R-squared and Adjusted R-squared are useful for understanding the proportion of variance explained by the model, but they don't necessarily imply that the model is accurate.

____

## **Classification Metrics**

1. 

![Confusion Matrix](./images/confusion_matrix.png)

![Confusion Matrix](./images/classification_metrics.png)

![Confusion Matrix](./images/classification_metrics_2.png)

![Example](./images/classification_metrics_example.png)