### Key Measures of Dispersion

| **Measure**                | **Definition**                                                                                         | **Formula**                                                                                         | **Interpretation**                                                                           |
|----------------------------|--------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------|
| **Variance**               | Measures the dispersion of data points from the mean.                                                  | $$\text{Var}(X) = \mathbb{E}[(X - \mu)^2]$$                                                         | Indicates how much the data points spread around the mean. Higher variance means more spread. |
| **Standard Deviation**     | The square root of the variance, providing dispersion in the same units as the data.                   | $$\text{SD}(X) = \sqrt{\text{Var}(X)}$$                                                             | A more intuitive measure of spread compared to variance. Higher standard deviation means more variability. |
| **Covariance**             | Measures how two variables change together.                                                             | $$\text{Cov}(X, Y) = \mathbb{E}[(X - \mu_X)(Y - \mu_Y)]$$                                           | Positive covariance indicates that variables increase/decrease together; negative covariance indicates the opposite. |
| **Correlation**            | Standardizes covariance to measure the strength and direction of a linear relationship between two variables. | $$\rho_{XY} = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y}$$                                        | Ranges from -1 to 1. 1 indicates a perfect positive relationship, -1 a perfect negative relationship, and 0 no linear relationship. |
| **Coefficient of Determination (R²)** | Measures the proportion of variance in the dependent variable explained by the independent variables.  | $$ R^2 = 1 - \frac{\text{SS}_{\text{res}}}{\text{SS}_{\text{tot}}} $$                            | Ranges from 0 to 1. Higher values indicate a better fit of the model to the data.               |
| **Range**                  | The difference between the maximum and minimum values in the dataset.                                  | $$\text{Range} = \text{Max}(X) - \text{Min}(X)$$                                                  | Provides a measure of the spread of the data. Sensitive to outliers.                          |
| **Interquartile Range (IQR)** | The range of the middle 50% of the data.                                                                | $$\text{IQR} = Q_3 - Q_1$$                                                                          | Less sensitive to outliers compared to the range. Measures the spread of the central portion of the data. |
| **Mean Absolute Deviation (MAD)** | The average of the absolute deviations from the mean.                                                   | $$ \text{MAD} = \frac{1}{n} \sum_{i=1}^n |x_i - \bar{x}|$$                                       | Provides a measure of spread that is less sensitive to outliers than variance and standard deviation. |

### Evaluation Metrics for Classification Models

#### Confusion Matrix

A confusion matrix is a table used to describe the performance of a classification model by comparing the actual class labels with the predicted class labels. It provides a detailed breakdown of the model's performance on each class.

The confusion matrix is typically structured as follows:

|                | **Predicted Positive** | **Predicted Negative** |
|----------------|-------------------------|-------------------------|
| **Actual Positive** | True Positive (TP)      | False Negative (FN)     |
| **Actual Negative** | False Positive (FP)     | True Negative (TN)      |

- **True Positive (TP)**: The number of instances correctly predicted as positive.
- **False Negative (FN)**: The number of instances incorrectly predicted as negative when they are actually positive.
- **False Positive (FP)**: The number of instances incorrectly predicted as positive when they are actually negative.
- **True Negative (TN)**: The number of instances correctly predicted as negative.

##### Key Metrics Derived from Confusion Matrix

1. **Accuracy**:
   - **Definition**: The proportion of correctly classified instances out of the total instances.
   - **Formula**: 
     $$
     \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}
     $$
   - **Use**: Provides a general idea of the model's performance but can be misleading if the class distribution is imbalanced.



2. **Precision (Positive Predictive Value)**:
   - **Definition**: The proportion of true positive predictions out of all positive predictions.
   - **Formula**: 
     $$
     \text{Precision} = \frac{TP}{TP + FP}
     $$
   - **Use**: Indicates the quality of positive predictions. Useful when the cost of false positives is high.



3. **Recall (Sensitivity or True Positive Rate)**:
   - **Definition**: The proportion of true positive predictions out of all actual positives.
   - **Formula**: 
     $$
     \text{Recall} = \frac{TP}{TP + FN}
     $$
   - **Use**: Measures how well the model identifies positive instances. Important when the cost of false negatives is high.



4. **F1 Score**:
   - **Definition**: The harmonic mean of precision and recall, providing a balance between the two metrics.
   - **Formula**: 
     $$
     \text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
     $$
   - **Use**: Useful when you need a single metric that combines both precision and recall, particularly in cases of class imbalance.

5. **Specificity (True Negative Rate)**:
   - **Definition**: The proportion of true negative predictions out of all actual negatives.
   - **Formula**: 
     $$
     \text{Specificity} = \frac{TN}{TN + FP}
     $$
   - **Use**: Measures the model’s ability to identify negative instances. Important when false positives are costly.


6. **False Positive Rate (FPR)**:
   - **Definition**: The proportion of negative instances incorrectly classified as positive.
   - **Formula**: 
     $$
     \text{FPR} = \frac{FP}{FP + TN}
     $$
   - **Use**: Useful for understanding the rate at which negative instances are incorrectly classified as positive.

7. **False Negative Rate (FNR)**:
   - **Definition**: The proportion of positive instances incorrectly classified as negative.
   - **Formula**: 
     $$
     \text{FNR} = \frac{FN}{FN + TP}
     $$
   - **Use**: Useful for understanding the rate at which positive instances are incorrectly classified as negative.

8. **Matthews Correlation Coefficient (MCC)**:
   - **Definition**: A measure of the quality of binary classifications, taking into account all four categories of the confusion matrix.
   - **Formula**: 
     $$
     \text{MCC} = \frac{TP \times TN - FP \times FN}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}
     $$
   - **Use**: Provides a balanced measure that can be more informative than accuracy, especially with imbalanced classes.

##### Instructions for picking the right Metrics

1. **Identify the Class Distribution**:
   - **Balanced**: All classes have roughly equal representation.
   - **Imbalanced**: Some classes are significantly more frequent than others.

2. **Determine the Cost of Errors**:
   - **False Positives**: Incorrectly predicting a negative instance as positive.
   - **False Negatives**: Incorrectly predicting a positive instance as negative.

3. **Consider the Need for a Single Metric**:
   - **Single Summary Metric**: You need a single value that summarizes performance.
   - **Detailed Metrics**: You need a detailed evaluation of model performance.

4. **Evaluate Performance Across Different Thresholds**:
   - **Threshold Sensitivity**: You want to understand how performance changes with different classification thresholds.
   - **Fixed Threshold**: You are using a specific threshold for evaluation.

5. **Address Multi-Class Classification**:
   - **Multi-Class**: Your classification problem involves more than two classes.
   - **Binary**: Your classification problem involves only two classes.


6. **Understand the Application Context**:
   - **Operational Goals**: Focus on metrics that align with the practical goals of the application.
   - **Scientific Analysis**: Use metrics that provide a comprehensive view of performance.



| **Situation**                                   | **Metrics to Use**                                                                                              | **Metrics to Avoid**                                                                                                 |
|-------------------------------------------------|------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|
| **Balanced Dataset**                           | Accuracy, Precision, Recall, F1 Score, Specificity, ROC AUC                                                        | Avoid focusing solely on Accuracy if you need detailed performance per class                                        |
| **Imbalanced Dataset**                         | Precision, Recall, F1 Score, Balanced Accuracy, AUC-PR, ROC AUC                                                     | Avoid relying only on Accuracy as it may not reflect performance on the minority class                              |
| **High Cost of False Positives**                | Precision, Specificity                                                                                              | Avoid metrics that do not address false positives, such as general Accuracy                                         |
| **High Cost of False Negatives**                | Recall, F1 Score                                                                                                    | Avoid metrics that do not capture recall, such as Precision alone                                                   |
| **Need for Single Performance Metric**          | F1 Score, MCC (Matthews Correlation Coefficient)                                                                       | Avoid using multiple metrics if a single summary metric suffices                                                      |
| **Performance Across Different Thresholds**     | ROC AUC, Precision-Recall Curve                                                                                      | Avoid metrics that are not sensitive to threshold changes, such as fixed-threshold metrics like plain Accuracy       |
| **Multi-Class Classification**                  | Macro-Averaged Precision, Recall, F1 Score, Weighted Metrics, Multi-Class ROC AUC                                    | Avoid binary metrics and ensure that metrics are extended to handle multiple classes correctly                       |
| **Evaluation of Class-Specific Performance**     | Class-Specific Precision, Recall, F1 Score, Balanced Accuracy                                                        | Avoid aggregated metrics if detailed performance per class is needed                                                |
| **Operational Goals**                          | Precision, Recall, F1 Score, Specificity                                                                            | Avoid using only high-level metrics if specific operational aspects need to be assessed                               |
| **Scientific Analysis**                        | ROC AUC, AUC-PR, Detailed Precision-Recall Analysis                                                                  | Avoid metrics that do not provide a comprehensive view of performance across classes and thresholds                   |

**Summary**

- **Balanced Dataset**: Use standard metrics like accuracy and F1 score.
- **Imbalanced Dataset**: Focus on metrics like precision, recall, F1 score, and AUC-PR.
- **Cost Considerations**: Precision for high false positive costs, recall for high false negative costs.
- **Threshold Sensitivity**: ROC AUC and Precision-Recall Curve for understanding performance across thresholds.
- **Multi-Class**: Use macro-averaged or weighted metrics and extend ROC AUC to multi-class scenarios.
- **Operational vs. Scientific Needs**: Choose metrics that align with practical goals or provide a comprehensive performance overview.

##### Practical Considerations for Confusion Matrix and Associated Metrics

1. Class Distribution

- **Balanced Classes**:
  - **Metrics**: Accuracy, Precision, Recall, F1 Score, Specificity, ROC AUC.
  - **Considerations**: Metrics provide a straightforward understanding of model performance as each class contributes equally to the evaluation.

- **Imbalanced Classes**:
  - **Metrics**: Precision, Recall, F1 Score, Balanced Accuracy, AUC-PR, ROC AUC.
  - **Considerations**: Accuracy alone can be misleading; focus on metrics that highlight performance on minority classes.

2. Cost of False Positives vs. False Negatives

- **High Cost of False Positives**:
  - **Metrics**: Precision, Specificity.
  - **Considerations**: Precision measures the accuracy of positive predictions, while specificity evaluates how well the model avoids false positives.

- **High Cost of False Negatives**:
  - **Metrics**: Recall, F1 Score.
  - **Considerations**: Recall measures how well the model detects positive instances, and F1 Score balances precision and recall.

3. Need for a Single Performance Metric

- **Yes**:
  - **Metrics**: F1 Score, MCC (Matthews Correlation Coefficient).
  - **Considerations**: These metrics provide a summary of performance that considers both precision and recall, and are especially useful for imbalanced datasets.

- **No**:
  - **Metrics**: Accuracy, Precision, Recall, Specificity.
  - **Considerations**: Use a combination of metrics to get a comprehensive understanding of performance.

4. Performance Across Different Thresholds

- **Yes**:
  - **Metrics**: ROC AUC, Precision-Recall Curve.
  - **Considerations**: Evaluate how performance changes with different classification thresholds to find the optimal balance between sensitivity and specificity.

- **No**:
  - **Metrics**: Fixed-threshold metrics like accuracy, precision, recall.
  - **Considerations**: Use standard metrics that do not depend on threshold variations.

5. Multi-Class Classification

- **Metrics**: Macro-Averaged Precision, Recall, F1 Score, Weighted-Averaged Metrics, Multi-Class ROC AUC.
- **Considerations**: Evaluate performance for each class separately and use averaging methods to account for all classes. For ROC AUC, use one-vs-rest or one-vs-one approaches.

6. Threshold Adjustment and Selection

- **Metrics**: ROC AUC, Precision-Recall AUC, Confusion Matrix Analysis.
- **Considerations**: Analyze performance at various thresholds to select the most appropriate one based on the specific needs of the application.

7. Interpretation of Metrics

- **Metrics**: Precision, Recall, Specificity, Accuracy, F1 Score, MCC.
- **Considerations**: Ensure that the chosen metrics align with the practical goals of the model. For example, in medical diagnosis, high recall (sensitivity) is crucial to ensure that positive cases are detected.

8. Model Comparison

- **Metrics**: ROC AUC, F1 Score, Precision-Recall AUC.
- **Considerations**: Use these metrics to compare different models. Ensure that comparisons are made on the same test set to maintain consistency.

#### ROC AUC (Receiver Operating Characteristic - Area Under the Curve)

![image.png](attachment:457f2282-66d6-4026-abf9-db1748b56116.png)

- **ROC Curve**:
  - **Definition**: A graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied.
  - **Axes**:
    - **X-Axis**: False Positive Rate (FPR), which is the proportion of actual negatives that are incorrectly classified as positives.
    - **Y-Axis**: True Positive Rate (TPR), which is the proportion of actual positives that are correctly classified as positives.

- **AUC (Area Under the Curve)**:
  - **Definition**: The area under the ROC curve, representing the model’s ability to distinguish between positive and negative classes.
  - **Range**: 
    - **0.5**: No discrimination (the model performs no better than random guessing).
    - **1.0**: Perfect discrimination (the model perfectly distinguishes between all positive and negative instances).

**Formulas**:

1. **True Positive Rate (TPR)**, also known as **Recall** or **Sensitivity**:
   $$ \text{TPR} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}} $$
   or
   $$ \text{TPR} = \text{Recall} $$

2. **False Positive Rate (FPR)**:
   $$ \text{FPR} = \frac{\text{False Positives}}{\text{False Positives} + \text{True Negatives}} $$

##### Practical Considerations

1. **Balanced Datasets**:
   - **Use**: ROC AUC is generally a reliable metric for evaluating classifier performance, even when classes are balanced.
   - **Consideration**: Provides a comprehensive measure of performance across all classification thresholds, reflecting how well the model can separate classes.

2. **Imbalanced Datasets**:
   - **Use**: ROC AUC can still be useful, but it might not always reflect the model’s performance on the minority class.
   - **Consideration**: While ROC AUC evaluates the overall ability of the model to discriminate between classes, it can sometimes be misleading in highly imbalanced scenarios. Complement with metrics like Precision-Recall AUC for more insight into minority class performance.

3. **Multi-Class Classification**:
   - **Use**: ROC AUC can be extended to multi-class classification by using strategies such as one-vs-rest or one-vs-one.
   - **Consideration**: Compute ROC AUC for each class against all other classes (one-vs-rest), or for each pair of classes (one-vs-one), and then average the results.

4. **Threshold Independence**:
   - **Use**: ROC AUC evaluates the model’s performance independently of the classification threshold.
   - **Consideration**: Provides a view of model performance across all possible thresholds, which helps in understanding how the model performs in various operational conditions.

5. **Comparison of Models**:
   - **Use**: Useful for comparing different models. A model with a higher ROC AUC is generally preferred.
   - **Consideration**: When comparing models, ensure that the ROC AUC is calculated on the same test set to maintain consistency.

6. **Visual Interpretation**:
   - **Use**: The ROC curve visually represents how the true positive rate and false positive rate change with different thresholds.
   - **Consideration**: Analyze the shape of the ROC curve to understand the trade-offs between TPR and FPR for various thresholds.

7. **Limitations**:
   - **Use**: ROC AUC might not always capture the performance nuances in certain applications.
   - **Consideration**: For specific applications where the cost of false positives and false negatives varies significantly, consider complementing ROC AUC with other metrics like Precision, Recall, and F1 Score to get a fuller picture of performance.

##### Example

Suppose you have a model that predicts whether a patient has a disease (positive) or not (negative). The ROC curve plots the TPR against the FPR at various threshold settings. If your ROC AUC is 0.85, it means your model has a good ability to distinguish between patients with and without the disease, with the area under the curve indicating how well it performs across different thresholds.

By considering these points, you can effectively utilize ROC AUC to evaluate and interpret your model’s performance, ensuring a comprehensive understanding of its classification abilities.