## Q1. What is the purpose of grid search cv in machine learning, and how does it work?

Grid Search CV (Cross-Validation) is a hyperparameter tuning technique used in machine learning to find the optimal combination of hyperparameter values for a given model. Hyperparameters are external configuration settings for a model, and their values are not learned from the training data. Grid Search CV systematically explores a predefined set of hyperparameter values and evaluates the model's performance using cross-validation, helping to identify the hyperparameter values that lead to the best model performance.

### Purpose of Grid Search CV:

1. **Hyperparameter Tuning:**
   - Grid Search CV is primarily used for finding the best combination of hyperparameter values that optimize the model's performance. This is crucial for improving the model's predictive accuracy and generalization to unseen data.

2. **Avoiding Manual Tuning:**
   - Instead of manually trying different hyperparameter combinations, Grid Search CV automates the process, saving time and ensuring a more systematic exploration of the hyperparameter space.

3. **Cross-Validation:**
   - Grid Search CV incorporates cross-validation to provide a more robust estimate of the model's performance. It helps prevent overfitting to a specific subset of the data and gives a better indication of how the model will perform on unseen data.

### How Grid Search CV Works:

1. **Define Hyperparameter Grid:**
   - Specify the hyperparameters to be tuned and a set of values or ranges for each hyperparameter. This creates a grid of possible hyperparameter combinations.

2. **Cross-Validation:**
   - Divide the training dataset into multiple folds (e.g., k-folds). For each combination of hyperparameters in the grid, train the model on \(k-1\) folds and evaluate its performance on the remaining fold.

3. **Performance Metric:**
   - Define a performance metric (e.g., accuracy, F1-score, mean squared error) to measure the model's performance during cross-validation.

4. **Iterative Search:**
   - Systematically iterate through all possible hyperparameter combinations in the grid, training and evaluating the model for each combination.

5. **Select Optimal Hyperparameters:**
   - Identify the hyperparameter combination that results in the best performance according to the chosen metric. This combination represents the optimal set of hyperparameters for the model.

### Example in Python using Scikit-Learn:

```python
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# Sample data
X, y = load_your_data()

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the hyperparameter grid
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

# Create the model
rf_model = RandomForestClassifier()

# Instantiate GridSearchCV
grid_search = GridSearchCV(estimator=rf_model, param_grid=param_grid, cv=5, scoring='accuracy')

# Fit the model to the data
grid_search.fit(X_train, y_train)

# Get the best hyperparameters
best_params = grid_search.best_params_

# Evaluate the model on the test set using the best hyperparameters
test_accuracy = grid_search.score(X_test, y_test)
```

In this example, a Random Forest classifier is used, and Grid Search CV is employed to find the optimal combination of hyperparameters such as the number of trees (`n_estimators`), maximum depth (`max_depth`), minimum samples split (`min_samples_split`), and minimum samples leaf (`min_samples_leaf`). The performance is evaluated using accuracy, and the best hyperparameters are obtained. The final model is then tested on the held-out test set.

## Q2. Describe the difference between grid search cv and randomize search cv, and when might you choose one over the other?

Both Grid Search CV and Randomized Search CV are hyperparameter tuning techniques used in machine learning to find the optimal combination of hyperparameter values for a model. However, they differ in their approach to exploring the hyperparameter space.

### Grid Search CV:

1. **Search Strategy:**
   - Grid Search CV exhaustively searches through all possible combinations of hyperparameter values specified in a predefined grid.

2. **Computational Cost:**
   - Grid Search can be computationally expensive, especially when the hyperparameter space is large, as it evaluates every possible combination.

3. **Usage:**
   - Suitable for a small or moderately sized hyperparameter space where it is feasible to evaluate all combinations.

### Randomized Search CV:

1. **Search Strategy:**
   - Randomized Search CV randomly samples a specified number of hyperparameter combinations from the hyperparameter space.

2. **Computational Cost:**
   - Randomized Search is computationally more efficient than Grid Search since it doesn't evaluate all possible combinations.

3. **Usage:**
   - Suitable for large or continuous hyperparameter spaces where evaluating all combinations would be impractical.

### When to Choose Grid Search CV:

1. **Small Hyperparameter Space:**
   - When the hyperparameter space is relatively small, and it's feasible to evaluate all combinations.

2. **Exploration of All Combinations:**
   - If the goal is to perform an exhaustive search and explore the entire hyperparameter space systematically.

3. **Limited Computational Resources:**
   - When computational resources are not a constraint, and the dataset is not extremely large.

### When to Choose Randomized Search CV:

1. **Large Hyperparameter Space:**
   - When the hyperparameter space is extensive, and evaluating all combinations would be computationally expensive or impractical.

2. **Resource Efficiency:**
   - If computational resources are limited, and there is a need for a more resource-efficient approach to hyperparameter tuning.

3. **Exploration of Diverse Configurations:**
   - When the goal is to explore a diverse set of hyperparameter configurations rather than an exhaustive search.

### Example in Python using Scikit-Learn:

#### Grid Search CV:

```python
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# Sample data
X, y = load_your_data()

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the hyperparameter grid
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

# Create the model
rf_model = RandomForestClassifier()

# Instantiate GridSearchCV
grid_search = GridSearchCV(estimator=rf_model, param_grid=param_grid, cv=5, scoring='accuracy')

# Fit the model to the data
grid_search.fit(X_train, y_train)

# Get the best hyperparameters
best_params = grid_search.best_params_

# Evaluate the model on the test set using the best hyperparameters
test_accuracy = grid_search.score(X_test, y_test)
```

#### Randomized Search CV:

```python
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from scipy.stats import randint

# Sample data
X, y = load_your_data()

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the hyperparameter distributions
param_dist = {
    'n_estimators': randint(50, 200),
    'max_depth': [None, 10, 20],
    'min_samples_split': randint(2, 10),
    'min_samples_leaf': randint(1, 4)
}

# Create the model
rf_model = RandomForestClassifier()

# Instantiate RandomizedSearchCV
random_search = RandomizedSearchCV(estimator=rf_model, param_distributions=param_dist, n_iter=10, cv=5, scoring='accuracy', random_state=42)

# Fit the model to the data
random_search.fit(X_train, y_train)

# Get the best hyperparameters
best_params = random_search.best_params_

# Evaluate the model on the test set using the best hyperparameters
test_accuracy = random_search.score(X_test, y_test)
```

In these examples, a Random Forest classifier is used with Grid Search CV and Randomized Search CV to find the optimal hyperparameters. The key difference lies in how the hyperparameter space is explored: Grid Search systematically evaluates all combinations, while Randomized Search samples a specified number of combinations randomly. The Randomized Search approach is particularly useful when dealing with a large hyperparameter space.

## Q3. What is data leakage, and why is it a problem in machine learning? Provide an example.

Data leakage in machine learning refers to the unintentional incorporation of information from the training data into the model, leading to inflated performance metrics during training but poor generalization on new, unseen data. Data leakage can significantly compromise the model's ability to make accurate predictions on real-world data, and it often results from using information in the training process that would not be available at the time of making predictions.

### Causes of Data Leakage:

1. **Including Future Information:**
   - Using information that would not be available in practice at the time of making predictions, such as future data or data from the target variable that occurs after the event being predicted.

2. **Incorporating Test Set Information:**
   - Accidentally using information from the test set during model training. The model should not have access to the test set during training to accurately assess its generalization performance.

3. **Leaking Information Across Samples:**
   - Sharing information between training samples, causing the model to inadvertently learn patterns that are specific to the training set but may not generalize well to new data.

### Why Data Leakage Is a Problem:

1. **Overestimation of Model Performance:**
   - Data leakage can lead to overly optimistic performance metrics during model training, giving a false sense of the model's accuracy. In reality, the model may fail to generalize to new, unseen data.

2. **Unrealistic Expectations:**
   - The model may appear to perform exceptionally well during development, but its performance in production will be disappointing, as it is not accounting for the lack of access to future or unknown information.

3. **Misleading Insights:**
   - Any insights or patterns learned from the training data may not be applicable to new data, as the model may have unintentionally learned specifics of the training set.

### Example of Data Leakage:

Let's consider an example involving time-series data and a predictive model for stock price movements:

#### Scenario:
1. **Training Data:**
   - A machine learning model is trained on historical stock prices, including information on the target variable (e.g., stock price movement) up to a certain date.

2. **Feature Engineering:**
   - Feature engineering involves calculating technical indicators or statistical measures based on future information (e.g., moving averages, volatility) to be used as input features for the model.

3. **Model Training:**
   - The model is trained on the training set, including the derived features.

4. **Predictions:**
   - The model is then used to make predictions on new, unseen data.

#### Problem:
   - The model's predictive features, derived from future information, were used during training. In practice, this future information would not be available at the time of making predictions, leading to data leakage.

#### Consequences:
   - The model may appear to have high accuracy during training, but its performance on new data will likely be poor, as it was unintentionally exposed to information not available at prediction time.

#### Solution:
   - Ensure that feature engineering and model training only use information that would realistically be available at the time of prediction. In the stock price example, features derived from future information should be excluded during model training.

Avoiding data leakage requires a careful understanding of the dataset, feature engineering process, and the temporal relationships in time-series data. Regular validation and testing on a separate dataset help ensure that the model generalizes well to new, unseen data.

## Q4. How can you prevent data leakage when building a machine learning model?

Preventing data leakage is crucial for building machine learning models that generalize well to new, unseen data. Here are some strategies to prevent data leakage:

### 1. **Separate Training and Test Sets:**
   - **Best Practice:**
     - Clearly separate the training set and the test set. The model should be trained only on the training set, and test set information should not be used during model training.

### 2. **Avoid Future Information:**
   - **Best Practice:**
     - Exclude features derived from information that would not be available at the time of prediction.
     - Be cautious with features such as target variables, time-related features, or data transformations that involve future information.

### 3. **Use Time-Based Splits:**
   - **Best Practice:**
     - For time-series data, use time-based splits where the training set includes data up to a certain date, and the test set includes data after that date. This helps mimic the real-world scenario where future information is unknown during training.

### 4. **Cross-Validation Strategies:**
   - **Best Practice:**
     - If cross-validation is used, ensure that each fold represents a time period and that the training set for each fold precedes the test set. This prevents the model from being exposed to future information during cross-validation.

### 5. **Feature Engineering Awareness:**
   - **Best Practice:**
     - Be mindful of the features used during model training, especially those derived from transformations, aggregations, or statistical measures. These features should only involve information available at the time of prediction.

### 6. **Handle Missing Values Appropriately:**
   - **Best Practice:**
     - Address missing values using techniques that do not use information from the test set. For example, impute missing values based on statistics calculated only from the training set.

### 7. **Encode Categorical Variables Consistently:**
   - **Best Practice:**
     - If categorical variables are encoded, use consistent encoding methods across the training and test sets. Avoid encoding based on the entire dataset, as this may introduce information from the test set into the training set.

### 8. **Avoid Data Leakage in Feature Selection:**
   - **Best Practice:**
     - If feature selection is performed, ensure that it is done based on information available only in the training set. Do not use the test set or future information during the feature selection process.

### 9. **Documentation and Validation:**
   - **Best Practice:**
     - Document all data preprocessing steps, transformations, and decisions made during model development.
     - Regularly validate the model on a separate dataset or a held-out portion of the data to ensure it generalizes well to new, unseen information.

### 10. **Understand the Domain:**
   - **Best Practice:**
     - Gain a deep understanding of the domain and the data to identify potential sources of leakage. Collaborate with domain experts to ensure that the model development process aligns with realistic scenarios.

### 11. **Regular Model Evaluation:**
   - **Best Practice:**
     - Regularly evaluate the model's performance on a separate validation set or a test set that the model has never seen. This helps detect any unexpected changes in model behavior.

### 12. **Use Data Leak Detection Tools:**
   - **Best Practice:**
     - Employ data leak detection tools or libraries that can help identify potential sources of data leakage during the model development process.

### Conclusion:
Preventing data leakage requires diligence, a good understanding of the data, and careful documentation of the preprocessing steps. Regular validation, adherence to best practices, and an awareness of potential pitfalls are essential to building robust machine learning models.

## Q5. What is a confusion matrix, and what does it tell you about the performance of a classification model?

A confusion matrix is a table that is used to evaluate the performance of a classification model. It provides a detailed breakdown of the model's predictions, comparing them to the actual classes in the dataset. The matrix is particularly useful for understanding the types and frequency of errors made by the model.

The confusion matrix is structured as follows:

```
               Actual Class 1    Actual Class 2
Predicted Class 1    True Positive    False Positive
Predicted Class 2    False Negative   True Negative
```

Here are the components of the confusion matrix:

1. **True Positive (TP):**
   - Instances where the model correctly predicts the positive class.

2. **True Negative (TN):**
   - Instances where the model correctly predicts the negative class.

3. **False Positive (FP):**
   - Instances where the model incorrectly predicts the positive class (Type I error).

4. **False Negative (FN):**
   - Instances where the model incorrectly predicts the negative class (Type II error).

### Key Metrics Derived from the Confusion Matrix:

1. **Accuracy:**
   - The overall accuracy of the model, calculated as \((TP + TN) / (TP + TN + FP + FN)\). It represents the proportion of correctly classified instances.

2. **Precision (Positive Predictive Value):**
   - Precision measures the accuracy of the positive predictions and is calculated as \(TP / (TP + FP)\). It indicates the ability of the model to avoid false positives.

3. **Recall (Sensitivity, True Positive Rate):**
   - Recall measures the proportion of actual positive instances that are correctly predicted by the model and is calculated as \(TP / (TP + FN)\). It indicates the model's ability to capture all positive instances.

4. **Specificity (True Negative Rate):**
   - Specificity measures the proportion of actual negative instances that are correctly predicted by the model and is calculated as \(TN / (TN + FP)\).

5. **F1-Score:**
   - The harmonic mean of precision and recall, calculated as \(2 \times (Precision \times Recall) / (Precision + Recall)\). It provides a balance between precision and recall.

### Interpreting the Confusion Matrix:

- **Top Left (True Positive):**
   - Instances correctly predicted as positive.

- **Bottom Right (True Negative):**
   - Instances correctly predicted as negative.

- **Top Right (False Positive):**
   - Instances incorrectly predicted as positive (Type I error).

- **Bottom Left (False Negative):**
   - Instances incorrectly predicted as negative (Type II error).

### Example:

Consider a binary classification problem where a model predicts whether emails are spam (positive) or not (negative). The confusion matrix may look like this:

```
               Actual Not Spam   Actual Spam
Predicted Not Spam    850             20
Predicted Spam        30              100
```

- True Positive (TP): 100 (Spam emails correctly predicted as spam)
- True Negative (TN): 850 (Non-spam emails correctly predicted as non-spam)
- False Positive (FP): 30 (Non-spam emails incorrectly predicted as spam)
- False Negative (FN): 20 (Spam emails incorrectly predicted as non-spam)

By analyzing these values and using the derived metrics, you can gain insights into the strengths and weaknesses of the classification model and make informed decisions about its performance.

## Q6. Explain the difference between precision and recall in the context of a confusion matrix.

Precision and recall are two key metrics derived from a confusion matrix, and they provide insights into different aspects of a classification model's performance.

### Precision:

- **Definition:**
  - Precision, also known as Positive Predictive Value, measures the accuracy of the positive predictions made by the model.

- **Formula:**
  - \(\text{Precision} = \frac{\text{True Positive (TP)}}{\text{True Positive (TP) + False Positive (FP)}}\)

- **Interpretation:**
  - Precision answers the question: "Of all instances predicted as positive, how many were actually positive?"
  - It indicates the model's ability to avoid false positives.

### Recall (Sensitivity, True Positive Rate):

- **Definition:**
  - Recall measures the proportion of actual positive instances that are correctly predicted by the model.

- **Formula:**
  - \(\text{Recall} = \frac{\text{True Positive (TP)}}{\text{True Positive (TP) + False Negative (FN)}}\)

- **Interpretation:**
  - Recall answers the question: "Of all actual positive instances, how many were correctly predicted as positive?"
  - It indicates the model's ability to capture all positive instances.

### Key Differences:

1. **Focus:**
   - **Precision:**
     - Focuses on the accuracy of positive predictions.
   - Concerned with avoiding false positives.
   - Precision is relevant when the cost of false positives is high.

   - **Recall:**
     - Focuses on capturing all positive instances.
     - Concerned with avoiding false negatives.
     - Recall is relevant when missing positive instances is costly.

2. **Formula:**
   - **Precision:**
     - \(\text{Precision} = \frac{\text{True Positive (TP)}}{\text{True Positive (TP) + False Positive (FP)}}\)
     - Precision is calculated with respect to the total number of instances predicted as positive.

   - **Recall:**
     - \(\text{Recall} = \frac{\text{True Positive (TP)}}{\text{True Positive (TP) + False Negative (FN)}}\)
     - Recall is calculated with respect to the total number of actual positive instances.

3. **Trade-off:**
   - **Precision:**
     - Increasing precision may lead to a decrease in recall, and vice versa. There is often a trade-off between precision and recall.
     - Precision is sensitive to false positives.

   - **Recall:**
     - Increasing recall may lead to a decrease in precision, and vice versa.
     - Recall is sensitive to false negatives.

### Example:

Consider a medical test for a rare disease:

- **Precision:**
  - High precision means that if the test predicts the presence of the disease, it is likely correct.
  - A false positive in this context might lead to unnecessary treatments or interventions.

- **Recall:**
  - High recall means that the test is effective at capturing all instances of the disease.
  - A false negative in this context might result in a person with the disease going undetected.

In summary, precision and recall provide complementary insights into a model's performance, helping to assess its ability to make accurate positive predictions (precision) and capture all positive instances (recall). The choice between precision and recall depends on the specific goals and requirements of the application.

## Q7. How can you interpret a confusion matrix to determine which types of errors your model is making?

Interpreting a confusion matrix allows you to understand the types of errors your model is making and gain insights into its performance. The confusion matrix provides a detailed breakdown of predictions compared to the actual classes in the dataset. Let's use a binary classification example for clarity:

Consider the following confusion matrix:

```
               Actual Negative   Actual Positive
Predicted Negative      900               50
Predicted Positive      30                120
```

### Key Components:

1. **True Positive (TP):**
   - Instances correctly predicted as positive: 120 (Actual Positive, Predicted Positive)

2. **True Negative (TN):**
   - Instances correctly predicted as negative: 900 (Actual Negative, Predicted Negative)

3. **False Positive (FP):**
   - Instances incorrectly predicted as positive (Type I error): 30 (Actual Negative, Predicted Positive)

4. **False Negative (FN):**
   - Instances incorrectly predicted as negative (Type II error): 50 (Actual Positive, Predicted Negative)

### Interpretation:

1. **Accuracy:**
   - Overall accuracy can be calculated as \((TP + TN) / (TP + TN + FP + FN)\). In this case, it's \((120 + 900) / (120 + 900 + 30 + 50) = 0.93\) or 93%.

2. **Precision (Positive Predictive Value):**
   - Precision is calculated as \(TP / (TP + FP)\). In this case, it's \(120 / (120 + 30) = 0.80\) or 80%. Precision measures the accuracy of positive predictions and tells you how many of the predicted positive instances are actually positive.

3. **Recall (Sensitivity, True Positive Rate):**
   - Recall is calculated as \(TP / (TP + FN)\). In this case, it's \(120 / (120 + 50) = 0.71\) or 71%. Recall measures the ability of the model to capture all actual positive instances.

4. **False Positive Rate (FPR):**
   - FPR is calculated as \(FP / (FP + TN)\). In this case, it's \(30 / (30 + 900) = 0.03\) or 3%. FPR indicates the proportion of actual negatives incorrectly predicted as positive.

### Error Analysis:

- **Type I Error (False Positive):**
  - The model incorrectly predicted 30 instances as positive when they were actually negative. This might lead to unnecessary actions or interventions for those instances.

- **Type II Error (False Negative):**
  - The model incorrectly predicted 50 instances as negative when they were actually positive. This might result in missing potentially important instances, which could have adverse consequences.

### Recommendations:

- **Improving Precision:**
  - If the cost of false positives is high (e.g., in medical diagnoses), you may want to focus on improving precision.

- **Improving Recall:**
  - If missing positive instances is more costly (e.g., in fraud detection), you may want to focus on improving recall.

- **Trade-off Considerations:**
  - There is often a trade-off between precision and recall. Adjusting the classification threshold or exploring model parameters can help find a balance that aligns with the specific goals of your application.

By carefully interpreting the confusion matrix and associated metrics, you can make informed decisions about model performance, identify areas for improvement, and tailor your model to better meet the requirements of the specific task or application.

## Q8. What are some common metrics that can be derived from a confusion matrix, and how are they calculated?

In [None]:
Several common metrics can be derived from a confusion matrix to evaluate the performance of a classification model. Here are some key metrics and their calculations:

### 1. Accuracy:

- **Definition:**
  - Overall accuracy of the model, representing the proportion of correctly classified instances.

- **Formula:**
  - \(\text{Accuracy} = \frac{\text{True Positive (TP) + True Negative (TN)}}{\text{Total Population}}\)

### 2. Precision (Positive Predictive Value):

- **Definition:**
  - Precision measures the accuracy of positive predictions and is relevant when avoiding false positives is crucial.

- **Formula:**
  - \(\text{Precision} = \frac{\text{True Positive (TP)}}{\text{True Positive (TP) + False Positive (FP)}}\)

### 3. Recall (Sensitivity, True Positive Rate):

- **Definition:**
  - Recall measures the ability of the model to capture all actual positive instances and is relevant when avoiding false negatives is crucial.

- **Formula:**
  - \(\text{Recall} = \frac{\text{True Positive (TP)}}{\text{True Positive (TP) + False Negative (FN)}}\)

### 4. Specificity (True Negative Rate):

- **Definition:**
  - Specificity measures the proportion of actual negative instances that are correctly predicted by the model.

- **Formula:**
  - \(\text{Specificity} = \frac{\text{True Negative (TN)}}{\text{True Negative (TN) + False Positive (FP)}}\)

### 5. F1-Score:

- **Definition:**
  - F1-Score is the harmonic mean of precision and recall, providing a balance between the two metrics.

- **Formula:**
  - \(\text{F1-Score} = \frac{2 \times \text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}\)

### 6. False Positive Rate (FPR):

- **Definition:**
  - FPR measures the proportion of actual negative instances incorrectly predicted as positive.

- **Formula:**
  - \(\text{FPR} = \frac{\text{False Positive (FP)}}{\text{False Positive (FP) + True Negative (TN)}}\)

### 7. False Negative Rate (FNR):

- **Definition:**
  - FNR measures the proportion of actual positive instances incorrectly predicted as negative.

- **Formula:**
  - \(\text{FNR} = \frac{\text{False Negative (FN)}}{\text{False Negative (FN) + True Positive (TP)}}\)

### 8. Matthews Correlation Coefficient (MCC):

- **Definition:**
  - MCC provides a balanced measure that considers true and false positives and negatives.

- **Formula:**
  - \(\text{MCC} = \frac{\text{TP} \times \text{TN} - \text{FP} \times \text{FN}}{\sqrt{(\text{TP} + \text{FP})(\text{TP} + \text{FN})(\text{TN} + \text{FP})(\text{TN} + \text{FN})}}\)

### 9. Area Under the Receiver Operating Characteristic (ROC) Curve (AUC-ROC):

- **Definition:**
  - AUC-ROC measures the area under the ROC curve, which plots the true positive rate against the false positive rate.

- **Calculation:**
  - AUC-ROC is calculated by integrating the area under the ROC curve.

These metrics provide different perspectives on a model's performance, and the choice of which to emphasize depends on the specific goals and requirements of the application. It's common to use a combination of these metrics to obtain a comprehensive understanding of a classification model's behavior.