## Model Evaluation and Improvement

### Cross-Validation

This Python code uses the scikit-learn library to perform five-fold cross-validation on the Iris dataset using a Random Forest Classifier. It then prints out the cross-validation scores for each fold and their mean.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris

# Load iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Define the model
clf = RandomForestClassifier(random_state=42)

# Perform 5-fold cross-validation
scores = cross_val_score(clf, X, y, cv=5)

print("Cross-validation scores for each fold: ", scores)

# A common way to summarize the cross-validation accuracy is to compute the mean
print("Mean cross-validation score: ", scores.mean())

# Change the number of folds to 3
scores = cross_val_score(clf, X, y, cv=3)

print("Cross-validation scores for each fold: ", scores)
print("Mean cross-validation score: ", scores.mean())


### Grid Search

In [None]:
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris

# Load iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Define the model
model = RandomForestClassifier()

# Grid Search
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 5, 10]
}

grid_search = GridSearchCV(model, param_grid=param_grid, cv=3)
grid_search.fit(X, y)

print("Grid Search Best Parameters:", grid_search.best_params_)
print("Grid Search Best Score:", grid_search.best_score_)

# Random Search
param_dist = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 5, 10]
}

random_search = RandomizedSearchCV(model, param_distributions=param_dist, n_iter=3, cv=3)
random_search.fit(X, y)

print("Random Search Best Parameters:", random_search.best_params_)
print("Random Search Best Score:", random_search.best_score_)


For Grid Search, we define a parameter grid param_grid that specifies the different values we want to try for the n_estimators and max_depth hyperparameters. We then create a GridSearchCV object, passing the model and parameter grid, and specifying the number of cross-validation folds with cv=3. We fit the Grid Search object to the data using fit(X, y).

After the Grid Search is performed, we print the best parameters found (grid_search.best_params_) and the best score (grid_search.best_score_).

For Random Search, we define a parameter distribution param_dist similar to the parameter grid in Grid Search, but without specifying all possible combinations. We create a RandomizedSearchCV object, passing the model, parameter distribution, and the number of iterations n_iter=3 (which specifies the number of random combinations to try). We fit the Random Search object to the data using fit(X, y).

After the Random Search is performed, we print the best parameters found (random_search.best_params_) and the best score (random_search.best_score_).

#### Analyzing the result of grid search

Analyzing the result of grid search is an important step in hyperparameter tuning. Grid search exhaustively explores a defined grid of hyperparameter combinations and provides the best parameter values that maximize the performance metric. Once the grid search is complete, it is crucial to analyze the results to gain insights into the performance of different hyperparameter settings and make informed decisions.

To analyze the result of grid search, you can follow these steps:

1. Access the grid search results: After performing grid search, you can access the results using the cv_results_ attribute of the grid search object. This attribute provides a dictionary-like object containing various information about the search process, including the hyperparameter settings, mean scores, and standard deviations.
2. Extract relevant information: Extract the relevant information from the cv_results_ attribute based on your analysis requirements. Commonly used information includes the hyperparameter values, mean test scores, standard deviations, and any other relevant metrics.
3. Visualize the results: Create visualizations to better understand the impact of different hyperparameters on the model's performance. You can use plots such as bar plots, line plots, or heatmaps to visualize the mean scores or other relevant metrics across different hyperparameter settings.
4. Interpret the results: Analyze the visualizations to identify the hyperparameter settings that result in the best performance. Look for any trends, patterns, or trade-offs among the different hyperparameters. Consider the impact of each hyperparameter on the model's performance and assess whether the chosen parameter values make sense based on the data and domain knowledge.
5. Make decisions and refine the hyperparameter search: Based on the analysis, make decisions about which hyperparameter settings to choose. You may select the combination with the highest mean test score or consider a trade-off between performance and model complexity. If needed, refine the hyperparameter search by adjusting the grid space or adding new hyperparameters to explore.

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier

# Load iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Define the model
model = RandomForestClassifier()

# Define the parameter grid for grid search
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 5, 10]
}

# Perform grid search
grid_search = GridSearchCV(model, param_grid=param_grid, cv=3)
grid_search.fit(X, y)

# Access the grid search results
results = grid_search.cv_results_

# Convert results to a Pandas DataFrame
results_pd = pd.DataFrame(results)

# Print the first five rows of the DataFrame
display(results_pd.head())

# Extract relevant information
param_combinations = results['params']
mean_test_scores = results['mean_test_score']
std_test_scores = results['std_test_score']

# Find the index of the highest mean test score
best_index = np.argmax(mean_test_scores)

# Get the highest mean test score and its corresponding parameters
best_score = mean_test_scores[best_index]
best_params = param_combinations[best_index]


# Interpret the results and make decisions based on the analysis
print("Best Mean Test Score:", best_score)
print("Best Parameters:", best_params)


# Extract relevant information
param_combinations = results['params']
mean_test_scores = results['mean_test_score']

# Reshape the mean test scores into a 2D array
scores = np.array(mean_test_scores).reshape(len(param_grid['n_estimators']), len(param_grid['max_depth']))

# Plot the mean cross-validation scores as a heatmap
plt.figure()
sns.heatmap(scores, annot=True, fmt=".3f", cmap="viridis",
            xticklabels=param_grid['max_depth'], yticklabels=param_grid['n_estimators'])
plt.xlabel('max_depth')
plt.ylabel('n_estimators')
plt.title('Grid Search Results (Mean Test Scores)')
plt.show()

This code performs grid search to find the best hyperparameters for a RandomForestClassifier model using the Iris dataset. Let's go through the code and understand it step by step:

1. Loading the dataset:
    The Iris dataset is loaded using load_iris() function from sklearn.datasets.
    The input features are assigned to X, and the target variable is assigned to y.

2. Defining the model:
    The RandomForestClassifier model is initialized.
    Defining the parameter grid:
        The parameter grid is defined using param_grid dictionary. It specifies the values to be searched for n_estimators and max_depth.

3. Performing grid search:
    GridSearchCV is used to perform grid search. The model, parameter grid, and the number of cross-validation folds (cv) are passed as arguments.
    The grid search is executed by calling fit() on the grid search object, passing the input features (X) and target variable (y).

4. Accessing the grid search results:
    The cv_results_ attribute of the grid search object is accessed to obtain the results of the grid search.
    The results are stored in the results variable.

5. Converting results to a DataFrame:
    The results are converted to a Pandas DataFrame for easier analysis and visualization.
    results_pd is a DataFrame created from results.

6. Printing the first five rows of the DataFrame:
    The head() method is used to display the first five rows of the DataFrame.

7. Extracting relevant information:
    The relevant information from the results is extracted, including the parameter combinations, mean test scores, and standard deviations.
    These values are stored in param_combinations, mean_test_scores, and std_test_scores respectively.

8. Finding the best mean test score:
    The index of the highest mean test score is found using np.argmax().
    The best mean test score and its corresponding parameters are extracted using the index.

9. Printing the best mean test score and parameters:

10. Reshaping mean test scores:
    The mean test scores are reshaped into a 2D array, scores, using np.array.reshape().
    The dimensions of the array are based on the number of values for n_estimators and max_depth in the parameter grid.

11. Plotting the mean cross-validation scores as a heatmap:
    A heatmap is created using sns.heatmap(), which takes scores as the data to be plotted.
    The annot=True argument adds annotations to each cell with the mean test score.
    The fmt=".3f" argument specifies the formatting of the annotations.
    Colormap cmap="viridis" is used to visualize the scores.
    The x-axis and y-axis labels are set using xticklabels and yticklabels respectively.
    The heatmap is displayed using plt.show().

The code allows you to analyze the grid search results by printing the DataFrame, identifying the best mean test score and parameters, and visualizing the mean test scores using a heatmap. This information helps you make decisions about which hyperparameters to choose for your model.

## Learning and Validation Curves

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import learning_curve, validation_curve
from sklearn.ensemble import RandomForestClassifier

# Load iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Define the model
model = RandomForestClassifier()

# Learning Curves
train_sizes, train_scores, val_scores = learning_curve(model, X, y, train_sizes=np.linspace(0.1, 1.0, 5), cv=3)

# Calculate mean and standard deviation of training scores and validation scores
train_scores_mean = np.mean(train_scores, axis=1)
train_scores_std = np.std(train_scores, axis=1)
val_scores_mean = np.mean(val_scores, axis=1)
val_scores_std = np.std(val_scores, axis=1)

# Plot learning curves
plt.figure()
plt.title("Learning Curves")
plt.xlabel("Training Examples")
plt.ylabel("Score")
plt.grid()

plt.fill_between(train_sizes, train_scores_mean - train_scores_std, train_scores_mean + train_scores_std, alpha=0.1, color="r")
plt.fill_between(train_sizes, val_scores_mean - val_scores_std, val_scores_mean + val_scores_std, alpha=0.1, color="g")

plt.plot(train_sizes, train_scores_mean, 'o-', color="r", label="Training Score")
plt.plot(train_sizes, val_scores_mean, 'o-', color="g", label="Validation Score")

plt.legend(loc="best")
plt.show()

# Validation Curves
param_range = np.arange(1, 11)
train_scores, val_scores = validation_curve(model, X, y, param_name="max_depth", param_range=param_range, cv=3)

# Calculate mean and standard deviation of training scores and validation scores
train_scores_mean = np.mean(train_scores, axis=1)
train_scores_std = np.std(train_scores, axis=1)
val_scores_mean = np.mean(val_scores, axis=1)
val_scores_std = np.std(val_scores, axis=1)

# Plot validation curves
plt.figure()
plt.title("Validation Curves")
plt.xlabel("max_depth")
plt.ylabel("Score")
plt.grid()

plt.fill_between(param_range, train_scores_mean - train_scores_std, train_scores_mean + train_scores_std, alpha=0.1, color="r")
plt.fill_between(param_range, val_scores_mean - val_scores_std, val_scores_mean + val_scores_std, alpha=0.1, color="g")

plt.plot(param_range, train_scores_mean, 'o-', color="r", label="Training Score")
plt.plot(param_range, val_scores_mean, 'o-', color="g", label="Validation Score")

plt.legend(loc="best")
plt.show()


For learning curves, we use the learning_curve function from scikit-learn to calculate the training and validation scores for different training set sizes. We specify the model, input data X, target data y, the sizes of the training sets to use with train_sizes, and the number of cross-validation folds with cv=3.

After obtaining the scores, we calculate the mean and standard deviation for the training and validation scores. Then, we plot the learning curves, showing the mean scores as well as the shaded regions indicating the standard deviation.

For validation curves, we use the validation_curve function from scikit-learn to calculate the training and validation scores for different values of a hyperparameter (max_depth in this case). We specify the model, input data X, target data y, the name of the hyperparameter with param_name, the range of values to test with param_range, and the number of cross-validation folds with cv=3.

After obtaining the scores, we calculate the mean and standard deviation for the training and validation scores. Then, we plot the validation curves, showing the mean scores as well as the shaded regions indicating the standard deviation.

**To produce learning curves and validation curves, you can follow these general steps:**

1. Load the dataset: Begin by loading your dataset that you will use for training and evaluation.
2. Define the model: Choose the machine learning model you want to analyze and fine-tune.

3. Learning Curves:
    - Use the learning_curve function to generate learning curves. Specify the model, input features (X), target variable (y), the sizes of the training sets to use (train_sizes), and the number of cross-validation folds (cv).
    - Calculate the mean and standard deviation of the training and validation scores across the different training set sizes.
    - Plot the learning curves using the mean scores. You can also visualize the standard deviation using shaded regions.

3. Validation Curves:
    - Use the validation_curve function to generate validation curves. Specify the model, input features (X), target variable (y), the name of the hyperparameter to tune (param_name), the range of values to test for the hyperparameter (param_range), and the number of cross-validation folds (cv).
    - Calculate the mean and standard deviation of the training and validation scores across the different hyperparameter values.
    - Plot the validation curves using the mean scores. You can also visualize the standard deviation using shaded regions.

### Evaluation Metrics and Scoring

#### Metrics for Binary Classification

When working with binary classification problems, various metrics are used to evaluate the performance of machine learning models. These metrics provide insights into the model's ability to distinguish between the two classes and make accurate predictions. Let's explore some commonly used metrics for binary classification:

1. Accuracy: Accuracy is the most straightforward metric, representing the proportion of correct predictions out of the total number of predictions. It is calculated by dividing the number of correct predictions by the total number of samples.
2. Precision: Precision measures the proportion of true positive predictions (correctly predicted positive class) out of all positive predictions. It focuses on the accuracy of positive predictions and helps evaluate the model's ability to avoid false positives.
3. Recall (Sensitivity/True Positive Rate): Recall calculates the proportion of true positive predictions out of all actual positive samples. It quantifies the model's ability to find all positive instances and avoid false negatives.
5. F1 Score: The F1 score is the harmonic mean of precision and recall. It provides a balanced measure of the model's performance by considering both precision and recall. It is useful when there is an imbalance between the classes.
6. Receiver Operating Characteristic (ROC) Curve: The ROC curve is a graphical representation of the trade-off between true positive rate (sensitivity) and false positive rate (1-specificity) at different classification thresholds. It helps visualize the model's performance across various threshold values.
7. Area Under the ROC Curve (AUC-ROC): The AUC-ROC is a scalar value that quantifies the overall performance of a binary classification model. It represents the area under the ROC curve and provides a measure of the model's ability to distinguish between positive and negative classes.


In [None]:
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import metrics

# Load dataset
cancer = datasets.load_breast_cancer()
X = cancer.data
y = cancer.target

# Split into training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Build and fit model
model = LogisticRegression(max_iter=5000)
model.fit(X_train, y_train)

# Predict on test data
y_pred = model.predict(X_test)

# Calculate metrics
accuracy = metrics.accuracy_score(y_test, y_pred)
precision = metrics.precision_score(y_test, y_pred)
recall = metrics.recall_score(y_test, y_pred)
f1 = metrics.f1_score(y_test, y_pred)
auc_roc = metrics.roc_auc_score(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F1 Score: {f1}")
print(f"AUC-ROC: {auc_roc}")

# Confusion matrix
conf_matrix = metrics.confusion_matrix(y_test, y_pred)

# Plot the confusion matrix
plt.imshow(conf_matrix, cmap='Blues', interpolation='nearest')
plt.colorbar()
plt.xticks([0, 1], ['Negative', 'Positive'])
plt.yticks([0, 1], ['Negative', 'Positive'])
plt.xlabel('Predicted label')
plt.ylabel('True label')
plt.title('Confusion Matrix')

# Add labels to the plot
class_names = ['TN', 'FP', 'FN', 'TP']
thresh = conf_matrix.max() / 2.
for i in range(conf_matrix.shape[0]):
    for j in range(conf_matrix.shape[1]):
        plt.text(j, i, f"{class_names[i*2 + j]} = {conf_matrix[i, j]}",
                 horizontalalignment="center",
                 color="white" if conf_matrix[i, j] > thresh else "black")

plt.show()

In [None]:
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve, roc_auc_score

# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Build and fit a logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predict probabilities of positive class
y_pred_proba = model.predict_proba(X_test)[:, 1]

# Calculate the false positive rate (fpr) and true positive rate (tpr)
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)

# Calculate the area under the ROC curve (AUC-ROC)
auc_roc = roc_auc_score(y_test, y_pred_proba)

# Plot the ROC curve
plt.plot(fpr, tpr, label='ROC curve (AUC = %0.2f)' % auc_roc)
plt.plot([0, 1], [0, 1], 'k--')  # Random classifier line
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc='lower right')
plt.show()

# Print the AUC-ROC score
print("AUC-ROC:", auc_roc)


The Receiver Operating Characteristic (ROC) curve is a graphical representation of the performance of a binary classifier as the discrimination threshold is varied. It plots the true positive rate (TPR) against the false positive rate (FPR) at different threshold settings. The TPR is also known as sensitivity or recall, while the FPR is the complement of the specificity.

The ROC curve is useful for evaluating the trade-off between the true positive rate and the false positive rate of a classifier. It provides a visual representation of how well the classifier is able to distinguish between the positive and negative classes at different decision thresholds.

Interpreting the ROC curve:

The closer the ROC curve is to the top-left corner of the plot, the better the classifier's performance. This indicates high TPR and low FPR across various threshold settings.
A random classifier would have a ROC curve that is a diagonal line from the bottom-left to the top-right, indicating an equal chance of correctly classifying positive and negative instances.
A classifier that performs worse than random would have an ROC curve that falls below the diagonal line.
The point on the ROC curve where the TPR and FPR intersect depends on the decision threshold. By adjusting the threshold, you can achieve different trade-offs between TPR and FPR.
The Area Under the ROC Curve (AUC-ROC) is a numerical measure that quantifies the performance of a classifier across all possible threshold settings. It represents the entire two-dimensional area underneath the ROC curve. The AUC-ROC value ranges from 0 to 1, where:

A perfect classifier has an AUC-ROC score of 1, indicating that it achieves a TPR of 1 and an FPR of 0 across all thresholds.
A random classifier has an AUC-ROC score of 0.5, indicating no discriminatory power, as it performs as well as flipping a coin.
A classifier with an AUC-ROC score above 0.5 has some level of discriminatory power, with higher values indicating better performance.
In summary, the ROC curve visually displays the trade-off between true positive rate and false positive rate at various decision thresholds, while the AUC-ROC provides a single metric to assess the overall performance of the classifier.

#### Metrics for Multiclass Classification


In [None]:
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import metrics

# Generate a synthetic dataset with four classes
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_classes=4, random_state=42)

# Split into training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Build and fit model
model = LogisticRegression(max_iter=5000)
model.fit(X_train, y_train)

# Predict on test data
y_pred = model.predict(X_test)

# Calculate metrics
accuracy = metrics.accuracy_score(y_test, y_pred)
precision = metrics.precision_score(y_test, y_pred, average='macro')
recall = metrics.recall_score(y_test, y_pred, average='macro')
f1 = metrics.f1_score(y_test, y_pred, average='macro')

print(f"Accuracy: {accuracy}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F1 Score: {f1}")

# Confusion matrix
conf_matrix = metrics.confusion_matrix(y_test, y_pred)

# Plot the confusion matrix
plt.imshow(conf_matrix, cmap='Blues', interpolation='nearest')
plt.colorbar()
plt.xticks(range(4))
plt.yticks(range(4))
plt.xlabel('Predicted label')
plt.ylabel('True label')
plt.title('Confusion Matrix')

# Add labels to the plot
for i in range(conf_matrix.shape[0]):
    for j in range(conf_matrix.shape[1]):
        plt.text(j, i, str(conf_matrix[i, j]),
                 horizontalalignment="center",
                 color="white" if conf_matrix[i, j] > conf_matrix.max() / 2. else "black")

plt.show()



#### Regression Metrics

When working with regression problems, various metrics are used to evaluate the performance of machine learning models. These metrics provide insights into the model's ability to make accurate predictions. Let's explore some commonly used metrics for regression:

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
import numpy as np

# Load dataset
california = datasets.fetch_california_housing()
X = california.data
y = california.target

# Split into training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Build and fit model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict on test data
y_pred = model.predict(X_test)

# Calculate metrics
mae = metrics.mean_absolute_error(y_test, y_pred)
mse = metrics.mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = metrics.r2_score(y_test, y_pred)

print(f"Mean Absolute Error: {mae}")
print(f"Mean Squared Error: {mse}")
print(f"Root Mean Squared Error: {rmse}")
print(f"R2 Score: {r2}")


##### Decision Boundaries

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

# Generate a synthetic dataset
X, y = make_blobs(n_samples=(400, 50), cluster_std=[7.0, 2], random_state=22)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

# Initialize and fit the SVC model
svc = SVC(gamma=.05)
svc.fit(X_train, y_train)

# Generate grid of points to evaluate the model
xx, yy = np.meshgrid(np.linspace(X[:, 0].min() - 1, X[:, 0].max() + 1, 100),
                     np.linspace(X[:, 1].min() - 1, X[:, 1].max() + 1, 100))
Z = svc.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plot the decision boundaries and data points
plt.contourf(xx, yy, Z, levels=[-1, 0, 1], alpha=0.5, colors='blue')
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='rainbow')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Decision Boundaries')
plt.show()


The graph displays the decision boundaries generated by the Support Vector Classifier (SVC) model.

- Decision Boundaries: The decision boundaries are the lines or curves that separate the different classes in the dataset. They represent the regions where the model assigns a particular class label based on the input features. In the graph, the decision boundaries are depicted as a filled contour plot with the color blue. Each contour level (-1, 0, 1) corresponds to a different class or region.

- Data Points: The data points from the synthetic dataset are scattered across the plot. Each data point represents a sample with two feature values. The color of each data point represents its true class label. Different classes are indicated by different colors, creating a rainbow-like effect in the scatter plot.

By analyzing the graph, you can gain insights into the performance and behavior of the SVC model:

- Separation of Classes: The decision boundaries should ideally separate the different classes effectively. In other words, each class should be enclosed within its respective region on the plot. The graph allows you to visually assess how well the SVC model distinguishes between the classes.

- Overlapping or Misclassified Points: If some data points of different classes overlap or are misclassified, it indicates that the model may have difficulty separating those particular instances. The decision boundaries may not perfectly capture the underlying patterns in the data, leading to classification errors.

- Decision Thresholds: The contour levels (-1, 0, 1) on the decision boundaries represent the decision thresholds of the model. These thresholds determine how the model assigns class labels based on the predicted values or scores generated by the SVC. By observing the decision boundaries, you can visualize the regions where the model is more confident in assigning a specific class label.