# Supervised Learning: More Classification

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC, LinearSVC
from sklearn.metrics import classification_report, confusion_matrix

- Practice implementing classifier algorithms
- Make or find 2 small datasets that are clean (or walk through/prompt how to clean the data)

Please write each of the following problems:
1. For the first dataset, setup an outline/skeleton code of how to assemble a basic LinearSVC
2. For the second, have them compare the different boundaries found by using `LinearSVC`, `SVC(kernel='linear')`, and `SVC(kernel='rbf')`

## LinearSVC Problem
- example/guidance up front
- discuss the problem/dataset
- justify the use of a LinearSVC
- discuss results using a confusion matrix and classification report
- need to define precision vs. recall

## Comparing Kernels (20–30 Minute Guided Section)

In this section, we'll explore how different **Support Vector Machine (SVM)** kernels separate data. Kernels allow SVMs to handle more complex decision boundaries — some straight lines, some curves.

**Goal:** By the end of this exercise, you should be able to:
- Understand what kernels do and why we might choose one over another.
- Compare how linear and non-linear boundaries look.
- Evaluate each model using confusion matrices and classification reports.

### Background
A **kernel** is a mathematical function that transforms data into a higher-dimensional space where it’s easier to separate classes.

| Kernel | Description | Typical Shape |
|--|--|--|
| Linear | Draws a straight line (or hyperplane) between classes | Straight boundary |
| Polynomial | Curved boundaries that can adjust for complex patterns | Curved depending on degree |
| RBF (Radial Basis Function) | Measures similarity using distance — good for circular or non-linear data | Smooth, circular boundaries |

We’ll compare **LinearSVC**, **SVC(kernel='linear')**, and **SVC(kernel='rbf')** using the same dataset to see how the choice of kernel changes both the boundary and model accuracy.

In [None]:
# Step 1: Create a simple non-linear dataset
from sklearn.datasets import make_moons
X, y = make_moons(n_samples=300, noise=0.25, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

plt.figure(figsize=(6, 4))
plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap='coolwarm', edgecolor='k')
plt.title('Our Nonlinear Dataset: Two Interlocking Moons')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

### Question 1: LinearSVC

Let’s start simple with a **LinearSVC** model. This classifier tries to draw a straight line that separates the data.

**Task:**
- Fit a LinearSVC on `X_train` and `y_train`.
- Predict on the test set.
- Compute and print a confusion matrix and classification report.
- Discuss what you observe — is a straight line enough here?

In [None]:
# LinearSVC Implementation
linear_model = LinearSVC(random_state=42, max_iter=5000)
linear_model.fit(X_train, y_train)
y_pred_linear = linear_model.predict(X_test)

print("Confusion Matrix (LinearSVC):\n", confusion_matrix(y_test, y_pred_linear))
print("\nClassification Report (LinearSVC):\n", classification_report(y_test, y_pred_linear))

### Precision vs. Recall
Before we interpret the results, let’s define two key metrics:
- **Precision**: Out of all the positive predictions the model made, how many were actually correct?
  $$ Precision = \frac{True\ Positives}{True\ Positives + False\ Positives} $$
- **Recall**: Out of all the actual positive examples, how many did the model correctly identify?
  $$ Recall = \frac{True\ Positives}{True\ Positives + False\ Negatives} $$

A perfect model has both precision and recall equal to 1.0, but in practice, improving one can lower the other.

### Question 2: SVC (Linear Kernel)
Now, let’s use `SVC(kernel='linear')` instead of LinearSVC. While they’re similar, `SVC` gives us a bit more flexibility and sometimes a cleaner margin.

**Task:**
- Train an `SVC(kernel='linear')` on the same data.
- Compare its performance and boundary to the `LinearSVC` results.
- Plot the decision boundary.

In [None]:
svc_linear = SVC(kernel='linear', C=1.0, random_state=42)
svc_linear.fit(X_train, y_train)
y_pred_svc_linear = svc_linear.predict(X_test)

print("Confusion Matrix (SVC - Linear Kernel):\n", confusion_matrix(y_test, y_pred_svc_linear))
print("\nClassification Report (SVC - Linear Kernel):\n", classification_report(y_test, y_pred_svc_linear))

### Question 3: SVC (RBF Kernel)
The **Radial Basis Function (RBF)** kernel can draw **curved** boundaries, which makes it powerful for non-linear data.

**Task:**
- Train `SVC(kernel='rbf')`.
- Plot its boundary and evaluate its performance using a confusion matrix and classification report.

In [None]:
svc_rbf = SVC(kernel='rbf', gamma=0.7, C=1.0, random_state=42)
svc_rbf.fit(X_train, y_train)
y_pred_rbf = svc_rbf.predict(X_test)

print("Confusion Matrix (SVC - RBF Kernel):\n", confusion_matrix(y_test, y_pred_rbf))
print("\nClassification Report (SVC - RBF Kernel):\n", classification_report(y_test, y_pred_rbf))

### Visualizing All Kernels Together
Now let’s compare all three classifiers visually.

We’ll use **DecisionBoundaryDisplay** from scikit-learn to draw each SVM’s decision regions side by side.

In [None]:
from sklearn.inspection import DecisionBoundaryDisplay

models = {
    'LinearSVC': linear_model,
    'SVC (Linear Kernel)': svc_linear,
    'SVC (RBF Kernel)': svc_rbf
}

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

for ax, (name, model) in zip(axes, models.items()):
    DecisionBoundaryDisplay.from_estimator(
        model, X_train, response_method='predict', cmap='coolwarm', alpha=0.8, ax=ax
    )
    ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap='coolwarm', edgecolor='k')
    ax.set_title(name)
    ax.set_xlabel('Feature 1')
    ax.set_ylabel('Feature 2')

plt.suptitle('Comparing Decision Boundaries of Different Kernels', fontsize=14)
plt.tight_layout()
plt.show()

### Final Question: Interpreting the Results
1. Which model best captured the moon-shaped data?
2. How did the shape of each decision boundary affect accuracy?
3. Why might simpler models (like LinearSVC) still be valuable even when their accuracy is lower?