# Supervised Learning: More Classification

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC, LinearSVC
from sklearn.metrics import classification_report, confusion_matrix

- Practice implementing classifier algorithms
- Make or find 2 small datasets that are clean (or walk through/prompt how to clean the data)

Please write each of the following problems:
1. For the first dataset, setup an outline/skeleton code of how to assemble a basic LinearSVC
2. For the second, have them compare the different boundaries found by using `LinearSVC`, `SVC(kernel='linear)`, `SVC(kernel='rbf')`, and `SVC(kernel='rbf')`


## LinearSVC Problem
- example/guidance up front
- discuss the problem/dataset
- justify the use of a LinearSVC
- discuss results using a confusion matrix and classification report
- need to define precision vs. recall

## Comparing Kernels (Dataset 2)
Imagine you’re a **botanist trying to classify flowers** growing in a garden. You measure two features for each flower:
- **Petal Width** (x-axis)
- **Petal Length** (y-axis)

Some flowers grow in **two curved patches** on opposite corners of the garden — a bit like crescent moons. We’ll try to teach a computer to separate them using different SVM kernels.

### What is a Kernel?
A kernel function helps an SVM draw boundaries between classes. The cool part is that kernels can transform data into higher-dimensional spaces — so curved data can become separable!

| Kernel | Shape of Boundary | When Useful |
|--|--|--|
| Linear | Straight line | Data looks mostly separable by a flat edge |
| Polynomial | Curved and flexible | Data has smooth bends (like petals curving outward) |
| RBF | Highly flexible | Data has tight circles or complex spirals |

Let’s generate a dataset that mimics our garden: two types of flowers arranged in a curved pattern.

In [None]:
# Generate non-linear dataset representing our 'flower garden' pattern
X2, y2 = make_moons(n_samples=300, noise=0.25, random_state=42)
X2_train, X2_test, y2_train, y2_test = train_test_split(X2, y2, test_size=0.3, random_state=42)

plt.figure(figsize=(6,4))
plt.scatter(X2_train[:,0], X2_train[:,1], c=y2_train, cmap='coolwarm', edgecolor='k')
plt.title('Flower Garden – Two Curved Patches')
plt.xlabel('Petal Width'); plt.ylabel('Petal Length')
plt.show()

### Question A – `LinearSVC`
Let’s start simple. Train a LinearSVC. What kind of line do you expect it to draw? Would it work well to split the two curved flower patches?

In [None]:
linear_moons = LinearSVC(random_state=42, max_iter=5000)
linear_moons.fit(X2_train, y2_train)
y2_pred_linear = linear_moons.predict(X2_test)

print('Confusion Matrix (LinearSVC):\n', confusion_matrix(y2_test, y2_pred_linear))
print('\nClassification Report (LinearSVC):\n', classification_report(y2_test, y2_pred_linear))

### Question B – `SVC(kernel='linear')`
Now try an SVM with a **linear kernel**. It uses a slightly different optimization method than LinearSVC, but should draw a similar line. What happens if the data has just a little bit of curve?

In [None]:
svc_lin = SVC(kernel='linear', C=1.0, random_state=42)
svc_lin.fit(X2_train, y2_train)
y2_pred_lin = svc_lin.predict(X2_test)

print('Confusion Matrix (SVC Linear):\n', confusion_matrix(y2_test, y2_pred_lin))
print('\nClassification Report (SVC Linear):\n', classification_report(y2_test, y2_pred_lin))

### Question C – `SVC(kernel='rbf')`
Next, let’s use the **RBF (Radial Basis Function)** kernel. This kernel can handle **curved or circular data**, which makes it great for flower patches that grow in arcs.

How do you expect this model to perform compared to the linear ones?

In [None]:
svc_rbf = SVC(kernel='rbf', gamma=0.7, C=1.0, random_state=42)
svc_rbf.fit(X2_train, y2_train)
y2_pred_rbf = svc_rbf.predict(X2_test)

print('Confusion Matrix (SVC RBF):\n', confusion_matrix(y2_test, y2_pred_rbf))
print('\nClassification Report (SVC RBF):\n', classification_report(y2_test, y2_pred_rbf))

### Question D – `SVC(kernel='poly', degree=3)`
Finally, let’s test a **polynomial kernel** with degree 3. This kernel can learn gentle curves — imagine fitting a flexible ribbon around the garden to separate flower types. Does it perform closer to RBF or linear?

In [None]:
svc_poly = SVC(kernel='poly', degree=3, C=1.0, random_state=42)
svc_poly.fit(X2_train, y2_train)
y2_pred_poly = svc_poly.predict(X2_test)

print('Confusion Matrix (SVC Polynomial):\n', confusion_matrix(y2_test, y2_pred_poly))
print('\nClassification Report (SVC Polynomial):\n', classification_report(y2_test, y2_pred_poly))

## Visualizing All Four Kernels
Let’s see how each kernel draws its decision boundary. Think of these as invisible fences separating the two flower species.

In [None]:
models = {
  'LinearSVC': linear_moons,
  'SVC (Linear)': svc_lin,
  'SVC (RBF)': svc_rbf,
  'SVC (Polynomial)': svc_poly
}

fig, axes = plt.subplots(2, 2, figsize=(12, 10))
axes = axes.ravel()

for ax, (name, mdl) in zip(axes, models.items()):
  DecisionBoundaryDisplay.from_estimator(
    mdl, X2_train, response_method='predict', cmap='coolwarm', alpha=0.8, ax=ax
  )
  ax.scatter(X2_train[:,0], X2_train[:,1], c=y2_train, cmap='coolwarm', edgecolor='k')
  ax.set_title(name)
  ax.set_xlabel('Petal Width'); ax.set_ylabel('Petal Length')

plt.suptitle('Decision Boundaries – Classifying Flower Patches', fontsize=14)
plt.tight_layout()
plt.show()

## Reflection Questions
1. Which kernel handled the curved pattern best? How can you tell from the plot?
2. If your garden had more irregular patterns, which kernel would you experiment with next?
3. Did the models with more complex kernels always perform better? Why or why not?

Take a moment to jot down your observations before we review together.