# Supervised Learning: More Classification

In [None]:
import numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom sklearn.model_selection import train_test_splitfrom sklearn.svm import SVC, LinearSVCfrom sklearn.metrics import classification_report, confusion_matrixfrom sklearn.datasets import make_classification, make_moonsfrom sklearn.inspection import DecisionBoundaryDisplaynp.random.seed(42)

## What We'll Do- Practice implementing classifier algorithms on two small, clean datasets.- Learn how to assemble a basic **LinearSVC** pipeline on a linearly-separable dataset (Dataset 1).- Compare decision boundaries from four SVM variants on a non-linear dataset (Dataset 2).### Roadmap1) **Dataset 1 (LinearSVC skeleton):** outline of the steps to train, predict, and evaluate.2) **Dataset 2 (Comparing kernels):** LinearSVC vs. SVC(linear) vs. SVC(rbf) vs. SVC(polynomial), plus side-by-side boundary plots and a reflection.

## LinearSVC Problem (Dataset 1)**Goal:** Practice the structure of a basic classification workflow with `LinearSVC` using a **roughly linearly-separable** dataset.**Why LinearSVC here?** When classes can be separated by a straight line (or close to it), a linear SVM is fast, simple, and often effective.**Evaluation:** We'll use a *confusion matrix* and a *classification report*.- **Precision**: Of the points predicted as class 1, what fraction were truly class 1?  	Precision = TP / (TP + FP)- **Recall**: Of all true class-1 points, what fraction did we correctly find?  	Recall = TP / (TP + FN)

### Dataset 1: LinearSVC — Skeleton (You Fill In)Use this outline to train and evaluate a LinearSVC. The code is commented to guide you.**Tips while you work:**- Skim the shapes with `X.shape` and `y.shape`.- After training, always check performance with both a confusion matrix and a classification report.- If training doesn't converge, raise `max_iter`.

In [None]:
# 1) MAKE A SIMPLE, MOSTLY LINEAR DATASETX, y = make_classification(    n_samples=300,    n_features=2,    n_informative=2,    n_redundant=0,    n_clusters_per_class=1,    class_sep=1.5,           # larger = more linearly separable    random_state=42)# 2) TRAIN/TEST SPLITX_train, X_test, y_train, y_test = train_test_split(    X, y, test_size=0.25, random_state=42)# 3) BUILD THE MODELmodel = LinearSVC(random_state=42, max_iter=5000)# TODO: FIT THE MODEL# model.fit( ... )# TODO: PREDICT ON TEST SET# y_pred = model.predict( ... )# 4) EVALUATE# print('Confusion Matrix (LinearSVC):')# print(confusion_matrix(y_test, y_pred))# print('\nClassification Report (LinearSVC):')# print(classification_report(y_test, y_pred))# 5) QUICK VISUAL CHECK (OPTIONAL)plt.figure(figsize=(5,4))plt.scatter(X_train[:,0], X_train[:,1], c=y_train, edgecolor='k')plt.title('Dataset 1 — Training Scatter (roughly linear)')plt.xlabel('Feature 1'); plt.ylabel('Feature 2')plt.show()

## Transition to Kernel Comparisons (Dataset 2)Now that you've seen how a linear model behaves on data that is roughly linearly separable, let's switch to a dataset where a **straight line struggles**. We'll compare four SVM variants and see how their decision boundaries differ.We'll use the **interlocking moons** dataset — a classic two-class pattern with a curved relationship.

## Comparing Kernels (Dataset 2)**What is a kernel?** A kernel function lets an SVM draw different kinds of boundaries:| Kernel | What it tends to learn | When to try it ||--|--|--|| Linear | Straight boundary | Data looks linearly separable or close || Polynomial | Curved boundary with adjustable complexity (degree) | Data has smooth, medium-complexity patterns || RBF | Very flexible, smooth non-linear curves | Data has non-linear shapes (circles, moons) |We'll compare:- `LinearSVC` (linear)- `SVC(kernel='linear')`- `SVC(kernel='rbf')`- `SVC(kernel='poly', degree=3)`For each, you'll get a question prompt and an answer code cell you can run while you share your screen.

In [None]:
# Build the non-linear dataset (interlocking moons)X2, y2 = make_moons(n_samples=300, noise=0.25, random_state=42)X2_train, X2_test, y2_train, y2_test = train_test_split(X2, y2, test_size=0.3, random_state=42)plt.figure(figsize=(6,4))plt.scatter(X2_train[:,0], X2_train[:,1], c=y2_train, cmap='coolwarm', edgecolor='k')plt.title('Dataset 2 — Training Data: Interlocking Moons')plt.xlabel('Feature 1'); plt.ylabel('Feature 2')plt.show()

### Question A — Train `LinearSVC` on the moons**Prompt:** Fit a `LinearSVC` and evaluate it. What do you expect a straight line to do on curved data?

In [None]:
linear_moons = LinearSVC(random_state=42, max_iter=5000)linear_moons.fit(X2_train, y2_train)y2_pred_linear = linear_moons.predict(X2_test)print('Confusion Matrix (LinearSVC on moons):\n', confusion_matrix(y2_test, y2_pred_linear))print('\nClassification Report (LinearSVC on moons):\n', classification_report(y2_test, y2_pred_linear))

### Question B — Train `SVC(kernel='linear')`**Prompt:** Fit a linear-kernel SVC and compare with `LinearSVC`. Do the confusion matrices and reports look similar or different? Why might small differences appear?

In [None]:
svc_lin = SVC(kernel='linear', C=1.0, random_state=42)svc_lin.fit(X2_train, y2_train)y2_pred_svc_lin = svc_lin.predict(X2_test)print('Confusion Matrix (SVC linear on moons):\n', confusion_matrix(y2_test, y2_pred_svc_lin))print('\nClassification Report (SVC linear on moons):\n', classification_report(y2_test, y2_pred_svc_lin))

### Question C — Train `SVC(kernel='rbf')`**Prompt:** Fit an RBF-kernel SVC. How does allowing curved boundaries change the performance on moon-shaped data? Consider the confusion matrix and recall whether one class is harder than the other.

In [None]:
svc_rbf = SVC(kernel='rbf', gamma=0.7, C=1.0, random_state=42)svc_rbf.fit(X2_train, y2_train)y2_pred_rbf = svc_rbf.predict(X2_test)print('Confusion Matrix (SVC RBF on moons):\n', confusion_matrix(y2_test, y2_pred_rbf))print('\nClassification Report (SVC RBF on moons):\n', classification_report(y2_test, y2_pred_rbf))

### Question D — Train `SVC(kernel='poly', degree=3)`**Prompt:** Fit a polynomial-kernel SVC. Does a degree-3 polynomial capture the curved pattern well? Look at both precision and recall for each class.

In [None]:
svc_poly = SVC(kernel='poly', degree=3, C=1.0, random_state=42)svc_poly.fit(X2_train, y2_train)y2_pred_poly = svc_poly.predict(X2_test)print('Confusion Matrix (SVC poly d=3 on moons):\n', confusion_matrix(y2_test, y2_pred_poly))print('\nClassification Report (SVC poly d=3 on moons):\n', classification_report(y2_test, y2_pred_poly))

## Visual Comparison — Four Decision BoundariesBelow we plot decision regions for all four models on the same training data. As you discuss, connect what you see in the plots to what you saw in the confusion matrices.Focus points while you present on screen:- Do linear models draw a straight split through the moons?- How does RBF bend around the shapes?- Where does polynomial help or over/under-fit?

In [None]:
models = {    'LinearSVC': linear_moons,    'SVC (Linear)': svc_lin,    'SVC (RBF)': svc_rbf,    'SVC (Poly d=3)': svc_poly}fig, axes = plt.subplots(2, 2, figsize=(12, 10))axes = axes.ravel()for ax, (name, mdl) in zip(axes, models.items()):    DecisionBoundaryDisplay.from_estimator(        mdl, X2_train, response_method='predict', cmap='coolwarm', alpha=0.8, ax=ax    )    ax.scatter(X2_train[:,0], X2_train[:,1], c=y2_train, cmap='coolwarm', edgecolor='k')    ax.set_title(name)    ax.set_xlabel('Feature 1'); ax.set_ylabel('Feature 2')plt.suptitle('Decision Boundaries on Moons — Linear vs. Non-Linear Kernels', fontsize=14)plt.tight_layout()plt.show()

## Final DiscussionWith your partner or table, discuss:1) Which model performed best by the confusion matrix and classification report? Did the plots support that?2) Where did the linear models make systematic mistakes? Why?3) Compare **precision** and **recall** across kernels. Did a model trade one for the other?4) If you had a new dataset with more complicated curves, which kernel would you try first and why?Be ready to share one insight with the class.