# PA1 - Perceptron

## Background: What is a perceptron?
- A perceptron is a simple classifier: it computes a weighted sum of the input
  features and predicts based on the sign (binary) or the largest score (multiclass).
- Perceptrons work best when the classes can be separated by a single straight
  boundary (a line in 2D, a flat plane in 3D, and so on). They are fast and easy
  to understand, but they cannot capture complex curved boundaries on their own.
- If the data are not separable by one straight boundary, the algorithm may keep
  updating without fully reaching 100% accuracy. XOR in 2D is a classic example:
  - Class A: (0, 0), (1, 1)
  - Class B: (0, 1), (1, 0)
  - No single line can separate A from B in 2D.

## Assignment rules
- Use only Python standard libraries for any training logic that you implement.
- Do **not** use machine learning libraries for training (e.g., no scikit-learn models, no PyTorch, no TensorFlow).
- Libraries to **load** data are allowed (e.g., `sklearn.datasets.load_digits`).
- The provided imports of scikit-learn in this notebook are only for data loading / utilities, not for implementing your perceptron.

References: [Perceptron](https://en.wikipedia.org/wiki/Perceptron), [Linear separability](https://en.wikipedia.org/wiki/Linear_separability)


In [None]:
# Install dependencies (run once and then comment out)
!pip install scikit-learn matplotlib


In [None]:
from __future__ import annotations

from dataclasses import dataclass
import random
from typing import List, Sequence, Tuple

from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split as sk_train_test_split
import matplotlib.pyplot as plt


## Question 0: Setup + Utils
High-level goal: build small helper functions so the rest of the assignment
focuses on perceptron logic rather than data plumbing. This allows you to
spend your time on the learning algorithm itself.


Parts (TODO):
- 0.1 Load the dataset (TODO). [5 points]
- 0.2 Normalize features (TODO). [5 points]

Provided utility (not graded):
- Train/test split wrapper (scikit-learn).

Tip: keep these helpers simple and readable; you will reuse them throughout
later questions.


### Part 0.1: Load the dataset. [5 points]
We are doing this to get a standard, small image dataset quickly so
we can focus on the perceptron implementation instead of file I/O.

What to implement (TODO):
- Import the loader at the top with the other imports (keeps dependencies centralized).
- Load the dataset object, extract `.data` and `.target`, and return them
  as plain Python lists (`List[List[float]]` and `List[int]`).


Dataset context (scikit-learn digits):
- 8x8 grayscale images, flattened to 64 features per example.
- Labels are digits 0â€“9.
- Pixel values are in the range [0, 16].
- Total samples: 1,797.

Docs: [digits dataset overview](https://scikit-learn.org/stable/datasets/toy_dataset.html#digits-dataset), [load_digits API](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html).
Example pattern (different dataset):
```python
from sklearn.datasets import load_iris

iris = load_iris()
X_example = iris.data.tolist()
y_example = iris.target.tolist()
```

Your task: do the same pattern with `load_digits()` inside `load_digits_data()`.

Reference: [scikit-learn load_digits](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html)
<img src="https://upload.wikimedia.org/wikipedia/commons/f/f7/MnistExamplesModified.png" alt="MNIST handwritten digit examples" width="420" style="display:block;margin:0 auto;" />

*Image: ["MnistExamplesModified.png"](https://commons.wikimedia.org/wiki/File:MnistExamplesModified.png) by Suvanjanprasai, [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/).* 


In [None]:
def load_digits_data() -> Tuple[List[List[float]], List[int]]:
    """
    Returns:
        X: list of 64-length feature lists (each is 8x8 flattened).
        y: list of integer labels in {0..9}.
    """
    # TODO: Import load_digits with the other imports at the top.
    # TODO: Load the dataset, extract features and labels, and convert them
    # to plain Python lists before returning.
    raise NotImplementedError


In [None]:
# Run Part 0.1 (load digits)
X_demo, y_demo = load_digits_data()
assert len(X_demo) == len(y_demo) and len(X_demo) > 0
assert len(X_demo[0]) == 64
assert min(y_demo) == 0 and max(y_demo) == 9
print("OK: loaded", len(X_demo), "samples")


### Train/test split (given, not graded).
We are doing this to evaluate generalization, not just training accuracy,
which is important because a model can memorize the training data.

Conceptually, a train/test split should:
- Use a random seed for reproducibility.
- Shuffle data in unison so X and y stay aligned.
- Split according to a test ratio (e.g., 0.2 for 80/20).

Context: X is a list of input rows (a list of lists) and y is the list of labels.
For example, if our dataset has two instances (rows) of whether you bring an umbrella
depending on the weather conditions, X would have n rows for each day recorded with
m columns for each weather condition noted (sunny/cloudy, hot/cold), and y would have
n entries for the outcome of each day.

Dataset:   
| sunny/cloudy  | hot/cold  | umbrella | 
|---------------|-----------|-------------| 
| sunny         | hot       | not brought | 
| cloudy        | cold      | brought | 

X:   
| sunny/cloudy | hot/cold | 
|--------|---------|  
| sunny | hot |  
| cloudy | cold |   

y:   
| umbrella |   
|----------|  
| not brought |  
| brought |  
    
Reference: [train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)
<img src="https://upload.wikimedia.org/wikipedia/commons/8/88/Machine_learning_nutshell_--_Split_into_train-test_set.svg" alt="Train/test split diagram" width="420" style="display:block;margin:0 auto;" />

*Image: ["Machine learning nutshell -- Split into train-test set.svg"](https://commons.wikimedia.org/wiki/File:Machine_learning_nutshell_--_Split_into_train-test_set.svg) by EpochFail, [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/).* 


In [None]:
def train_test_split(
    X: Sequence[Sequence[float]],
    y: Sequence[int],
    test_ratio: float = 0.2,
    seed: int = 0,
    shuffle: bool = True,
) -> Tuple[List[List[float]], List[List[float]], List[int], List[int]]:
    """
    Returns:
        X_train, X_test, y_train, y_test
    """
    stratify_y = list(y) if shuffle else None
    X_train, X_test, y_train, y_test = sk_train_test_split(
        list(X),
        list(y),
        test_size=test_ratio,
        random_state=seed,
        shuffle=shuffle,
        stratify=stratify_y,
    )
    return (
        [list(row) for row in X_train],
        [list(row) for row in X_test],
        [int(v) for v in y_train],
        [int(v) for v in y_test],
    )


In [None]:
# Example (not graded)
X_tts_example = [[i] for i in range(10)]
y_tts_example = list(range(10))
X_train, X_test, y_train, y_test = train_test_split(
    X_tts_example, y_tts_example, test_ratio=0.2, seed=0, shuffle=False
)
print("X_train:", X_train)
print("X_test:", X_test)
print("y_train:", y_train)
print("y_test:", y_test)


### Part 0.2: Normalize features. [5 points]
We are doing this to keep feature magnitudes in a reasonable range.
This helps the learning rule behave more consistently.

What to implement (TODO):
- Return a new list of lists with each feature divided by `scale`.
- Do not mutate the input list.

Reference: [Feature scaling / normalization](https://en.wikipedia.org/wiki/Feature_scaling)
<img src="https://upload.wikimedia.org/wikipedia/commons/7/77/The_effect_of_z-score_normalization_on_k-means_clustering.svg" alt="Effect of z-score normalization" width="420" style="display:block;margin:0 auto;" />

*Image: ["The effect of z-score normalization on k-means clustering.svg"](https://commons.wikimedia.org/wiki/File:The_effect_of_z-score_normalization_on_k-means_clustering.svg) by Cosmia Nebula, [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/).* 


In [None]:
def normalize_features(
    X: Sequence[Sequence[float]],
    scale: float = 16.0,
) -> List[List[float]]:
    """
    Normalizes each feature by dividing by `scale`.
    For digits, pixel values are in [0, 16].
    """
    # TODO: Return a new list of lists with each feature scaled by `scale`.
    raise NotImplementedError


In [None]:
# Run Part 0.2 (normalize)
X_preview = [row[:] for row in X_demo[:3]]
X_norm = normalize_features(X_demo)
assert X_demo[:3] == X_preview  # input should not be mutated
assert 0.0 <= min(X_norm[0]) and max(X_norm[0]) <= 1.0
print("OK: normalized; first row in [0, 1]")


## Question 1: Binary Perceptron
High-level goal: build a perceptron that separates two digits. This gives you
the core learning rule in a simple setting and lets you see how a straight
decision boundary behaves before going multiclass.

Intuition: a perceptron scores an input by multiplying each input by its
corresponding weight, adding the results, and then adding a bias.
If the data can be separated by one straight boundary, repeated updates
push that boundary to correctly classify all training points.

Note: The autograder expects small classes below, but you do not need to
understand classes to finish this assignment. Implement the helper functions
(`binary_init`, `binary_score`, `binary_predict`, `binary_update`, `binary_fit`);
the class methods call them for you.

TODOs (TODO):
- TODO (1.1) Implement `binary_init`: weights length = num_features, bias = 0.0. [5 points]
- TODO (1.2) Implement `binary_score`: weighted sum + bias. [5 points]
- TODO (1.3) Implement `binary_predict`: score >= 0 -> +1, else -1. [5 points]
- TODO (1.4) Implement `binary_update`: adjust weights and bias on a mistake. [10 points]
- TODO (1.5) Implement `binary_fit`: loop epochs, shuffle if requested, call update. [10 points]

Reference: [Perceptron](https://en.wikipedia.org/wiki/Perceptron)
<img src="https://upload.wikimedia.org/wikipedia/commons/3/31/Perceptron.svg" alt="Perceptron diagram" width="420" style="display:block;margin:0 auto;" />

*Image: ["Perceptron.svg"](https://commons.wikimedia.org/wiki/File:Perceptron.svg) by Matthew, [CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/).* 


In [None]:
# Helper functions (function-based approach).

def binary_init(num_features: int) -> Tuple[List[float], float]:
    """
    Returns:
        weights: list of length num_features (all zeros)
        bias: 0.0
    """
    # TODO (1.1): Initialize weights to all zeros and bias to 0.0.
    raise NotImplementedError


def binary_score(weights: Sequence[float], bias: float, x: Sequence[float]) -> float:
    """
    Returns:
        score: weighted sum of inputs plus bias.
    """
    # TODO (1.2): Compute the weighted sum:
    #   multiply each input by its corresponding weight,
    #   add the results, then add the bias.
    #   (This maps directly to a for loop.)
    raise NotImplementedError


def binary_predict(weights: Sequence[float], bias: float, x: Sequence[float]) -> int:
    """
    Returns:
        +1 if score >= 0 else -1
    """
    # TODO (1.3): Use the sign of the score to map to +1 or -1.
    raise NotImplementedError


def binary_update(
    weights: List[float],
    bias: float,
    x: Sequence[float],
    y: int,
) -> Tuple[List[float], float]:
    """
    Returns:
        updated weights and bias after one example.
    """
    # TODO (1.4): If the example is misclassified, update each weight and bias:
    #   w_i = w_i + y * x_i
    #   bias = bias + y
    raise NotImplementedError


def binary_fit(
    weights: List[float],
    bias: float,
    X: Sequence[Sequence[float]],
    y: Sequence[int],
    epochs: int = 10,
    seed: int = 0,
    shuffle: bool = True,
) -> Tuple[List[float], float]:
    """
    Returns:
        weights and bias after training.
    """
    # TODO (1.5) (pseudocode):
    #   make an index list [0..len(X)-1]
    #   if shuffle: set random.seed(seed) for reproducibility
    #   repeat for each epoch:
    #       if shuffle: random.shuffle(indices)
    #       for idx in indices: update weights/bias on (X[idx], y[idx])
    raise NotImplementedError


### Python Classes
The `BinaryPerceptron` class below uses concepts from object oriented programming. The class has variables for storing a perceptron's number of features/weights, weights, and bias (y-intercept). The methods `create`, `score`, `predict`, `update`, and `fit` are needed to edit the class's internal variables.

Reference: [Python OOP](https://pythonnumericalmethods.studentorg.berkeley.edu/notebooks/chapter07.01-Introduction-to-OOP.html)

In [None]:
# The class below is a thin wrapper used by the autograder.
@dataclass
class BinaryPerceptron:
    """
    Binary perceptron for labels in {+1, -1}.
    """

    num_features: int
    weights: List[float]
    bias: float

    @classmethod
    def create(cls, num_features: int) -> "BinaryPerceptron":
        weights, bias = binary_init(num_features)
        return cls(num_features=num_features, weights=weights, bias=bias)

    def score(self, x: Sequence[float]) -> float:
        return binary_score(self.weights, self.bias, x)

    def predict(self, x: Sequence[float]) -> int:
        return binary_predict(self.weights, self.bias, x)

    def update(self, x: Sequence[float], y: int) -> None:
        self.weights, self.bias = binary_update(self.weights, self.bias, x, y)

    def fit(
        self,
        X: Sequence[Sequence[float]],
        y: Sequence[int],
        epochs: int = 10,
        seed: int = 0,
        shuffle: bool = True,
    ) -> None:
        self.weights, self.bias = binary_fit(
            self.weights,
            self.bias,
            X,
            y,
            epochs=epochs,
            seed=seed,
            shuffle=shuffle,
        )

In [None]:
# Run Question 1 (binary perceptron sanity check)
X_or = [[0, 0], [0, 1], [1, 0], [1, 1]]
y_or = [-1, 1, 1, 1]
model_or = BinaryPerceptron.create(num_features=2)
model_or.fit(X_or, y_or, epochs=10, shuffle=False)
preds_or = [model_or.predict(x) for x in X_or]
assert preds_or == y_or
print("OK: OR learned", preds_or)


## Question 2: Binary Classification Task
High-level goal: use the binary perceptron to separate two digits
(e.g., 0 vs 1). This exercises the full pipeline end-to-end and
gives you a concrete, visual task to evaluate.

Parts (TODO):
- 2.1 Filter the dataset to two digits and relabel (TODO). [8 points]
- 2.2 Evaluate binary accuracy (TODO). [6 points]
- 2.3a Build the XOR dataset (TODO). [3 points]
- 2.3b Map XOR features (TODO). [3 points]

Tip: pick digit pairs that are visually distinct first (e.g., 0 vs 1) and then
try harder pairs (e.g., 3 vs 5) to see how a straight boundary affects accuracy.


### Part 2.1: Filter the dataset to two digits and relabel. [8 points]
We are doing this so the binary perceptron sees labels in {+1, -1}.

What to implement (TODO):
- Keep only examples where the label is either `positive_digit` or `negative_digit`.
- Map `positive_digit` -> +1 and `negative_digit` -> -1.
- Preserve the original order of examples.

Why this matters: the update rule assumes a symmetric label set {+1, -1},
so relabeling makes the math and the code consistent.
<img src="https://upload.wikimedia.org/wikipedia/commons/a/a2/Linearly_separable.JPG" alt="Straight-line separable example" width="420" style="display:block;margin:0 auto;" />

*Image: example from Wikimedia Commons (public domain).* 


In [None]:
def make_binary_dataset(
    X: Sequence[Sequence[float]],
    y: Sequence[int],
    positive_digit: int,
    negative_digit: int,
) -> Tuple[List[List[float]], List[int]]:
    """
    Returns:
        X_bin, y_bin where y_bin is +1 for positive_digit and -1 for negative_digit
    """
    # TODO: Filter to the two digits and map them to +1/-1 labels.
    # Return the new feature and label lists.
    # if the digit is positive_digit, label +1
    # if the digit is negative_digit, label -1
    raise NotImplementedError


In [None]:
# Run Part 2.1 (make binary dataset)
X_bin, y_bin = make_binary_dataset(X_norm, y_demo, positive_digit=0, negative_digit=1)
expected = sum(1 for v in y_demo if v in (0, 1))
assert len(X_bin) == expected and len(y_bin) == expected
assert set(y_bin).issubset({1, -1})
print("OK: binary dataset size", len(X_bin))


### Part 2.2: Evaluate binary accuracy. [6 points]
Binary accuracy is the proportion of predictions that are correct, when there are only two possible classes.

What to implement (TODO):
- Use `model.predict` for each example.
- Return correct / total as a float.
- If X is empty, return 0.0.

Reference: [accuracy_score definition](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html)


In [None]:
def binary_accuracy(
    model: BinaryPerceptron,
    X: Sequence[Sequence[float]],
    y: Sequence[int],
) -> float:
    # TODO: Compute accuracy as correct predictions divided by total.
    raise NotImplementedError


In [None]:
# Run Part 2.2 (binary accuracy)
acc_or = binary_accuracy(model_or, X_or, y_or)
assert acc_or == 1.0
print("OK: OR accuracy", acc_or)


### Part 2.3: XOR feature mapping (2D -> 3D). [6 points]
We are doing this to show how adding a new feature can make the XOR
points separable by a single straight boundary. XOR is not separable
by a line in 2D, but it becomes separable after adding the feature x1 * x2.

What to implement (TODO):
- Part 2.3a: Create the 4 XOR points and labels (+1 for XOR-true, -1 otherwise). [3 points]
- Part 2.3b: Map each [x1, x2] -> [x1, x2, x1 * x2]. [3 points]

<img src="https://upload.wikimedia.org/wikipedia/commons/7/7f/XOR_truth_table.svg" alt="XOR truth table" width="420" style="display:block;margin:0 auto;" />

*Image: ["XOR truth table.svg"](https://commons.wikimedia.org/wiki/File:XOR_truth_table.svg) by Lionel Allorge, [CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/).* 


In [None]:
def xor_dataset() -> Tuple[List[List[float]], List[int]]:
    """
    Returns:
        X: 2D XOR inputs
        y: labels in {+1, -1} where +1 is XOR-true
    """
    # TODO: Return the four XOR points with labels in {+1, -1}.
    # +1 for (0,1) and (1,0); -1 for (0,0) and (1,1).

    raise NotImplementedError


def map_xor_features(X: Sequence[Sequence[float]]) -> List[List[float]]:
    """
    Maps 2D inputs [x1, x2] -> [x1, x2, x1*x2].
    """
    # TODO: Return a new list with the mapped features.
    # For each input [x1, x2], create [x1, x2, x1*x2].
    # return the new list of lists.
    raise NotImplementedError


In [None]:
# Run Part 2.3 (XOR feature mapping)
X_xor, y_xor = xor_dataset()
mapped = map_xor_features(X_xor)
assert len(mapped) == 4 and all(len(row) == 3 for row in mapped)
for (x1, x2), m in zip(X_xor, mapped):
    assert m[2] == x1 * x2
assert sorted(y_xor) == [-1, -1, 1, 1]
print("OK: XOR mapping")


## Question 3: Multiclass Perceptron
High-level goal: extend the perceptron to handle all 10 digits.
This allows us to make a single model that predicts 0-9 directly by
scoring each class and choosing the highest. Note that digit images are
not perfectly separable by a single boundary in 64D, so we should expect good but
not perfect accuracy.

Intuition: instead of one weight list, you maintain one list per class.
Given an input, compute 10 scores and predict the class with the
largest score (a "winner-take-all" rule).

Note: As in Question 1, you can ignore the class details. Implement the
helper functions (`multiclass_init`, `multiclass_scores`, `multiclass_predict`,
`multiclass_update`, `multiclass_fit`); the class methods call them for you.

TODOs (TODO):
- TODO (3.1) Implement `multiclass_init`: one weight list per class, all zeros. [5 points]
- TODO (3.2) Implement `multiclass_scores`: one score per class. [5 points]
- TODO (3.3) Implement `multiclass_predict`: return class with highest score. [5 points]
- TODO (3.4) Implement `multiclass_update`: push true class up, predicted class down. [5 points]
- TODO (3.5) Implement `multiclass_fit`: loop epochs, shuffle if requested, call update. [5 points]

Reference: [Multiclass classification overview](https://en.wikipedia.org/wiki/Multiclass_classification)
<img src="https://upload.wikimedia.org/wikipedia/commons/7/71/Multiclass_classification.png" alt="Multiclass classification example" width="420" style="display:block;margin:0 auto;" />

*Image: ["Multiclass classification.png"](https://commons.wikimedia.org/wiki/File:Multiclass_classification.png) by SoroushJahanzad, [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/).* 


In [None]:
# Helper functions (function-based approach).

def multiclass_init(
    num_features: int,
    num_classes: int,
) -> Tuple[List[List[float]], List[float]]:
    """
    Returns:
        weights: list of weight lists (one per class)
        bias: list of class biases
    """
    # TODO (3.1): Initialize all weights to zeros and all biases to 0.0.
    #   weights should be a list of length num_classes, each a list of length num_features.
    #   bias should be a list of length num_classes.
    raise NotImplementedError


def multiclass_scores(weights: Sequence[Sequence[float]], bias: Sequence[float], x: Sequence[float]) -> List[float]:
    """
    Returns:
        list of scores, one per class.
    """
    # TODO (3.2): For each class, compute a weighted sum of inputs plus its bias.
    #   (Multiply each input by its corresponding weight, add them up, then add bias.)
    raise NotImplementedError


def multiclass_predict(weights: Sequence[Sequence[float]], bias: Sequence[float], x: Sequence[float]) -> int:
    """
    Returns:
        index of the class with the largest score.
    """
    # TODO (3.3): Return the class index with the largest score.
    raise NotImplementedError


def multiclass_update(
    weights: List[List[float]],
    bias: List[float],
    x: Sequence[float],
    y_true: int,
) -> Tuple[List[List[float]], List[float]]:
    """
    Returns:
        updated weights and bias after one example.
    """
    # TODO (3.4):
    #   y_pred = multiclass_predict(...)
    #   if y_pred is not equal to y_true:
    #       for each feature i:
    #           add x[i] to true class weights
    #           subtract x[i] from predicted class weights
    #       increase true class bias, decrease predicted class bias
    raise NotImplementedError


def multiclass_fit(
    weights: List[List[float]],
    bias: List[float],
    X: Sequence[Sequence[float]],
    y: Sequence[int],
    epochs: int = 10,
    seed: int = 0,
    shuffle: bool = True,
) -> Tuple[List[List[float]], List[float]]:
    """
    Returns:
        weights and bias after training.
    """
    # TODO (3.5):
    #   make an index list [0..len(X)-1]
    #   if shuffle: set random.seed(seed) for reproducibility
    #   repeat for each epoch:
    #       if shuffle: random.shuffle(indices)
    #       update on (X[idx], y[idx]) for each idx
    raise NotImplementedError


# The class below is a thin wrapper we will use for grading.
@dataclass
class MulticlassPerceptron:
    """
    Multiclass perceptron using one weight list per class.
    """

    num_features: int
    num_classes: int
    weights: List[List[float]]
    bias: List[float]

    @classmethod
    def create(cls, num_features: int, num_classes: int) -> "MulticlassPerceptron":
        weights, bias = multiclass_init(num_features, num_classes)
        return cls(num_features=num_features, num_classes=num_classes, weights=weights, bias=bias)

    def scores(self, x: Sequence[float]) -> List[float]:
        return multiclass_scores(self.weights, self.bias, x)

    def predict(self, x: Sequence[float]) -> int:
        return multiclass_predict(self.weights, self.bias, x)

    def update(self, x: Sequence[float], y_true: int) -> None:
        self.weights, self.bias = multiclass_update(self.weights, self.bias, x, y_true)

    def fit(
        self,
        X: Sequence[Sequence[float]],
        y: Sequence[int],
        epochs: int = 10,
        seed: int = 0,
        shuffle: bool = True,
    ) -> None:
        self.weights, self.bias = multiclass_fit(
            self.weights,
            self.bias,
            X,
            y,
            epochs=epochs,
            seed=seed,
            shuffle=shuffle,
        )


In [None]:
# Run Question 3 (multiclass perceptron quick check)
X_mc = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
y_mc = [0, 1, 2]
mc = MulticlassPerceptron.create(num_features=3, num_classes=3)
mc.fit(X_mc, y_mc, epochs=3, shuffle=False)
preds_mc = [mc.predict(x) for x in X_mc]
assert preds_mc == y_mc
print("OK: multiclass one-hot fit", preds_mc)


## Question 4: Multiclass Classification Task
High-level goal: train and evaluate the multiclass perceptron on all digits.
This shows how well a simple model can do on a real dataset.

Parts (TODO):
- 4.1 Evaluate multiclass accuracy (TODO). [2 points]
- 4.2 Track mistakes per epoch for a binary perceptron (TODO). [3 points]
- 4.3 Plot a learning curve (provided, not graded).

Tip: accuracy summarizes overall performance, while the learning curve tells
you how fast the model improves over training epochs.

References: [accuracy_score definition](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html), [Learning curve](https://en.wikipedia.org/wiki/Learning_curve_%28machine_learning%29)
<img src="https://upload.wikimedia.org/wikipedia/commons/1/11/Confusion_Matrix_Metrics.png" alt="Confusion matrix metrics" width="420" style="display:block;margin:0 auto;" />

*Image: ["Confusion Matrix Metrics.png"](https://commons.wikimedia.org/wiki/File:Confusion_Matrix_Metrics.png) by Hssiqueira, [CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/).* 


In [None]:
# Part 4.1: Evaluate multiclass accuracy. [2 points]
def multiclass_accuracy(
    model: MulticlassPerceptron,
    X: Sequence[Sequence[float]],
    y: Sequence[int],
) -> float:
    # TODO: Compute accuracy as correct predictions divided by total.
    # If X is empty, return 0.0.
    raise NotImplementedError


In [None]:
# Run Part 4.1 (multiclass accuracy)
Xm_train, Xm_test, ym_train, ym_test = train_test_split(X_norm, y_demo)
mc_digits = MulticlassPerceptron.create(num_features=len(Xm_train[0]), num_classes=10)
mc_digits.fit(Xm_train, ym_train, epochs=3)
acc_mc = multiclass_accuracy(mc_digits, Xm_test, ym_test)
assert 0.0 <= acc_mc <= 1.0
print("Multiclass acc (quick):", acc_mc)


### Part 4.2: Track mistakes per epoch (binary). [3 points]
If the model predicts a label incorrectly, we can track these mistakes in order to evaluate the model so that we know where it can be improved.
We are doing this to visualize training dynamics and compare digit pairs.

What to implement (TODO):
- For each epoch, count how many times predict(x) != y.
- Return a list of mistake counts, one per epoch.

Reference: [Learning curve](https://en.wikipedia.org/wiki/Learning_curve_%28machine_learning%29)
<img src="https://upload.wikimedia.org/wikipedia/commons/2/24/Learning_Curves_%28Naive_Bayes%29.png" alt="Learning curve example" width="420" style="display:block;margin:0 auto;" />

*Image: ["Learning Curves (Naive Bayes).png"](https://commons.wikimedia.org/wiki/File:Learning_Curves_(Naive_Bayes).png) by Justin Ormont, [BSD 3-Clause](https://opensource.org/licenses/BSD-3-Clause).* 


In [None]:
def train_binary_with_mistakes(
    model: BinaryPerceptron,
    X: Sequence[Sequence[float]],
    y: Sequence[int],
    epochs: int = 10,
    seed: int = 0,
    shuffle: bool = True,
) -> List[int]:
    """
    Returns:
        mistakes_per_epoch: list of mistake counts per epoch.
    """
    # TODO: Train the model for `epochs`, counting misclassifications each epoch.
    #   make an index list [0..len(X)-1]
    #   if shuffle: set random.seed(seed) for reproducibility
    #   for each epoch:
    #       if shuffle: random.shuffle(indices)
    #       for each index
    #           if predict(X[idx]) is not equal to y[idx], increment the number of mistakes 
    #           update(X[idx], y[idx])
    #       append mistakes to list
    
    raise NotImplementedError


# Part 4.3: Plot a learning curve. [not graded]
def plot_learning_curve(mistakes_per_epoch: Sequence[int], title: str) -> None:
    """
    Plots mistakes per epoch using matplotlib.
    """
    import matplotlib.pyplot as plt

    epochs = list(range(1, len(mistakes_per_epoch) + 1))
    plt.figure(figsize=(6, 4))
    plt.plot(epochs, mistakes_per_epoch, marker="o")
    plt.xlabel("Epoch")
    plt.ylabel("Mistakes")
    plt.title(title)
    plt.grid(True, linestyle="--", alpha=0.4)
    plt.tight_layout()
    plt.show()


In [None]:
# Run Part 4.2 (mistakes per epoch)
model_bin = BinaryPerceptron.create(num_features=len(X_bin[0]))
mistakes = train_binary_with_mistakes(model_bin, X_bin, y_bin, epochs=5)
assert len(mistakes) == 5 and all(m >= 0 for m in mistakes)
print("Mistakes per epoch:", mistakes)


In [None]:
# Run Part 4.3 (plot learning curve)
# Run Part 4.2 first to create `mistakes`.
plot_learning_curve(mistakes, title="Perceptron Mistakes per Epoch")


## Question 5: Main Experiment [5 points]
High-level goal: wire everything together so you can run the assignment
end-to-end from a single notebook.

Parts (not individually graded; this section is an integration check):
- 5.1 Load data.
- 5.2 Normalize features.
- 5.3 Create a binary dataset (e.g., 0 vs 1).
- 5.4 Train and evaluate the binary perceptron.
- 5.5 Train and evaluate the multiclass perceptron.
- 5.6 Track mistakes and plot a learning curve for a digit pair.

Suggested workflow:
1. Start with default settings and verify you can run end-to-end.
2. Try a few digit pairs and compare accuracies.
3. Inspect learning curves to see whether training plateaus or keeps improving.


<img src="https://upload.wikimedia.org/wikipedia/commons/a/a4/Machine_learning_workflow_diagram.png" alt="Machine learning workflow diagram" width="420" style="display:block;margin:0 auto;" />

*Image: ["Machine learning workflow diagram.png"](https://commons.wikimedia.org/wiki/File:Machine_learning_workflow_diagram.png) by Brylie Christopher Oxley, [CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/).* 


In [None]:
def main() -> None:
    # Part 5.1: Load data.
    X, y = load_digits_data()

    # Part 5.2: Normalize features (optional but recommended).
    X = normalize_features(X)

    # Part 5.3: Create a binary dataset (e.g., 0 vs 1).
    X_bin, y_bin = make_binary_dataset(X, y, positive_digit=0, negative_digit=1)
    Xb_train, Xb_test, yb_train, yb_test = train_test_split(X_bin, y_bin)

    # Part 5.4: Train + evaluate binary perceptron.
    if not Xb_train:
        print("Binary dataset is empty; skipping binary perceptron.")
    else:
        bin_model = BinaryPerceptron.create(num_features=len(Xb_train[0]))
        bin_model.fit(Xb_train, yb_train, epochs=10)
        bin_acc = binary_accuracy(bin_model, Xb_test, yb_test)
        print("Binary accuracy:", bin_acc)

    # Part 5.5: Train + evaluate multiclass perceptron.
    Xm_train, Xm_test, ym_train, ym_test = train_test_split(X, y)
    if not Xm_train:
        print("Multiclass dataset is empty; skipping multiclass perceptron.")
    else:
        multi_model = MulticlassPerceptron.create(
            num_features=len(Xm_train[0]), num_classes=10
        )
        multi_model.fit(Xm_train, ym_train, epochs=10)
        multi_acc = multiclass_accuracy(multi_model, Xm_test, ym_test)
        print("Multiclass accuracy:", multi_acc)

    # Part 5.6: Track mistakes and plot a learning curve for a digit pair.
    # Example pair: 3 vs 5 (often harder than 0 vs 1).
    X_bin_35, y_bin_35 = make_binary_dataset(X, y, positive_digit=3, negative_digit=5)
    if not X_bin_35:
        print("Digit pair 3 vs 5 not present; skipping learning curve.")
        return
    X35_train, X35_test, y35_train, y35_test = train_test_split(X_bin_35, y_bin_35)
    if not X35_train:
        print("Not enough data for learning curve; skipping.")
        return
    model_35 = BinaryPerceptron.create(num_features=len(X35_train[0]))
    mistakes = train_binary_with_mistakes(model_35, X35_train, y35_train, epochs=15)
    # Uncomment to display the plot.
    # plot_learning_curve(mistakes, title="Perceptron Mistakes per Epoch (3 vs 5)")

# Uncomment to run:
# main()
