# CSE 25 – Introduction to Artificial Intelligence
## Worksheet 7: From Fitting to Predicting

**Context (from last class):**  
Last time, we fit a line to data by choosing parameters that reduced error.
Our focus was on *fitting* the model to the data we were given.

In this worksheet, we take the next step: using a *trained model* to make predictions and evaluate how well it *generalizes*.

We will then extend these ideas to a new kind of task: **classification**.


**Today’s guiding questions:**

- What can we do *after* we fit a model?
- How do we know if a model works well on new data?
- How can the same linear structure be used to make decisions between classes?

**Learning Objectives:**

By the end of today’s class, you will be able to:

- Use a fitted linear model to make predictions on new, unseen data  
- Explain why we separate data into **training** and **testing** sets  
- Distinguish between **regression** and **classification** problems  
- Explain how a perceptron uses a weighted sum and bias to make decisions  
- Describe how the perceptron updates its parameters when it makes a mistake  


**Instructions:**

Create a copy of this notebook and complete it during class.  
Work through the cells below **in order**.

You may discuss with your neighbors, but make sure you understand
what each step is doing and why.


**Submission**

When finished, download the notebook as a PDF and upload it to Gradescope under  
`In-Class – Week 4 Thursday`.

To download as a PDF on DataHub:  
`File -> Save and Export Notebook As -> PDF`


### From last time

In [None]:
def get_predictions_v2(input_values, w, b):
    '''
    input_values: list of input values
    w: weight (slope)
    b: bias (intercept)

    Complete the function that calculates the predicted output values using the weight and bias. 
    Return a list of predicted values.
    '''
    # Initialize an empty list to store predicted values
    predicted_values = []

    # Calculate predicted values using the formula: predicted_value = input_value * w + b
    # Append each predicted value to the predicted_values list
    for input_value in input_values:
        predicted_values.append(input_value * w + b)
    # Return the list of predicted values
    return predicted_values


def get_mean_squared_error(actual_values, predicted_values):
    '''
    actual_values: list of actual output values
    predicted_values: list of predicted output values

    '''
    squared_error_list = []
   
    # Get pointwise squared errors
    for idx, actual_value in enumerate(actual_values):
        squared_error = (actual_value - predicted_values[idx])**2
        squared_error_list.append((squared_error))
    
    # Calculate total error
    total_error = sum(squared_error_list)
    
    # Calculate MSE by dividing total error by number of data points
    mse = total_error / len(actual_values)
    
    # Return the MSE
    return mse

def get_gradients_w_b(error_function, input_values, actual_values, w, b, epsilon=1e-6):
    """
    Approximates the partial derivatives of the error with respect to w and b using finite differences.
    Returns: (dE/dw, dE/db)
    """
    # Get predictions at (w, b)
    predicted = get_predictions_v2(input_values, w, b)
    # Get error at (w, b)
    error = error_function(actual_values, predicted)

    # Get predictions at (w + epsilon, b)
    predicted_w = get_predictions_v2(input_values, w + epsilon, b)
    # Get error at (w + epsilon, b)
    error_w = error_function(actual_values, predicted_w)
    # Partial derivative w.r.t. w
    grad_w = (error_w - error) / epsilon
    
    ##  Partial derivative w.r.t. b
    # Get predictions at (w, b + epsilon)
    predicted_b = get_predictions_v2(input_values, w, b + epsilon)
    # Get error at (w, b + epsilon)
    error_b = error_function(actual_values, predicted_b)
    # Partial derivative w.r.t. b
    grad_b = (error_b - error) / epsilon
    
    return grad_w, grad_b


In [None]:
# Data for Fahrenheit and Celsius
fahrenheit = [32, 50, 68, 86, 104] # x-axis values (INPUT DATA)
celsius = [0, 10, 20, 30, 40] # y-axis values (OUTPUT DATA)

In [None]:
# Gradient Descent for Fahrenheit to Celsius conversion using both w and b

# Initialize parameters
w_fc = 1
b_fc = -60

# Set the learning rate
learning_rate_fc = 0.0001

# Set the number of gradient descent steps
num_steps_fc = 250000

grad_weights_fc = []
grad_biases_fc = []
grad_errors_fc = []

for step in range(num_steps_fc):
    predicted_fc = get_predictions_v2(fahrenheit, w_fc, b_fc)
    error_fc = get_mean_squared_error(celsius, predicted_fc)
    grad_weights_fc.append(w_fc)
    grad_biases_fc.append(b_fc)
    grad_errors_fc.append(error_fc)
    grad_w_fc, grad_b_fc = get_gradients_w_b(get_mean_squared_error, fahrenheit, celsius, w_fc, b_fc)
    w_fc = w_fc - learning_rate_fc * grad_w_fc
    b_fc = b_fc - learning_rate_fc * grad_b_fc
    if step % 25000 == 0:
        print(f"Step {step}: w = {w_fc:.4f}, b = {b_fc:.4f}, error = {error_fc:.4f}")

print(f"Best weight (w) found: {w_fc:.4f}")
print(f"Best bias (b) found: {b_fc:.4f}")
print(f"Mean squared error at best parameters: {grad_errors_fc[-1]:.4f}")

Okay, now we have the updated parameters - What do we do with it? 

In [None]:
# Plot the data points and the fitted line
import matplotlib.pyplot as plt

# Original data
plt.scatter(fahrenheit, celsius, label='Data points')

x_line = [min(fahrenheit), max(fahrenheit)]
y_line = [w_fc * x + b_fc for x in x_line]
plt.plot(x_line, y_line, color='red', linestyle='--',  label='Fitted line')

plt.xlabel('Fahrenheit')
plt.ylabel('Celsius')
plt.title('Fahrenheit to Celsius Fit')
plt.grid()
plt.legend()
plt.show()

### What do we do with the model?

Now that we’ve fit a line, we can use it to make predictions for new, unseen data.

For example, if you have a temperature in Fahrenheit, you can use the model to estimate the temperature in Celsius.

In [None]:
# Predict Celsius for new Fahrenheit values using the learned model
new_fahrenheit = [40, 77, 100]
# Get predictions for the new Fahrenheit values
predicted_celsius = get_predictions_v2(new_fahrenheit, w_fc, b_fc)

for f, c in zip(new_fahrenheit, predicted_celsius):
    print(f"Fahrenheit: {f} -> Predicted Celsius: {c:.2f}")

### Train/Test Split

But, how do we know if our model is correct?

It's not enough to just fit a line and make predictions - we want to know if our model is actually making good predictions on unseen data.

In practice, we split our data into two parts:

- **Training data:** Used to fit (train) the model.
- **Testing data:** Used to evaluate how well the model predicts on new, unseen data.

This way, we can check if our model *generalizes* well, not just *memorizes* the training points.

In [None]:
# Let's take a bigger dataset for demonstration

fahrenheit_all = [32, 40, 50, 59, 68, 77, 86, 95, 104, 113] # Input Values
celsius_all = [0, 4, 10, 15, 20, 25, 30, 35, 40, 45] # Target/Output Values

To evaluate how well our model generalizes to new data, we use a **train/test split**:

1. **Randomize the data:**
    We shuffle the dataset so that the order does not affect which points end up in the training or testing sets. This helps ensure that both sets are representative of the overall data.

2. **Split into training and testing sets:**  
    After shuffling, we divide the data into two parts:
    - The **training set** is used to fit (train) the model.  
    - The **testing set** is used to evaluate the model's performance on unseen data.

3. **Apply the split to both inputs and outputs:**  
    We split both the input values (e.g., `fahrenheit_all`) and the corresponding output values (e.g., `celsius_all`) in the same way, so each input still matches its correct output.

This process helps us measure how well our model is likely to perform on new, real-world data, not just the data it was trained on.

In [None]:
# Print lengths and ranges for verification
print(len(fahrenheit_all))
print(len(celsius_all))

# Print the index ranges
print(list(range(len(fahrenheit_all))))
print(list(range(len(celsius_all))))

indices = list(range(len(fahrenheit_all)))

In [None]:
import random

# Shuffle the data indices - we will use random.shuffle() for this.

indices = list(range(len(fahrenheit_all)))
random.seed(42)  # we use the seed for reproducibility
random.shuffle(indices)


# Shuffle the data according to the shuffled indices
fahrenheit_shuffled = [fahrenheit_all[i] for i in indices]
celsius_shuffled = [celsius_all[i] for i in indices]


print('Original Data:')
print(fahrenheit_all)
print(celsius_all)
print('-'*10)
print('Shuffled Data:')
print(fahrenheit_shuffled)
print(celsius_shuffled)
print('-'*10)

Q. Why did we shuffle the indices first, and not the lists directly?

`YOUR ANSWER HERE`

In [None]:
# Manually split into training and testing sets
# Here, we will use 70% of the data for training and 30% for testing.

train_size = int(0.7 * len(fahrenheit_all))

# We can use python list slicing to split the data
fahrenheit_train = fahrenheit_shuffled[:train_size]     # first train_size points for training
celsius_train = celsius_shuffled[:train_size]           # first train_size points for training
fahrenheit_test = fahrenheit_shuffled[train_size:]      # remaining points for testing
celsius_test = celsius_shuffled[train_size:]            # remaining points for testing


print("Training data (Fahrenheit):", fahrenheit_train)
print("Training data (Celsius):", celsius_train)
print('-'*10)

print("Testing data (Fahrenheit):", fahrenheit_test)
print("Testing data (Celsius):", celsius_test)
print('-'*10)


#### Train the model on the training set

Now, that we have our train/test split, we can **train** a model on our `training set` using the gradient descent from before and **evaluate** our model using the `test set`.

In [None]:
# Gradient Descent for Fahrenheit to Celsius conversion using both w and b

# Initialize parameters
w_fc = 1
b_fc = -60

# Set the learning rate
learning_rate_fc = 0.0001

# Set the number of gradient descent steps
num_steps_fc = 250000

for step in range(num_steps_fc):
    predicted_fc = get_predictions_v2(fahrenheit_train, w_fc, b_fc)
    error_fc = get_mean_squared_error(celsius_train, predicted_fc)
    grad_w_fc, grad_b_fc = get_gradients_w_b(get_mean_squared_error, fahrenheit_train, celsius_train, w_fc, b_fc)
    
    w_fc = w_fc - learning_rate_fc * grad_w_fc
    b_fc = b_fc - learning_rate_fc * grad_b_fc
    
    if step % 25000 == 0:
        print(f"Step {step}: w = {w_fc:.4f}, b = {b_fc:.4f}, error = {error_fc:.4f}")

print(f"Best weight (w) found: {w_fc:.4f}")
print(f"Best bias (b) found: {b_fc:.4f}")
print(f"Mean squared error at best parameters for the Train Set: {error_fc:.4f}")


#### Evaluate the model on the test set.

Next, we can *evaluate* on the `test set` using our learned parameters

In [None]:
# Get predictions on the test set: fahrenheit_test
predicted_test = get_predictions_v2(fahrenheit_test, w_fc, b_fc)

# Print the predicted values upto 2 decimal places
print("Predicted Values:", [round(val, 2) for val in predicted_test])

# Print the actual values
print("Actual Values:", celsius_test)

# Using the predicted values, calculate the mean squared error on the test set
test_error = get_mean_squared_error(celsius_test, predicted_test)

# Print the test error
print(f"Mean Squared Error on Test Set: {test_error:.4f}")

Why would the Test Set error more Train Set error?

`YOUR ANSWER HERE`

### Regression vs. Classification

So far, we used our model to **predict a (continuous) number** (Celsius temperature). This type of prediction is called **regression**.

But what if we wanted to predict a **category** instead of a number? That is called **classification**.

**Regression:** Predicting a `continuous value` (for example: someone's weight, the length of a fish, or the amount of rainfall tomorrow).

**Classification:** Predicting a discrete `label` or `class` (for example: whether a review is positive or negative, the species of a plant, or the type of fruit in a photo).

##### Examples: Is this Regression or Classification?

Answer the following questions and then dicsuss your answers:

1. Predicting the price of a house based on its features.

`YOUR ANSWER HERE`

2. Predicting whether an email is spam or not.

`YOUR ANSWER HERE`

3. Predicting if a tumor is benign or malignant.

`YOUR ANSWER HERE`

4. Predicting the number of hours a student will study next week.

`YOUR ANSWER HERE`

5. Predicting the temperature in San Diego tomorrow.

`YOUR ANSWER HERE`

6. Predicting which digit (0-9) is shown in a handwritten image.

`YOUR ANSWER HERE`



Why is the the last option ___________ given that we are predicting a number?

`YOUR ANSWER HERE`

<!-- Even though the last example ("Predicting which digit (0-9) is shown in a handwritten image") involves predicting a number, it is still a classification problem. This is because you are choosing one `label` from a fixed set of `classes` (the digits 0 through 9), not predicting a continuous value. The output is a class label, not a real-valued number. 

In classification, the "number" is just a category name (like 0, 1, ..., 9), not a quantity that can take on any value. -->

### Linear Regression: From a Line to Multiple Variables

Linear regression is a method for modeling the relationship between input variables (features) and an output variable (target) by fitting a straight line.

#### The Equation of a Line (One Variable)

For a single input variable $x$, the equation of a line is:

$$
y = w \cdot x + b
$$

- $y$ = predicted value (output)
- $x$ = input value (feature)
- $w$ = weight (slope of the line)
- $b$ = bias (intercept)

This is the basic form of **simple linear regression**. We have already seen this!

#### Generalizing to Multiple Input Variables

When there are multiple input variables (features), we use **multiple linear regression**. The equation becomes:

$$
y = w_1 x_1 + w_2 x_2 + \ldots + w_n x_n + b
$$

So, the prediction is a **weighted sum** of all the input features plus a bias:

$$
y = \sum_{i=1}^n w_i x_i + b
$$

The model learns the best weights ($w_1, w_2, ..., w_n$) and bias ($b$) to fit the data.

**Take away:**  
- With one $x$, linear regression fits a line.  
- With many $x$'s, it fits a *hyperplane* in higher dimensions.  
- The principle is the same: predict $y$ as a weighted sum of the inputs plus a bias.

## Introducing the Perceptron

Now that we've seen how to fit a line to predict numbers (regression), let's look at a simple model for classification: the **perceptron**.

A perceptron is a type of *linear* classifier. It takes a set of input features (numbers), computes a *weighted sum*, and makes a decision about which class the input belongs to.

- If the data is linearly separable, the perceptron can find a line (or hyperplane) that separates the classes.
- The perceptron updates its weights based on mistakes it makes during training.

### What does the perceptron model look like?

The perceptron is actually very similar to the linear model we used for regression.

- In regression, we used the equation:  
  $$
  y = \sum_{i=1}^n w_i x_i + b
  $$

- In the perceptron (for classification), we use:  
  $$
  score = \sum_{i=1}^n w_i x_i + b
  $$

    But instead of predicting a number, the perceptron uses the score to decide the class:

    - If the score is greater than or equal to 0, predict one class (e.g., +1).
    - If the score is less than 0, predict the other class (e.g., -1).

  So, the perceptron is just a linear model with a decision rule on top! 

Let's look at a simple example:

In [None]:
# Toy perceptron example
# Each input is [x1, x2], label is +1 if x2 > x1, else -1

X_toy = [
    [1.5, 4],
    [1, 2],   
    [2, 1],   
    [3, 5],
    [4, 2],   
    [0, 0],
    [1.5, -0.5] 
]
y_toy = [1, 1, -1, 1, -1, -1, -1]


In [None]:
# Separate points by class for coloring
def plot_points(X_points, y_labels):
    X_pos = [x for x, y in zip(X_points, y_labels) if y == 1]
    X_neg = [x for x, y in zip(X_points, y_labels) if y == -1]

    plt.scatter([x[0] for x in X_pos], [x[1] for x in X_pos], color='blue', marker='x', label='Class +1')
    plt.scatter([x[0] for x in X_neg], [x[1] for x in X_neg], color='red', marker='o',  label='Class -1')

    plt.xlabel('x1')
    plt.ylabel('x2')
    plt.title('Data')
    plt.legend()
    plt.grid(True)
    plt.show()

plot_points(X_toy, y_toy)

Complete the `perceptron_predict(x, w, b)` code below. 

In [None]:
# Complete the perceptron prediction function below
def perceptron_predict(x, w, b):
    '''
    x: 2D input list [x1, x2]
    w: 2D weight list [w1, w2]
    b: bias term
    '''
    # You can assume x is a list of two elements [x1, x2]
    # Score = w_1*x_1 + w_2*x_2 + b
    # YOUR CODE HERE

    # Return +1 if score >= 0 else -1
    # YOUR CODE HERE
    return None  # Replace this with actual return value

In [None]:
# Test cases for perceptron_predict with 2D input x

# Case 1: score > 0
assert perceptron_predict([2, 3], [1, 1], -4) == 1  # 2*1 + 3*1 - 4 = 1 >= 0

# Case 2: score == 0
assert perceptron_predict([1, 1], [2, -1], -1) == 1  # 1*2 + 1*(-1) - 1 = 0

# Case 3: score < 0
assert perceptron_predict([0, 1], [1, 2], -3) == -1  # 0*1 + 1*2 - 3 = -1 < 0

# Case 4: negative weights
assert perceptron_predict([2, 2], [-1, -1], 3) == -1  # 2*-1 + 2*-1 + 3 = -2 -2 + 3 = -1 < 0 -> -1

# Case 5: bias only
assert perceptron_predict([0, 0], [0, 0], 2) == 1    # 0 + 0 + 2 = 2 >= 0

print("All perceptron_predict test cases passed.")

In [None]:
w = [1, 1]
b = 1

for x, y_true in zip(X_toy, y_toy):
    y_pred = perceptron_predict(x, w, b)
    print(f"Input: {x}, Actual label: {y_true}, Predicted: {y_pred}")

You don't need to understand the code in the next cell. Run it to see the plot of the current decision boundary. 

In [None]:
import numpy as np

# Plot the toy data and the decision boundary using current w and b
def plot_perceptron_decision_boundary(X_points, y_labels, w, b, current_points=None):
    X_pos = [x for x, y in zip(X_points, y_labels) if y == 1]
    X_neg = [x for x, y in zip(X_points, y_labels) if y == -1]
    # Highlight current points if provided
    if current_points is not None:
        for pt in current_points:
            plt.scatter(pt[0], pt[1], color='gold', edgecolor='black', s=120, marker='*', label='Current point')
    plt.scatter([x[0] for x in X_pos], [x[1] for x in X_pos], color='blue', marker='x', label='Class +1')
    plt.scatter([x[0] for x in X_neg], [x[1] for x in X_neg], color='red', marker='o', label='Class -1')

    x1_vals = np.linspace(-2, 5, 100)
    # Robust handling for zero weights
    if w[0] == 0 and w[1] == 0:
        plt.text(0.5, 0.5, "No decision boundary\n(w1=0, w2=0)",
                 fontsize=14, color='red', ha='center', va='center', transform=plt.gca().transAxes)
    elif w[1] != 0:
        x2_vals = [-(w[0]/w[1])*x1 - b/w[1] for x1 in x1_vals]
        plt.plot(x1_vals, x2_vals, color='green', linestyle='--', label='Decision boundary')
    elif w[0] != 0:
        plt.axvline(x=-b/w[0], color='green', linestyle='--', label='Decision boundary')

    plt.xlabel('x1')
    plt.ylabel('x2')
    plt.title('Perceptron Decision Boundary')
    plt.legend()
    plt.grid(True)
    plt.show()

plot_perceptron_decision_boundary(X_toy, y_toy, w, b)

You don't need to understand the code in the next cell. Run it and you should be able to control the decision boundary by changing the weights and bias values (similar to what we did for the line).

In [None]:
from ipywidgets import interact, FloatSlider

def interactive_perceptron_plot(w1=1.0, w2=1.0, b=1.0):
    w = [w1, w2]
    # Plot data points
    X_pos = [x for x, y in zip(X_toy, y_toy) if y == 1]
    X_neg = [x for x, y in zip(X_toy, y_toy) if y == -1]

    plt.figure(figsize=(6, 5))
    plt.scatter([x[0] for x in X_pos], [x[1] for x in X_pos], color='blue', marker='x', label='Class +1')
    plt.scatter([x[0] for x in X_neg], [x[1] for x in X_neg], color='red', marker='o', label='Class -1')

    # Plot decision boundary: w1*x1 + w2*x2 + b = 0
    x1_vals = np.linspace(-2, 5, 100)
    if w1 == 0 and w2 == 0:
        plt.text(0.5, 0.5, "No decision boundary\n(w1=0, w2=0)", 
                 fontsize=14, color='red', ha='center', va='center', transform=plt.gca().transAxes)
    elif w2 != 0:
        x2_vals = [-(w1/w2)*x1 - b/w2 for x1 in x1_vals]
        plt.plot(x1_vals, x2_vals, color='green', linestyle='--', label='Decision boundary')
    elif w1 != 0:
        plt.axvline(x=-b/w1, color='green', linestyle='--', label='Decision boundary')

    # Show predictions for each point
    for x, y_true in zip(X_toy, y_toy):
        score = w[0] * x[0] + w[1] * x[1] + b
        y_pred = 1 if score >= 0 else -1
        plt.text(x[0]+0.1, x[1], f'Pred: {y_pred}', fontsize=9, color='black')

    plt.xlabel('x1')
    plt.ylabel('x2')
    plt.title(f'Perceptron: w1={w1:.2f}, w2={w2:.2f}, b={b:.2f}')
    plt.legend()
    plt.grid(True)
    plt.show()

interact(
    interactive_perceptron_plot,
    w1=FloatSlider(value=1, min=-5, max=5, step=0.1, description='w1'),
    w2=FloatSlider(value=1, min=-5, max=5, step=0.1, description='w2'),
    b=FloatSlider(value=1, min=-10, max=10, step=0.1, description='b')
)

### How does the perceptron learn?

The perceptron learns by updating its weights and bias whenever it makes a mistake on a training example.

- For each training example, it computes the score: 
    $$
      score = \sum_{i=1}^n w_i x_i + b
    $$
- It predicts the class based on the sign of the score.
- If the prediction is wrong, it *updates* the weights and bias to reduce future mistakes:

  **Update rule:**

  - For each feature $i$:
    - $w_i = w_i + y \cdot x_i$
  - Bias:
    - $b = b + y$

  Here, $y$ is the true label (+1 or -1), and $x_i$ is the value of feature $i$.

This update nudges the model to be more likely to predict the correct class next time for similar examples.

In [None]:
# Perceptron learning: one iteration over the data

w_learn = [1, 1]  # start with initial weights
b_learn = 1       # start with initial bias

for x, y_true in zip(X_toy, y_toy):
    score = w_learn[0] * x[0] + w_learn[1] * x[1] + b_learn
    y_pred = 1 if score >= 0 else -1
    if y_pred != y_true:
        # Update weights
        for i in range(len(w_learn)):
            w_learn[i] += y_true * x[i]
        # Update bias
        b_learn += y_true
    print('Old Score:', score)
    print('New Score:', w_learn[0] * x[0] + w_learn[1] * x[1] + b_learn)
    print(f"x={x}, y_true={y_true}, y_pred={y_pred}, w={w_learn}, b={b_learn}")
    plot_perceptron_decision_boundary(X_toy, y_toy, w_learn, b_learn, current_points=[x])


Now, let's do the same thing, with the same data, just ordered differently. 

In [None]:
X_toy_ordered = [
[1.5, 4],
[1, 2],
[3, 5],
[2, 1],
[4, 2],
[0, 0],
[1.5, -0.5]
]

y_toy_ordered = [1, 1, 1, -1, -1, -1, -1]

plot_points(X_toy_ordered, y_toy_ordered)

Now, let's run the same code from earlier, but this time use `X_toy_ordered` and `y_toy_ordered` instead of `X_toy` and `y_toy_ordered`. 

In [None]:
# Perceptron learning: one epoch over the data
w_learn = [1, 1]  # start with initial weights
b_learn = 1       # start with initial bias

for x, y_true in zip(X_toy_ordered, y_toy_ordered):
    score = w_learn[0]*x[0] + w_learn[1]*x[1] + b_learn
    y_pred = 1 if score >= 0 else -1
    if y_pred != y_true:
        # Update weights
        for i in range(len(w_learn)):
            w_learn[i] += y_true * x[i]
        # Update bias
        b_learn += y_true
    
    print('Old Score:', score)
    print('New Score:', w_learn[0] * x[0] + w_learn[1] * x[1] + b_learn)
    print(f"x={x}, y_true={y_true}, y_pred={y_pred}, w={w_learn}, b={b_learn}")
    plot_perceptron_decision_boundary(X_toy_ordered, y_toy_ordered, w_learn, b_learn, current_points=[x])
    


### What happened?
Discuss:
- What happened? Why did the model not learn the correct decision boundary?

`YOUR ANSWER HERE`

- What can we do to make it learn the decision boundary?

`YOUR ANSWER HERE`


In [None]:
# Let's shuffle the data and do one epoch of perceptron learning

import random

random.seed(42) # for reproducibility

# This is another way to shuffle data so that X and y stay aligned
# You make a combination of X and y, shuffle that, then unzip back to X and y

# For zip, refer: https://docs.python.org/3/library/functions.html#zip 
combined = list(zip(X_toy_ordered, y_toy_ordered)) 

# Shuffle the combined data
random.shuffle(combined) # shuffle the combined data

# One thing we can do is shuffle the data...
# Unzip back to X and y with *combined
X_shuffled, y_shuffled = zip(*combined)


w_learn = [1, 1]  # reset weights
b_learn = 1       # reset bias  

for x, y_true in zip(X_shuffled, y_shuffled):
    y_pred = perceptron_predict(x, w_learn, b_learn)
    if y_pred != y_true:
        # Update weights
        for i in range(len(w_learn)):
            w_learn[i] += y_true * x[i]
        # Update bias
        b_learn += y_true
    print('Old Score:', score)
    print('New Score:', w_learn[0] * x[0] + w_learn[1] * x[1] + b_learn)
    print(f"x={x}, y_true={y_true}, y_pred={y_pred}, w={w_learn}, b={b_learn}")
    plot_perceptron_decision_boundary(X_shuffled, y_shuffled, w_learn, b_learn)

In [None]:
# We can also run multiple epochs and shuffle the data at each epoch
# Number of epochs
num_epochs = 2

w_learn = [1, 1]  # reset weights
b_learn = 1       # reset bias  
random.seed(42) # for reproducibility

plot_perceptron_decision_boundary(X_toy_ordered, y_toy_ordered, w_learn, b_learn)

# We can run multiple epochs
for epoch in range(num_epochs):
    combined = list(zip(X_toy_ordered, y_toy_ordered))
    random.shuffle(combined)
    X_shuffled, y_shuffled = zip(*combined)

    for x, y_true in zip(X_shuffled, y_shuffled):
        score = w_learn[0]*x[0] + w_learn[1]*x[1] + b_learn
        y_pred = 1 if score >= 0 else -1
        if y_pred != y_true:
            # Update weights
            for i in range(len(w_learn)):
                w_learn[i] += y_true * x[i]
            # Update bias
            b_learn += y_true
        print(f'Epoch: {epoch+1}')
        print(f"x={x}, y_true={y_true}, y_pred={y_pred}, w={w_learn}, b={b_learn}")
        plot_perceptron_decision_boundary(X_shuffled, y_shuffled, w_learn, b_learn)
    print(f"Epoch:{epoch+1} w={w_learn}, b={b_learn}")


##### Shuffling and Multiple Epochs in Perceptron Training

In practice, training a perceptron involves more than just a single pass over the data:

- **Shuffling the Data:**  
    In each epoch (or iteration), we randomly shuffle the order of the training examples. This prevents the perceptron from getting stuck in patterns caused by the order of the data and helps it learn a better decision boundary.

- **Multiple Epochs:**  
    Instead of updating the weights just once for each example, we repeat the process for several epochs. In each epoch, the perceptron sees all the training examples (in a new random order), updating its weights whenever it makes a mistake.  
    Running multiple epochs gives the perceptron more chances to correct its mistakes and converge to a solution that separates the classes (if possible).

This approach improves learning and helps the perceptron find a good decision boundary.

### What if we wanted to work with pictures?

So far, we have been working with simple numbers, which made it easy to plug them directly into our models. But what if our data consists of images instead of numbers?

To use images in machine learning models, we need to convert them into a numerical format. This usually means *representing* each image as a list (or array) of numbers—one for each pixel. For example, a grayscale image can be turned into a grid of pixel brightness values, where each value shows how dark or light that pixel is. Once we have this numerical representation, we can use the same modeling techniques as before.

**What is a pixel?**

A *pixel* (short for "picture element") is the smallest unit of a digital image. Each pixel represents a single point in the image and has a value that describes its color or brightness. In grayscale images, a pixel's value typically indicates how light or dark it is. In color images, each pixel usually has three values (red, green, and blue) that together define its color. By arranging many pixels in a grid, we can form complex images that computers can process and analyze.

**Example:**
- A grayscale image can be represented as a grid of numbers, where each number is the brightness of a pixel (e.g., 0 for black, 255 for white).
- For the *digits dataset* (see below), each image is 8x8 pixels, so we have 64 numbers per image.

In [None]:

# Grid for digit 0 (circle)
zero_grid = [
    [0,0,1,1,1,1,0,0],
    [0,1,1,0,0,1,1,0],
    [1,1,0,0,0,0,1,1],
    [1,1,0,0,0,0,1,1],
    [1,1,0,0,0,0,1,1],
    [1,1,0,0,0,0,1,1],
    [0,1,1,0,0,1,1,0],
    [0,0,1,1,1,1,0,0]
]

plt.figure(figsize=(2, 2))
plt.imshow(zero_grid, cmap='gray_r')
plt.title("Digit 0")
plt.axis('off')
plt.show()

# Grid for digit 1 (vertical bar)
one_grid = [
    [0,0,0,1,1,0,0,0],
    [0,0,1,1,1,0,0,0],
    [0,1,0,1,1,0,0,0],
    [0,0,0,1,1,0,0,0],
    [0,0,0,1,1,0,0,0],
    [0,0,0,1,1,0,0,0],
    [0,0,0,1,1,0,0,0],
    [0,1,1,1,1,1,0,0]
]

plt.figure(figsize=(2, 2))
plt.imshow(one_grid, cmap='gray_r')
plt.title("Digit 1")
plt.axis('off')
plt.show()

Create your own digits (or other shapes) using the grid below!

In [None]:
create_grid = [
    [0,0,0,0,0,0,0,0],
    [0,0,0,0,0,0,0,0],
    [0,0,0,0,0,0,0,0],
    [0,0,0,0,0,0,0,0],
    [0,0,0,0,0,0,0,0],
    [0,0,0,0,0,0,0,0],
    [0,0,0,0,0,0,0,0],
    [0,0,0,0,0,0,0,0]
]

plt.figure(figsize=(2, 2))
plt.imshow(create_grid, cmap='gray_r')
plt.axis('off')
plt.show()

Let's load real-world data:

- Each 8x8 matrix in the digits dataset represents a grayscale image of a handwritten digit.
- Each value in the matrix is a pixel intensity:
    - 0 means the pixel is completely white (background).
    - Higher values (up to 16) mean the pixel is darker (more ink).
- So, the matrix is a grid of pixel brightness values, where each number shows how dark that pixel is.
- When you plot the matrix as an image, you see the shape of the digit written by hand.

In [None]:
from sklearn.datasets import load_digits

# Load the digits dataset
digits = load_digits()

# Print the first 5 images as arrays and their labels
for i in range(5):
    
    print(f"Image {i} (label: {digits.target[i]}):")
    print(digits.images[i])
    
    # Show each image in a separate figure with its label
    plt.figure(figsize=(2, 2))
    plt.imshow(digits.images[i], cmap='gray_r')
    plt.title(f"Label: {digits.target[i]}")
    plt.axis('off')
    plt.show()

