# MATH2603 Lab — Artificial Neural Networks (Part I)
## Perceptrons · Activation functions · Decision boundaries · Overfitting vs underfitting

This practical accompanies **ANN – Part I**. You will run small computational experiments to understand:
- what a **perceptron** can (and cannot) represent
- why **nonlinear activation functions** matter
- how a small neural network can learn **nonlinear decision boundaries**
- the difference between **underfitting** and **overfitting**

> **How to run:** click a code cell and press **Shift + Enter**.

### 
- **Part A :** Perceptron as a linear classifier (and why XOR fails)
- **Part B :** Soft perceptron with sigmoid; probability view
- **Part C :** Small neural network (MLP) learns nonlinear boundaries
- **Part D :** Overfitting vs underfitting with train/test split
- **Wrap-up :** Reflection questions (useful for portfolio)


> **Important:** If you see `NameError` (a function is not defined), you probably ran cells out of order. Use **Run All** once, then work top-to-bottom.

## 0) Setup check (run first)

If you get `ModuleNotFoundError`, install packages from a terminal (Anaconda Prompt / command line):

```bash
pip install numpy matplotlib scikit-learn
```


In [4]:
import sys
print("Python:", sys.version.split()[0])

import numpy as np
import matplotlib.pyplot as plt

try:
    import sklearn
    from sklearn.model_selection import train_test_split
    from sklearn.neural_network import MLPClassifier
    print("scikit-learn:", sklearn.__version__)
except Exception as e:
    print("scikit-learn not available yet:", repr(e))


Python: 3.12.1
scikit-learn: 1.8.0


## Helper functions (plots + datasets)

We will work with **2D toy datasets** so we can *see* decision boundaries.


In [None]:
def make_linearly_separable(n=200, noise=0.25, seed=0):
    rng = np.random.default_rng(seed)
    X = rng.normal(size=(n, 2))
    y = (X[:, 0] + X[:, 1] > 0).astype(int)  # true boundary
    X = X + noise * rng.normal(size=X.shape)  # jitter inputs
    return X, y

def make_xor(n=200, noise=0.2, seed=0):
    rng = np.random.default_rng(seed)
    X = rng.uniform(-1, 1, size=(n, 2))
    y = ((X[:, 0] > 0) ^ (X[:, 1] > 0)).astype(int)
    X = X + noise * rng.normal(size=X.shape)
    return X, y

def plot_points(X, y, title="Dataset"):
    plt.figure(figsize=(5.5, 5))
    plt.scatter(X[y==0, 0], X[y==0, 1], s=20, label="Class 0")
    plt.scatter(X[y==1, 0], X[y==1, 1], s=20, label="Class 1")
    plt.xlabel("x1")
    plt.ylabel("x2")
    plt.title(title)
    plt.legend()
    plt.show()

def plot_decision_boundary(predict_fn, X, y, title="Decision boundary", grid_step=0.02):
    x_min, x_max = X[:,0].min()-0.6, X[:,0].max()+0.6
    y_min, y_max = X[:,1].min()-0.6, X[:,1].max()+0.6
    xx, yy = np.meshgrid(np.arange(x_min, x_max, grid_step),
                         np.arange(y_min, y_max, grid_step))
    grid = np.c_[xx.ravel(), yy.ravel()]
    zz = predict_fn(grid).reshape(xx.shape)

    plt.figure(figsize=(5.5, 5))
    plt.contourf(xx, yy, zz, alpha=0.25)
    plt.scatter(X[y==0, 0], X[y==0, 1], s=18, label="Class 0")
    plt.scatter(X[y==1, 0], X[y==1, 1], s=18, label="Class 1")
    plt.xlabel("x1")
    plt.ylabel("x2")
    plt.title(title)
    plt.legend()
    plt.show()


# Part A — Perceptron = linear classifier

A (hard) perceptron computes:
- **weighted sum + bias:**  z = w·x + b
- **threshold:** output 1 if z ≥ 0 else 0

In 2D, the decision boundary is a **straight line**.


In [None]:
def perceptron_predict(X, w, b):
    z = X @ w + b
    return (z >= 0).astype(int)

# A linearly separable dataset
X_lin, y_lin = make_linearly_separable(n=250, noise=0.25, seed=1)
plot_points(X_lin, y_lin, title="Linearly separable dataset")


## A1) Try a perceptron by hand (change w and b)

Goal: choose `w` and `b` so the boundary separates the classes.

**Tip:** Start with `w = [1, 1]` and adjust `b` to shift the line.


In [None]:
# --- TODO: change these ---
w = np.array([1.0, 1.0])
b = 0.0

y_pred = perceptron_predict(X_lin, w, b)
acc = (y_pred == y_lin).mean()

plot_decision_boundary(lambda Z: perceptron_predict(Z, w, b), X_lin, y_lin,
                       title=f"Perceptron boundary (acc={acc:.3f})")

print("Accuracy:", acc)


## A2) Why XOR fails

Now try XOR. No straight line can separate XOR perfectly.


In [None]:
X_xor, y_xor = make_xor(n=260, noise=0.18, seed=2)
plot_points(X_xor, y_xor, title="XOR dataset (not linearly separable)")


In [None]:
# Try a perceptron on XOR (you can still change w,b)
w = np.array([1.0, 1.0])
b = 0.0

y_pred = perceptron_predict(X_xor, w, b)
acc = (y_pred == y_xor).mean()

plot_decision_boundary(lambda Z: perceptron_predict(Z, w, b), X_xor, y_xor,
                       title=f"Perceptron on XOR (acc={acc:.3f})")

print("Accuracy:", acc)


### Short answer (Part A) — write 5–8 sentences
1. In your own words, what geometric object is a perceptron decision boundary in 2D?
2. Why can’t a single perceptron solve XOR?
3. What kind of change is needed to solve XOR?


**Your answer here:**


# Part B — “Soft” perceptron with sigmoid

Sigmoid activation:
σ(z) = 1 / (1 + e^{-z})

We interpret σ(z) as a probability-like score.
We also introduce **steepness** β:
σ(β z)


In [None]:
def sigmoid(z):
    return 1/(1+np.exp(-z))

def soft_predict_proba(X, w, b, beta=1.0):
    z = beta*(X @ w + b)
    return sigmoid(z)

zs = np.linspace(-8, 8, 400)
plt.figure(figsize=(6.5,4))
for beta in [0.5, 1.0, 3.0, 8.0]:
    plt.plot(zs, sigmoid(beta*zs), label=f"beta={beta}")
plt.xlabel("z")
plt.ylabel("sigma(beta z)")
plt.title("Sigmoid activation: changing steepness")
plt.legend()
plt.show()


## B1) Soft boundary (change beta)

Try `beta = 0.5, 1, 3, 10`.


In [None]:
w = np.array([1.0, 1.0])
b = 0.0
beta = 1.0  # TODO

proba = soft_predict_proba(X_lin, w, b, beta=beta)
y_pred = (proba >= 0.5).astype(int)
acc = (y_pred == y_lin).mean()

plot_decision_boundary(lambda Z: (soft_predict_proba(Z, w, b, beta=beta) >= 0.5).astype(int),
                       X_lin, y_lin,
                       title=f"Soft perceptron (beta={beta}) | acc={acc:.3f}")

print("Accuracy:", acc)
print("Example probabilities:", np.round(proba[:10], 3))


### Short answer (Part B) — write 4–7 sentences
1. How does increasing β change the sigmoid?
2. Why is a soft output useful?
3. Does a soft perceptron solve XOR? Why?


**Your answer here:**


# Part C — A small neural network learns nonlinear boundaries (MLP)

With a hidden layer + nonlinear activation, a network can represent nonlinear boundaries.
We use `sklearn` so we focus on concepts.


## C0) Import scikit-learn (run)

If this cell errors, install scikit-learn (see setup cell).


In [None]:
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier


## C1) Train an MLP on XOR

Try:
- hidden_layer_sizes = (2,), (10,), (50,)
- activation = 'tanh', 'relu', 'logistic'


In [None]:
# Ensure XOR data exists (in case you run cells out of order)
try:
    X_xor, y_xor
except NameError:
    X_xor, y_xor = make_xor(n=260, noise=0.18, seed=2)
    print("Note: XOR dataset was (re)created in this cell.")

X_train, X_test, y_train, y_test = train_test_split(
    X_xor, y_xor, test_size=0.3, random_state=0, stratify=y_xor
)

hidden_layer_sizes = (10,)   # TODO: try (2,), (10,), (50,)
activation = "tanh"          # TODO: try "tanh", "relu", "logistic"
alpha = 1e-4                 # regularization (bigger = simpler)
max_iter = 4000

mlp = MLPClassifier(
    hidden_layer_sizes=hidden_layer_sizes,
    activation=activation,
    alpha=alpha,
    random_state=0,
    max_iter=max_iter,
)

mlp.fit(X_train, y_train)

train_acc = mlp.score(X_train, y_train)
test_acc = mlp.score(X_test, y_test)

print("Train accuracy:", train_acc)
print("Test accuracy:", test_acc)

title = (
    f"MLP on XOR | hidden={hidden_layer_sizes}, act={activation}\n"
    f"train={train_acc:.3f}, test={test_acc:.3f}"
)

plot_decision_boundary(lambda Z: mlp.predict(Z), X_xor, y_xor, title=title)


### Task C2
Answer after trying several settings:

1. What happens when the network is too small?
2. What happens when it is larger?
3. How does activation affect the boundary?


**Your answer here:**


# Part D — Underfitting vs Overfitting (train/test)

We now use a noisier dataset to see generalisation.
You will vary model complexity and regularisation.


In [None]:
X, y = make_linearly_separable(n=400, noise=0.55, seed=7)
plot_points(X, y, title="Noisier dataset (harder classification)")

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)


## D1) Compare a few configurations

Interpretation guide:
- **Underfitting:** train low, test low
- **Good fit:** train good, test good
- **Overfitting:** train very high, test worse


In [None]:
configs = [
    {"hidden": (2,),  "alpha": 1e-2},
    {"hidden": (10,), "alpha": 1e-3},
    {"hidden": (80,), "alpha": 1e-6},
]

for cfg in configs:
    model = MLPClassifier(hidden_layer_sizes=cfg["hidden"],
                          activation="tanh",
                          alpha=cfg["alpha"],
                          random_state=0,
                          max_iter=5000)
    model.fit(X_train, y_train)
    tr = model.score(X_train, y_train)
    te = model.score(X_test, y_test)
    print(f"hidden={cfg['hidden']}, alpha={cfg['alpha']:>8g} | train={tr:.3f}, test={te:.3f}")


## D2) Plot one chosen configuration

Choose values for `hidden_layer_sizes` and `alpha` and plot the boundary.


In [None]:
# Choose values for hidden_layer_sizes and alpha and plot the boundary.

hidden_layer_sizes = (10,)  # TODO: try (2,), (10,), (80,)
alpha = 1e-3                # TODO: try 1e-2, 1e-3, 1e-6

model = MLPClassifier(
    hidden_layer_sizes=hidden_layer_sizes,
    activation="tanh",
    alpha=alpha,
    random_state=0,
    max_iter=5000,
)

model.fit(X_train, y_train)

train_acc = model.score(X_train, y_train)
test_acc = model.score(X_test, y_test)

title = (
    f"Boundary | hidden={hidden_layer_sizes}, alpha={alpha}\n"
    f"train={train_acc:.3f}, test={test_acc:.3f}"
)

plot_decision_boundary(lambda Z: model.predict(Z), X, y, title=title)


### Short answer (Part D) — write 6–10 sentences
1. Give one configuration that underfits and explain how you know.
2. Give one configuration that overfits and explain how you know.
3. Give at least two practical strategies to reduce overfitting.


**Your answer here:**


# Wrap-up reflection (10 minutes)

Write 6–10 sentences:
1. Why do neural networks need nonlinear activation functions?
2. What is the most important limitation of a single perceptron?
3. Why do we use a train/test split?
4. (Optional) How does this connect to complex systems (nonlinearity, emergence, etc.)?


**Your answer here:**


---
## Troubleshooting

- If plots do not appear in VS Code, try running the notebook in your browser (Jupyter Notebook).
- If MLP does not converge, increase `max_iter` or increase `alpha` slightly.
- If `sklearn` is missing, run: `pip install scikit-learn`.
