# Summer of Code - Artificial Intelligence
## Week 08: Deep Learning

### Day 01: Artificial Neural Networks

In this notebook, we will explore **Artificial Neural Networks (ANNs)** using PyTorch library.


In [1]:
# Setup
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import make_moons, load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="whitegrid")
torch.manual_seed(42)
np.random.seed(42)

## From a Single Neuron to a Feed-Forward Network
We’ll build up step-by-step:
- Start with a single neuron (linear combination + nonlinearity)
- Use it for binary classification (logistic regression)
- Stack neurons into a small feed-forward network for multiclass classification

Keep runs fast and code clean.

### 1) A single neuron (NumPy)
A neuron computes:
$z = w^T x + b$, then applies a nonlinearity (here: sigmoid $\sigma(z)$).
We’ll see its decision boundary on a tiny toy dataset.

### 2) Logistic regression as a single-neuron classifier (PyTorch)
- Single linear unit with sigmoid gives a binary classifier.
- We train it on a simple non-linearly separable dataset (make_moons) and observe that without features engineering it will underfit.

### 3) From single neuron to a small Feed-Forward Network (Multiclass)
We’ll now stack layers with nonlinear activations to solve a multiclass problem (digits).
- Architecture: Input -> Hidden(64) -> Hidden(32) -> Output(10)
- Activation: ReLU
- Loss: CrossEntropyLoss (applies softmax internally)

In [None]:
# Multiclass classification with a small Feed-Forward Network (PyTorch)
digits = load_digits()
X_m = digits.data.astype(np.float32)
y_m = digits.target

# Standardize features
scaler_m = StandardScaler()
X_m = scaler_m.fit_transform(X_m)

X_train_m, X_test_m, y_train_m, y_test_m = train_test_split(
    X_m, y_m, test_size=0.2, random_state=42, stratify=y_m
)

X_train_m_t = torch.tensor(X_train_m, dtype=torch.float32)
y_train_m_t = torch.tensor(y_train_m, dtype=torch.long)
X_test_m_t = torch.tensor(X_test_m, dtype=torch.float32)
y_test_m_t = torch.tensor(y_test_m, dtype=torch.long)

mlp = nn.Sequential(
    nn.Linear(X_train_m.shape[1], 64),
    nn.ReLU(),
    nn.Linear(64, 32),
    nn.ReLU(),
    nn.Linear(32, 10),
 )

criterion_m = nn.CrossEntropyLoss()
optimizer_m = optim.Adam(mlp.parameters(), lr=1e-3)

for epoch in range(15):  # short training
    optimizer_m.zero_grad()
    logits = mlp(X_train_m_t)
    loss = criterion_m(logits, y_train_m_t)
    loss.backward()
    optimizer_m.step()

with torch.no_grad():
    preds = mlp(X_test_m_t).argmax(dim=1).numpy()
acc_m = accuracy_score(y_test_m, preds)
print(f"MLP accuracy on digits: {acc_m:.3f}")

In [None]:
# Binary classification with a single linear neuron (PyTorch)
X, y = make_moons(n_samples=600, noise=0.2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

X_train_t = torch.tensor(X_train, dtype=torch.float32)
y_train_t = torch.tensor(y_train, dtype=torch.float32).unsqueeze(1)
X_test_t = torch.tensor(X_test, dtype=torch.float32)
y_test_t = torch.tensor(y_test, dtype=torch.float32).unsqueeze(1)

model = nn.Linear(2, 1)  # single neuron
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

for epoch in range(200):
    optimizer.zero_grad()
    logits = model(X_train_t)
    loss = criterion(logits, y_train_t)
    loss.backward()
    optimizer.step()

with torch.no_grad():
    pred = (torch.sigmoid(model(X_test_t)) >= 0.5).int().numpy().ravel()
acc = accuracy_score(y_test, pred)
print(f"Logistic regression accuracy on moons: {acc:.3f}")

In [2]:
# Single neuron forward pass (NumPy)
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# Tiny dataset (linearly separable-ish)
X = np.array([[0.0, 0.0],
              [0.0, 1.0],
              [1.0, 0.0],
              [1.0, 1.0]])
y = np.array([0, 0, 0, 1])  # AND gate

# Random weights and bias
w = np.array([3.0, 3.0])
b = -4.0

z = X @ w + b
p = sigmoid(z)  # probability of class 1
pred = (p >= 0.5).astype(int)

print("Probabilities:", np.round(p, 3))
print("Predictions:", pred)
print("Accuracy:", (pred == y).mean())

Probabilities: [0.018 0.269 0.269 0.881]
Predictions: [0 0 0 1]
Accuracy: 1.0
