# Neural Networks (NNs)

A **neural network** is a **parameterized, differentiable function approximator** inspired by the structure and function of biological neural systems. It is composed of layers of **artificial neurons** (also called units or nodes), each of which performs a weighted sum of its inputs followed by the application of a non-linear **activation function**.

Formally, a neural network $f(\mathbf{x}; \boldsymbol{\theta})$ maps an input vector $\mathbf{x} \in \mathbb{R}^n$ to an output vector $\mathbf{y} \in \mathbb{R}^m$, where $\boldsymbol{\theta}$ represents the learnable parameters (weights and biases). A typical feedforward neural network with $L$ layers can be defined recursively as:

$$
\mathbf{h}^{(0)} = \mathbf{x}
$$

$$
\mathbf{h}^{(l)} = \phi^{(l)}\left( \mathbf{W}^{(l)} \mathbf{h}^{(l-1)} + \mathbf{b}^{(l)} \right), \quad \text{for } l = 1, 2, \ldots, L
$$

$$
f(\mathbf{x}; \boldsymbol{\theta}) = \mathbf{h}^{(L)}
$$

Where:
- $\mathbf{W}^{(l)} \in \mathbb{R}^{d_l \times d_{l-1}}$ is the weight matrix for layer $l$,
- $\mathbf{b}^{(l)} \in \mathbb{R}^{d_l}$ is the bias vector,
- $\phi^{(l)}$ is a non-linear activation function (e.g., ReLU, sigmoid, tanh),
- $\mathbf{h}^{(l)}$ is the output of the $l$-th layer (also called the hidden state),
- $\boldsymbol{\theta} = \{ \mathbf{W}^{(l)}, \mathbf{b}^{(l)} \}_{l=1}^L$ is the set of all learnable parameters.

Neural networks are typically trained using **gradient-based optimization**, most commonly **stochastic gradient descent (SGD)** or its variants, to minimize a **loss function** that quantifies the error between predicted and true outputs. The gradients are computed using **backpropagation**, which applies the chain rule of calculus to efficiently compute derivatives of the loss with respect to each parameter.

Variants of neural networks include:
- **Convolutional Neural Networks (CNNs)** for spatial data (e.g., images),
- **Recurrent Neural Networks (RNNs)** for sequential data,
- **Transformer architectures** for attention-based sequence modeling.

Neural networks are universal function approximators and can represent any Borel measurable function to arbitrary accuracy given sufficient width, depth, and training data.

## Example: a 2D classification problem using make_moons.

Builds a simple MLP (multi-layer perceptron) with two hidden layers.
Trains it using the Adam optimizer and cross-entropy loss.
Evaluates classification accuracy on the test set.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Generate synthetic data
X, y = make_moons(n_samples=1000, noise=0.2, random_state=42)
X = StandardScaler().fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Convert to PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.long)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_test = torch.tensor(y_test, dtype=torch.long)

# Define a simple feedforward neural network
class SimpleNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(2, 16),
            nn.ReLU(),
            nn.Linear(16, 8),
            nn.ReLU(),
            nn.Linear(8, 2)  # 2 output classes
        )
    
    def forward(self, x):
        return self.net(x)

# Instantiate model, loss, and optimizer
model = SimpleNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Training loop
epochs = 100
for epoch in range(epochs):
    # Forward pass
    outputs = model(X_train)
    loss = criterion(outputs, y_train)

    # Backward and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (epoch+1) % 10 == 0:
        print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}")

# Evaluate on test set
with torch.no_grad():
    test_outputs = model(X_test)
    predictions = torch.argmax(test_outputs, dim=1)
    accuracy = (predictions == y_test).float().mean()
    print(f"Test Accuracy: {accuracy:.2f}")


Epoch [10/100], Loss: 0.5675
Epoch [20/100], Loss: 0.3909
Epoch [30/100], Loss: 0.2990
Epoch [40/100], Loss: 0.2747
Epoch [50/100], Loss: 0.2602
Epoch [60/100], Loss: 0.2413
Epoch [70/100], Loss: 0.2187
Epoch [80/100], Loss: 0.1923
Epoch [90/100], Loss: 0.1631
Epoch [100/100], Loss: 0.1315
Test Accuracy: 0.95
