Understanding how we feed data into the neural network impacts training speed, stability, and generalization

🧪 Intuition:

Imagine climbing a hill (loss function surface):

🟦 Batch: You analyze the whole mountain before taking a step — stable but slow.

🟥 Stochastic: You take a wild step after seeing just one rock — fast but jittery.

🟩 Mini-Batch: You study a portion of the slope before stepping — fast & more stable.

In [101]:
import torch

from torch import nn

from torch.utils.data import DataLoader, TensorDataset

In [102]:
# Sample Dataset

X = torch.tensor([[0.], [1.],[2.],[3.],[4.],[5.]], dtype=torch.float32)

y = 2 * X + 1 # y = 2X + 1

In [103]:
dataset = TensorDataset(X,y)

In [104]:
# Try different batch sizes here: 1 (SGD), 6 (Batch), 2 or 3 (Mini-Batch)

loader = DataLoader(dataset, batch_size=6 ,shuffle=True)

In [105]:
model = nn.Linear(1,1)

loss_fn = nn.MSELoss()

optimizer = torch.optim.SGD(model.parameters(), lr=0.1)


for epoch in range(10):

    for xb,yb in loader:

        pred = model(xb)

        loss = loss_fn(pred, yb)

        optimizer.zero_grad()

        loss.backward()

        optimizer.step()

    print(f'Epoch : {epoch + 1}, Loss : {loss.item():.4f}')

Epoch : 1, Loss : 25.2862
Epoch : 2, Loss : 23.9892
Epoch : 3, Loss : 22.7593
Epoch : 4, Loss : 21.5932
Epoch : 5, Loss : 20.4874
Epoch : 6, Loss : 19.4386
Epoch : 7, Loss : 18.4440
Epoch : 8, Loss : 17.5007
Epoch : 9, Loss : 16.6059
Epoch : 10, Loss : 15.7572


Summary Visual (Mentally Picture):

Batch GD       → 🐢 Steady but slow and memory heavy  
SGD            → 🐇 Fast but noisy and unstable  
Mini-Batch GD  → 🦊 Fast and stable (Gold Standard)