<a href="https://colab.research.google.com/github/zeeshan-sardar/efficient_ml/blob/main/pytorch_basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets # torchvision contains vision related datasets, transformations and model architectures.
#There are other types as well like TorchText, TorchAudio
from torchvision.transforms import ToTensor

Torchvision Datasets object takes two arguments, one is the transforms and second target_transforms for images and labels respectively.

In [2]:
training_data = datasets.FashionMNIST(root="data", train=True, download=True, transform=ToTensor())

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 26421880/26421880 [00:02<00:00, 11477234.31it/s]


Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 29515/29515 [00:00<00:00, 206134.47it/s]


Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 4422102/4422102 [00:01<00:00, 3828617.95it/s]


Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 5148/5148 [00:00<00:00, 15646577.53it/s]


Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw



In [3]:
test_dataset = datasets.FashionMNIST(root="data", train=False, download=True, transform=ToTensor())

In [4]:
train_dataloader = DataLoader(dataset=training_data, batch_size=64)
test_dataloader = DataLoader(dataset=test_dataset, batch_size=64)

In [5]:
for X, y in train_dataloader:
  print(f"Shape of X [N, C, H, W]: {X.shape}")
  print(f"Shape of y: {y.shape}, {y.dtype}")
  break

Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]), torch.int64


The Neural Network is defined by inheriting the `nn.Module`. The network architecture is defined in `__init__` function and the datapassing through the network is defined in `forward` function.

In [6]:
# Get cpu, gpu or mps device for training.
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

class NeuralNetwork(nn.Module):
  def __init__(self):
    super().__init__()
    self.flatten = nn.Flatten()
    self.linear_relu_stack = nn.Sequential(
        nn.Linear(28*28, 512),nn.ReLU(),
        nn.Linear(512, 512), nn.ReLU(),
        nn.Linear(512, 10))
  def forward(self, x):
    x = self.flatten(x)
    logits = self.linear_relu_stack(x)
    return logits

model = NeuralNetwork().to(device)
print(model)



Using cpu device
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


For model training we need to have a loss function and optimizer.  
During training loop, model makes the prediction, claculates the loss and backpropagrate it to correct it for the next iteration.

Crossentropy loss is common for classification tasks.

In [7]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model.parameters(), lr=1e-3)

loss.backward: it calculates the gradients of loos wrt each parameter of the model. The backpropagation happens using computational graph indicating which parameter is contributing how in the loss.

optimizer.step: updates the parameters based on the gradients during backpropagation.

Optimizer tells the update rule for model parameters, in this case it is stochastic gradient descent. The for updation is `param -= learning_rate * grad`.

optimizer.zero_grad: resets the gradients after every batch to prevent gradient accumulation and increase stability.

Putting it Together:

Here's the overall flow of a training iteration:

1. Calculate the loss for the current batch.
2. Backpropagate to compute gradients.
3. Update the model's parameters using the optimizer.
4. Reset gradients for the next batch.

**Model Parameters**
Examples:
In a linear layer: Weights represent connections between input and output neurons, and biases determine the activation thresholds.
In a CNN: Filters capture spatial features in images, and biases regulate the activations of convolutional layers.
In an LSTM: Weights govern the flow of information within the memory cell and gates.

you can check which parts of your model are considered parameters using the model.parameters()

In [None]:
def train(dataloader, model, loss_fn, optimizer):
  size = len(dataloader.dataset)
  model.train()
  for batch, (X, y) in enumerate(dataloader):
    X, y = X.to(device), y.to(device)

    pred = model(X)
    loss = loss_fn(pred, y)

    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

    if batch % 100 == 0:
      loss, current = loss.item(), (batch + 1) * len(X)
      print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


In [None]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [None]:
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.302810  [   64/60000]
loss: 2.298926  [ 6464/60000]
loss: 2.278595  [12864/60000]
loss: 2.271198  [19264/60000]
loss: 2.266478  [25664/60000]
loss: 2.240637  [32064/60000]
loss: 2.238707  [38464/60000]
loss: 2.217944  [44864/60000]
loss: 2.197321  [51264/60000]
loss: 2.173370  [57664/60000]
Test Error: 
 Accuracy: 54.6%, Avg loss: 2.173082 

Epoch 2
-------------------------------
loss: 2.177229  [   64/60000]
loss: 2.178077  [ 6464/60000]
loss: 2.127108  [12864/60000]
loss: 2.137861  [19264/60000]
loss: 2.096600  [25664/60000]
loss: 2.040423  [32064/60000]
loss: 2.061537  [38464/60000]
loss: 2.000973  [44864/60000]
loss: 1.987123  [51264/60000]
loss: 1.914861  [57664/60000]
Test Error: 
 Accuracy: 61.8%, Avg loss: 1.927784 

Epoch 3
-------------------------------
loss: 1.955326  [   64/60000]
loss: 1.936322  [ 6464/60000]
loss: 1.830831  [12864/60000]
loss: 1.860294  [19264/60000]
loss: 1.753417  [25664/60000]
loss: 1.698043  [32064/600

# Tensors
Tensors are like numpy ndarrays but they are optimized for hardware acceleration and differentiation. Both np arrays and pytorch tensors can be converted from one to another using the bridge.

In [8]:
import torch
import numpy as np

In [17]:
data = [[1,2],[3,4]]
np_data = np.array(data)
tensor = torch.tensor(data)


tensor([[1, 2],
        [3, 4]])

In [41]:
np_tensor = torch.from_numpy(np_data)
np_tensor

tensor([[1, 2],
        [3, 4]])

In [42]:
np_tensor.numpy()

array([[1, 2],
       [3, 4]])

In [18]:
data_like = torch.ones_like(np_tensor)
data_like

tensor([[1, 1],
        [1, 1]])

In [20]:
torch.rand_like(np_tensor, dtype=torch.float16)

tensor([[0.2739, 0.5054],
        [0.6294, 0.9458]], dtype=torch.float16)

In [21]:
torch.rand((2,3))

tensor([[0.3782, 0.0664, 0.0159],
        [0.3111, 0.5670, 0.6933]])

In [23]:
tensor = torch.rand(2,3)

By default tensors are created in CPU and we have to move them explicitly to GPU to get the hardware acceleration. Keep in mind, copying large tensors can be time and memeory expensive.

In [24]:
if torch.cuda.is_available():
  tensor = tensor.to("cuda")

In [26]:
tensor.get_device()

-1

In [29]:
# tensor can be converted into python numerical types using .item()

tensor.sum().item()

3.4518532752990723

Inplace operations can be acieved by *_ subscript.  Inplace operations can save memory but variables can lose their history immidiately and can be problamatic when computing derivatives. So, their usage is discouraged.

In [39]:
tensor.add_(3)

tensor([[6.7487, 6.1146, 6.7688],
        [6.3831, 6.4955, 6.9412]])

In [40]:
tensor

tensor([[6.7487, 6.1146, 6.7688],
        [6.3831, 6.4955, 6.9412]])