Big picture

-> **NN's as functions** - neural networks are a big function made up of a lot of little functions.  The parameters to this function is the weights and biases.  The inputs to this function is usually a vector of all the variables of one training example. 

-> We can think of this as one big calculus function. We're trying to minimize the cost function. Aka - one big minimiztion function in calculus - we're trying to take the cost and doing our best to minimize it.  We try to find the derivitive of the cost function with respect to every single weight and bias.  How to best figure out how to lower that cost function is to find out how much that weight and bias contribute to the cost.  Then we can appropriately add or subjract the different weights to make sure you're finding your optimal algorithm. The entire goal of backpropagation is to find the derivative of the cost with respect to every different weight and bias so basically how the cost changes when we change a weight or a bias in our algorithm and this can help us tweek them to see how they impact the final cost.

Matrix Calc review
- Gradiants
- Jacobians
- Jocabian chain rule


In [67]:
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

In [68]:
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

Using mps device


In [70]:
from torch import nn

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

In [71]:
model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


In [74]:
X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

Predicted class: tensor([6], device='mps:0')


In [65]:
# Testing the trained model
with torch.no_grad():  # No need to track gradients for testing
    test_outputs = model(inputs)
    predicted = (test_outputs > 0.5).float()  # Apply threshold to get binary output
    for i, input in enumerate(inputs):
        print(f'Input: {input.numpy()}, Predicted: {predicted[i].item()}, Actual: {labels[i].item()}')

Input: [0. 0.], Predicted: 0.0, Actual: 0.0
Input: [0. 1.], Predicted: 1.0, Actual: 1.0
Input: [1. 0.], Predicted: 1.0, Actual: 1.0
Input: [1. 1.], Predicted: 0.0, Actual: 0.0


In [54]:
num_epochs = 300

for epoch in range(num_epochs):
    for inputs, labels in dataloader:
        # Forward pass
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        
        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    
    if (epoch + 1) % 10 == 0:
        print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')

Epoch [10/300], Loss: 0.4370
Epoch [20/300], Loss: 0.3344
Epoch [30/300], Loss: 0.1690
Epoch [40/300], Loss: 0.1056
Epoch [50/300], Loss: 0.1336
Epoch [60/300], Loss: 0.0497
Epoch [70/300], Loss: 0.1166
Epoch [80/300], Loss: 0.0504
Epoch [90/300], Loss: 0.0107
Epoch [100/300], Loss: 0.0693
Epoch [110/300], Loss: 0.0327
Epoch [120/300], Loss: 0.0164
Epoch [130/300], Loss: 0.0454
Epoch [140/300], Loss: 0.1034
Epoch [150/300], Loss: 0.0098
Epoch [160/300], Loss: 0.1274
Epoch [170/300], Loss: 0.0721
Epoch [180/300], Loss: 0.0653
Epoch [190/300], Loss: 0.0232
Epoch [200/300], Loss: 0.0121
Epoch [210/300], Loss: 0.0502
Epoch [220/300], Loss: 0.0270
Epoch [230/300], Loss: 0.0122
Epoch [240/300], Loss: 0.1417
Epoch [250/300], Loss: 0.0003
Epoch [260/300], Loss: 0.0036
Epoch [270/300], Loss: 0.0730
Epoch [280/300], Loss: 0.0197
Epoch [290/300], Loss: 0.0536
Epoch [300/300], Loss: 0.0963


In [55]:
# Evaluation mode
model.eval()
with torch.no_grad():
    predictions = model(X_tensor)
    predictions = predictions.round()
    accuracy = (predictions.eq(y_tensor).sum() / float(y_tensor.shape[0])).item()

print(f'Accuracy: {accuracy * 100:.2f}%')

Accuracy: 98.00%
