<a href="https://colab.research.google.com/github/valkova-k/cactus-repo/blob/br01/FashionMNIST_Dense_improved.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Let's build a simple neural network to classify images from the FashionMNIST dataset.

**1. Import Libraries**

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

In [None]:
import random
import numpy as np

seed = 42
torch.manual_seed(seed)
random.seed(seed)
np.random.seed(seed)

if torch.cuda.is_available():
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

*Checking for GPU Availability*

This code checks if a CUDA-enabled GPU is available and sets the `device` accordingly. If no GPU is available, it defaults to the CPU.

In [None]:
# Check if GPU is available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}')

Using device: cpu


**2. Data Preparation**

In [None]:
# Define a transform to convert images to tensors
transform = transforms.ToTensor()

# Download and load the training data
train_set = datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_set, batch_size=256, shuffle=True)

# Download and load the test data
test_set = datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=256, shuffle=False)

100%|██████████| 26.4M/26.4M [00:03<00:00, 8.50MB/s]
100%|██████████| 29.5k/29.5k [00:00<00:00, 131kB/s]
100%|██████████| 4.42M/4.42M [00:01<00:00, 2.48MB/s]
100%|██████████| 5.15k/5.15k [00:00<00:00, 17.6MB/s]


**3. Neural Network Model**

In [None]:
class DenseNet(nn.Module):
    def __init__(self):
        super(DenseNet, self).__init__()
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(28 * 28, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = self.flatten(x)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

model = DenseNet().to(device)
model

DenseNet(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (fc1): Linear(in_features=784, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=10, bias=True)
)

**4. Loss & Optimizer**

In [None]:
criterion = nn.CrossEntropyLoss(label_smoothing=0.05)   # added label smoothing for better regularization
optimizer = optim.SGD(model.parameters(), lr=0.05, momentum=0.9, weight_decay=1e-4, nesterov=True)    # added momentum (to speed up convergence), weight decay (penalizing large weights to reduce overfitting), nesterov momentum
scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=[10, 20], gamma=0.5)   # reduces lr during training, lr is multiplied by 0.5 after 10 epochs

**5. Training loop**

In [None]:
for epoch in range(30):  # Train for 30 epochs
    running_loss = 0.0
    for images, labels in train_loader:
        # Move images and labels to the device
        images = images.to(device)
        labels = labels.to(device)

        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass and optimize
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        optimizer.step()

        running_loss += loss.item()

    scheduler.step()
    current_lr = scheduler.get_last_lr()[0]
    print(f'Epoch [{epoch+1}/30], Loss: {running_loss / len(train_loader):.4f}, LR: {current_lr:.5f}')

Epoch [1/30], Loss: 0.8654, LR: 0.05000
Epoch [2/30], Loss: 0.6794, LR: 0.05000
Epoch [3/30], Loss: 0.6383, LR: 0.05000
Epoch [4/30], Loss: 0.6157, LR: 0.05000
Epoch [5/30], Loss: 0.5981, LR: 0.05000
Epoch [6/30], Loss: 0.5839, LR: 0.05000
Epoch [7/30], Loss: 0.5738, LR: 0.05000
Epoch [8/30], Loss: 0.5629, LR: 0.05000
Epoch [9/30], Loss: 0.5554, LR: 0.05000
Epoch [10/30], Loss: 0.5474, LR: 0.02500
Epoch [11/30], Loss: 0.5317, LR: 0.02500
Epoch [12/30], Loss: 0.5266, LR: 0.02500
Epoch [13/30], Loss: 0.5229, LR: 0.02500
Epoch [14/30], Loss: 0.5185, LR: 0.02500
Epoch [15/30], Loss: 0.5166, LR: 0.02500
Epoch [16/30], Loss: 0.5137, LR: 0.02500
Epoch [17/30], Loss: 0.5103, LR: 0.02500
Epoch [18/30], Loss: 0.5081, LR: 0.02500
Epoch [19/30], Loss: 0.5045, LR: 0.02500
Epoch [20/30], Loss: 0.5019, LR: 0.01250
Epoch [21/30], Loss: 0.4927, LR: 0.01250
Epoch [22/30], Loss: 0.4912, LR: 0.01250
Epoch [23/30], Loss: 0.4893, LR: 0.01250
Epoch [24/30], Loss: 0.4884, LR: 0.01250
Epoch [25/30], Loss: 0.48

**6. Evaluation on the test set**

In [None]:
correct = 0
total = 0
with torch.no_grad():  # Disable gradient calculation for evaluation
    for images, labels in test_loader:
        # Move images and labels to the device
        images = images.to(device)
        labels = labels.to(device)

        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy: {100 * correct / total:.2f}%')

Accuracy: 88.93%
