Task-1 : Implement VGG16 on Food101 dataset.
Your first task would be to implement the VGG16 architecture model class and train a classification model on the Food101 dataset using the above architecture. The details of the dataset are given below.

1.1 Import packages
Some packages are imported. However, you would need to import any other package that is required in the implementation that you feel is required. But do keep in mind, your model shouldn't be imported. It has to be implemented using the basic convolution layers.

In [1]:

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
from torchvision import datasets
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt
import torchvision.transforms as transforms
import torch.optim as optim
import random
import os
from torch.utils.data import Dataset,DataLoader

1.2. Dataset
The Food-101 is a challenging data set of 101 food categories with 101,000 images. All images were rescaled to have a maximum side length of 512 pixels. Implementing the below cell will allow you to download the dataset into your colab directory under /data/food-101. Inside the directory you would find the information about the dataset and also a ReadMe.txt file.

Now, the image size of the dataset is (512,512,3). However, the model expects the image to be of the size (224,224,3). Now using the transforms method, write a composed transformation where you implement the resize as well as convert to tensor function. Do some basic preprocessing as well, Normalisation, Standardization etc.
Hint : use the transform.Compose() method.

In [2]:
### YOUR CODE STARTS HERE ###
transform = transforms.Compose([
    transforms.Resize((224, 224)),  # Resizing the image to (224, 224)
    transforms.ToTensor(),  
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])  # Normalizing the image
])

### YOUR CODE ENDS HERE ###

In [3]:
# Setup training data
train_data = datasets.Food101(
    root="data",
    split="train", # get training data
    download=True,
    transform=transform
)

# Setup testing data
test_data = datasets.Food101(
    root="data",
    split="test", # get test data
    download=True,
    transform=transform
)

Downloading https://data.vision.ee.ethz.ch/cvl/food-101.tar.gz to data/food-101.tar.gz


100%|██████████| 4996278331/4996278331 [03:48<00:00, 21843702.59it/s]


Extracting data/food-101.tar.gz to data


In [5]:
image,label = train_data[0]
image ,label

(tensor([[[ 2.1119,  2.0948,  2.1290,  ..., -0.4911, -0.6109, -0.5424],
          [ 2.1290,  2.1119,  2.1290,  ..., -0.4054, -0.4568, -0.4568],
          [ 2.1462,  2.1290,  2.1462,  ..., -0.3883, -0.3712, -0.4397],
          ...,
          [ 0.5193,  0.4851,  0.4679,  ...,  0.2796,  0.2796,  0.3138],
          [ 0.4851,  0.4337,  0.4166,  ...,  0.4166,  0.3994,  0.4508],
          [ 0.4337,  0.3652,  0.3652,  ...,  0.4851,  0.4679,  0.4851]],
 
         [[ 2.2185,  2.2010,  2.2360,  ..., -1.3704, -1.5105, -1.4755],
          [ 2.2360,  2.2185,  2.2360,  ..., -1.3004, -1.3704, -1.4055],
          [ 2.2535,  2.2360,  2.2535,  ..., -1.3004, -1.3354, -1.4230],
          ...,
          [-0.6877, -0.7227, -0.7402,  ...,  0.1702,  0.2227,  0.3102],
          [-0.7052, -0.7577, -0.7752,  ...,  0.3102,  0.3627,  0.4503],
          [-0.7577, -0.8277, -0.8277,  ...,  0.3978,  0.4328,  0.4853]],
 
         [[ 2.4483,  2.4308,  2.4657,  ..., -1.4210, -1.6127, -1.6127],
          [ 2.4657,  2.4483,

In [6]:
image.shape

torch.Size([3, 224, 224])

1.3. Prepare Dataloader
Now, in the cell below implement the DataLoader function for the train and test data. You then have to print the length of the train and test dataloaders.

In [7]:


BATCH_SIZE = 32

### YOUR CODE STARTS HERE ###
#splitting data set into batches of 32
train_dataloader = DataLoader(train_data,batch_size = BATCH_SIZE,shuffle = True)
test_dataloader = DataLoader(test_data,batch_size = BATCH_SIZE,shuffle = True)
### YOUR CODE ENDS HERE ###



print(f"Length of train dataloader: {len(train_dataloader)} batches of {BATCH_SIZE}")
print(f"Length of test dataloader: {len(test_dataloader)} batches of {BATCH_SIZE}")

Length of train dataloader: 2368 batches of 32
Length of test dataloader: 790 batches of 32


In [8]:

train_features_batch, train_labels_batch = next(iter(train_dataloader))
train_features_batch.shape, train_labels_batch.shape

(torch.Size([32, 3, 224, 224]), torch.Size([32]))

1.4. VGG16 Architecture
Now, create a model class and implement the VGG16 architecture. The architecture layer is as follows :

VGG16 takes input tensor size as 224, 244 with 3 RGB channel. It has 13 convolutional layers, 5 Max Pooling layers, and 3 Dense layers which sum up to 21 layers.


Implement the model class in the given cell below. DONOT change the class name as that would be required in the next cell.



In [14]:


class VGG16(nn.Module):
    def __init__(self, num_classes=1000):
        super(VGG16, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(64),
            nn.Conv2d(64, 64, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(64),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(128),
            nn.Conv2d(128, 128, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(128),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(128, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(256),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(256),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(256),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(256, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(512),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(512),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(512),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(512),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(512),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(512),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        #self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(inplace=True),
            nn.BatchNorm1d(4096),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.BatchNorm1d(4096),
            
            #nn.Linear(4096, num_classes),
            #nn.Softmax(dim=1)  # Softmax layer for classification
        )

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x


1.5. Model Training
Train the above defined model using the following configurations :

#epochs = 20
learning rate = 0.05
loss = cross entropy
optimizer = Adam
After training, save the model with the name : food101_vgg16_model.pt

In [15]:
def train_model(model, train_loader, num_epochs, learning_rate):
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    total_step = len(train_loader)
    for epoch in range(num_epochs):
        for i, (images, labels) in enumerate(train_loader):
            images = images.to(device)
            labels = labels.to(device)

            # Forward pass
            outputs = model(images)
            loss = criterion(outputs, labels)

            # Backward and optimize
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            if (i+1) % 100 == 0:
                print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
                       .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

    # Saving the model
    torch.save(model.state_dict(), 'food101_vgg16_model.pt')


In [18]:
# num_classes = 101
num_epochs = 5
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
learning_rate = 0.0001
model = VGG16(num_classes).to(device)


### YOUR CODE STARTS HERE ###

train_loader = DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=True)

# Training the model
train_model(model, train_loader, num_epochs, learning_rate)


### YOUR CODE ENDS HERE ###


Epoch [1/5], Step [100/2368], Loss: 7.7038
Epoch [1/5], Step [200/2368], Loss: 7.3633
Epoch [1/5], Step [300/2368], Loss: 7.4514
Epoch [1/5], Step [400/2368], Loss: 6.9226
Epoch [1/5], Step [500/2368], Loss: 7.1151
Epoch [1/5], Step [600/2368], Loss: 7.2933
Epoch [1/5], Step [700/2368], Loss: 7.2007
Epoch [1/5], Step [800/2368], Loss: 7.3042
Epoch [1/5], Step [900/2368], Loss: 6.8691
Epoch [1/5], Step [1000/2368], Loss: 6.7475
Epoch [1/5], Step [1100/2368], Loss: 6.0768
Epoch [1/5], Step [1200/2368], Loss: 6.5275
Epoch [1/5], Step [1300/2368], Loss: 6.9691
Epoch [1/5], Step [1400/2368], Loss: 6.9699
Epoch [1/5], Step [1500/2368], Loss: 6.0810
Epoch [1/5], Step [1600/2368], Loss: 6.2864
Epoch [1/5], Step [1700/2368], Loss: 6.0112
Epoch [1/5], Step [1800/2368], Loss: 5.8773
Epoch [1/5], Step [1900/2368], Loss: 5.9984
Epoch [1/5], Step [2000/2368], Loss: 6.1921
Epoch [1/5], Step [2100/2368], Loss: 5.3402
Epoch [1/5], Step [2200/2368], Loss: 6.4702
Epoch [1/5], Step [2300/2368], Loss: 6.50

1.6. Evaluate the model
Load the trained model and evaluate on the test data.

In [19]:
### YOUR CODE STARTS HERE ###
# Define the evaluation function
def evaluate_model(model, test_loader):
    model.eval()  # Set the model to evaluation mode
    correct = 0
    total = 0
    with torch.no_grad():  # Disable gradient calculation
        for images, labels in test_loader:
            images = images.to(device)
            labels = labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total
    print('Accuracy of the model on the test images: {:.2f}%'.format(accuracy))

# Initialize the test DataLoader
test_loader = DataLoader(test_data, batch_size=BATCH_SIZE, shuffle=False)

# Evaluate the model
evaluate_model(model, test_loader)

### YOUR CODE ENDS HERE ###

Accuracy of the model on the test images: 42.00%
