# Challenge 1: Classification
In this challenge, you're given a food classification dataset which has 101 classes. You need to analyze and preprocess the dataset as well as build deep learning models for performing food classification. 
<br>
Three models are to be trained for this task, mainly light, medium, and heavy model. <br>
Examples: <br>
Light model - mobilenetv2 <br>
Medium model - Resnet50 <br>
Heavy model - VGG19 <br>
<br>
The above given models are examples. You are free to choose any deep learning model to train. 

**Main Objective**:
You are supposed to use both TensorFlow and PyTorch for this task. You need to train one model for each framework. (You can use one of the frameworks again for the third model)

## Summary 

Create a table for your train and test accuracy as well as speed for each model (mention the framework used for training)

# Analyze the dataset
## Objectives
1. Upload the dataset provided (Google Drive link). 
2. Extract the dataset. 
3. Re-arrange dataset into training and testing folders. 
4. List number of samples in training and testing folders. 
5. Plot sample images from training and testing datasets. 

### Your Response/Notes

You can summarize your work for this section here/give any explanations if required. 


**Re-arranging dataset into folders**

In [1]:
import math
import os
import random
import time
import json
import shutil
import numpy as np
from glob import iglob
from copy import deepcopy

import PIL
import PIL.ImageOps
import PIL.ImageEnhance
import PIL.ImageDraw
from PIL import Image

from audioop import bias

import torch
from torch.cuda import amp
from torch import nn
from torch.nn import functional as F
from torch import optim
from torch.optim.lr_scheduler import LambdaLR
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler

from torchvision import datasets
from torchvision import transforms

In [2]:
given_root = 'data/food/images'

In [3]:
# given_root = 'food-101/images'

shutil.rmtree('Food_data', ignore_errors=True)

os.mkdir('Food_data')
train_root = os.path.join('Food_data', 'train')
os.mkdir(train_root)

with open('data/food/meta/train.json') as f:
    train_json = json.load(f)

for i, item in enumerate(train_json.items()):
    des = os.path.join(train_root, item[0])
    os.mkdir(des)

    for v in item[1]:
        src = os.path.join(given_root, v)
        shutil.copy2(src + '.jpg', des)

    print('Train {} {}/{} completed'.format(item[0], i+1, len(train_json)))

test_root = os.path.join('Food_data', 'test')
os.mkdir(test_root)

with open('data/food/meta/test.json') as f:
    test_json = json.load(f)

for i, item in enumerate(test_json.items()):
    des = os.path.join(test_root, item[0])
    os.mkdir(des)

    for v in item[1]:
        src = os.path.join(given_root, v)
        shutil.copy2(src + '.jpg', des)

    print('Test {} {}/{} completed'.format(item[0], i+1, len(test_json)))

Train churros 1/101 completed
Train hot_and_sour_soup 2/101 completed
Train samosa 3/101 completed
Train sashimi 4/101 completed
Train pork_chop 5/101 completed
Train spring_rolls 6/101 completed
Train panna_cotta 7/101 completed
Train beef_tartare 8/101 completed
Train greek_salad 9/101 completed
Train foie_gras 10/101 completed
Train tacos 11/101 completed
Train pad_thai 12/101 completed
Train poutine 13/101 completed
Train ramen 14/101 completed
Train pulled_pork_sandwich 15/101 completed
Train bibimbap 16/101 completed
Train beignets 17/101 completed
Train apple_pie 18/101 completed
Train crab_cakes 19/101 completed
Train risotto 20/101 completed
Train paella 21/101 completed
Train steak 22/101 completed
Train baby_back_ribs 23/101 completed
Train miso_soup 24/101 completed
Train frozen_yogurt 25/101 completed
Train club_sandwich 26/101 completed
Train carrot_cake 27/101 completed
Train falafel 28/101 completed
Train bread_pudding 29/101 completed
Train chicken_wings 30/101 complet

In [4]:
print(len(list(iglob("Food_data/train/*/*.jpg", recursive=True))))
print(len(list(iglob("Food_data/test/*/*.jpg", recursive=True))))

75750
25250


# Pre-process Images
## Objectives
1. Implement preprocessing codes for each model. 
2. Augment the dataset. 
3. Preview the preprocessed dataset. 

### Preprocessing steps for Medium model (Pytorch)


In [5]:
# config
class Config:
    def __init__(self):

        self._configs = {}
        self._configs["device"] = torch.device('cpu')
        self._configs["seed"] = 11

        self._configs["randaug"] = (2, 10)
        self._configs["resize"] = 128
        self._configs["batch"] = 16
        self._configs["drop_last"] = True

        # model config
        self._configs["num_classes"] = 101
        self._configs["depth"] = 28
        self._configs["widen_factor"] = 8
        self._configs["dense_dropout"] = 0.1

        # training config
        self._configs["name"] = 'exp_1'
        self._configs["save_path"] = '/content/drive/MyDrive/food_weights/WideResNet'
        self._configs["lr"] = 0.01
        self._configs["momentum"] = 0.9
        self._configs["weight_decay"] = 5e-4

        self._configs["label_smoothing"] = 0.15
        self._configs["epochs"] = 30
        self._configs["start_epoch"] = 0
        self._configs["amp"] = False
        self._configs["best_top1"] = 0.0
        self._configs["best_top5"] = 0.0

    @property
    def device(self):
        return self._configs["device"]

    @property
    def seed(self):
        return self._configs["seed"]

    @property
    def randaug(self):
        return self._configs["randaug"]

    @property
    def resize(self):
        return self._configs["resize"]

    @property
    def batch(self):
        return self._configs["batch"]

    @property
    def drop_last(self):
        return self._configs["drop_last"]

    @property
    def num_classes(self):
        return self._configs["num_classes"]

    @property
    def depth(self):
        return self._configs["depth"]

    @property
    def widen_factor(self):
        return self._configs["widen_factor"]

    @property
    def dense_dropout(self):
        return self._configs["dense_dropout"]

    @property
    def name(self):
        return self._configs["name"]

    @property
    def save_path(self):
        return self._configs["save_path"]

    @property
    def lr(self):
        return self._configs["lr"]

    @property
    def momentum(self):
        return self._configs["momentum"]

    @property
    def weight_decay(self):
        return self._configs["weight_decay"]

    @property
    def label_smoothing(self):
        return self._configs["label_smoothing"]

    @property
    def epochs(self):
        return self._configs["epochs"]

    
    @property
    def start_epoch(self):
        return self._configs["start_epoch"]

    @property
    def amp(self):
        return self._configs["amp"]

    def best_top1(self, item=None):
        if item == None:
            return self._configs["best_top1"]
        self._configs["best_top1"] = item   

    def best_top5(self, item=None):
        if item == None:
            return self._configs["best_top5"]
        self._configs["best_top5"] = item            

    def change_optim(self, lr, m, w):
        self._configs["lr"] = lr
        self._configs["momentum"] = m
        self._configs["weight_decay"] = w

    def set_start_epoch(self, epoch):
        self._configs["start_epoch"] = epoch

config = Config()

In [6]:
# Data Augmentation
PARAMETER_MAX = 10
RESAMPLE_MODE = None


def AutoContrast(img, **kwarg):
    return PIL.ImageOps.autocontrast(img)


def Brightness(img, v, max_v, bias=0):
    v = _float_parameter(v, max_v) + bias
    return PIL.ImageEnhance.Brightness(img).enhance(v)


def Color(img, v, max_v, bias=0):
    v = _float_parameter(v, max_v) + bias
    return PIL.ImageEnhance.Color(img).enhance(v)


def Contrast(img, v, max_v, bias=0):
    v = _float_parameter(v, max_v) + bias
    return PIL.ImageEnhance.Contrast(img).enhance(v)


def Cutout(img, v, max_v, **kwarg):
    if v == 0:
        return img
    v = _float_parameter(v, max_v)
    v = int(v * min(img.size))
    w, h = img.size
    x0 = np.random.uniform(0, w)
    y0 = np.random.uniform(0, h)
    x0 = int(max(0, x0 - v / 2.))
    y0 = int(max(0, y0 - v / 2.))
    x1 = int(min(w, x0 + v))
    y1 = int(min(h, y0 + v))
    xy = (x0, y0, x1, y1)
    # gray
    color = (127, 127, 127)
    img = img.copy()
    PIL.ImageDraw.Draw(img).rectangle(xy, color)
    return img


def CutoutConst(img, v, max_v, **kwarg):
    v = _int_parameter(v, max_v)
    w, h = img.size
    x0 = np.random.uniform(0, w)
    y0 = np.random.uniform(0, h)
    x0 = int(max(0, x0 - v / 2.))
    y0 = int(max(0, y0 - v / 2.))
    x1 = int(min(w, x0 + v))
    y1 = int(min(h, y0 + v))
    xy = (x0, y0, x1, y1)
    # gray
    color = (127, 127, 127)
    img = img.copy()
    PIL.ImageDraw.Draw(img).rectangle(xy, color)
    return img


def Equalize(img, **kwarg):
    return PIL.ImageOps.equalize(img)


def Identity(img, **kwarg):
    return img


def Invert(img, **kwarg):
    return PIL.ImageOps.invert(img)


def Posterize(img, v, max_v, bias, **kwarg):
    v = _int_parameter(v, max_v) + bias
    return PIL.ImageOps.posterize(img, v)


def Rotate(img, v, max_v, **kwarg):
    v = _float_parameter(v, max_v)
    if random.random() < 0.5:
        v = -v
    return img.rotate(v)


def Sharpness(img, v, max_v, bias):
    v = _float_parameter(v, max_v) + bias
    return PIL.ImageEnhance.Sharpness(img).enhance(v)


def ShearX(img, v, max_v, **kwarg):
    v = _float_parameter(v, max_v)
    if random.random() < 0.5:
        v = -v
    return img.transform(img.size, PIL.Image.AFFINE, (1, v, 0, 0, 1, 0), RESAMPLE_MODE)


def ShearY(img, v, max_v, **kwarg):
    v = _float_parameter(v, max_v)
    if random.random() < 0.5:
        v = -v
    return img.transform(img.size, PIL.Image.AFFINE, (1, 0, 0, v, 1, 0), RESAMPLE_MODE)


def Solarize(img, v, max_v, **kwarg):
    v = _int_parameter(v, max_v)
    return PIL.ImageOps.solarize(img, 256 - v)


def SolarizeAdd(img, v, max_v, threshold=128, **kwarg):
    v = _int_parameter(v, max_v)
    if random.random() < 0.5:
        v = -v
    img_np = np.array(img).astype(np.int)
    img_np = img_np + v
    img_np = np.clip(img_np, 0, 255)
    img_np = img_np.astype(np.uint8)
    img = Image.fromarray(img_np)
    return PIL.ImageOps.solarize(img, threshold)


def TranslateX(img, v, max_v, **kwarg):
    v = _float_parameter(v, max_v)
    if random.random() < 0.5:
        v = -v
    v = int(v * img.size[0])
    return img.transform(img.size, PIL.Image.AFFINE, (1, 0, v, 0, 1, 0), RESAMPLE_MODE)


def TranslateY(img, v, max_v, **kwarg):
    v = _float_parameter(v, max_v)
    if random.random() < 0.5:
        v = -v
    v = int(v * img.size[1])
    return img.transform(img.size, PIL.Image.AFFINE, (1, 0, 0, 0, 1, v), RESAMPLE_MODE)


def TranslateXConst(img, v, max_v, **kwarg):
    v = _float_parameter(v, max_v)
    if random.random() > 0.5:
        v = -v
    return img.transform(img.size, PIL.Image.AFFINE, (1, 0, v, 0, 1, 0), RESAMPLE_MODE)


def TranslateYConst(img, v, max_v, **kwarg):
    v = _float_parameter(v, max_v)
    if random.random() > 0.5:
        v = -v
    return img.transform(img.size, PIL.Image.AFFINE, (1, 0, 0, 0, 1, v), RESAMPLE_MODE)


def _float_parameter(v, max_v):
    return float(v) * max_v / PARAMETER_MAX


def _int_parameter(v, max_v):
    return int(v * max_v / PARAMETER_MAX)


def rand_augment_pool():
    augs = [(AutoContrast, None, None),
            (Brightness, 1.8, 0.1),
            (Color, 1.8, 0.1),
            (Contrast, 1.8, 0.1),
            (CutoutConst, 40, None),
            (Equalize, None, None),
            (Invert, None, None),
            (Posterize, 4, 0),
            (Rotate, 30, None),
            (Sharpness, 1.8, 0.1),
            (ShearX, 0.3, None),
            (ShearY, 0.3, None),
            (Solarize, 256, None),
            (TranslateXConst, 100, None),
            (TranslateYConst, 100, None),
            ]
    return augs


class RandAugment(object):
    def __init__(self, n, m, resample_mode=PIL.Image.BILINEAR):
        assert n >= 1
        assert m >= 1
        global RESAMPLE_MODE
        RESAMPLE_MODE = resample_mode
        self.n = n
        self.m = m
        self.augment_pool = rand_augment_pool()

    def __call__(self, img):
        ops = random.choices(self.augment_pool, k=self.n)
        for op, max_v, bias in ops:
            prob = np.random.uniform(0.2, 0.8)
            if random.random() + prob >= 1:
                img = op(img, v=self.m, max_v=max_v, bias=bias)
        return img

In [7]:
# define Random Transforms class

class TransformMPL(object):
    def __init__(self, config, mean, std):
        if config.randaug:
            n, m = config.randaug
        else:
            n, m = 2, 10  # default

        self.aug = transforms.Compose([
            transforms.RandomHorizontalFlip(),
            transforms.RandomCrop(size=config.resize,
                                  padding=int(config.resize*0.125),
                                  padding_mode='reflect'),
            RandAugment(n=n, m=m)])
        self.normalize = transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize(mean=mean, std=std)])

    def __call__(self, x):
        aug = self.aug(x)
        return self.normalize(aug)

In [8]:
# Image Net mean and std
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]

train_set = datasets.ImageFolder('Food_data/train',
                                 transform=TransformMPL(config, mean=mean, std=std),
)

train_loader = DataLoader(train_set,
                          sampler=RandomSampler(train_set),
                          batch_size=config.batch,
                          drop_last=config.drop_last,
                          pin_memory=True)

test_set = datasets.ImageFolder('Food_data/test',
                                 transform=TransformMPL(config, mean=mean, std=std),
)

test_loader = DataLoader(test_set,
                         batch_size=config.batch,
                         sampler=SequentialSampler(test_set))

In [8]:
import matplotlib.pyplot as plt
import torchvision

def imshow(img):
    img = inv_normalize(img)     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

def inv_normalize(img):
    mean = torch.Tensor([0.485, 0.456, 0.406]).unsqueeze(-1)
    std= torch.Tensor([0.229, 0.224, 0.225]).unsqueeze(-1)
    img = (img.view(3, -1) * std + mean).view(img.shape)
    img = img.clamp(0, 1)
    return img

images, labels = next(iter(train_loader))
imshow(torchvision.utils.make_grid(images))
print('   '.join('%5s' % train_set.classes[labels[j]] for j in range(config.batch)))

images, labels = next(iter(test_loader))
imshow(torchvision.utils.make_grid(images))
print('   '.join('%5s' % test_set.classes[labels[j]] for j in range(config.batch)))

NameError: name 'train_loader' is not defined

### Preprocessing steps for light model (TF EfficientNetB0)

In [None]:
import tensorflow as tf

images = []
labels  = []
TRAIN_DIR = 'Food_data/train'
VAL_DIR = 'Food_data/test'
IMG_SIZE = (224,224) # Image resolution
BATCH_SIZE = 32

train_data = tf.keras.preprocessing.image_dataset_from_directory(
    TRAIN_DIR,
    image_size=(224, 224),
    batch_size= BATCH_SIZE,
    shuffle = True,
    seed = 123,
    label_mode = 'categorical',
)

valid_data = tf.keras.preprocessing.image_dataset_from_directory(
    VAL_DIR,
    image_size=(224, 224),
    batch_size= BATCH_SIZE,
    shuffle = False,
    seed = 123,
    label_mode = 'categorical',
)

print(len(train_data))
print(len(valid_data))

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 10))
for images, labels in train_data.take(1):
  for i in range(9):
    ax = plt.subplot(3, 3, i + 1)
    plt.imshow(images[i].numpy().astype("uint8"))
    plt.title(class_names[labels[i]])
    plt.axis("off")

plt.figure(figsize=(10, 10))
for images, labels in valid_data.take(1):
  for i in range(9):
    ax = plt.subplot(3, 3, i + 1)
    plt.imshow(images[i].numpy().astype("uint8"))
    plt.title(class_names[labels[i]])
    plt.axis("off")

### Preprocessing steps for heavier model

In [None]:
# preprocessing step for pytorch models are same and preprocessing 
# steps for tf models are same. 

# Training different models
## Objectives
1. Obtain 90% accuracy in all the models trained. 
2. You're free to use any techniques for traning such as transfer learning, knowledge transfer, etc. 
3. The models should not overfit the training dataset. 
4. Measure the performance in terms of accuracy and speed of each model. 
5. Visualize the training and testing performance using TensorBoard. 

#### Optional:
1. Apply weight quantization to increase the speed of the models. 

## Train wideResNet

In [None]:

class ModelEMA(nn.Module):
    def __init__(self, model, decay=0.9999, device=None):
        super().__init__()
        self.module = deepcopy(model)
        self.module.eval()
        self.decay = decay
        self.device = device
        if self.device is not None:
            self.module.to(device=device)

    def forward(self, input):
        return self.module(input)

    def _update(self, model, update_fn):
        with torch.no_grad():
            for ema_v, model_v in zip(self.module.parameters(), model.parameters()):
                if self.device is not None:
                    model_v = model_v.to(device=self.device)
                ema_v.copy_(update_fn(ema_v, model_v))
            for ema_v, model_v in zip(self.module.buffers(), model.buffers()):
                if self.device is not None:
                    model_v = model_v.to(device=self.device)
                ema_v.copy_(model_v)

    def update_parameters(self, model):
        self._update(model, update_fn=lambda e, m: self.decay * e + (1. - self.decay) * m)

    def state_dict(self):
        return self.module.state_dict()

    def load_state_dict(self, state_dict):
        self.module.load_state_dict(state_dict)

In [None]:
class BasicBlock(nn.Module):
    def __init__(self, in_planes, out_planes, stride, dropout=0.0, activate_before_residual=False):
        super(BasicBlock, self).__init__()
        self.bn1 = nn.BatchNorm2d(in_planes, momentum=0.001)
        self.relu1 = nn.LeakyReLU(negative_slope=0.1, inplace=True)
        self.conv1 = nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                               padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_planes, momentum=0.001)
        self.relu2 = nn.LeakyReLU(negative_slope=0.1, inplace=True)
        self.conv2 = nn.Conv2d(out_planes, out_planes, kernel_size=3, stride=1,
                               padding=1, bias=False)
        self.dropout = dropout
        self.equalInOut = (in_planes == out_planes)
        self.convShortcut = (not self.equalInOut) and nn.Conv2d(in_planes, out_planes,
                                                                kernel_size=1, stride=stride,
                                                                padding=0, bias=False) or None
        self.activate_before_residual = activate_before_residual

    def forward(self, x):
        if not self.equalInOut and self.activate_before_residual == True:
            x = self.relu1(self.bn1(x))
        else:
            out = self.relu1(self.bn1(x))
        out = self.relu2(self.bn2(self.conv1(out if self.equalInOut else x)))
        if self.dropout > 0:
            out = F.dropout(out, p=self.dropout, training=self.training)
        out = self.conv2(out)
        return torch.add(x if self.equalInOut else self.convShortcut(x), out)

In [None]:
class NetworkBlock(nn.Module):
    def __init__(self, nb_layers, in_planes, out_planes, block, stride, dropout=0.0,
                 activate_before_residual=False):
        super(NetworkBlock, self).__init__()
        self.layer = self._make_layer(
            block, in_planes, out_planes, nb_layers, stride, dropout, activate_before_residual)

    def _make_layer(self, block, in_planes, out_planes, nb_layers, stride, dropout,
                    activate_before_residual):
        layers = []
        for i in range(int(nb_layers)):
            layers.append(block(i == 0 and in_planes or out_planes, out_planes,
                                i == 0 and stride or 1, dropout, activate_before_residual))
        return nn.Sequential(*layers)

    def forward(self, x):
        return self.layer(x)

In [None]:
class WideResNet(nn.Module):
    def __init__(self, num_classes, depth=28, widen_factor=2, dropout=0.0, dense_dropout=0.0):
        super(WideResNet, self).__init__()
        channels = [16, 16*widen_factor, 32*widen_factor, 64*widen_factor]
        assert((depth - 4) % 6 == 0)
        n = (depth - 4) / 6
        block = BasicBlock
        # 1st conv before any network block
        self.conv1 = nn.Conv2d(3, channels[0], kernel_size=3, stride=1,
                               padding=1, bias=False)
        # 1st block
        self.block1 = NetworkBlock(
            n, channels[0], channels[1], block, 1, dropout, activate_before_residual=True)
        # 2nd block
        self.block2 = NetworkBlock(
            n, channels[1], channels[2], block, 2, dropout)
        # 3rd block
        self.block3 = NetworkBlock(
            n, channels[2], channels[3], block, 2, dropout)
        # global average pooling and classifier
        self.bn1 = nn.BatchNorm2d(channels[3], momentum=0.001)
        self.relu = nn.LeakyReLU(negative_slope=0.1, inplace=True)
        self.drop = nn.Dropout(dense_dropout)
        self.fc = nn.Linear(channels[3], num_classes)
        self.channels = channels[3]

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight,
                                        mode='fan_out',
                                        nonlinearity='leaky_relu')
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1.0)
                nn.init.constant_(m.bias, 0.0)
            elif isinstance(m, nn.Linear):
                nn.init.xavier_normal_(m.weight)
                nn.init.constant_(m.bias, 0.0)

    def forward(self, x):
        out = self.conv1(x)
        out = self.block1(out)
        out = self.block2(out)
        out = self.block3(out)
        out = self.relu(self.bn1(out))
        out = F.adaptive_avg_pool2d(out, 1)
        out = out.view(-1, self.channels)
        return self.fc(self.drop(out))

In [None]:
def build_wideresnet(config):

    model = WideResNet(num_classes=config.num_classes,
                       depth=config.depth,
                       widen_factor=config.widen_factor,
                       dropout=0,
                       dense_dropout=config.dense_dropout)
    return model

In [None]:
model = build_wideresnet(config)
model_params = sum(p.numel() for p in model.parameters())
print('Total parameters {} M'.format(model_params/1e6))

In [None]:
import os
import shutil
from collections import OrderedDict

import torch
from torch import distributed as dist
from torch import nn
from torch.nn import functional as F

In [None]:
def reduce_tensor(tensor, n):
    rt = tensor.clone()
    dist.all_reduce(rt, op=dist.ReduceOp.SUM)
    rt /= n
    return rt


def create_loss_fn(config):
    if config.label_smoothing > 0:
        criterion = SmoothCrossEntropy(alpha=config.label_smoothing)
    else:
        criterion = nn.CrossEntropyLoss()
    return criterion.to(config.device)


def module_load_state_dict(model, state_dict):
    new_state_dict = OrderedDict()
    for k, v in state_dict.items():
        name = k[7:]  # remove `module.`
        new_state_dict[name] = v
    model.load_state_dict(new_state_dict)


def model_load_state_dict(model, state_dict):
    try:
        model.load_state_dict(state_dict)
    except:
        module_load_state_dict(model, state_dict)


def save_checkpoint(config, state, is_best):
    os.makedirs(config.save_path, exist_ok=True)
    
    name = config.name
    filename = f'{config.save_path}/{name}_last.pth.tar'
    torch.save(state, filename, _use_new_zipfile_serialization=False)
    if is_best:
        shutil.copyfile(filename, f'{config.save_path}/{config.name}_best.pth.tar')


def accuracy(output, target, topk=(1,)):
    output = output.to(torch.device('cpu'))
    target = target.to(torch.device('cpu'))
    maxk = max(topk)
    batch_size = target.shape[0]

    _, idx = output.sort(dim=1, descending=True)
    pred = idx.narrow(1, 0, maxk).t()
    correct = pred.eq(target.reshape(1, -1).expand_as(pred))

    res = []
    res = []
    for k in topk:
        correct_k = correct[:k].reshape(-1).float().sum(dim=0, keepdim=True)
        res.append(correct_k.mul_(100.0 / batch_size))
    return res

In [None]:
class SmoothCrossEntropy(nn.Module):
    def __init__(self, alpha=0.1):
        super(SmoothCrossEntropy, self).__init__()
        self.alpha = alpha

    def forward(self, logits, labels):
        num_classes = logits.shape[-1]
        alpha_div_k = self.alpha / num_classes
        target_probs = F.one_hot(labels, num_classes=num_classes).float() * \
            (1. - self.alpha) + alpha_div_k
        loss = -(target_probs * torch.log_softmax(logits, dim=-1)).sum(dim=-1)
        return loss.mean()


class AverageMeter(object):

    def __init__(self):
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count

In [None]:
def set_seed(config):
    random.seed(config.seed)
    np.random.seed(config.seed)
    torch.manual_seed(config.seed)
    torch.cuda.manual_seed_all(config.seed)

def get_cosine_schedule_with_warmup(optimizer,
                                    num_warmup_steps,
                                    num_training_steps,
                                    num_cycles=0.5,
                                    last_epoch=-1):
    def lr_lambda(current_step):
        if current_step < num_warmup_steps:
            return float(current_step) / float(max(1, num_warmup_steps))

        progress = float(current_step - num_warmup_steps) / \
            float(max(1, num_training_steps - num_warmup_steps))
        return max(0.0, 0.5 * (1.0 + math.cos(math.pi * float(num_cycles) * 2.0 * progress)))

    return LambdaLR(optimizer, lr_lambda, last_epoch)

def get_lr(optimizer):
    return optimizer.param_groups[0]['lr']

In [None]:
def train(config, train_loader, test_loader,
          model, criterion, optimizer, scaler):

    for epoch in range(config.start_epoch, config.epochs):

        batch_time = AverageMeter()
        data_time = AverageMeter()
        losses = AverageMeter()

        model.train()
        end = time.time()

        for step, (images, targets) in enumerate(train_loader):
            data_time.update(time.time() - end)

            batch_size = targets.shape[0]
            images = images.to(config.device)
            targets = targets.to(config.device)
            with amp.autocast(enabled=config.amp):
                model.zero_grad()
                outputs = model(images)
                loss = criterion(outputs, targets)

            scaler.scale(loss).backward()
            scaler.step(optimizer)
            scaler.update()

            losses.update(loss.item(), batch_size)
            batch_time.update(time.time() - end)

        message = 'Epoch : {0}/{1} Data : {2:.2f} Batch : {3:.2f} Loss : {4:.4f}'.format(
            epoch+1, config.epochs, data_time.avg, batch_time.avg, losses.avg
        )

        print(message)

        test_loss, top1, top5 = evaluate(config, test_loader, model, criterion)
        is_best = top1 > config.best_top1()

        if is_best:
            config.best_top1(top1)
            config.best_top5(top5)

        save_checkpoint(config, {
            'step': step + 1,
            'best_top1': config.best_top1(),
            'best_top5': config.best_top5(),
            'student_state_dict': model.state_dict(),
            'avg_state_dict': None,
            'student_optimizer': optimizer.state_dict(),
        }, is_best)

def evaluate(config, test_loader, model, criterion):
    batch_time = AverageMeter()
    data_time = AverageMeter()
    losses = AverageMeter()
    top1 = AverageMeter()
    top5 = AverageMeter()
    model.eval()

    with torch.no_grad():
        end = time.time()
        for step, (images, targets) in enumerate(test_loader):
            data_time.update(time.time() - end)
            batch_size = targets.shape[0]
            images = images.to(config.device)
            targets = targets.to(config.device)
            with amp.autocast(enabled=config.amp):
                outputs = model(images)
                loss = criterion(outputs, targets)

            acc1, acc5 = accuracy(outputs, targets, (1, 5))
            losses.update(loss.item(), batch_size)
            top1.update(acc1[0], batch_size)
            top5.update(acc5[0], batch_size)
            batch_time.update(time.time() - end)
            end = time.time()

        message = 'Data : {0:.2f} Batch : {1:.2f} Loss : {2:.4f} top1 : {3:.2f} top5 : {4:.2f}'.format(
            data_time.avg, batch_time.avg,
            losses.avg, top1.avg, top5.avg,
        )

        print(message)

        return losses.avg, top1.avg, top5.avg

In [None]:
set_seed(config)

model = build_wideresnet(config)
model.to(config.device)

state_dict = torch.load(config.save_path + '/' + config.name + '_best.pth.tar',
                        map_location=torch.device('cpu'))

model.load_state_dict(state_dict["student_state_dict"])
config.best_top1(state_dict["best_top1"])
config.best_top5(state_dict["best_top5"])

optimizer = optim.SGD(model.parameters(),
                      lr=config.lr,
                      momentum=config.momentum,
                      weight_decay=config.weight_decay)

optimizer.load_state_dict(state_dict["student_optimizer"])
config.change_optim(lr=optimizer.param_groups[0]['lr'],
                    m=optimizer.param_groups[0]['momentum'],
                    w=optimizer.param_groups[0]['weight_decay'])

config.set_start_epoch(4)

criterion = create_loss_fn(config)

scaler = amp.GradScaler(enabled=config.amp)

In [None]:
optimizer.param_groups[0]['lr']

In [None]:
train(config, train_loader, test_loader,
          model, criterion, optimizer, scaler)

## Train EfficientNetB0

In [None]:
from tensorflow.keras.layers.experimental.preprocessing import *

resize_and_rescale = tf.keras.Sequential([
  Resizing(224, 224),
  Rescaling(1 /255)
])

# Data augmentation to reduce overtraining
data_augmentation = tf.keras.Sequential([
     RandomFlip("horizontal", 
     input_shape=(224, 224,3)),
     RandomRotation(0.1),
     RandomZoom(0.1),
])

model = tf.keras.Sequential([
        #resize_and_rescale,
        data_augmentation,
        tf.keras.applications.EfficientNetB0(
            input_shape=(224, 224, 3),
            weights='imagenet',
            include_top=False,
            drop_connect_rate=0.5
        ),
        tf.keras.layers.GlobalAveragePooling2D(),
        tf.keras.layers.Dense(101, activation='softmax') 
])

In [None]:
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics=['accuracy']) 

history = model.fit(
    train_data, 
    validation_data=valid_data,
    epochs= 30,
)

model_yaml = model.to_yaml()
with open("tf_efficientNetB0_ft.yaml", "w") as yaml_file:
    yaml_file.write(model_yaml)
model.save_weights("tf_efficientNetB0_ft.h5")

## Train Vision Transformer

In [None]:
import torch
import torch.nn as nn
from functools import partial

from timm.models.vision_transformer import VisionTransformer, _cfg
from timm.models.registry import register_model
from timm.models.layers import trunc_normal_


class DistilledVisionTransformer(VisionTransformer):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.dist_token = nn.Parameter(torch.zeros(1, 1, self.embed_dim))
        num_patches = self.patch_embed.num_patches
        self.pos_embed = nn.Parameter(torch.zeros(1, num_patches + 2, self.embed_dim))
        self.head_dist = nn.Linear(self.embed_dim, self.num_classes) if self.num_classes > 0 else nn.Identity()

        trunc_normal_(self.dist_token, std=.02)
        trunc_normal_(self.pos_embed, std=.02)
        self.head_dist.apply(self._init_weights)

    def forward_features(self, x):
        B = x.shape[0]
        x = self.patch_embed(x)

        cls_tokens = self.cls_token.expand(B, -1, -1)  # stole cls_tokens impl from Phil Wang, thanks
        dist_token = self.dist_token.expand(B, -1, -1)
        x = torch.cat((cls_tokens, dist_token, x), dim=1)

        x = x + self.pos_embed
        x = self.pos_drop(x)

        for blk in self.blocks:
            x = blk(x)

        x = self.norm(x)
        return x[:, 0], x[:, 1]

    def forward(self, x):
        x, x_dist = self.forward_features(x)
        x = self.head(x)
        x_dist = self.head_dist(x_dist)
        if self.training:
            return x, x_dist
        else:
            # during inference, return the average of both classifier predictions
            return (x + x_dist) / 2



In [None]:
from timm.data import Mixup
from timm.models import create_model
from timm.loss import LabelSmoothingCrossEntropy, SoftTargetCrossEntropy
from timm.scheduler import create_scheduler
from timm.optim import create_optimizer
from timm.utils import NativeScaler, get_state_dict, ModelEma

model = create_model(
    'deit_base_patch16_224',
    pretrained=False,
    num_classes=101,
    drop_rate=0.0,
    drop_path_rate=0.1,
    drop_block_rate=None,
)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model.to(device)

criterion = LabelSmoothingCrossEntropy()

optimizer = optim.AdamW(model.parameters(), lr=0.001, betas=[0.9, 0.999])



In [None]:
import numpy as np
def train(n_epochs,trainloader,testloader, model, optimizer, criterion):
    """returns trained model"""
    # initialize tracker for minimum validation loss
    valid_loss_min = np.Inf 
    running_loss=0
    
  
    for epoch in range(n_epochs):
        
        model.train()

        for inputs, labels in tqdm(trainloader):
            
        # Move input and label tensors to the default device
            inputs, labels = inputs.cuda(), labels.cuda()
            optimizer.zero_grad()
            start = time.time()
            logps = model(inputs)
            loss = criterion(logps, labels)
            loss.backward()
            optimizer.step()
        
            running_loss += loss.item()
        
        
        model.eval()
        valid_loss=0
        accuracy=0
        with torch.no_grad():
            for inputs, labels in tqdm(testloader):
                inputs, labels = inputs.cuda(), labels.cuda()
                logps = model(inputs)
                batch_loss = criterion(logps, labels)
                valid_loss += batch_loss.item()
                    
                    # Calculate accuracy
                
                top_p, top_class = logps.topk(1, dim=1)
                equals = top_class == labels.view(*top_class.shape)
                accuracy += torch.mean(equals.type(torch.FloatTensor)).item()
           
        
            if valid_loss <= valid_loss_min:
                print("Validation loss decreased  Saving model")
                torch.save(model.state_dict(),'food_classifier_deit.pt')
                valid_loss_min=valid_loss
                
            
            print(f"Device = cuda; Time per batch: {(time.time() - start):.3f} seconds")       
            print(f"Epoch {epoch}/{n_epochs}.. "
                  f"Train loss: {running_loss/len(trainloader):.3f}.. "
                  f"Test loss: {valid_loss/len(testloader):.3f}.. "
                  f"Test accuracy: {accuracy/len(testloader):.3f}")
            running_loss = 0
            model.train()

In [None]:
train(30, train_loader, test_loader, model, optimizer, criterion)

## Train DanseNet201

In [None]:
model_ft = torchvision.models.densenet201(pretrained=True)
model_ft.classifier = nn.Sequential(nn.Linear(1920,1024),nn.LeakyReLU(),nn.Linear(1024,101))

# Use GPU if it's available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model_ft.to(device)

criterion = nn.CrossEntropyLoss()

optimizer = optim.Adam(model_ft.parameters(), lr=0.001, betas=[0.9, 0.999])

In [None]:
import numpy as np
def train(n_epochs,trainloader,testloader, model, optimizer, criterion):
    """returns trained model"""
    # initialize tracker for minimum validation loss
    valid_loss_min = np.Inf 
    running_loss=0
    
  
    for epoch in range(n_epochs):
        
        model.train()

        for inputs, labels in tqdm(trainloader):
            
        # Move input and label tensors to the default device
            inputs, labels = inputs.cuda(), labels.cuda()
            optimizer.zero_grad()
            start = time.time()
            logps = model(inputs)
            loss = criterion(logps, labels)
            loss.backward()
            optimizer.step()
        
            running_loss += loss.item()
        
        
        model.eval()
        valid_loss=0
        accuracy=0
        with torch.no_grad():
            for inputs, labels in tqdm(testloader):
                inputs, labels = inputs.cuda(), labels.cuda()
                logps = model(inputs)
                batch_loss = criterion(logps, labels)
                valid_loss += batch_loss.item()
                    
                    # Calculate accuracy
                
                top_p, top_class = logps.topk(1, dim=1)
                equals = top_class == labels.view(*top_class.shape)
                accuracy += torch.mean(equals.type(torch.FloatTensor)).item()
           
        
            if valid_loss <= valid_loss_min:
                print("Validation loss decreased  Saving model")
                torch.save(model.state_dict(),'food_classifier_deit.pt')
                valid_loss_min=valid_loss
                
            
            print(f"Device = cuda; Time per batch: {(time.time() - start):.3f} seconds")       
            print(f"Epoch {epoch}/{n_epochs}.. "
                  f"Train loss: {running_loss/len(trainloader):.3f}.. "
                  f"Test loss: {valid_loss/len(testloader):.3f}.. "
                  f"Test accuracy: {accuracy/len(testloader):.3f}")
            running_loss = 0
            model.train()

In [None]:
train(30, train_dl, val_dl, model_ft, optimizer, criterion)

## Train MobileNetV2

In [None]:
def set_parameter_requires_grad(model, feature_extracting):
    if feature_extracting:
        for param in model.parameters():
            param.requires_grad = False

model_ft = torchvision.models.mobilenet_v2(pretrained=True)
set_parameter_requires_grad(model_ft, True)
num_ftrs = model_ft.classifier[1].in_features
model_ft.classifier[1] = nn.Linear(num_ftrs,101)

def train(start_epoch, best_acc_, model, train_dl, val_dl, criterion, optimizer, num_epochs=25):

    val_acc_history = []

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = best_acc_

    for epoch in range(start_epoch, num_epochs):
        running_loss = 0.0
        running_corrects = 0

        model.train()
        for i,batch in enumerate(train_dl):
            optimizer.zero_grad()

            outputs = model(batch[0].cuda())
            loss = criterion(outputs, batch[1].cuda())
            _, preds = torch.max(outputs, 1)

            loss.backward()
            optimizer.step()

            running_loss += loss.item() * batch[0].size(0)
            running_corrects += torch.sum(preds == batch[1].cuda().data)
            
            if i%100 == 0:
                print('{}/{}'.format(i, len(train_dl)))

        epoch_loss = running_loss / len(train_dl.dataset)
        epoch_acc = running_corrects.double() / len(train_dl.dataset)

        print('Epoch {} Training Loss: {:.4f} Acc: {:.4f}'.format(epoch, epoch_loss, epoch_acc))

        with torch.no_grad():
            running_loss = 0.0
            running_corrects = 0.0

            model.eval()
            for i,batch in enumerate(val_dl):

                outputs = model(batch[0].cuda())
                loss = criterion(outputs, batch[1].cuda())
                _, preds = torch.max(outputs, 1)

                running_loss += loss.item() * batch[0].size(0)
                running_corrects += torch.sum(preds == batch[1].cuda().data)

            epoch_loss = running_loss / len(val_dl.dataset)
            epoch_acc = running_corrects.double() / len(val_dl.dataset)

            print('Evaluation Loss: {:.4f} Acc: {:.4f}'.format(epoch_loss, epoch_acc))

        if epoch_acc > best_acc:
            ckpt = {
                "model_dict": model.state_dict(),
                "best_acc": epoch_acc,
                "optim_dict": optimizer.state_dict(),
                "epoch": epoch,
            }
            save_path = '/content/drive/MyDrive/food_weights/WideResNetV2/epoch_{}.pt'.format(epoch+1)
            torch.save(ckpt, save_path)

            best_acc = epoch_acc
            best_model_wts = copy.deepcopy(model.state_dict())
            val_acc_history.append(epoch_acc)

    return best_model_wts, val_acc_history, best_acc

In [None]:
criterion = nn.CrossEntropyLoss()

params_to_update = model_ft.parameters()
print("Params to learn:")
feature_extract = True

if feature_extract:
    params_to_update = []
    for name,param in model_ft.named_parameters():
        if param.requires_grad == True:
            params_to_update.append(param)
            print("\t",name)
else:
    for name,param in model_ft.named_parameters():
        if param.requires_grad == True:
            print("\t",name)

# Observe that all parameters are being optimized
optimizer_ft = torch.optim.SGD(params_to_update, lr=0.001, momentum=0.9)

In [None]:
ckpt_ = '/content/drive/MyDrive/food_weights/MobileNet/epoch_9.pt'
ckpt = torch.load(ckpt_)
model_ft.load_state_dict(ckpt["model_dict"])
optimizer_ft.load_state_dict(ckpt["optim_dict"])
start_epoch = ckpt["epoch"] + 1
best_acc = ckpt["best_acc"]

In [None]:
t = time.time()
model_ft, hist, best_acc = train(0, 0, model_ft, train_dl, val_dl, criterion, optimizer_ft, num_epochs=30)
time_elapsed = time.time() - t
print('\n\n')
print(time_elapsed)

# Observation

Model | top1 accuracy | top5 accuracy | model parameters (M)
-------|---------------|------------- | ------------
WRS    | 88.72          | 97.92 | 23.40
DenseNet201 | 78.26      | 86.42 | 20.01
MobileNetV2 | 83.45      | 91.07 | 3.50
EfficientNetB0 | 79.34    | 85.47 | 4.13
Transformer | 94.61       | 98.56 | 85.87




Training accuracy can be improved by increasing number of epoch. all the models are trained under epoch 15. and the training time is huge and the discontinuty proble of colab. So I trained it with very small epoch and weight decay. 