<a href="https://colab.research.google.com/github/zubejda/Advanced_DL/blob/main/Few_Shot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![alternatvie text](https://www.doc.zuv.fau.de//M/FAU-Logo/01_FAU_Kernmarke/Web/FAU_Kernmarke_Q_RGB_blue.svg)


# Assignment 5: Few-Shot Learning

In the lecture you have learned optimization-based meta-learning techniques with application to few-shot image classification and segmentation. In this notebook, you'll be implementing First-order MAML (FOMAML) and Reptile algorithms for few-shot image classification task. You'll use CIFAR-FS dataset throughout this exercise.

Note: You are required to install [**torchmeta**](https://github.com/tristandeleu/pytorch-meta) to assist you in your assignment. Also, note that this library support PyTorch up to version 1.10

## 1) Task Generator

In this task, you'll be required to implement a task generator for few-shot classification using torchmeta. Subsquently, the generator will be used as argument in the Dataloader for training and testing with FOMAML and Reptile. Recall a few-shot classification task is formulated as *N*-way, *K*-shot problem, where the *NK* data samples form the support set. Set *N* = 5 and *K*=5. Also, set the size of the query set to 15. The below figure visulaizes a few-shot classification task.

<img src="task.png" width="250" height="300"/>

Omniglot dataset contains handwritten characters from different alphabets, visualize some of the dataset samples below. You'll use the train set in meta-training and the test set for meta-testing

In [1]:
!pip install learn2learn

Collecting learn2learn
  Downloading learn2learn-0.2.0.tar.gz (7.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.0/7.0 MB[0m [31m24.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting gsutil (from learn2learn)
  Downloading gsutil-5.33.tar.gz (3.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m56.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting qpth>=0.0.15 (from learn2learn)
  Downloading qpth-0.0.18.tar.gz (16 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting argcomplete>=3.5.1 (from gsutil->learn2learn)
  Downloading argcomplete-3.5.2-py3-none-any.whl.metadata (16 kB)
Collecting crcmod>=1.7 (from gsutil->learn2learn)
  Downloading crcmod-1.7.tar.gz (89 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m89.7/89.7 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metada

In [1]:
import torch
import torchvision
from torchmeta.datasets.helpers import omniglot,cifar_fs
from torchmeta.utils.data import BatchMetaDataLoader





ModuleNotFoundError: No module named 'torchmeta'

In [None]:
## Visualize data

## 2) FOMAML
FOMAML is less computationally expensive than MAML as it does not require second-order derivative computation. In this task you need to implement FOMAML algorithm, you may use torchmeta for assistance. Set the number of tasks to 16 and use the following CNN model



In [None]:
import torch.nn as nn
from torchmeta.modules import (MetaModule, MetaSequential, MetaConv2d,
                               MetaBatchNorm2d, MetaLinear)


def conv3x3(in_channels, out_channels, **kwargs):
    return MetaSequential(
        MetaConv2d(in_channels, out_channels, kernel_size=3, padding=1, **kwargs),
        MetaBatchNorm2d(out_channels, momentum=1., track_running_stats=False),
        nn.ReLU(),
        nn.MaxPool2d(2)
    )

class ConvolutionalNeuralNetwork(MetaModule):
    def __init__(self, in_channels, num_classes, hidden_size=64):
        super(ConvolutionalNeuralNetwork, self).__init__()
        self.in_channels = in_channels
        self.hidden_size = hidden_size

        self.features = MetaSequential(
            conv3x3(in_channels, hidden_size),
            conv3x3(hidden_size, hidden_size),
            conv3x3(hidden_size, hidden_size),
            conv3x3(hidden_size, hidden_size)
        )

        self.classifier = MetaLinear(64, num_classes)

    def forward(self, inputs,params=None): # load the params of the model optimized on a single task
        features = self.features(inputs, params=self.get_subdict(params, 'features'))
        features = features.view((features.size(0), -1))
        logits = self.classifier(features, params=self.get_subdict(params, 'classifier'))
        return logits


### A) Meta-Training

The meta training consists of two parts. An inner loop which trains the base model using a the support set of a task for a number of epochs and tests it on the query set. In addition to an outer loop which updates the meta-model with the accumalated losses of the query sets based on the number of generated tasks. For simplicity, set the inner epochs to 1. Furthermore, set number of tasks to 16 and outer epochs to 100. Moreover, use an Adam optimizer with learning rate of 0.001 to update the meta-model and use the function *gradient_update_parameters* in torchmeta to update the base model parameters. Finally, during meta-training you should use the train set for training the base model and the test set for calculating the meta-loss and updating the meta-model.


**Output**: Plot the accuracy and loss of the meta-model against the number of outer epochs.

In [None]:
import os
import torch
import torch.nn.functional as F
from tqdm import tqdm
import numpy as np

from torchmeta.datasets.helpers import cifar_fs,omniglot
from torchmeta.utils.data import BatchMetaDataLoader
from torchmeta.utils.gradient_based import gradient_update_parameters
from copy import deepcopy


torch.manual_seed(0)
np.random.seed(0)


def train_maml(model):
    pass


### B) Meta-testing

The testing protocol is to generate a random set of tasks using test set of the dataset. Then we iterate over each task, fine-tune the meta-trained model using the support set and test it on the query set. The process is repeated for each task, eventually the average results should be reported. We will use the same 5-way, 5-shot problem and also set the query set size to 15. Set the task size to 16 and repeat for 100 iterations. You can use *gradient_update_parameters* function to update model parameters also in meta-testing.

**Note**: Don't forgot to initilaze the model between each task with the meta-trained parameters to avoid accumalting gradients from previous iterations.

**Output** Average test accuracy on Omniglot should be above 50%.

In [None]:
def test_maml(model):
    pass

In [None]:
if __name__ == '__main__':
    model = ConvolutionalNeuralNetwork(1, num_classes=5)

    train_maml(model)
    test_maml(model)

## 3) Reptile

Reptile uses a different update rule. Hence, you can still rely on your inner and outer loops implemented before in the previous task, however, updating the meta-parameters should be performed by accumlating the difference between the meta-model parameters and the parameters learned on each individual task, as learned in the lecture. In this case we need to modify only the update part. Note that reptile does not require a query set to update the model parameters, unlike MAML.

Update rule $$ \theta \leftarrow \theta + \frac{\beta}{n} \sum_{i=1}^{n}{(\theta_{i}^\prime - \theta)}$$

Use the same hyperparameters and optimizers as listed in FOMAML, moreover, set Beta to 1e-1.

**Output** Use the same meta-testing process and report test accuracy on test set of omniglot.

In [None]:
def train_reptile(model):
   pass


After training with reptile, you should perform the meta-testing process using the test set of omniglot. Similarly, set the task size to 16 and repeat for 100 iterations, fine-tune the meta-trained model using the support set and test on the query set. For fine-tuning use an Adam optimizer with learning rate 1e-3. You should notice an average test accuracy above 50%

In [None]:
def test_reptile(model):
    pass

In [None]:
if __name__ == '__main__':
    model = ConvolutionalNeuralNetwork(1, num_classes=5)

    train_reptile(model)

    test_reptile(model)


Try to repeat the same experiments with 5-way, 1-shot classification task at test time. How do the results differ ?