<a href="https://colab.research.google.com/github/varun-bhaseen/Advance-Deep-Learning/blob/master/Assignment_1_Part_A_Multi_Instance_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment-1 Part-A (Torch based MNIST MIL)

[GIthub source code link](https://github.com/lsheng23/Practicum/blob/master/MIL_MNIST/end_to_end_mnist_MIL.ipynb)

In [None]:
# Importing Libraries

from torchvision.models.resnet import ResNet, BasicBlock
from torchvision.datasets import MNIST
from tqdm.autonotebook import tqdm
from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score
import inspect
import time
from torch import nn, optim
import torch
from torchvision.transforms import Compose, ToTensor, Normalize, Resize
from torch.utils.data import DataLoader

import tensorflow as tf
import os
import time
import datetime

import copy
import re
import yaml
import uuid
import warnings
from functools import partial, reduce
from random import shuffle
import random

import numpy as np
import pandas as pd

from sklearn import metrics as mtx
from sklearn import model_selection as ms
import torch
from torch import nn
from torch.nn import functional as F
import torch.optim as optim
from torch.utils.data.dataset import Dataset
from torch.utils.data import DataLoader
from torchvision.models import resnet

In [None]:
# Pretrain

def get_data_loaders(train_batch_size, val_batch_size):
    mnist = MNIST(download=True, train=True, root=".").train_data.float()

    # Transform are common image transformations and can be chained together using compose
    # Signature is as: torchvision.transforms.Compose ([ transforms ])
    # transforms (list of Transform objects) – list of transforms to compose.    
    # Here we are chaining together Resize, ToTensor, Normalize methods Under a single data transform variable
    
    data_transform = Compose([ Resize((224, 224)),ToTensor(), Normalize((mnist.mean()/255,), (mnist.std()/255,))])
    
    # Resize: Resize the input PIL Image to the given size.
    # torchvision.transforms.functional.resize(img, size, interpolation=2)

    """
    Parameters
    img (PIL Image) – Image to be resized.

    size (sequence or int) – Desired output size. If size is a sequence like (h, w), 
    the output size will be matched to this. If size is an int, the smaller edge of 
    the image will be matched to this number maintaining the aspect ratio. i.e, if height > width, 
    then image will be rescaled to (size X Height/width, size)

    interpolation (int, optional) – Desired interpolation. Default is PIL.Image.BILINEAR
    """


    # ToTensor Converts a PIL Image or numpy.ndarray to tensor
    
    """
    Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor 
    of shape (C x H x W) in the range [0.0, 1.0] if the PIL Image belongs to one of the modes 
    (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1) or if the numpy.ndarray has dtype = np.uint8

    In the other cases, tensors are returned without scaling.

    H is height
    W is width
    C is Channel
    """

    # Normalize method is used to normalize an image using stad deviation and mean
    # Signature is as: torchvision.transforms.functional.normalize(tensor, mean, std, inplace=False)

    """
    Parameters for Normalize: 
    tensor (Tensor) – Tensor image of size (C, H, W) to be normalized.
    mean (sequence) – Sequence of means for each channel.
    std (sequence) – Sequence of standard deviations for each channel.
    inplace (bool,optional) – Bool to make this operation inplace.
    """

    # DataLoader: At the heart of PyTorch data loading utility is the torch.utils.data.DataLoader class. 
    # It represents a Python iterable over a dataset
    
    """
    The signature for DataLoader is:

    DataLoader(dataset, batch_size=1, shuffle=False, sampler=None,
           batch_sampler=None, num_workers=0, collate_fn=None,
           pin_memory=False, drop_last=False, timeout=0,
           worker_init_fn=None)

    where, dataset can be: Map-style datasets or Iterable-style datasets
    dataset (Dataset) – dataset from which to load the data.
    
    batch_size (int, optional) – how many samples per batch to load (default: 1).
    
    shuffle (bool, optional) – set to True to have the data reshuffled at every epoch (default: False).
    
    sampler (Sampler or Iterable, optional) – defines the strategy to draw samples from the dataset. 
    Can be any Iterable with __len__ implemented. If specified, shuffle must not be specified.

    batch_sampler (Sampler or Iterable, optional) – like sampler, but returns a batch of indices at a time. 
    Mutually exclusive with batch_size, shuffle, sampler, and drop_last.

    num_workers (int, optional) – how many subprocesses to use for data loading. 0 means that the data will be 
    loaded in the main process. (default: 0)

    collate_fn (callable, optional) – merges a list of samples to form a mini-batch of Tensor(s). Used when 
    using batched loading from a map-style dataset.

    pin_memory (bool, optional) – If True, the data loader will copy Tensors into CUDA pinned memory before returning 
    them. If your data elements are a custom type, or your collate_fn returns a batch that is a custom type, see the example below.

    drop_last (bool, optional) – set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch 
    size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller. (default: False)

    timeout (numeric, optional) – if positive, the timeout value for collecting a batch from workers. Should always be 
    non-negative. (default: 0)

    worker_init_fn (callable, optional) – If not None, this will be called on each worker subprocess with the worker id 
    (an int in [0, num_workers - 1]) as input, after seeding and before data loading. (default: None)
    """
    
    train_loader = DataLoader(MNIST(download=True, root=".", transform=data_transform, train=True),
                              batch_size=train_batch_size, shuffle=True)
    
    val_loader = DataLoader(MNIST(download=False, root=".", transform=data_transform, train=False),
                            batch_size=val_batch_size, shuffle=False)
    return train_loader, val_loader

In [None]:
train_batch_size = 256
val_batch_size = 256

# Executing the data loader function created earlier.
# The function will load the data and will return the Transformed and Normalized data for training and validation

train_loader, valid_loader = get_data_loaders(train_batch_size, val_batch_size)

In [None]:
# Define Model


class MnistResNet(ResNet):

  """
  Make a class named MnistResNet which is a ResNet.
  Refer this tutorial to understand scripting in pytorch. Below is signature for Torchscript:
  https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html
  """

    def __init__(self):

      """
      The super function below helps to ineherit all the functions of Resnet class
      
      """
      super(MnistResNet, self).__init__(BasicBlock, [2, 2, 2, 2], num_classes=10)
      self.conv1 = torch.nn.Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
        
    def forward(self, x):
      return torch.softmax(super(MnistResNet, self).forward(x), dim=-1)

mnist_resnet = MnistResNet()
print ("mnist_resnet")

In [None]:
# Helper function

def calculate_metric(metric_fn, true_y, pred_y):

  """
  The function Calculate_metric is used for creating the confusion matrix.

  it uses a python module called inspect.

  inspect.getfullargspec(func)
  Get the names and default values of a Python function’s parameters. A named tuple is returned:
  
  FullArgSpec(args, varargs, varkw, defaults, kwonlyargs, kwonlydefaults, annotations)
  
  Here we are inspecting metric_fn argument and checking the values passed on in this list
  If it has average then if condition will run otherwise else condition will execute
  """

    if "average" in inspect.getfullargspec(metric_fn).args:
        return metric_fn(true_y, pred_y, average="macro")
    else:
        return metric_fn(true_y, pred_y)
    
def print_scores(p, r, f1, a, batch_size):
    for name, scores in zip(("precision", "recall", "F1", "accuracy"), (p, r, f1, a)):

      """
      Python string method rjust() returns the string right justified in a string of length width. 
      Padding is done using the specified fillchar (default is a space). The original string is 
      returned if width is less than len(s).
      
      Syntax is: str.rjust(width[, fillchar])
      
      width − This is the string length in total after padding.
      fillchar − This is the filler character, default is a space.
      """

        print(f"\t{name.rjust(14, ' ')}: {sum(scores)/batch_size:.4f}") #:.4f means floating type output until 4 decimal places

        # The output will be putting a 14 blank spaces followed by scores

In [None]:
# Training
# Using time.time module to record the start time of the code block below

start_ts = time.time()

# torch.devie: A torch.device is an object representing the device on which a torch.Tensor is or will be allocated.

"""
The torch.device contains a device type ('cpu' or 'cuda') and optional device ordinal for the device type. If the 
device ordinal is not present, this object will always represent the current device for the device type, even after 
torch.cuda.set_device() is called; e.g., a torch.Tensor constructed with device 'cuda' is equivalent to 'cuda:X' 
where X is the result of torch.cuda.current_device().

A torch.device can be constructed via a string or via a string and device ordinal.

via a string: torch.device('cuda:0');
torch.device('cpu') or
torch.device('cuda')  # current cuda device

via a string and device ordinal:
torch.device('cuda', 0);
 o/p: device(type='cuda', index=0)

torch.device('cpu', 0)
"""

# torch.cuda: This package adds support for CUDA tensor types, that implement the same function as CPU tensors,
# but they utilize GPUs for computation.
"""
It is lazily initialized, we are using is_available() to determine if system supports CUDA else compute will happen at CPU
"""
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")


# model:

"""
The torch.nn.Module class also has "to" and "add" cuda functions which put the entire 
network on a particular device. Unlike, Tensors calling to on the nn.Module the 
object is enough, and there's no need to assign the returned value from the to function.

https://medium.com/ai%C2%B3-theory-practice-business/use-gpu-in-your-pytorch-code-676a67faed09


"""

# Setting model with the neural network Mnistresnet to compute on device (either CPU or Cuda) 

model = MnistResNet().to(device)

# params you need to specify:
epochs = 5

# train_loader, val_loader = get_data_loaders(train_batch_size, val_batch_size)

"""
Setting the loss function:

https://pytorch.org/docs/stable/nn.html#loss-functions

Search for loss functions above which is relevant to the the problem domains.

Below is a quick handout:

* Regression Loss Functions
    * Mean Squared Error Loss
    * Mean Squared Logarithmic Error Loss
    * Mean Absolute Error Loss
* Binary Classification Loss Functions
    * Binary Cross-Entropy
    * Hinge Loss
    * Squared Hinge Loss
* Multi-Class Classification Loss Functions
    * Multi-Class Cross-Entropy Loss
    * Sparse Multiclass Cross-Entropy Loss
    * Kullback Leibler Divergence Loss
"""

loss_function = nn.CrossEntropyLoss()         # loss function, cross entropy works well for multi-class problems

# Created loss function using nn module
"""
What is an optimizer?
During the training process of a Neural Network, our aim is to try and minimize the loss function, by updating 
the values of the parameters (Weights) and make our predictions as accurate as possible. But how exactly do you 
do that? Then comes the question of how do you change the parameters of your model and by how much?

Optimizer tries to lower the loss function by updating the model parameters in response to the output of the loss function. 
In other words Optimizers are algorithms or methods used to change the attributes of your neural network such as weights and 
learning rate  in order to reduce the losses. Thereby helping to reach the Global Minima with the lowest loss and most accurate 
output.  the most important function of  the optimizer is to update the weights of the learning algorithm to reach the least 
cost function

"""
# optimizer
optimizer = optim.Adadelta(model.parameters()) # created optimizer using optim module


losses = []

# The batch size is 256 taken from second cell above

batches = len(train_loader)
val_batches = len(val_loader)

# loop for every epoch (training + evaluation)

"""
Epoch: epoch is a single pass through the full training set. You don’t just run through the training set once, 
In reality it can take thousands of epochs for a backpropagation algorithm to converge on a combination of weights with 
an acceptable level of accuracy

we have taken an epoch of 5 here
"""

for epoch in range(epochs):
    total_loss = 0

    # progress bar: creating the progress bar using tqdm with progress bar starting with "loss" as description and "total" 
    # The expected total number of iterations. If meaningless (None), only basic progress statistics are displayed (no ETA).
    # Here total number of iterations will be batches which is len(train_loader) as assigned above
    """
    We have enumerate iterable here. A lot of times when dealing with iterators, we also get a need to keep a count of iterations.  
    The enumerate() function takes a collection (e.g. a tuple) and returns it as an enumerate object
    The enumerate() function adds a counter as the key of the enumerate object.
    
    enumerate(iterable, start=0)

    Parameters:
    Iterable: any object that supports iteration
    Start: the index value from which the counter is to be started, by default it is 0 
    
    ex: x = ('apple', 'banana', 'cherry')
    y = enumerate(x)
    print (list(y))
    
    o/p : [(0, 'apple'), (1, 'banana'), (2, 'cherry')]

    ex2: # enumerate function in loops 
    l1 = ["eat","sleep","repeat"] 
        # changing index and printing separately 
    for count,ele in enumerate(l1): 
        print count,ele 
    
    o/p : 
    0 eat
    1 sleep
    2 repeat

    In Short what we get is a counting of iterable (0,1,2, ..) and corresponding value of iterable (like eat, sleep repeat etc.)
    """
    progress = tqdm(enumerate(train_loader), desc="Loss: ", total=batches)

    # ----------------- TRAINING  -------------------- 
    # set model to training

    """
    From model get the training function. Here the model is MNISTResnet which is made up of
    resnet class itself. Since the train function is not in mnist resnet due to super function and inheritence 
    the model instance will search train function in resnet class.
    """

    model.train()
    
    # passing index value i and corresponding data value from enumerate (progress) to i,data respectively
    for i, data in progress:
        X, y = data[0].to(device), data[1].to(device)
        
        # training step for single batch
        model.zero_grad() # to make sure that all the grads are 0 
        
        
        """
        model.zero_grad() and optimizer.zero_grad() are the same         
        IF all your model parameters are in that optimizer. 
        
        In PyTorch, we need to set the gradients to zero before starting to do backpropragation 
        because PyTorch accumulates the gradients on subsequent backward passes.when you start your 
        training loop, ideally you should zero out the gradients so that you do the parameter update 
        correctly. Else the gradient would point in some other direction than the intended direction 
        towards the minimum

        I found it is safer to call model.zero_grad() to make sure all grads are zero, 
        e.g. if you have two or more optimizers for one model.

        """
        
        outputs = model(X)                     # forward
        loss = loss_function(outputs, y)       # get loss
        
        loss.backward()                        # accumulates the gradient (by addition) for each parameter.
        
        optimizer.step()                       # performs a parameter update based on the current gradient 

        # getting training quality data
        current_loss = loss.item()
        total_loss += current_loss

        # updating progress bar
        progress.set_description("Loss: {:.4f}".format(total_loss/(i+1)))
        
    # releasing unceseccary memory in GPU
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
    
    # ----------------- VALIDATION  ----------------- 
    val_losses = 0
    precision, recall, f1, accuracy = [], [], [], []
    
    # set model to evaluating (testing)
    model.eval()
    with torch.no_grad():
        for i, data in enumerate(val_loader):
            X, y = data[0].to(device), data[1].to(device)

            outputs = model(X)                                    # this gets the prediction from the network

            val_losses += loss_function(outputs, y)

            predicted_classes = torch.max(outputs, 1)[1]          # get class from network's prediction
            
            # calculate P/R/F1/A metrics for batch
            for acc, metric in zip((precision, recall, f1, accuracy), 
                                   (precision_score, recall_score, f1_score, accuracy_score)):
                acc.append(
                    calculate_metric(metric, y.cpu(), predicted_classes.cpu())
                )
          
    print(f"Epoch {epoch+1}/{epochs}, training loss: {total_loss/batches}, validation loss: {val_losses/val_batches}")
    print_scores(precision, recall, f1, accuracy, val_batches)
    losses.append(total_loss/batches) # for plotting learning curve
print(f"Training time: {time.time()-start_ts}s")

In [None]:
# Save Model

torch.save(model.state_dict(), 'mnist_state.pt')

In [None]:
# Data Generation

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

In [None]:
x_train.shape

In [None]:
x_train = x_train[:30001]
y_train = y_train[:30001]
x_test = x_test[:9000]
y_test = y_test[:9000]

In [None]:
# Making sure that the values are float so that we can get decimal points after division
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

# Normalizing the RGB codes by dividing it to the max RGB value.
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print('Number of images in x_train', x_train.shape[0])
print('Number of images in x_test', x_test.shape[0])

In [None]:
# Create Tuple (index, label) for train and test

instance_index_label = [(i, y_train[i]) for i in range(x_train.shape[0])]
instance_index_label_test = [(i, y_test[i]) for i in range(x_test.shape[0])]

# find the index if label is 1
find_index = [instance_index_label[i][0] for i in range(len(instance_index_label)) if instance_index_label[i][1]==1]
# find the index if label is 1
find_index_test = [instance_index_label_test[i][0] for i in range(len(instance_index_label_test))
                   if instance_index_label_test[i][1]==1]

In [None]:
print('index:', instance_index_label[0][0])         #index
print('label:', instance_index_label[0][1])         #label

In [None]:
# Load Pretrained Model

import torch
from torchvision.models.resnet import ResNet, BasicBlock
class MnistResNet(ResNet):
    def __init__(self):
        super(MnistResNet, self).__init__(BasicBlock, [2, 2, 2, 2], num_classes=10)
        self.conv1 = torch.nn.Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
        
    def forward(self, x):
        return torch.softmax(super(MnistResNet, self).forward(x), dim=-1)

In [None]:
model = MnistResNet()
model.load_state_dict(torch.load('mnist_state.pt'))
body = nn.Sequential(*list(model.children()))
# extract the last layer
model = body[:9]
# the model we will use
model.eval()

In [None]:
# Get Features

train_batch_size = 1
val_batch_size = 1
train_loader, val_loader = get_data_loaders(train_batch_size, val_batch_size)
loss_function = nn.CrossEntropyLoss()

# optimizer
optimizer = optim.Adadelta(model.parameters())

losses = []
batches = len(train_loader)
val_batches = len(val_loader)

In [None]:
# Get features for train

# loop for every epoch (training + evaluation)
meta_table = dict()
feature_result = []

# progress bar
progress = tqdm(enumerate(train_loader), desc="Loss: ", total=batches)

model.eval()

for i, data in progress:
    if i==30001:
        break
    X, y = data[0], data[1]
    # training step for single batch
    model.zero_grad()
    outputs = model(X)
    feature_result.append(outputs.reshape(-1).tolist())
    meta_table[i] = outputs.reshape(-1).tolist()
    
feature_array = np.array(feature_result)
np.save('feature_array_full',feature_array )

In [None]:
# load
feature_array = np.load('feature_array.npy', allow_pickle=True)

NameError: ignored

In [None]:
# Get Features for test

# loop for every epoch (training + evaluation)
meta_t_table = dict()
feature_t_result = []

# progress bar
progress = tqdm(enumerate(val_loader), desc="Loss: ", total=batches)

model.eval()

for i, data in progress:
    if i==9000:
        break
    X, y = data[0], data[1]
    # training step for single batch
    model.zero_grad()
    outputs_t = model(X)
    feature_t_result.append(outputs_t.reshape(-1).tolist())
    meta_t_table[i] = outputs_t.reshape(-1).tolist()

feature_test_array = np.array(feature_t_result)
# save 
np.save('feature_test_array_full',feature_test_array )

In [None]:
#load
feature_test_array = np.load('feature_t_array.npy', allow_pickle=True)

In [None]:
# Generate Data

# generate data for train

from typing import List, Dict, Tuple
def data_generation(instance_index_label: List[Tuple]) -> List[Dict]:
    """
    bags: {key1: [ind1, ind2, ind3],
           key2: [ind1, ind2, ind3, ind4, ind5],
           ... }
    bag_lbls:
        {key1: 0,
         key2: 1,
         ... }
    """
    bag_size = np.random.randint(3,7,size=len(instance_index_label)//5)
    data_cp = copy.copy(instance_index_label)
    np.random.shuffle(data_cp)
    bags = {}
    bags_per_instance_labels = {}
    bags_labels = {}
    for bag_ind, size in enumerate(bag_size):
        bags[bag_ind] = []
        bags_per_instance_labels[bag_ind] = []
        try:
            for _ in range(size):
                inst_ind, lbl = data_cp.pop()
                bags[bag_ind].append(inst_ind)
                # simplfy, just use a temporary variable instead of bags_per_instance_labels
                bags_per_instance_labels[bag_ind].append(lbl)
            bags_labels[bag_ind] = bag_label_from_instance_labels(bags_per_instance_labels[bag_ind])
        except:
            break
    return bags, bags_labels

def bag_label_from_instance_labels(instance_labels):
    return int(any(((x==1) for x in instance_labels)))

In [None]:
bag_indices, bag_labels = data_generation(instance_index_label)
bag_features = {kk: torch.Tensor(feature_array[inds]) for kk, inds in bag_indices.items()}

In [None]:
# save
import pickle
pickle.dump(bag_indices, open( "bag_indices", "wb" ) )
pickle.dump(bag_labels, open( "bag_labels", "wb" ) )
pickle.dump(bag_features, open( "bag_features", "wb" ) )

In [None]:
import pickle
bag_indices = pickle.load( open( "bag_indices", "rb" ) )
bag_labels = pickle.load( open( "bag_labels", "rb" ) )
bag_features = pickle.load( open( "bag_features", "rb" ) )

In [None]:
# generate data for test


bag_t_indices, bag_t_labels = data_generation(instance_index_label_test)

bag_t_features = {kk: torch.Tensor(feature_test_array[inds]) for kk, inds in bag_t_indices.items()}

In [None]:
pickle.dump(bag_t_indices, open( "bag_t_indices", "wb" ) )
pickle.dump(bag_t_labels, open( "bag_t_labels", "wb" ) )
pickle.dump(bag_t_features, open( "bag_t_features", "wb" ) )

In [None]:
bag_t_indices = pickle.load( open( "bag_t_indices", "rb" ) )
bag_t_labels = pickle.load( open( "bag_t_labels", "rb" ) )
bag_t_features = pickle.load( open( "bag_t_features", "rb" ) )

In [None]:
# MIL

# Prepare data for model

from torch.utils.data import Dataset
class Transform_data(Dataset):
    """
    We want to 1. pad tensor 2. transform the data to the size that fits in the input size.
    
    """

    def __init__(self, data, transform=None):
        self.transform = transform
        self.data = data
        
    def __getitem__(self, index):
        tensor = self.data[index][0]
        if self.transform is not None:
            tensor = self.transform(tensor)
        return (tensor, self.data[index][1])

    def __len__(self):
        return len(self.data)

In [None]:
train_data = [(bag_features[i],bag_labels[i]) for i in range(len(bag_features))]

In [None]:
bag_features[0]

In [None]:
def pad_tensor(data:list, max_number_instance) -> list:
    """
    Since our bag has different sizes, we need to pad each tensor to have the same shape (max: 7).
    We will look through each one instance and look at the shape of the tensor, and then we will pad 7-n 
    to the existing tensor where n is the number of instances in the bag.
    The function will return a padded data set."""
    new_data = []
    for bag_index in range(len(data)):
        tensor_size = len(data[bag_index][0])
        pad_size = max_number_instance - tensor_size
        p2d = (0,0, 0, pad_size)
        padded = nn.functional.pad(data[bag_index][0], p2d, 'constant', 0)
        new_data.append((padded, data[bag_index][1]))
    return new_data

In [None]:
max_number_instance = 7
padded_train = pad_tensor(train_data, max_number_instance)

In [None]:
test_data = [(bag_t_features[i],bag_t_labels[i]) for i in range(len(bag_t_features))]
padded_test = pad_tensor(test_data, max_number_instance)

In [None]:
def get_data_loaders(train_data, test_data, train_batch_size, val_batch_size):
    train_loader = DataLoader(train_data, batch_size=train_batch_size, shuffle=True)
    val_loader = DataLoader(test_data, batch_size=val_batch_size, shuffle=False)
    return train_loader, val_loader

In [None]:
train_loader,valid_loader = get_data_loaders(padded_train, padded_test, 1, 1)

In [None]:
train_batch_size = 1
val_batch_size = 1

In [None]:
# Define Model

class linear(torch.nn.Module):

    def __init__(self, n=7*512, n_out=1, dropout=0.2):
        super(linear, self).__init__()
        self.linear1 = torch.nn.Linear(n, n_out)
        
    def forward(self, x):
        z = self.linear1(x)
        y_pred = torch.sigmoid(z)
        return y_pred

In [None]:
# NN Model

class NN(torch.nn.Module):

    def __init__(self, n=7*512, n_mid = 7168, n_out=1, dropout=0.2):
        super(NN, self).__init__()
        self.linear1 = torch.nn.Linear(n, n_mid)
        self.linear2 = torch.nn.Linear(n_mid, n_out)
        self.dropout = torch.nn.Dropout(dropout)
        self.non_linearity = torch.nn.LeakyReLU()
        
    def forward(self, x):
        z = self.linear1(x)
        z = self.non_linearity(z)
        z = self.dropout(z)
        z = self.linear2(z)
        y_pred = torch.sigmoid(z)
        return y_pred

In [None]:
# MIL_NN Model

class NoisyAnd(torch.nn.Module):
    def __init__(self, a=10, dims=[0]):
        super(NoisyAnd, self).__init__()
        self.a = a
        self.b = torch.nn.Parameter(torch.tensor(0.01))
        self.dims =dims
        self.sigmoid = nn.Sigmoid()
    def forward(self, x):
        mean = torch.mean(x, self.dims, False)
        res = (self.sigmoid(self.a * (mean - self.b)) - self.sigmoid(-self.a * self.b)) / (
              self.sigmoid(self.a * (1 - self.b)) - self.sigmoid(-self.a * self.b))
        return res
    

class MIL_NN(torch.nn.Module):

    def __init__(self, n=7*512,  n_mid=7168, n_out=1, 
                 n_inst=None, dropout=0.1,
                 noisy_a=4,
                 agg = NoisyAnd(a=4, dims=[0]),
                ):
        super(MIL_NN, self).__init__()
        if agg is None:
            agg = NoisyAnd(a=noisy_a, dims=[0])
        if n_inst is None:
            self.mdl_instance = agg
            n_inst = n
        else:
            self.mdl_instance = nn.Sequential(
                            nn.Linear(n, n_inst),
                            nn.LeakyReLU(),
                            agg,
                            )
        if n_mid == 0:
            self.mdl_bag = LogisticRegression(n_inst, n_out)
        else:
            self.mdl_bag = NN(n_inst, n_mid, n_out, dropout=dropout)
        
    def forward(self, bag_feature):

        y_pred = self.mdl_bag(bag_feature)
        return y_pred

In [None]:
# Helper Function

def calculate_metric(metric_fn, true_y, pred_y):
    # multi class problems need to have averaging method
    if "average" in inspect.getfullargspec(metric_fn).args:
        return metric_fn(true_y, pred_y, average="macro")
    else:
        return metric_fn(true_y, pred_y)
    
def print_scores(p, r, f1, a, batch_size):
    # just an utility printing function
    for name, scores in zip(("precision", "recall", "F1", "accuracy"), (p, r, f1, a)):
        print(f"\t{name.rjust(14, ' ')}: {sum(scores)/batch_size:.4f}") #:.4f means float output upto 4 decimal places

In [None]:
# TRAIN and TEST

import numpy as np
start_ts = time.time()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

lr0 = 1e-4

# model:
model = MIL_NN().to(device)

# params you need to specify:
epochs = 10
train_loader, val_loader = get_data_loaders(padded_train, padded_test, 1, 1)
loss_function = torch.nn.BCELoss(reduction='mean')      # loss function: BCELoss works well for binary class problems


#optimizer
optimizer = optim.SGD(model.parameters(), lr=lr0, momentum=0.9)

losses = []
batches = len(train_loader)
val_batches = len(val_loader)

# loop for every epoch (training + evaluation)
for epoch in range(epochs):
    total_loss = 0

    # progress bar
    progress = tqdm(enumerate(train_loader), desc="Loss: ", total=batches)

    # ----------------- TRAINING  -------------------- 
    # set model to training
    model.train()
    for i, data in progress:
        X, y = data[0].to(device), data[1].to(device)
        X = X.reshape([1,7*512])
        y = y.type(torch.cuda.FloatTensor)
        # training step for single batch
        model.zero_grad() # to make sure that all the grads are 0 
        """
        model.zero_grad() and optimizer.zero_grad() are the same 
        IF all your model parameters are in that optimizer. 
        I found it is safer to call model.zero_grad() to make sure all grads are zero, 
        e.g. if you have two or more optimizers for one model.

        """
        outputs = model(X)                             # forward
        loss = loss_function(outputs, y)               # get loss
        loss.backward()                                # accumulates the gradient (by addition) for each parameter.
        optimizer.step()                               # performs a parameter update based on the current gradient 

        # getting training quality data
        current_loss = loss.item()
        total_loss += current_loss

        # updating progress bar
        progress.set_description("Loss: {:.4f}".format(total_loss/(i+1)))
        
    # releasing unceseccary memory in GPU
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
    
    # ----------------- VALIDATION  ----------------- 
    val_losses = 0
    precision, recall, f1, accuracy = [], [], [], []
    
    # set model to evaluating (testing)
    model.eval()
    with torch.no_grad():
        for i, data in enumerate(val_loader):
            X, y = data[0].to(device), data[1].to(device)
            X = X.reshape([1,7*512])
            y = y.type(torch.cuda.FloatTensor)
            outputs = model(X)                         # this get's the prediction from the network
            prediced_classes =outputs.detach().round()
            val_losses += loss_function(outputs, y)
            
            # calculate P/R/F1/A metrics for batch
            for acc, metric in zip((precision, recall, f1, accuracy), 
                                   (precision_score, recall_score, f1_score, accuracy_score)):
                acc.append(
                    calculate_metric(metric, y.cpu(), prediced_classes.cpu())
                )
          
    print(f"Epoch {epoch+1}/{epochs}, training loss: {total_loss/batches}, validation loss: {val_losses/val_batches}")
    print_scores(precision, recall, f1, accuracy, val_batches)
    losses.append(total_loss/batches)                  # for plotting learning curve
print(f"Training time: {time.time()-start_ts}s")

# Assignment-1 Part-A (Tensorflow based MIL)

## A) Multi Instance Learning:

### A.1.) Definition and Understanding:

Consider this you have an album with many images you have clicked and you want to determine how many of those images have cats. The Question is how to determine this.

One approach could be to create a CNN model and train it on multiple cats images and pass your images to identify which images have cats. This approach is good provided the image size is not too large and fits within the memory.

But if the image exceeds the memory size then what can we do. Perhaps increase more memory (This is inefficient as there will always be scenarios with huge data size which is much bigger than the memory). So how solve this problem.

One thing we can do is that we can split multiple images into smaller chunks and determine if any one of them has a cat in it. If even one small piece has a cat (or cat like structure) the image will be classified as positive (image with a cat) else it will be negative.

This approach of weakly identifying the labels is called as Multi Instance Learning where a single instance is divided into multiple instance and then classification will be carried on them.

### A.2.) Approach:

I will be taking a dataset called COCo (Common Objects Context) to identify Just one object from every image. Then we will break down each image into multiple image-slice. So in short each image **I** *(X * Y pixels)* will be broken down into smaller chunks called image size denoted by **mi**. Where each image-slice will be a subset of primary image *(xi * yi pixerls from X * Y pixels)*. 


The relationship between image ***I*** and image-slice ***mi*** can be defined as below:



---
I = n * mi, *Where n is total number of image-slice which when combined gives original zoomed out image.*


---

If in any of the image-slice we find the target object the Image **I** will be classified as positive. We will use a pre-trained VGG16 to identify the target object and determine its existence



In [None]:
# Connecting Google drive to Colab

# Import PyDrive and associated libraries.
# This only needs to be done once per notebook.
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate and create the PyDrive client.
# This only needs to be done once per notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)


### A.3.) Dataset:

The dataset that we have selected is called as COCO dataset (Common Objects COntext Dataset). COCO is a large-scale object detection, segmentation, and captioning dataset. 

The dataset defines 91 classes but data uses 80 classes only. 

We will be using just ***one class in our problem statement***. And check whether that object is present in an image or not while ignoring the other 90 classes (or 79 data classes)

In [None]:
!mkdir -p data/coco

In [None]:
!cd data/coco

In [None]:
!wget http://images.cocodataset.org/zips/train2017.zip

--2020-08-31 05:18:40--  http://images.cocodataset.org/zips/train2017.zip
Resolving images.cocodataset.org (images.cocodataset.org)... 52.217.88.84
Connecting to images.cocodataset.org (images.cocodataset.org)|52.217.88.84|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19336861798 (18G) [application/zip]
Saving to: ‘train2017.zip’


2020-08-31 05:37:34 (16.3 MB/s) - ‘train2017.zip’ saved [19336861798/19336861798]



In [None]:
!wget http://images.cocodataset.org/zips/val2017.zip

--2020-08-31 05:37:34--  http://images.cocodataset.org/zips/val2017.zip
Resolving images.cocodataset.org (images.cocodataset.org)... 52.216.93.3
Connecting to images.cocodataset.org (images.cocodataset.org)|52.216.93.3|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 815585330 (778M) [application/zip]
Saving to: ‘val2017.zip’


2020-08-31 05:38:23 (16.1 MB/s) - ‘val2017.zip’ saved [815585330/815585330]



In [None]:
!wget http://images.cocodataset.org/zips/test2017.zip

--2020-08-31 05:38:23--  http://images.cocodataset.org/zips/test2017.zip
Resolving images.cocodataset.org (images.cocodataset.org)... 52.216.242.228
Connecting to images.cocodataset.org (images.cocodataset.org)|52.216.242.228|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6646970404 (6.2G) [application/zip]
Saving to: ‘test2017.zip’


2020-08-31 05:44:47 (16.5 MB/s) - ‘test2017.zip’ saved [6646970404/6646970404]



In [None]:
!unzip "/content/data/coco/train2017.zip"

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
 extracting: train2017/000000381931.jpg  
 extracting: train2017/000000569592.jpg  
 extracting: train2017/000000229396.jpg  
 extracting: train2017/000000488990.jpg  
 extracting: train2017/000000348684.jpg  
 extracting: train2017/000000234031.jpg  
 extracting: train2017/000000563584.jpg  
 extracting: train2017/000000276069.jpg  
 extracting: train2017/000000401194.jpg  
 extracting: train2017/000000502089.jpg  
 extracting: train2017/000000192183.jpg  
 extracting: train2017/000000425127.jpg  
 extracting: train2017/000000126766.jpg  
 extracting: train2017/000000324161.jpg  
 extracting: train2017/000000452746.jpg  
 extracting: train2017/000000423782.jpg  
 extracting: train2017/000000546343.jpg  
 extracting: train2017/000000249290.jpg  
 extracting: train2017/000000025529.jpg  
 extracting: train2017/000000316928.jpg  
 extracting: train2017/000000337866.jpg  
 extracting: train2017/000000547768.jpg  
 extracting

In [None]:
!unzip /content/data/coco/test2017.zip

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
 extracting: test2017/000000145034.jpg  
 extracting: test2017/000000234833.jpg  
 extracting: test2017/000000025927.jpg  
 extracting: test2017/000000393116.jpg  
 extracting: test2017/000000536478.jpg  
 extracting: test2017/000000365460.jpg  
 extracting: test2017/000000089887.jpg  
 extracting: test2017/000000274928.jpg  
 extracting: test2017/000000310028.jpg  
 extracting: test2017/000000229793.jpg  
 extracting: test2017/000000394339.jpg  
 extracting: test2017/000000564538.jpg  
 extracting: test2017/000000440585.jpg  
 extracting: test2017/000000541512.jpg  
 extracting: test2017/000000395525.jpg  
 extracting: test2017/000000235435.jpg  
 extracting: test2017/000000069964.jpg  
 extracting: test2017/000000131962.jpg  
 extracting: test2017/000000281392.jpg  
 extracting: test2017/000000534893.jpg  
 extracting: test2017/000000128676.jpg  
 extracting: test2017/000000187639.jpg  
 extracting: test2017/00000015309

In [None]:
!unzip /content/data/coco/val2017.zip

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
 extracting: val2017/000000231527.jpg  
 extracting: val2017/000000578922.jpg  
 extracting: val2017/000000062808.jpg  
 extracting: val2017/000000119038.jpg  
 extracting: val2017/000000114871.jpg  
 extracting: val2017/000000463918.jpg  
 extracting: val2017/000000365745.jpg  
 extracting: val2017/000000320425.jpg  
 extracting: val2017/000000481404.jpg  
 extracting: val2017/000000314294.jpg  
 extracting: val2017/000000335328.jpg  
 extracting: val2017/000000513688.jpg  
 extracting: val2017/000000158548.jpg  
 extracting: val2017/000000132116.jpg  
 extracting: val2017/000000415238.jpg  
 extracting: val2017/000000321333.jpg  
 extracting: val2017/000000081738.jpg  
 extracting: val2017/000000577584.jpg  
 extracting: val2017/000000346905.jpg  
 extracting: val2017/000000433980.jpg  
 extracting: val2017/000000228144.jpg  
 extracting: val2017/000000041872.jpg  
 extracting: val2017/000000117492.jpg  
 extracting: va

In [None]:
!rm /content/data/coco/train2017.zip

In [None]:
!rm /content/data/coco/test2017.zip

In [None]:
!rm /content/data/coco/val2017.zip

### A.4.) State of art Model (VGG 16):

Downloading the state of art model VGG16 and its weights-parameters this will be later used for transfer learning and object detection

### A.5.) Data Preprocessing and Visualization

###A.6.) Slicing the Image

###A.7.) Model Training and Validation

###A.8.) Tensorboard and Tuning

###A.9.) Conclusion on Multi Instance Learning