# SeedTag Coding Test

## Introduction 
For this coding test, we are presented with 3 tasks which essentially consist of:
- **Task 1:** Image classification
- **Task 2:** Transfer learning 
- **Task 3:** Unsupervised Learning / Image clustering


# Task 1

In this task, we are provided with the Street View House Numbers (SVHN) dataset. The SVHN dataset, consists of a set of 32x32 RGB images taken from street house numbers, from different angles, light exposures, and with different fonts. Each image is accompanied by a numerical label ranging from 1 to 10, whereby 10 corresponds to the number 0. In this task, the dataset is divided in a training and test set consisting of ~73k and ~6.5k samples respectively. The purpose of this task is to create a model capable of maximizing classification accuracies. 


Firstly, we import the necessary libraries that we will use accross the notebook.

In [None]:
import torchvision
import torch
import os 
import json

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.pyplot as plt
import torchvision.transforms.functional as F
import scipy.io as sio
import numpy as np
import torch.nn as nn
from pytorch_lightning import loggers as pl_loggers
import pytorch_lightning as pl
import torch.nn.utils.prune as prune
import torchvision.transforms as transforms
import scipy.io as sio

from torch.utils.data import Dataset
from PIL import Image
from time import time
from  torch.utils.data import DataLoader
import torchmetrics
from sklearn.metrics import accuracy_score


Secondly, we leverage the dataset loading function, and the Dataset class provided in the example.

In [1]:
def load_data(root_dir, split):
    """load_data Load images from the dataset

    Args:
        root_dir (string): root directory 
        split (string): type of split based on the task  

    Returns:
        tuple: set of images and lables
    """
    
    filename = os.path.join(root_dir,'test_32x32.mat')
    if(split.startswith('train') or split.startswith('unlabelled')):
        filename = os.path.join(root_dir,'train_32x32.mat') 
    elif(split.startswith('test')):
        filename = os.path.join(root_dir,'test_32x32.mat')
    
    # Load matrix
    loaded_mat = sio.loadmat(filename)
    
    # Parse images and normalize
    imgs = (loaded_mat['X']/255).astype(np.float32)
    
    # Parse labels, convert to int and create vector
    labels = loaded_mat['y'].astype(np.int64).squeeze()
    
    
    if(split=='train_29_task2'):
        imgs_idx_01 =  np.logical_or(labels==10,labels==1)
        imgs_idx_29 = np.where(np.logical_not(imgs_idx_01))
        imgs = imgs[:,:,:,imgs_idx_29]
        labels = labels[imgs_idx_29]
    elif(split=='test_01_task2' or split=='train_01_task2'):
        imgs_idx_01 =  np.where(np.logical_or(labels==10,labels==1))[0]
        if(split=='train_01_task2'):
            imgs_idx_01 = imgs_idx_01[0:200]
        else:
            imgs_idx_01 = imgs_idx_01[200::]
        imgs = imgs[:,:,:,imgs_idx_01]
        labels = labels[imgs_idx_01]
    if(split=='test_task3'):
        N = 50
        imgs = imgs[:,:,:,0:N]
        labels = labels[0:N]
    print('Loaded SVHN split: {split}'.format(split=split))
    print('-------------------------------------')
    print('Images Size: ' , imgs.shape[0:-1])
    print('Split Number of Images:', imgs.shape[-1])
    print('Split Labels Array Size:', labels.shape)
    print('Possible Labels: ', np.unique(labels))
    return imgs,labels

class SVHNDataset(Dataset):
    """SVHNDataset SVHN Dataset class to parse images and targets

    Args:
        Dataset (Dataset): None
    """

    def __init__(self, 
                 root_dir, 
                 split, 
                 transform=None):
        self.images, self.labels = load_data(root_dir, split)
        self.transform = transform
        
    def __len__(self):
        return len(self.labels)

    def __getitem__(self, index):
        img, target = self.images[:,:,:,index], int(self.labels[index])
        if self.transform:
            img = self.transform(img)
        return img, target-1 # target -1 assuming that there are no 0s

NameError: name 'Dataset' is not defined