### Description
This notebook describes how to load CIFAR10 or CIFAR100 datasets using the trailmet framework.

#### Installation of trailmet
Note that if the source code version of trailmet is used, then the path to the root directory needs to be added to the system path. 

In [1]:
# adding trailmet to the system path
import sys
sys.path.append("/Users/deepak.gupta/eff-dl/trailmet/")

In [2]:
# load the required packages
import torch
import matplotlib.pyplot as plt
from torchvision import transforms
from trailmet.datasets.classification import DatasetFactory

#### Specify the root data directory
This directory will be used to download (and process if needed) the required data. Any folders related to train/test etc. will be created at this path. Please adapt this path based on your system's address.

In [3]:
root_dir = "/Users/deepak.gupta/eff-dl/data_dir"

#### Loading CIFAR10 Dataset

##### Specify the transforms to be applied on the inputs and labels of the train, val and test splits
All the transforms to be applied on the different splits of the data can be specified using transforms function from the torchvision library.

In [4]:
train_transform = transforms.Compose(
[transforms.ToTensor()])

val_transform = transforms.Compose(
[transforms.ToTensor()])

test_transform = transforms.Compose(
[transforms.ToTensor()])

input_transforms = {
    'train': train_transform, 
    'val': val_transform, 
    'test': test_transform}

target_transforms = {
    'train': None, 
    'val': None, 
    'test': None}

##### Creating the CIFAR10 dataset with the specified control parameters
 - val_fraction defines the fraction of the training data from the original CIFAR10 to be seperated as validation set. Note that the test is preserved to be the same as the standard one.

In [6]:
cifar_dataset = DatasetFactory.create_dataset(name = 'CIFAR10', 
                                        root = root_dir,
                                        split_types = ['train', 'val', 'test'],
                                        val_fraction = 1.1,
                                        transform = input_transforms,
                                        target_transform = target_transforms
                                        )

TypeError: unsupported operand type(s) for &: 'int' and 'float'

In [6]:
print(cifar_dataset)

{'train_dataset': Dataset CIFAR10
    Number of datapoints: 50000
    Root location: /Users/deepak.gupta/eff-dl/data_dir
    Split: Train
    StandardTransform
Transform: Compose(
               ToTensor()
           ), 'val_dataset': Dataset CIFAR10
    Number of datapoints: 50000
    Root location: /Users/deepak.gupta/eff-dl/data_dir
    Split: Train
    StandardTransform
Transform: Compose(
               ToTensor()
           ), 'test_dataset': Dataset CIFAR10
    Number of datapoints: 10000
    Root location: /Users/deepak.gupta/eff-dl/data_dir
    Split: Test
    StandardTransform
Transform: Compose(
               ToTensor()
           ), 'train_sampler': <torch.utils.data.sampler.SubsetRandomSampler object at 0x7f9f60a6fa90>, 'val_sampler': <torch.utils.data.sampler.SubsetRandomSampler object at 0x7f9f58622580>, 'test_sampler': None}


The details of the created dataset object are described above. The whole dataset is divided into three parts: train_dataset, val_dataset and test_dataset, and these names are used as keys of a dictionary to access the sub-objects. 

In [None]:
# Construct dataloader
train_loader = torch.utils.data.DataLoader(
        cifar_dataset['train_dataset'], batch_size=64, 
        sampler=cifar_dataset['train_sampler'],
        num_workers=0
    )

In [None]:
# Display image and label.
train_features, train_labels = next(iter(train_loader))
print(f"Feature batch shape: {train_features.size()}")
print(f"Labels batch shape: {train_labels.size()}")
img = train_features[7,0,:,:].squeeze()
label = train_labels[0]
plt.imshow(img, cmap="gray")
plt.show()
print(f"Label: {label}")