### Dataset and DataLoader

In [2]:
import torch
import torch.nn as nn
import torchvision
from torch.utils.data import Dataset, DataLoader
import numpy as np
import math

PyTorch provides a utility for loading datasets in ```torch.util.data.Dataset```. It allows to create custom Datasets objects by subclassing ```Dataset```. The custom dataset class requires three methods:
- ```__init__()```: For initialization
- ```__getitem(index)```: Loads and returns **a sample** from the dataset at the given index
- ```__len__()```: Returns the number of samples in the dataset

In [8]:
class WineDataset(Dataset):
    
    def __init__(self): # Data Loading
        xy = np.loadtxt("wine.csv", delimiter=",", dtype=np.float32)
        self.x = torch.from_numpy(xy[:, 1:])
        self.y = torch.from_numpy(xy[:, [0]])
        self.n_samples = xy.shape[0]
        
    def __getitem__(self, index):
        return self.x[index], self.y[index]
        
    def __len__(self):
        return self.n_samples

In [9]:
dataset = WineDataset()
first_data =dataset[0]
features, labels = first_data
print(features, labels)

tensor([1.4230e+01, 1.7100e+00, 2.4300e+00, 1.5600e+01, 1.2700e+02, 2.8000e+00,
        3.0600e+00, 2.8000e-01, 2.2900e+00, 5.6400e+00, 1.0400e+00, 3.9200e+00,
        1.0650e+03]) tensor([1.])


The loaded ```Dataset``` passes one sample at a time. In training, it is usually the case where batches of data are passed at a time. To generate batches of data, the ```torch.util.data.DataLoader``` module allows us to create an iterator that generates batches of data 

In [21]:
dataloader = DataLoader(dataset, batch_size=32, shuffle=True, 
                       num_workers=2)
dataiter = iter(dataloader)
batch_x, batch_y = next(dataiter)
print(batch_x.shape, batch_y.shape)

torch.Size([32, 13]) torch.Size([32, 1])


In [22]:
num_epochs = 2
total_samples = len(dataset)
n_iterations = math.ceil(total_samples/32)
print(total_samples, n_iterations)

178 6


In [24]:
for epoch in range(num_epochs):
    for i, (inputs, labels) in enumerate(dataloader):
        print(epoch, inputs.shape, labels.shape)

0 torch.Size([32, 13]) torch.Size([32, 1])
0 torch.Size([32, 13]) torch.Size([32, 1])
0 torch.Size([32, 13]) torch.Size([32, 1])
0 torch.Size([32, 13]) torch.Size([32, 1])
0 torch.Size([32, 13]) torch.Size([32, 1])
0 torch.Size([18, 13]) torch.Size([18, 1])
1 torch.Size([32, 13]) torch.Size([32, 1])
1 torch.Size([32, 13]) torch.Size([32, 1])
1 torch.Size([32, 13]) torch.Size([32, 1])
1 torch.Size([32, 13]) torch.Size([32, 1])
1 torch.Size([32, 13]) torch.Size([32, 1])
1 torch.Size([18, 13]) torch.Size([18, 1])


PyTorch provides several transformations that can be utilized in conjunction with the ```Dataset``` utility to apply transformations to data as they are loaded. These can be found under ```torchvision.transforms```.

In [25]:
import torchvision

PyTorch provides several transformations on its own. However, writing one is fairly simple. It requires writing a class with an implemented ```__call__(self, sample)``` method that applies the transformation. This can then be passed to the ```transform``` argument of the ```Dataset``` object.

In [27]:
class MulTransform:
    def __init__(self, factor):
        self.factor = factor
        
    def __call__(self, sample):
        inputs, targets = sample
        inputs *= self.factor
        return inputs, targets
    
class ToTensor:
    def __init__(self):
        pass
    
    def __call__(self, sample):
        inputs, targets = sample
        return torch.from_numpy(inputs), torch.from_numpy(targets)

Several transformations can be chained together using ```torchvision.transforms.Compose```

In [31]:
composed = torchvision.transforms.Compose([ToTensor(), MulTransform(factor=10)])