# The `DataSet` class

PyTorch provides many tools to separate the data processing and loading part. The base concept is the `DataSet` class, which is designed to work with the `DataLoader` class, that takes care of shuffling data and batching it during epochs.

The whole [documentation](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html) is more detailed, but basically the `DataLoader` expects the dataset be a subclass of `torch.utils.data.Dataset` and to implement a few basic methods to retrive the sample with index `idx`.

I implemented it in the `./src/dataset.py` module; let's import it and see it in action.

In [2]:
import os

os.chdir('/home/ubuntu/nndl-project/')

from src.dataset import * # import ModelNetDataset class
from torch.utils.data import DataLoader
import pandas as pd
import matplotlib.pyplot as plt

metadata_path = '/home/ubuntu/nndl-project/data/modelnet10/metadata.parquet'

dataset=ModelNetDataset(metadata_path,file_format='npy')
loader=DataLoader(dataset,shuffle=True,batch_size=12)

We just created both the `dataset` and the `loader` objects; the `ModelNetDataset` class expects the path to a metadata `parquet` file containing the fields we created in the previous notebook. 
The loader is an iterable that provides batches until it sweeps through all the dataset; each sample from the dataset is:

- if working with `npy` a tuple $(V_i,o_i,c_i)$ where $V$ is a $(1,N,N,N)$ voxel grid (first dimension is channel), $o$ is the one-hot encoded orientation class and $c$ is the one-hot encoded class

- if working with `ply` a tuple $(V_i,\mathbf{r}_i,c_i)$ where now $\mathbf{r}_i = (r_x,r_y,r_z)$ is a vector with the rotation components along the three axis in degrees. This method voxelizes the mesh on the fly, so it's much slower.

In [8]:
v_batch,o_batch,c_batch=next(iter(loader))

print(v_batch.shape)
print(o_batch.shape)
print(c_batch.shape)

torch.Size([12, 1, 30, 30, 30])
torch.Size([12, 40])
torch.Size([12, 10])


In [10]:
# print first sample
o_batch[0]

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0])