In [1]:
from vit4elm.datasets import IsingData
import os

The main objective of the PyTorch dataset classes constructed in this repository is to automate the entire experimentation process and make things as easy for you as possible. With the `experiments.json` file, you can simply pass it into the `IsingData` class alongside the set you want. 

Remember the directory structure:

```console
af_ising
├── csvs
│   └── data.csv
├── data
│   ├── bin0
│   ├── bin1
│   ├── bin2
│   └── bin3
├── data.json
└── experiments.json
```

In [2]:
# Get path to experiments.json
experiments = os.path.join('af_ising', 'experiments.json')

# For train set:
train_set = IsingData(experiments, 'train')

# For test set:
test_set = IsingData(experiments, 'test')

# For validation set:
validation_set = IsingData(experiments, 'validation')

Note that if you run the following:

In [3]:
data = IsingData(experiments)

This will simply return ALL of the data. However, it is ideal to use the splits outlined for you already as the `DataProcessor` class in the first notebook leverages stratified shuffling to deal with class imbalance.

Now that we can load our data as a Pytorch Dataset, we can simply feed it into our trainer. Note that `IsingData` is a child class of a `torch.utils.data.Dataset`. This means we can index it and view the data as such:

All the parameters of the `IsingData` class are as follows:

```python
        experiment_json_path:str,
        train_test_validation:str=None,
        feature_extractor = ViTFeatureExtractor()
```

It is ideal that you specify the `train_test_validation` parameter as mentioned before. As of now, the `IsingData` class only supports HuggingFace's `ViTFeatureExtractor`. There will be updates down the road to make this more flexible.

In [4]:
# Let's play with train_set:
print(f'Length of Train Set: {len(train_set)}')

# Get first image and label:
image, label = train_set[0]
print(f'Image Shape: {image.shape}')
print(f'Label: {label}')

Length of Train Set: 663
Image Shape: torch.Size([1, 3, 224, 224])
Label: 0


You can also view the `experiments.json`:

In [5]:
train_set.experiments

{'data_dir': 'af_ising',
 'num_labels': 4,
 'intervals': [[0.0, 1.0], [1.0, 2.0], [2.0, 3.0], [3.0, 4.0]],
 'test_size': 0.4}