To process the data, ensure you have a `data.json` from [statmechsims](https://www.statmechsims.com/). This json will contain all of your simulation data. Say you have the following directory:

```console
af_ising/
└── data.json
```

In [1]:
# Import DataProcessor class
from vit4elm.data_processor import DataProcessor
import numpy as np

# Instantiate necessary parameters.
json_path = 'af_ising/data.json'
data_dir = 'af_ising'
test_size = 0.4

# Please ensure that the shape of custom_intervals is (n_bins, 2)
custom_intervals = np.array([
    [0, 2],
    [2, 3.33],    # Notice the funky boundary lengths
    [3.33, 3.55], 
    [3.55, 4]
])

# Instantiate DataProcessor
processor = DataProcessor(
    json_path, 
    data_dir, 
    custom_intervals=custom_intervals
)

# Let the processor process!
processor.process()

All the parameters of the `DataProcessor` class are as follows:

```python
        json_path:str, 
        data_dir:str, 
        n_bins:int=None, 
        custom_intervals:np.ndarray=None,
        stratified_shuffle:bool=True,
        test_size:float=0.4
```

If both `n_bins` and `custom_intervals` are `None`, then the class will raise an error. The goal is to allow you to choose between equidistant intervals set by the data in the simulations, or choose your own temperature boundaries.

Should you decide to choose your own intervals/boundaries such as above, please ensure the min/max temperatures (0 and 4, in this example) line up with your simulation experiments.

However, if you want equidistant bins, you don't need to input `custom_intervals`, but rather `n_bins` as a parameter.

In [2]:
# We'll assume same json_path and data_dir
n_bins = 4

# Instantiate DataProcessor
processor = DataProcessor(
    json_path, 
    data_dir, 
    n_bins=n_bins
)

# Let the processor process!
processor.process()

## NOTE THAT RUNNING THESE CELLS BACK TO BACK WILL OVERLAP A LOT OF DATA. PLEASE DO NOT RUN BOTH TO PROCESS.

The result of either of the calls will lead to a directory with the following structure.

```console
af_ising
├── csvs
│   └── data.csv
├── data
│   ├── bin0
│   ├── bin1
│   ├── bin2
│   └── bin3
├── data.json
└── experiments.json
```

The only difference between these is the boundaries that determine which image goes into which bin based off of the temperature. Again, this is assuming you want to bin into 4 bins. The `experiments.json` will be needed for instantiating a PyTorch dataset, so please do not delete it or alter anything in this directory.