# Generating batches

In this example we will explore how to read a simple Lightning Memory-Mapped Database (LMDB) using a batch generator.

In [1]:
import numpy as np

import pyxis

Let's start by creating a small dataset of `10` samples. Each input is an image with shape `(254, 254, 3)`, while the targets are scalar values.

In [2]:
nb_samples = 10

X = np.random.rand(nb_samples, 254, 254, 3)
y = np.arange(nb_samples, dtype=np.uint8)

The data is written using the LMDB writer.

In [3]:
db = pyxis.Writer(dirpath='data',
                  input_shape=(254,254,3), target_shape=(),
                  input_dtype=np.float32, target_dtype=np.uint8,
                  map_size_limit=500, ram_gb_limit=1)
db.put_samples(X, y)
db.close()

## Using the batch generator

Read back the data using the LMDB reader.

In [4]:
db = pyxis.Reader('data')

###  Example 1 - Number of samples is a multiple of the batch size

In this first example we create a batch generator where the number of samples is divisible by the batch size.

In [5]:
batch_gen = db.batch_generator(batch_size=5, shuffle=False)

The artificial dataset has `10` samples, so by letting the batch size be `5` it will take *two* iterations to go through the whole dataset. The artificial targets for four batches are printed out to showcase this.

`endless_mode` is by default on, which means that after having gone through the dataset, the generator will re-iterate over the data.

In [6]:
for i in range(4):
    xs, ys = next(batch_gen)
    print()
    print('Iteration:', i, '\tTargets:', ys)
    if db.end_of_dataset:
        print('We have reached the end of the dataset')


Iteration: 0 	Targets: [0 1 2 3 4]

Iteration: 1 	Targets: [5 6 7 8 9]
We have reached the end of the dataset

Iteration: 2 	Targets: [0 1 2 3 4]

Iteration: 3 	Targets: [5 6 7 8 9]
We have reached the end of the dataset


### Example 2 - Number of samples is not a multiple of the batch size

In [7]:
batch_gen = db.batch_generator(batch_size=3, shuffle=False)

The artificial dataset has `10` samples, so by letting the batch size be `3` it will take four iterations to go through the whole dataset. The artificial targets for *six* batches are printed out to showcase this.

Notice that the final batch of the dataset only contains the remaining unseen samples.

In [8]:
for i in range(6):
    xs, ys = next(batch_gen)
    print()
    print('Iteration:', i, '\tTargets:', ys)
    if db.end_of_dataset:
        print('We have reached the end of the dataset')


Iteration: 0 	Targets: [0 1 2]

Iteration: 1 	Targets: [3 4 5]

Iteration: 2 	Targets: [6 7 8]

Iteration: 3 	Targets: [9]
We have reached the end of the dataset

Iteration: 4 	Targets: [0 1 2]

Iteration: 5 	Targets: [3 4 5]


### Example 3 - Shuffling of data

Until now we have created batches by reading samples from the dataset in the order they were written. However, by turning shuffling on, the samples in the dataset will be reshuffled each time we go through the dataset.

In [9]:
batch_gen = db.batch_generator(batch_size=5, shuffle=True)

In [10]:
for i in range(6):
    xs, ys = next(batch_gen)
    print()
    print('Iteration:', i, '\tTargets:', ys)
    if db.end_of_dataset:
        print('We have reached the end of the dataset')


Iteration: 0 	Targets: [0 2 5 9 8]

Iteration: 1 	Targets: [6 3 1 7 4]
We have reached the end of the dataset

Iteration: 2 	Targets: [4 6 1 9 7]

Iteration: 3 	Targets: [2 5 0 3 8]
We have reached the end of the dataset

Iteration: 4 	Targets: [4 5 1 9 2]

Iteration: 5 	Targets: [3 7 0 8 6]
We have reached the end of the dataset


### Example 4 - Stochastic batch generator

Batches can be created stochastically. This means that the samples in a batch are sampled uniformly from the entire dataset. Here we showcase *ten* different batches with a batch size of *five*.

In [11]:
sbg = db.stochastic_batch_generator(batch_size=5)

In [12]:
for i in range(10):
    xs, ys = next(sbg)
    print('Iteration:', i, '\tTargets:', ys)

Iteration: 0 	Targets: [1 8 8 1 4]
Iteration: 1 	Targets: [4 0 0 5 2]
Iteration: 2 	Targets: [1 8 7 3 7]
Iteration: 3 	Targets: [7 4 5 9 0]
Iteration: 4 	Targets: [2 3 3 1 3]
Iteration: 5 	Targets: [4 5 0 4 9]
Iteration: 6 	Targets: [3 1 2 5 3]
Iteration: 7 	Targets: [3 9 5 3 2]
Iteration: 8 	Targets: [3 2 1 1 4]
Iteration: 9 	Targets: [5 3 6 4 7]


## Close everything
We should make sure to close the LMDB environment after we are done reading.



In [13]:
db.close()