In [1]:
import tensorflow as tf
import itertools as it
tf.enable_eager_execution()

In [2]:
print("TensorFlow version:", tf.__version__)

TensorFlow version: 1.11.0


## Basic Dataset function

Start with dataset giving integers 0, 1, 2, 3, 4, 5, 6.

In [3]:
def get_initial_dataset(to_=7, from_=0):
    tensor = tf.range(from_, to_) 
    ds = tf.data.Dataset.from_tensor_slices(tensor)
    return ds

In [4]:
ds = get_initial_dataset()
ds

<TensorSliceDataset shapes: (), types: tf.int32>

With eager mode enabled, `tf.data.Dataset` may be treated as a Python iterable object. Otherwise convert the object into `tf.data.Iterator` with `.make_one_shot_iterator()`. Then the iterator is accessed with TensorFlow iterator APIs such as `.get_next()`.

In [5]:
list(it.islice(ds, 10))

[<tf.Tensor: id=7, shape=(), dtype=int32, numpy=0>,
 <tf.Tensor: id=8, shape=(), dtype=int32, numpy=1>,
 <tf.Tensor: id=9, shape=(), dtype=int32, numpy=2>,
 <tf.Tensor: id=10, shape=(), dtype=int32, numpy=3>,
 <tf.Tensor: id=11, shape=(), dtype=int32, numpy=4>,
 <tf.Tensor: id=12, shape=(), dtype=int32, numpy=5>,
 <tf.Tensor: id=13, shape=(), dtype=int32, numpy=6>]

## Using `Dataset.batch()`

`.batch()` is handy in grouping dataset elements by chunks. 

In [6]:
ds = get_initial_dataset()
ds = ds.batch(2, drop_remainder=True)
ds

<BatchDataset shapes: (2,), types: tf.int32>

In [7]:
list(it.islice(ds, 5))

[<tf.Tensor: id=36, shape=(2,), dtype=int32, numpy=array([0, 1], dtype=int32)>,
 <tf.Tensor: id=37, shape=(2,), dtype=int32, numpy=array([2, 3], dtype=int32)>,
 <tf.Tensor: id=38, shape=(2,), dtype=int32, numpy=array([4, 5], dtype=int32)>]

Note that the last element was droped as the result of `drop_remainder=True`.

You can apply batch multiple times to obtain high-dimensional element.

In [8]:
ds = get_initial_dataset(20)
ds = ds.batch(2, drop_remainder=True).batch(3, drop_remainder=True)
ds

<BatchDataset shapes: (3, 2), types: tf.int32>

In [9]:
list(ds.take(2))

[<tf.Tensor: id=92, shape=(3, 2), dtype=int32, numpy=
 array([[0, 1],
        [2, 3],
        [4, 5]], dtype=int32)>,
 <tf.Tensor: id=93, shape=(3, 2), dtype=int32, numpy=
 array([[ 6,  7],
        [ 8,  9],
        [10, 11]], dtype=int32)>]

## Tuple element

Tuple of tensors becomes dataset with a tuple element after `.from_tensor_slices()`.

**[NOTE]** Length of element must agree.

In [10]:
tuple_of_tensors = (tf.range(10), tf.range(10, 20))
ds = tf.data.Dataset.from_tensor_slices(tuple_of_tensors)
ds

<TensorSliceDataset shapes: ((), ()), types: (tf.int32, tf.int32)>

In [11]:
list(ds.take(1))

[(<tf.Tensor: id=103, shape=(), dtype=int32, numpy=0>,
  <tf.Tensor: id=104, shape=(), dtype=int32, numpy=10>)]

`.batch` works magically such that element is a tuple where each component is batched.

In [12]:
ds = ds.batch(2)
list(ds.take(1))

[(<tf.Tensor: id=123, shape=(2,), dtype=int32, numpy=array([0, 1], dtype=int32)>,
  <tf.Tensor: id=124, shape=(2,), dtype=int32, numpy=array([10, 11], dtype=int32)>)]

`.map` provide access each tuple element as positional arguments.

In [13]:
tuple_of_tensors = (tf.range(10), tf.range(10, 20))
ds = tf.data.Dataset.from_tensor_slices(tuple_of_tensors)
ds = ds.map(lambda x,y: (x + y, x * y))
ds = ds.batch(3)

list(ds.take(1))

[(<tf.Tensor: id=150, shape=(3,), dtype=int32, numpy=array([10, 12, 14], dtype=int32)>,
  <tf.Tensor: id=151, shape=(3,), dtype=int32, numpy=array([ 0, 11, 24], dtype=int32)>)]

Note that analogus operation against Python list of tuples fails.

In [14]:
tuples = [(1,2), (3,4)]
try:
    list(map(lambda x,y: (x + y, x * y), tuples))
    print("Works in raw Python!")
except TypeError as e:
    print("Fails with error message:", e)

Fails with error message: <lambda>() missing 1 required positional argument: 'y'


## Dict element

In similar way as tuple, dataset handles dict of tensors elegantly.

In [15]:
d = {"x": tf.range(10), "y": tf.range(10, 20)}
ds = tf.data.Dataset.from_tensor_slices(d)
ds = ds.batch(3)

In [16]:
list(ds.take(1))

[{'x': <tf.Tensor: id=171, shape=(3,), dtype=int32, numpy=array([0, 1, 2], dtype=int32)>,
  'y': <tf.Tensor: id=172, shape=(3,), dtype=int32, numpy=array([10, 11, 12], dtype=int32)>}]

## Sliding window

In [21]:
ds = get_initial_dataset(100)
ds = ds.apply(tf.contrib.data.sliding_window_batch(4))
ds = ds.batch(3, drop_remainder=True)

In [23]:
list(ds.take(2))

[<tf.Tensor: id=228, shape=(3, 4), dtype=int32, numpy=
 array([[0, 1, 2, 3],
        [1, 2, 3, 4],
        [2, 3, 4, 5]], dtype=int32)>,
 <tf.Tensor: id=229, shape=(3, 4), dtype=int32, numpy=
 array([[3, 4, 5, 6],
        [4, 5, 6, 7],
        [5, 6, 7, 8]], dtype=int32)>]