In this notebook I'll demonstrate how to use `tf.data.Dataset` to implement Curriculum Learning used in [Language Generation with Recurrent Generative Adversarial Networks without Pre-training](https://arxiv.org/pdf/1706.01399.pdf):

> **Curriculum Learning (CL)**: In this extension,
we start by training on short sequences and then
slowly increase sequence length. In the first training
stage, the generator G generates sequences of
length 1, and the discriminator D receives real and
generated sequences of length 1 as input. Then,
the generator generates sequences of length 2 and
the discriminator receives sequences of length 2.
We increase sequence length in this manner until
the maximum length of 32 characters.

In [1]:
import numpy as np
import tensorflow as tf

tf.set_random_seed(42)

sess = tf.Session()

  from ._conv import register_converters as _register_converters


In this example we're going to use `tf.data.Dataset.from_generator` which allows us to use a simple python generator to generate the data.

In [2]:
dataset_filename = './datasets example.ipynb'
with open(dataset_filename, 'r') as f:
    dataset_text = '\n'.join(f.readlines())
    
    
def generate_generator(length):
    '''
    In Curriculum Learning we use multiple phases - each phase generates sequences of different length.
    This function returns a function that when called returns a generator.
    The generator can generate sequences of the given length.
    Note that the returned function is exactly what's expected by tf.data.Dataset.from_generator.
    '''
    def generate_text():
        start_index = 0
        while start_index < len(dataset_text) - length:
            yield dataset_text[start_index:start_index + length]
            start_index += 1
    return generate_text

`generate_generator` is vanilla python code - no tensorflow involved.

We can easily see what it gives us:

In [3]:
generator = generate_generator(length=50)()
print generator.next()

{

 "cells": [

  {

   "cell_type": "markdown",




In [4]:
MAX_LENGTH = 32
BATCH_SIZE = 2
EPOCHS = 20

In [5]:
datasets = [
    # create a dataset out of a generator that generates sequences with `length` charachters
    tf.data.Dataset.from_generator(generate_generator(length), tf.string)
    
    # map the characters into numbers
    .map(lambda t: tf.decode_raw(t, tf.int8))
    
    # shuffle the data (should be done only for training dataset)
    .shuffle(buffer_size=1000, reshuffle_each_iteration=True)
    
    # train EPOCHS epochs
    .repeat(EPOCHS)
    
    # use batch gradient descent
    .batch(BATCH_SIZE)
    
    # create a dataset for every phase in the Curriculum Learning
    for length in range(1, MAX_LENGTH + 1)
]

# create a handle which will choose which dataset to use (i.e. the current phase)
handle = tf.placeholder(tf.string, shape=[], name='handle')

# create the iterator which will be used by the model
iterator = tf.data.Iterator.from_string_handle(
    # the iterator will use the underlying iterator referenced by handle.
    # remember that handle is a placeholder, so in runtime we'll decide which actual dataset to use.
    string_handle=handle,
    
    # we must tell the iterator what types to expect.
    # all the datasets have the same types.
    output_types=datasets[0].output_types
)

# create a tensor that when evaluated will return the
# next element of the dataset (whichever dataset the handle chooses)
value = iterator.get_next()

# for every dataset create an iterator
iterators = [dataset.make_one_shot_iterator() for dataset in datasets]

# and out of the iterators create handles so we can use them as a source to the handle created above
handles = sess.run([iterator.string_handle() for iterator in iterators])

Let's say we trained for a while, and we reached to the last phase of the Curriculum Learning:

In [6]:
phase = MAX_LENGTH

Let's peek into `value` to see what's going to be fed to the model:

In [7]:
for i, example in enumerate(sess.run(value, {handle: handles[phase - 1]})):
    print 'example #{}:'.format(i)
    print '-----------'
    print example
    print ''.join(map(chr, example))
    print '\n\n'

example #0:
-----------
[100 105 115  99 114 105 109 105 110  97 116 111 114  32 114 101  99 101
 105 118 101 115  32 115 101 113 117 101 110  99 101 115]
discriminator receives sequences



example #1:
-----------
[ 32  67 117 114 114 105  99 117 108 117 109  32  76 101  97 114 110 105
 110 103  32 117 115 101 100  32 105 110  32  91  76  97]
 Curriculum Learning used in [La



