# Anna KaRNNa

In this notebook, I'll build a character-wise RNN trained on Anna Karenina, one of my all-time favorite books. It'll be able to generate new text based on the text from the book.

This network is based off of Andrej Karpathy's [post on RNNs](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) and [implementation in Torch](https://github.com/karpathy/char-rnn). Also, some information [here at r2rt](http://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.html) and from [Sherjil Ozair](https://github.com/sherjilozair/char-rnn-tensorflow) on GitHub. Below is the general architecture of the character-wise RNN.

<img src="assets/charseq.jpeg" width="500">

In [1]:
import time
from collections import namedtuple

import numpy as np
import tensorflow as tf

First we'll load the text file and convert it into integers for our network to use.

In [2]:
with open('anna.txt', 'r') as f:
    text=f.read()
vocab = set(text)
vocab_to_int = {c: i for i, c in enumerate(vocab)}
int_to_vocab = dict(enumerate(vocab))
chars = np.array([vocab_to_int[c] for c in text], dtype=np.int32)

In [3]:
text[:100]

'Chapter 1\n\n\nHappy families are all alike; every unhappy family is unhappy in its own\nway.\n\nEverythin'

In [4]:
chars[:100]

array([46, 36, 35, 62, 17, 42, 15, 38, 34, 65, 65, 65, 60, 35, 62, 62, 39,
       38, 55, 35, 49, 24, 44, 24, 42, 19, 38, 35, 15, 42, 38, 35, 44, 44,
       38, 35, 44, 24, 51, 42, 74, 38, 42, 70, 42, 15, 39, 38, 47, 23, 36,
       35, 62, 62, 39, 38, 55, 35, 49, 24, 44, 39, 38, 24, 19, 38, 47, 23,
       36, 35, 62, 62, 39, 38, 24, 23, 38, 24, 17, 19, 38, 82, 32, 23, 65,
       32, 35, 39, 64, 65, 65, 25, 70, 42, 15, 39, 17, 36, 24, 23],
      dtype=int32)

Now I need to split up the data into batches, and into training and validation sets. I should be making a test set here, but I'm not going to worry about that. My test will be if the network can generate new text.

Here I'll make both input and target arrays. The targets are the same as the inputs, except shifted one character over. I'll also drop the last bit of data so that I'll only have completely full batches.

The idea here is to make a 2D matrix where the number of rows is equal to the number of batches. Each row will be one long concatenated string from the character data. We'll split this data into a training set and validation set using the `split_frac` keyword. This will keep 90% of the batches in the training set, the other 10% in the validation set.

In [5]:
def split_data(chars, batch_size, num_steps, split_frac=0.9):
    """ 
    Split character data into training and validation sets, inputs and targets for each set.
    
    Arguments
    ---------
    chars: character array
    batch_size: Size of examples in each of batch
    num_steps: Number of sequence steps to keep in the input and pass to the network
    split_frac: Fraction of batches to keep in the training set
    
    
    Returns train_x, train_y, val_x, val_y
    """
    
    
    slice_size = batch_size * num_steps
    n_batches = int(len(chars) / slice_size)
    
    # Drop the last few characters to make only full batches
    x = chars[: n_batches*slice_size]
    y = chars[1: n_batches*slice_size + 1]
    
    # Split the data into batch_size slices, then stack them into a 2D matrix 
    x = np.stack(np.split(x, batch_size))
    y = np.stack(np.split(y, batch_size))
    
    # Now x and y are arrays with dimensions batch_size x n_batches*num_steps
    
    # Split into training and validation sets, keep the virst split_frac batches for training
    split_idx = int(n_batches*split_frac)
    train_x, train_y= x[:, :split_idx*num_steps], y[:, :split_idx*num_steps]
    val_x, val_y = x[:, split_idx*num_steps:], y[:, split_idx*num_steps:]
    
    return train_x, train_y, val_x, val_y

In [6]:
train_x, train_y, val_x, val_y = split_data(chars, 10, 200)

In [7]:
train_x.shape

(10, 178400)

In [8]:
train_x[:,:10]

array([[46, 36, 35, 62, 17, 42, 15, 38, 34, 65],
       [ 7, 23, 18, 38, 36, 42, 38, 49, 82, 70],
       [38,  2, 35, 17,  2, 36, 24, 23, 69, 38],
       [82, 17, 36, 42, 15, 38, 32, 82, 47, 44],
       [38, 17, 36, 42, 38, 44, 35, 23, 18, 37],
       [38, 52, 36, 15, 82, 47, 69, 36, 38, 44],
       [17, 38, 17, 82, 65, 18, 82, 64, 65, 65],
       [82, 38, 36, 42, 15, 19, 42, 44, 55, 57],
       [36, 35, 17, 38, 24, 19, 38, 17, 36, 42],
       [42, 15, 19, 42, 44, 55, 38, 35, 23, 18]], dtype=int32)

I'll write another function to grab batches out of the arrays made by split data. Here each batch will be a sliding window on these arrays with size `batch_size X num_steps`. For example, if we want our network to train on a sequence of 100 characters, `num_steps = 100`. For the next batch, we'll shift this window the next sequence of `num_steps` characters. In this way we can feed batches to the network and the cell states will continue through on each batch.

In [9]:
def get_batch(arrs, num_steps):
    batch_size, slice_size = arrs[0].shape
    
    n_batches = int(slice_size/num_steps)
    for b in range(n_batches):
        yield [x[:, b*num_steps: (b+1)*num_steps] for x in arrs]

In [13]:
def build_rnn(num_classes, batch_size=50, num_steps=50, lstm_size=128, num_layers=2,
              learning_rate=0.001, grad_clip=5, sampling=False):
        
    if sampling == True:
        batch_size, num_steps = 1, 1

    tf.reset_default_graph()
    
    # Declare placeholders we'll feed into the graph
    with tf.name_scope('inputs'):
        inputs = tf.placeholder(tf.int32, [batch_size, num_steps], name='inputs')
        x_one_hot = tf.one_hot(inputs, num_classes, name='x_one_hot')
    
    with tf.name_scope('targets'):
        targets = tf.placeholder(tf.int32, [batch_size, num_steps], name='targets')
        y_one_hot = tf.one_hot(targets, num_classes, name='y_one_hot')
        y_reshaped = tf.reshape(y_one_hot, [-1, num_classes])
    
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')
    
    # Build the RNN layers
    with tf.name_scope("RNN_layers"):
        def make_cell():
            lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
            drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
            return drop
        cell = tf.contrib.rnn.MultiRNNCell([make_cell() for _ in range(num_layers)])
    
    with tf.name_scope("RNN_init_state"):
        initial_state = cell.zero_state(batch_size, tf.float32)

    # Run the data through the RNN layers
    with tf.name_scope("RNN_forward"):
        outputs, state = tf.nn.dynamic_rnn(cell, x_one_hot, initial_state=initial_state)
    
    final_state = state
    
    # Reshape output so it's a bunch of rows, one row for each cell output
    with tf.name_scope('sequence_reshape'):
        seq_output = tf.concat(outputs, axis=1,name='seq_output')
        output = tf.reshape(seq_output, [-1, lstm_size], name='graph_output')
    
    # Now connect the RNN putputs to a softmax layer and calculate the cost
    with tf.name_scope('logits'):
        softmax_w = tf.Variable(tf.truncated_normal((lstm_size, num_classes), stddev=0.1),
                               name='softmax_w')
        softmax_b = tf.Variable(tf.zeros(num_classes), name='softmax_b')
        logits = tf.matmul(output, softmax_w) + softmax_b

    with tf.name_scope('predictions'):
        preds = tf.nn.softmax(logits, name='predictions')
    
    
    with tf.name_scope('cost'):
        loss = tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=y_reshaped, name='loss')
        cost = tf.reduce_mean(loss, name='cost')

    # Optimizer for training, using gradient clipping to control exploding gradients
    with tf.name_scope('train'):
        tvars = tf.trainable_variables()
        grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars), grad_clip)
        train_op = tf.train.AdamOptimizer(learning_rate)
        optimizer = train_op.apply_gradients(zip(grads, tvars))
    
    # Export the nodes 
    export_nodes = ['inputs', 'targets', 'initial_state', 'final_state',
                    'keep_prob', 'cost', 'preds', 'optimizer']
    Graph = namedtuple('Graph', export_nodes)
    local_dict = locals()
    graph = Graph(*[local_dict[each] for each in export_nodes])
    
    return graph

## Hyperparameters

Here I'm defining the hyperparameters for the network. The two you probably haven't seen before are `lstm_size` and `num_layers`. These set the number of hidden units in the LSTM layers and the number of LSTM layers, respectively. Of course, making these bigger will improve the network's performance but you'll have to watch out for overfitting. If your validation loss is much larger than the training loss, you're probably overfitting. Decrease the size of the network or decrease the dropout keep probability.

In [11]:
batch_size = 100
num_steps = 100
lstm_size = 512
num_layers = 2
learning_rate = 0.001

## Write out the graph for TensorBoard

In [14]:
model = build_rnn(len(vocab), 
                  batch_size=batch_size,
                  num_steps=num_steps,
                  learning_rate=learning_rate,
                  lstm_size=lstm_size,
                  num_layers=num_layers)

with tf.Session() as sess:
    
    sess.run(tf.global_variables_initializer())
    file_writer = tf.summary.FileWriter('./logs/3', sess.graph)

## Training

Time for training which is is pretty straightforward. Here I pass in some data, and get an LSTM state back. Then I pass that state back in to the network so the next batch can continue the state from the previous batch. And every so often (set by `save_every_n`) I calculate the validation loss and save a checkpoint.

In [15]:
!mkdir -p checkpoints/anna

In [16]:
epochs = 10
save_every_n = 200
train_x, train_y, val_x, val_y = split_data(chars, batch_size, num_steps)

model = build_rnn(len(vocab), 
                  batch_size=batch_size,
                  num_steps=num_steps,
                  learning_rate=learning_rate,
                  lstm_size=lstm_size,
                  num_layers=num_layers)

saver = tf.train.Saver(max_to_keep=100)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    # Use the line below to load a checkpoint and resume training
    #saver.restore(sess, 'checkpoints/anna20.ckpt')
    
    n_batches = int(train_x.shape[1]/num_steps)
    iterations = n_batches * epochs
    for e in range(epochs):
        
        # Train network
        new_state = sess.run(model.initial_state)
        loss = 0
        for b, (x, y) in enumerate(get_batch([train_x, train_y], num_steps), 1):
            iteration = e*n_batches + b
            start = time.time()
            feed = {model.inputs: x,
                    model.targets: y,
                    model.keep_prob: 0.5,
                    model.initial_state: new_state}
            batch_loss, new_state, _ = sess.run([model.cost, model.final_state, model.optimizer], 
                                                 feed_dict=feed)
            loss += batch_loss
            end = time.time()
            print('Epoch {}/{} '.format(e+1, epochs),
                  'Iteration {}/{}'.format(iteration, iterations),
                  'Training loss: {:.4f}'.format(loss/b),
                  '{:.4f} sec/batch'.format((end-start)))
        
            
            if (iteration%save_every_n == 0) or (iteration == iterations):
                # Check performance, notice dropout has been set to 1
                val_loss = []
                new_state = sess.run(model.initial_state)
                for x, y in get_batch([val_x, val_y], num_steps):
                    feed = {model.inputs: x,
                            model.targets: y,
                            model.keep_prob: 1.,
                            model.initial_state: new_state}
                    batch_loss, new_state = sess.run([model.cost, model.final_state], feed_dict=feed)
                    val_loss.append(batch_loss)

                print('Validation loss:', np.mean(val_loss),
                      'Saving checkpoint!')
                saver.save(sess, "checkpoints/anna/i{}_l{}_{:.3f}.ckpt".format(iteration, lstm_size, np.mean(val_loss)))

Epoch 1/10  Iteration 1/1780 Training loss: 4.4193 6.8299 sec/batch
Epoch 1/10  Iteration 2/1780 Training loss: 4.3753 6.5096 sec/batch
Epoch 1/10  Iteration 3/1780 Training loss: 4.1998 6.2006 sec/batch
Epoch 1/10  Iteration 4/1780 Training loss: 4.5473 6.0518 sec/batch
Epoch 1/10  Iteration 5/1780 Training loss: 4.5133 5.6488 sec/batch
Epoch 1/10  Iteration 6/1780 Training loss: 4.4152 5.3280 sec/batch
Epoch 1/10  Iteration 7/1780 Training loss: 4.3178 5.4348 sec/batch
Epoch 1/10  Iteration 8/1780 Training loss: 4.2294 5.6318 sec/batch
Epoch 1/10  Iteration 9/1780 Training loss: 4.1510 5.5955 sec/batch
Epoch 1/10  Iteration 10/1780 Training loss: 4.0843 6.0905 sec/batch
Epoch 1/10  Iteration 11/1780 Training loss: 4.0236 6.3510 sec/batch
Epoch 1/10  Iteration 12/1780 Training loss: 3.9709 6.5256 sec/batch
Epoch 1/10  Iteration 13/1780 Training loss: 3.9233 5.6860 sec/batch
Epoch 1/10  Iteration 14/1780 Training loss: 3.8825 6.0242 sec/batch
Epoch 1/10  Iteration 15/1780 Training loss

Epoch 1/10  Iteration 120/1780 Training loss: 3.2297 8.2370 sec/batch
Epoch 1/10  Iteration 121/1780 Training loss: 3.2295 8.0131 sec/batch
Epoch 1/10  Iteration 122/1780 Training loss: 3.2296 7.8781 sec/batch
Epoch 1/10  Iteration 123/1780 Training loss: 3.2295 8.2653 sec/batch
Epoch 1/10  Iteration 124/1780 Training loss: 3.2291 8.3982 sec/batch
Epoch 1/10  Iteration 125/1780 Training loss: 3.2279 7.8353 sec/batch
Epoch 1/10  Iteration 126/1780 Training loss: 3.2259 8.1576 sec/batch
Epoch 1/10  Iteration 127/1780 Training loss: 3.2240 7.9695 sec/batch
Epoch 1/10  Iteration 128/1780 Training loss: 3.2221 7.8218 sec/batch
Epoch 1/10  Iteration 129/1780 Training loss: 3.2201 8.0262 sec/batch
Epoch 1/10  Iteration 130/1780 Training loss: 3.2182 7.9855 sec/batch
Epoch 1/10  Iteration 131/1780 Training loss: 3.2162 7.8664 sec/batch
Epoch 1/10  Iteration 132/1780 Training loss: 3.2141 7.9613 sec/batch
Epoch 1/10  Iteration 133/1780 Training loss: 3.2126 8.0132 sec/batch
Epoch 1/10  Iteratio

Epoch 2/10  Iteration 237/1780 Training loss: 2.4981 8.3369 sec/batch
Epoch 2/10  Iteration 238/1780 Training loss: 2.4967 7.8721 sec/batch
Epoch 2/10  Iteration 239/1780 Training loss: 2.4951 8.0669 sec/batch
Epoch 2/10  Iteration 240/1780 Training loss: 2.4937 8.0392 sec/batch
Epoch 2/10  Iteration 241/1780 Training loss: 2.4925 8.0355 sec/batch
Epoch 2/10  Iteration 242/1780 Training loss: 2.4910 8.1469 sec/batch
Epoch 2/10  Iteration 243/1780 Training loss: 2.4892 8.0436 sec/batch
Epoch 2/10  Iteration 244/1780 Training loss: 2.4881 7.9960 sec/batch
Epoch 2/10  Iteration 245/1780 Training loss: 2.4866 7.9778 sec/batch
Epoch 2/10  Iteration 246/1780 Training loss: 2.4846 8.1687 sec/batch
Epoch 2/10  Iteration 247/1780 Training loss: 2.4827 8.7341 sec/batch
Epoch 2/10  Iteration 248/1780 Training loss: 2.4814 8.0562 sec/batch
Epoch 2/10  Iteration 249/1780 Training loss: 2.4801 7.9906 sec/batch
Epoch 2/10  Iteration 250/1780 Training loss: 2.4789 8.1335 sec/batch
Epoch 2/10  Iteratio

Epoch 2/10  Iteration 355/1780 Training loss: 2.3512 8.1330 sec/batch
Epoch 2/10  Iteration 356/1780 Training loss: 2.3502 8.1702 sec/batch
Epoch 3/10  Iteration 357/1780 Training loss: 2.2259 8.0402 sec/batch
Epoch 3/10  Iteration 358/1780 Training loss: 2.1869 8.2197 sec/batch
Epoch 3/10  Iteration 359/1780 Training loss: 2.1715 8.2922 sec/batch
Epoch 3/10  Iteration 360/1780 Training loss: 2.1667 8.1391 sec/batch
Epoch 3/10  Iteration 361/1780 Training loss: 2.1646 8.1368 sec/batch
Epoch 3/10  Iteration 362/1780 Training loss: 2.1598 8.0939 sec/batch
Epoch 3/10  Iteration 363/1780 Training loss: 2.1605 9.2188 sec/batch
Epoch 3/10  Iteration 364/1780 Training loss: 2.1601 8.2241 sec/batch
Epoch 3/10  Iteration 365/1780 Training loss: 2.1619 8.0618 sec/batch
Epoch 3/10  Iteration 366/1780 Training loss: 2.1604 8.2046 sec/batch
Epoch 3/10  Iteration 367/1780 Training loss: 2.1567 8.0853 sec/batch
Epoch 3/10  Iteration 368/1780 Training loss: 2.1547 8.2028 sec/batch
Epoch 3/10  Iteratio

Epoch 3/10  Iteration 472/1780 Training loss: 2.0740 8.8618 sec/batch
Epoch 3/10  Iteration 473/1780 Training loss: 2.0734 8.5196 sec/batch
Epoch 3/10  Iteration 474/1780 Training loss: 2.0726 8.1088 sec/batch
Epoch 3/10  Iteration 475/1780 Training loss: 2.0720 8.1840 sec/batch
Epoch 3/10  Iteration 476/1780 Training loss: 2.0714 8.3683 sec/batch
Epoch 3/10  Iteration 477/1780 Training loss: 2.0709 8.1337 sec/batch
Epoch 3/10  Iteration 478/1780 Training loss: 2.0701 8.1084 sec/batch
Epoch 3/10  Iteration 479/1780 Training loss: 2.0693 8.1054 sec/batch
Epoch 3/10  Iteration 480/1780 Training loss: 2.0688 8.2000 sec/batch
Epoch 3/10  Iteration 481/1780 Training loss: 2.0682 8.0729 sec/batch
Epoch 3/10  Iteration 482/1780 Training loss: 2.0672 8.0509 sec/batch
Epoch 3/10  Iteration 483/1780 Training loss: 2.0667 8.3632 sec/batch
Epoch 3/10  Iteration 484/1780 Training loss: 2.0661 8.2117 sec/batch
Epoch 3/10  Iteration 485/1780 Training loss: 2.0654 8.4260 sec/batch
Epoch 3/10  Iteratio

Epoch 4/10  Iteration 590/1780 Training loss: 1.8981 8.5842 sec/batch
Epoch 4/10  Iteration 591/1780 Training loss: 1.8979 8.9832 sec/batch
Epoch 4/10  Iteration 592/1780 Training loss: 1.8971 8.4949 sec/batch
Epoch 4/10  Iteration 593/1780 Training loss: 1.8963 8.5160 sec/batch
Epoch 4/10  Iteration 594/1780 Training loss: 1.8964 8.5082 sec/batch
Epoch 4/10  Iteration 595/1780 Training loss: 1.8958 8.4953 sec/batch
Epoch 4/10  Iteration 596/1780 Training loss: 1.8962 8.3352 sec/batch
Epoch 4/10  Iteration 597/1780 Training loss: 1.8960 8.3802 sec/batch
Epoch 4/10  Iteration 598/1780 Training loss: 1.8958 8.8629 sec/batch
Epoch 4/10  Iteration 599/1780 Training loss: 1.8954 8.7753 sec/batch
Epoch 4/10  Iteration 600/1780 Training loss: 1.8954 8.3748 sec/batch
Validation loss: 1.7583001 Saving checkpoint!
Epoch 4/10  Iteration 601/1780 Training loss: 1.8957 8.2371 sec/batch
Epoch 4/10  Iteration 602/1780 Training loss: 1.8951 8.0999 sec/batch
Epoch 4/10  Iteration 603/1780 Training loss

Epoch 4/10  Iteration 707/1780 Training loss: 1.8460 8.0903 sec/batch
Epoch 4/10  Iteration 708/1780 Training loss: 1.8457 8.3592 sec/batch
Epoch 4/10  Iteration 709/1780 Training loss: 1.8454 8.2817 sec/batch
Epoch 4/10  Iteration 710/1780 Training loss: 1.8449 8.0113 sec/batch
Epoch 4/10  Iteration 711/1780 Training loss: 1.8444 8.1577 sec/batch
Epoch 4/10  Iteration 712/1780 Training loss: 1.8441 8.1497 sec/batch
Epoch 5/10  Iteration 713/1780 Training loss: 1.8629 8.4539 sec/batch
Epoch 5/10  Iteration 714/1780 Training loss: 1.8120 9.9947 sec/batch
Epoch 5/10  Iteration 715/1780 Training loss: 1.7952 8.5853 sec/batch
Epoch 5/10  Iteration 716/1780 Training loss: 1.7876 8.3310 sec/batch
Epoch 5/10  Iteration 717/1780 Training loss: 1.7846 8.3953 sec/batch
Epoch 5/10  Iteration 718/1780 Training loss: 1.7741 8.6884 sec/batch
Epoch 5/10  Iteration 719/1780 Training loss: 1.7741 8.2981 sec/batch
Epoch 5/10  Iteration 720/1780 Training loss: 1.7711 8.2177 sec/batch
Epoch 5/10  Iteratio

Epoch 5/10  Iteration 824/1780 Training loss: 1.7298 8.0150 sec/batch
Epoch 5/10  Iteration 825/1780 Training loss: 1.7294 8.0655 sec/batch
Epoch 5/10  Iteration 826/1780 Training loss: 1.7290 8.1262 sec/batch
Epoch 5/10  Iteration 827/1780 Training loss: 1.7285 8.1652 sec/batch
Epoch 5/10  Iteration 828/1780 Training loss: 1.7279 8.1143 sec/batch
Epoch 5/10  Iteration 829/1780 Training loss: 1.7276 8.2688 sec/batch
Epoch 5/10  Iteration 830/1780 Training loss: 1.7272 8.2438 sec/batch
Epoch 5/10  Iteration 831/1780 Training loss: 1.7268 8.2095 sec/batch
Epoch 5/10  Iteration 832/1780 Training loss: 1.7265 8.3087 sec/batch
Epoch 5/10  Iteration 833/1780 Training loss: 1.7261 8.2441 sec/batch
Epoch 5/10  Iteration 834/1780 Training loss: 1.7256 8.1590 sec/batch
Epoch 5/10  Iteration 835/1780 Training loss: 1.7250 8.2073 sec/batch
Epoch 5/10  Iteration 836/1780 Training loss: 1.7248 8.3535 sec/batch
Epoch 5/10  Iteration 837/1780 Training loss: 1.7245 7.8650 sec/batch
Epoch 5/10  Iteratio

Epoch 6/10  Iteration 942/1780 Training loss: 1.6418 7.6541 sec/batch
Epoch 6/10  Iteration 943/1780 Training loss: 1.6415 7.6347 sec/batch
Epoch 6/10  Iteration 944/1780 Training loss: 1.6413 7.6155 sec/batch
Epoch 6/10  Iteration 945/1780 Training loss: 1.6409 7.6757 sec/batch
Epoch 6/10  Iteration 946/1780 Training loss: 1.6408 7.5862 sec/batch
Epoch 6/10  Iteration 947/1780 Training loss: 1.6409 7.7459 sec/batch
Epoch 6/10  Iteration 948/1780 Training loss: 1.6404 8.5625 sec/batch
Epoch 6/10  Iteration 949/1780 Training loss: 1.6397 8.3941 sec/batch
Epoch 6/10  Iteration 950/1780 Training loss: 1.6401 8.9605 sec/batch
Epoch 6/10  Iteration 951/1780 Training loss: 1.6398 9.0943 sec/batch
Epoch 6/10  Iteration 952/1780 Training loss: 1.6405 8.4674 sec/batch
Epoch 6/10  Iteration 953/1780 Training loss: 1.6408 7.8843 sec/batch
Epoch 6/10  Iteration 954/1780 Training loss: 1.6409 7.6767 sec/batch
Epoch 6/10  Iteration 955/1780 Training loss: 1.6405 7.6472 sec/batch
Epoch 6/10  Iteratio

Epoch 6/10  Iteration 1058/1780 Training loss: 1.6127 7.6974 sec/batch
Epoch 6/10  Iteration 1059/1780 Training loss: 1.6125 8.1978 sec/batch
Epoch 6/10  Iteration 1060/1780 Training loss: 1.6123 7.8497 sec/batch
Epoch 6/10  Iteration 1061/1780 Training loss: 1.6120 8.0344 sec/batch
Epoch 6/10  Iteration 1062/1780 Training loss: 1.6117 7.9625 sec/batch
Epoch 6/10  Iteration 1063/1780 Training loss: 1.6116 7.7814 sec/batch
Epoch 6/10  Iteration 1064/1780 Training loss: 1.6114 7.9233 sec/batch
Epoch 6/10  Iteration 1065/1780 Training loss: 1.6114 7.6504 sec/batch
Epoch 6/10  Iteration 1066/1780 Training loss: 1.6111 7.7032 sec/batch
Epoch 6/10  Iteration 1067/1780 Training loss: 1.6108 7.6034 sec/batch
Epoch 6/10  Iteration 1068/1780 Training loss: 1.6107 7.6228 sec/batch
Epoch 7/10  Iteration 1069/1780 Training loss: 1.6686 7.8002 sec/batch
Epoch 7/10  Iteration 1070/1780 Training loss: 1.6232 7.7008 sec/batch
Epoch 7/10  Iteration 1071/1780 Training loss: 1.6074 7.6212 sec/batch
Epoch 

Epoch 7/10  Iteration 1174/1780 Training loss: 1.5471 7.7343 sec/batch
Epoch 7/10  Iteration 1175/1780 Training loss: 1.5469 7.6099 sec/batch
Epoch 7/10  Iteration 1176/1780 Training loss: 1.5468 7.5762 sec/batch
Epoch 7/10  Iteration 1177/1780 Training loss: 1.5466 7.7547 sec/batch
Epoch 7/10  Iteration 1178/1780 Training loss: 1.5465 7.6699 sec/batch
Epoch 7/10  Iteration 1179/1780 Training loss: 1.5462 7.7295 sec/batch
Epoch 7/10  Iteration 1180/1780 Training loss: 1.5459 7.7526 sec/batch
Epoch 7/10  Iteration 1181/1780 Training loss: 1.5457 7.5154 sec/batch
Epoch 7/10  Iteration 1182/1780 Training loss: 1.5455 7.4611 sec/batch
Epoch 7/10  Iteration 1183/1780 Training loss: 1.5450 7.7340 sec/batch
Epoch 7/10  Iteration 1184/1780 Training loss: 1.5445 8.0495 sec/batch
Epoch 7/10  Iteration 1185/1780 Training loss: 1.5443 8.2218 sec/batch
Epoch 7/10  Iteration 1186/1780 Training loss: 1.5441 7.8813 sec/batch
Epoch 7/10  Iteration 1187/1780 Training loss: 1.5439 7.7855 sec/batch
Epoch 

Epoch 8/10  Iteration 1289/1780 Training loss: 1.4982 7.4446 sec/batch
Epoch 8/10  Iteration 1290/1780 Training loss: 1.4975 7.6480 sec/batch
Epoch 8/10  Iteration 1291/1780 Training loss: 1.4976 7.7126 sec/batch
Epoch 8/10  Iteration 1292/1780 Training loss: 1.4965 7.5841 sec/batch
Epoch 8/10  Iteration 1293/1780 Training loss: 1.4960 7.5948 sec/batch
Epoch 8/10  Iteration 1294/1780 Training loss: 1.4954 7.6789 sec/batch
Epoch 8/10  Iteration 1295/1780 Training loss: 1.4952 7.5576 sec/batch
Epoch 8/10  Iteration 1296/1780 Training loss: 1.4954 7.6822 sec/batch
Epoch 8/10  Iteration 1297/1780 Training loss: 1.4950 7.5653 sec/batch
Epoch 8/10  Iteration 1298/1780 Training loss: 1.4956 7.5648 sec/batch
Epoch 8/10  Iteration 1299/1780 Training loss: 1.4955 7.5620 sec/batch
Epoch 8/10  Iteration 1300/1780 Training loss: 1.4956 7.7321 sec/batch
Epoch 8/10  Iteration 1301/1780 Training loss: 1.4952 7.5460 sec/batch
Epoch 8/10  Iteration 1302/1780 Training loss: 1.4950 7.4526 sec/batch
Epoch 

Epoch 8/10  Iteration 1404/1780 Training loss: 1.4780 7.6736 sec/batch
Epoch 8/10  Iteration 1405/1780 Training loss: 1.4777 7.4757 sec/batch
Epoch 8/10  Iteration 1406/1780 Training loss: 1.4776 7.5694 sec/batch
Epoch 8/10  Iteration 1407/1780 Training loss: 1.4777 7.7006 sec/batch
Epoch 8/10  Iteration 1408/1780 Training loss: 1.4776 7.7632 sec/batch
Epoch 8/10  Iteration 1409/1780 Training loss: 1.4775 7.5011 sec/batch
Epoch 8/10  Iteration 1410/1780 Training loss: 1.4774 7.5889 sec/batch
Epoch 8/10  Iteration 1411/1780 Training loss: 1.4773 7.7051 sec/batch
Epoch 8/10  Iteration 1412/1780 Training loss: 1.4771 7.6334 sec/batch
Epoch 8/10  Iteration 1413/1780 Training loss: 1.4772 7.7341 sec/batch
Epoch 8/10  Iteration 1414/1780 Training loss: 1.4775 8.5208 sec/batch
Epoch 8/10  Iteration 1415/1780 Training loss: 1.4774 10.1441 sec/batch
Epoch 8/10  Iteration 1416/1780 Training loss: 1.4773 8.0733 sec/batch
Epoch 8/10  Iteration 1417/1780 Training loss: 1.4772 8.6044 sec/batch
Epoch

Epoch 9/10  Iteration 1520/1780 Training loss: 1.4402 7.7255 sec/batch
Epoch 9/10  Iteration 1521/1780 Training loss: 1.4401 7.8594 sec/batch
Epoch 9/10  Iteration 1522/1780 Training loss: 1.4396 7.7309 sec/batch
Epoch 9/10  Iteration 1523/1780 Training loss: 1.4392 8.0588 sec/batch
Epoch 9/10  Iteration 1524/1780 Training loss: 1.4388 8.0497 sec/batch
Epoch 9/10  Iteration 1525/1780 Training loss: 1.4386 7.9417 sec/batch
Epoch 9/10  Iteration 1526/1780 Training loss: 1.4385 7.7600 sec/batch
Epoch 9/10  Iteration 1527/1780 Training loss: 1.4383 7.8961 sec/batch
Epoch 9/10  Iteration 1528/1780 Training loss: 1.4380 7.7107 sec/batch
Epoch 9/10  Iteration 1529/1780 Training loss: 1.4377 8.0067 sec/batch
Epoch 9/10  Iteration 1530/1780 Training loss: 1.4377 7.9204 sec/batch
Epoch 9/10  Iteration 1531/1780 Training loss: 1.4376 7.7717 sec/batch
Epoch 9/10  Iteration 1532/1780 Training loss: 1.4375 7.8207 sec/batch
Epoch 9/10  Iteration 1533/1780 Training loss: 1.4373 7.6690 sec/batch
Epoch 

Epoch 10/10  Iteration 1635/1780 Training loss: 1.4111 8.3954 sec/batch
Epoch 10/10  Iteration 1636/1780 Training loss: 1.4111 8.3847 sec/batch
Epoch 10/10  Iteration 1637/1780 Training loss: 1.4106 8.5011 sec/batch
Epoch 10/10  Iteration 1638/1780 Training loss: 1.4104 8.4096 sec/batch
Epoch 10/10  Iteration 1639/1780 Training loss: 1.4096 8.4418 sec/batch
Epoch 10/10  Iteration 1640/1780 Training loss: 1.4081 8.4180 sec/batch
Epoch 10/10  Iteration 1641/1780 Training loss: 1.4065 8.3927 sec/batch
Epoch 10/10  Iteration 1642/1780 Training loss: 1.4060 8.3205 sec/batch
Epoch 10/10  Iteration 1643/1780 Training loss: 1.4054 8.4958 sec/batch
Epoch 10/10  Iteration 1644/1780 Training loss: 1.4060 8.6090 sec/batch
Epoch 10/10  Iteration 1645/1780 Training loss: 1.4057 8.5596 sec/batch
Epoch 10/10  Iteration 1646/1780 Training loss: 1.4050 8.4453 sec/batch
Epoch 10/10  Iteration 1647/1780 Training loss: 1.4051 8.4557 sec/batch
Epoch 10/10  Iteration 1648/1780 Training loss: 1.4042 8.2326 se

Epoch 10/10  Iteration 1749/1780 Training loss: 1.3875 9.0093 sec/batch
Epoch 10/10  Iteration 1750/1780 Training loss: 1.3876 8.7756 sec/batch
Epoch 10/10  Iteration 1751/1780 Training loss: 1.3876 8.8266 sec/batch
Epoch 10/10  Iteration 1752/1780 Training loss: 1.3873 8.9666 sec/batch
Epoch 10/10  Iteration 1753/1780 Training loss: 1.3869 8.8643 sec/batch
Epoch 10/10  Iteration 1754/1780 Training loss: 1.3867 8.7211 sec/batch
Epoch 10/10  Iteration 1755/1780 Training loss: 1.3866 8.7815 sec/batch
Epoch 10/10  Iteration 1756/1780 Training loss: 1.3865 8.5564 sec/batch
Epoch 10/10  Iteration 1757/1780 Training loss: 1.3864 8.9114 sec/batch
Epoch 10/10  Iteration 1758/1780 Training loss: 1.3862 9.1145 sec/batch
Epoch 10/10  Iteration 1759/1780 Training loss: 1.3861 8.8593 sec/batch
Epoch 10/10  Iteration 1760/1780 Training loss: 1.3859 8.7978 sec/batch
Epoch 10/10  Iteration 1761/1780 Training loss: 1.3855 8.7674 sec/batch
Epoch 10/10  Iteration 1762/1780 Training loss: 1.3856 8.8712 se

In [17]:
tf.train.get_checkpoint_state('checkpoints/anna')

model_checkpoint_path: "checkpoints/anna/i1780_l512_1.256.ckpt"
all_model_checkpoint_paths: "checkpoints/anna/i200_l512_2.436.ckpt"
all_model_checkpoint_paths: "checkpoints/anna/i400_l512_1.992.ckpt"
all_model_checkpoint_paths: "checkpoints/anna/i600_l512_1.758.ckpt"
all_model_checkpoint_paths: "checkpoints/anna/i800_l512_1.604.ckpt"
all_model_checkpoint_paths: "checkpoints/anna/i1000_l512_1.485.ckpt"
all_model_checkpoint_paths: "checkpoints/anna/i1200_l512_1.407.ckpt"
all_model_checkpoint_paths: "checkpoints/anna/i1400_l512_1.348.ckpt"
all_model_checkpoint_paths: "checkpoints/anna/i1600_l512_1.306.ckpt"
all_model_checkpoint_paths: "checkpoints/anna/i1780_l512_1.256.ckpt"

## Sampling

Now that the network is trained, we'll can use it to generate new text. The idea is that we pass in a character, then the network will predict the next character. We can use the new one, to predict the next one. And we keep doing this to generate all new text. I also included some functionality to prime the network with some text by passing in a string and building up a state from that.

The network gives us predictions for each character. To reduce noise and make things a little less random, I'm going to only choose a new character from the top N most likely characters.



In [18]:
def pick_top_n(preds, vocab_size, top_n=5):
    p = np.squeeze(preds)
    p[np.argsort(p)[:-top_n]] = 0
    p = p / np.sum(p)
    c = np.random.choice(vocab_size, 1, p=p)[0]
    return c

In [19]:
def sample(checkpoint, n_samples, lstm_size, vocab_size, prime="The "):
    prime = "Far"
    samples = [c for c in prime]
    model = build_rnn(vocab_size, lstm_size=lstm_size, sampling=True)
    saver = tf.train.Saver()
    with tf.Session() as sess:
        saver.restore(sess, checkpoint)
        new_state = sess.run(model.initial_state)
        for c in prime:
            x = np.zeros((1, 1))
            x[0,0] = vocab_to_int[c]
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.preds, model.final_state], 
                                         feed_dict=feed)

        c = pick_top_n(preds, len(vocab))
        samples.append(int_to_vocab[c])

        for i in range(n_samples):
            x[0,0] = c
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.preds, model.final_state], 
                                         feed_dict=feed)

            c = pick_top_n(preds, len(vocab))
            samples.append(int_to_vocab[c])
        
    return ''.join(samples)

In [21]:
checkpoint = "checkpoints/anna/i1780_l512_1.256.ckpt"
samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime="Far")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints/anna/i1780_l512_1.256.ckpt
Farlly, and
to diver on the sigress to and an once the peasa as it was the
class instant it."

"If I can't say."
 "I whon this your horses or a letter," Levin added the sudfeely
and sawing his blonged hands that them that he was not a gone away of
the singer when that it was in tell, who had base altogether to say to a long
the same only of them, and a same of the more. Sergey Ivanovitch who whom the
semptate he had not tell with the rooms women and his still something in
his head on the carry when Levin camped at the paint of him and something, but the
came and horses he was simply, and the same other of her fam her
hands time and went away the contrirt, sin ene, and he had been all, but
was in their change the sender face, and a concroom of the song that the
could not see her at the course. There she shouted the marther, the same or sid
the crail, as though she went on a their strangh without time, and
a

In [22]:
checkpoint = "checkpoints/anna/i1200_l512_1.407.ckpt"
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints/anna/i1200_l512_1.407.ckpt
Farnizg, his happen that had anstared with his chaticale head,
but their were not to his, but and all of the
starre of
his way out of a shance
and attered at
interestence, the word to the stall was have back asted at the call, but he was to step, had not bus and then her hat too the contrame of a
prating went, to the since that is a sungant of he could not some the proficulal often her sand.

"Whilave, I've not the stances worrs were to be so to me?" said Levin,
walking an ancerseant, and stepping the highes and was along, as so the plinciess
and sawithan this she was and supponed and saig,
bat was sattened to
be some a some to her, and telling the serming of his back was simple to be the say that in a
chear to the party would have the plane only and that his hat that the patt of the country, and
a sundrest head the samily or the strange of the
mearing that had no heast to a peasant that he wanted his crat

In [23]:
checkpoint = "checkpoints/anna/i600_l512_1.758.ckpt"
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints/anna/i600_l512_1.758.ckpt
Farnd.

He was so to buce and
to selt to have of and at a sain the
roof and the prestert of
assend, wother the
sone hearss fall his waren when the strom the corsious an that were the herssed.

That shis a what the pooting, tho care an his sare the castred and the pronce with here the said
the same a mare there her tele the hissed in the hadsed to had heald her she wist shise to him though that thas that the shuth op to the seeting. So he with in the somation in the hard been shat his and seid
of he tones it throwe what in who he wanted and the sormare, as the cantine to bute his tele of the parers, were his was and she hid houred to he wath to have ancerting intarining. Hor see in were ancorseds, whom the pasined, to the she with as the saction in the chure the hos a precance.

"Yus, say, the morky al he since and whot's to
her. The celesed to some astised to the mothist and westering. "It sument his tone, 

In [24]:
checkpoint = "checkpoints/anna/i200_l512_2.436.ckpt"
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints/anna/i200_l512_2.436.ckpt
Fargge,, wothe to thon he ate tarering so woth mang at and tho he sal ant as on thes sas tis as astang her an tin ta ser athe ad, bed won thar tos ande hot, wand sorand ant han tim hasd, an herad wall oter hasd tat he wit ho the to the th sanging the
 ool sint the wis tire tislen sad, and the te he ante ha werasd wang tin her thering. tas war an ole the wetha nhe he so the alte sinte ald th ases are sirad an the se was wan he and ale sh te ante won his ha asisid sithe wat hhas se were ton he th teas win ad set ant or homered wong ho so ate he ha the he hher add
ind to themed and.
"
The hese tat he the thas ho sos tas hhre ta the wan het ant it ot anthas ho tou him hirithe wals that onsind the wartin on an himessond, whit an hossrinde simiger thit her he the son serere an her sheras onthe sas thor hors hh sares or at to the asesat an th ar the the sor thes hand wh ond that hared san tit al ares os tir ang th