# Anna KaRNNa

In this notebook, we'll build a character-wise RNN trained on Anna Karenina, one of my all-time favorite books. It'll be able to generate new text based on the text from the book.

This network is based off of Andrej Karpathy's [post on RNNs](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) and [implementation in Torch](https://github.com/karpathy/char-rnn). Also, some information [here at r2rt](http://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.html) and from [Sherjil Ozair](https://github.com/sherjilozair/char-rnn-tensorflow) on GitHub. Below is the general architecture of the character-wise RNN.

<img src="assets/charseq.jpeg" width="500">

In [1]:
import time
from collections import namedtuple

import numpy as np
import tensorflow as tf

First we'll load the text file and convert it into integers for our network to use. Here I'm creating a couple dictionaries to convert the characters to and from integers. Encoding the characters as integers makes it easier to use as input in the network.

In [2]:
with open('anna_temp.txt', 'r') as f:
    text = f.read()
vocab = set(text)
vocab_to_int = {c: i for i, c in enumerate(vocab)}
int_to_vocab = dict(enumerate(vocab))
encoded = np.array([vocab_to_int[c] for c in text])

Let's check out the first 100 characters, make sure everything is peachy. According to the [American Book Review](http://americanbookreview.org/100bestlines.asp), this is the 6th best first line of a book ever.

In [3]:
text[:100]

'Chapter 1\n\n\nHappy families are all alike; every unhappy family is unhappy in its own\nway.\n\nEverythin'

And we can see the characters encoded as integers.

In [4]:
encoded[:100]

array([ 5, 39, 37,  0, 41, 19, 35, 17, 54, 27, 27, 27, 34, 37,  0,  0, 40,
       17,  1, 37, 14, 44, 10, 44, 19, 29, 17, 37, 35, 19, 17, 37, 10, 10,
       17, 37, 10, 44,  2, 19, 53, 17, 19,  6, 19, 35, 40, 17, 45, 48, 39,
       37,  0,  0, 40, 17,  1, 37, 14, 44, 10, 40, 17, 44, 29, 17, 45, 48,
       39, 37,  0,  0, 40, 17, 44, 48, 17, 44, 41, 29, 17, 31, 22, 48, 27,
       22, 37, 40, 38, 27, 27,  7,  6, 19, 35, 40, 41, 39, 44, 48])

Since the network is working with individual characters, it's similar to a classification problem in which we are trying to predict the next character from the previous text.  Here's how many 'classes' our network has to pick from.

In [5]:
len(vocab)

56

## Making training mini-batches

Here is where we'll make our mini-batches for training. Remember that we want our batches to be multiple sequences of some desired number of sequence steps. Considering a simple example, our batches would look like this:

<img src="assets/sequence_batching@1x.png" width=500px>


<br>

We start with our text encoded as integers in one long array in `encoded`. Let's create a function that will give us an iterator for our batches. I like using [generator functions](https://jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/) to do this. Then we can pass `encoded` into this function and get our batch generator.

The first thing we need to do is discard some of the text so we only have completely full batches. Each batch contains $N \times M$ characters, where $N$ is the batch size (the number of sequences) and $M$ is the number of steps. Then, to get the total number of batches, $K$, we can make from the array `arr`, you divide the length of `arr` by the number of characters per batch. Once you know the number of batches, you can get the total number of characters to keep from `arr`, $N * M * K$.

After that, we need to split `arr` into $N$ sequences. You can do this using `arr.reshape(size)` where `size` is a tuple containing the dimensions sizes of the reshaped array. We know we want $N$ sequences (`batch_size` below), let's make that the size of the first dimension. For the second dimension, you can use `-1` as a placeholder in the size, it'll fill up the array with the appropriate data for you. After this, you should have an array that is $N \times (M * K)$.

Now that we have this array, we can iterate through it to get our batches. The idea is each batch is a $N \times M$ window on the $N \times (M * K)$ array. For each subsequent batch, the window moves over by `n_steps`. We also want to create both the input and target arrays. Remember that the targets are the inputs shifted over one character. 

The way I like to do this window is use `range` to take steps of size `n_steps` from $0$ to `arr.shape[1]`, the total number of steps in each sequence. That way, the integers you get from `range` always point to the start of a batch, and each window is `n_steps` wide.

> **Exercise:** Write the code for creating batches in the function below. The exercises in this notebook _will not be easy_. I've provided a notebook with solutions alongside this notebook. If you get stuck, checkout the solutions. The most important thing is that you don't copy and paste the code into here, **type out the solution code yourself.**

In [6]:
from keras.utils import np_utils

class KerasBatchGenerator(object):
    
    def __init__(self, arr, batch_size, n_steps, num_classes):
        self.arr = arr
        self.n_steps = n_steps
        #self.n_batches = len(arr) - self.n_steps
        self.batch_size = batch_size
        self.num_classes = num_classes
        self.x, self.y = self.gen_sequence()
        self.n_batches = self.x.shape[0] // self.batch_size
        
        
    def get_n_batches(self):
        return self.n_batches
    
    def get_n_patterns(self):
        return self.x.shape[0]

    
    def gen_sequence(self):
        x = []
        y = []
        
        for i in range(len(self.arr)-self.n_steps-1):
            x.append(self.arr[i:i+self.n_steps])
            y.append(self.arr[i+self.n_steps])
            
        #print(np.array(x).shape)
        #print(np.array(y).shape)
        
        x = np.reshape(x, (-1, self.n_steps, 1))
        y = np.reshape(y, (-1, 1))
        y = np_utils.to_categorical(y, self.num_classes)
        
        #print(np.array(x).shape)
        #print(np.array(y).shape)

        return x, y
    
        
    def random_generate(self, num):
        return self.x[num:num+self.batch_size,:,:], self.y[num:num+self.batch_size,:]
        
        
    def batch_generate(self):
        '''Create a generator that returns batches of size
           batch_size x n_steps from arr.
       
           Arguments
           ---------
            arr: Array you want to make batches from
           batch_size: Batch size, the number of sequences per batch
           n_steps: Number of sequence steps per batch
        '''
                
        while True:
            for i in range(0, self.x.shape[0] - self.batch_size, self.batch_size):
                yield self.x[i:i+self.batch_size,:,:], self.y[i:i+self.batch_size,:]
                
            
       

Using TensorFlow backend.


Now I'll make my data sets and we can check out what's going on here. Here I'm going to use a batch size of 10 and 50 sequence steps.

In [7]:
batch_generator = KerasBatchGenerator(encoded, 10, 50, len(vocab))

batches = batch_generator.batch_generate()
x_0, y_0 = next(batches)
x_1, y_1 = next(batches)

In [8]:
print('x\n', x_0[:, :10, :])
print('\ny\n', y_0[:, :])
print('x\n', x_1[:, :10, :])
print('\ny\n', y_1[:, :])

x
 [[[ 5]
  [39]
  [37]
  [ 0]
  [41]
  [19]
  [35]
  [17]
  [54]
  [27]]

 [[39]
  [37]
  [ 0]
  [41]
  [19]
  [35]
  [17]
  [54]
  [27]
  [27]]

 [[37]
  [ 0]
  [41]
  [19]
  [35]
  [17]
  [54]
  [27]
  [27]
  [27]]

 [[ 0]
  [41]
  [19]
  [35]
  [17]
  [54]
  [27]
  [27]
  [27]
  [34]]

 [[41]
  [19]
  [35]
  [17]
  [54]
  [27]
  [27]
  [27]
  [34]
  [37]]

 [[19]
  [35]
  [17]
  [54]
  [27]
  [27]
  [27]
  [34]
  [37]
  [ 0]]

 [[35]
  [17]
  [54]
  [27]
  [27]
  [27]
  [34]
  [37]
  [ 0]
  [ 0]]

 [[17]
  [54]
  [27]
  [27]
  [27]
  [34]
  [37]
  [ 0]
  [ 0]
  [40]]

 [[54]
  [27]
  [27]
  [27]
  [34]
  [37]
  [ 0]
  [ 0]
  [40]
  [17]]

 [[27]
  [27]
  [27]
  [34]
  [37]
  [ 0]
  [ 0]
  [40]
  [17]
  [ 1]]]

y
 [[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0

## Building the model

Below is where you'll build the network. We'll break it up into parts so it's easier to reason about each bit. Then we can connect them up into the whole network.

<img src="assets/charRNN.png" width=500px>


### Inputs

First off we'll create our input placeholders. As usual we need placeholders for the training data and the targets. We'll also create a placeholder for dropout layers called `keep_prob`. This will be a scalar, that is a 0-D tensor. To make a scalar, you create a placeholder without giving it a size.

> **Exercise:** Create the input placeholders in the function below.

In [9]:
from keras.models import Model, Sequential
from keras.layers import Input, LSTM, Dense, Activation, Flatten
from keras.callbacks import ModelCheckpoint
from keras.utils import to_categorical



batch_size = 10         # Sequences per batch
num_steps = 50          # Number of sequence steps per batch
lstm_size = 128         # Size of hidden layers in LSTMs
num_layers = 2          # Number of LSTM layers
learning_rate = 0.01    # Learning rate
keep_prob = 0.5         # Dropout keep probability
epochs = 20

dropout = 1 - keep_prob
num_classes = len(vocab)

# Save every N iterations
save_every_n = 200

# Print losses every N interations
print_every_n = 50


In [10]:
batch_generator = KerasBatchGenerator(encoded, batch_size, num_steps, num_classes)

input_ = Input(batch_shape=(batch_size, num_steps, 1), name='input')
lstm1 = LSTM(lstm_size, dropout = dropout, batch_size=batch_size, stateful=True, return_sequences=True)(input_)
lstm2 = LSTM(lstm_size, dropout = dropout)(lstm1)
logits = Dense(num_classes)(lstm2)
out = Activation('softmax')(logits)

model = Model(inputs=input_, outputs=out)
loss = model.compile(loss='categorical_crossentropy', optimizer='adam')


#checkpointer = ModelCheckpoint(filepath="checkpoints/model-{epoch:02d}.ckpt", verbose=1)

print(model.summary())
n_batches = batch_generator.get_n_batches()
print("Number of batches:", n_batches)
n_patterns = batch_generator.get_n_patterns()
print("Number of patterns:", n_patterns)


for e in range(epochs):
    counter = 0
    for x, y in batch_generator.batch_generate():
        start = time.time()
        counter += 1
        model.train_on_batch(x, y)
        
        if (counter % save_every_n) == 0:
            model.save("checkpoints/i{}_l{}.ckpt".format(counter, lstm_size))

        if (counter % print_every_n == 0):
            end = time.time()
            print('Epoch: {}/{}... '.format(e+1, epochs),
                    'Training Step: {}... '.format(counter),
                    '{:.4f} sec/batch'.format((end-start)))
        if counter > n_batches:
            end = time.time()
            model.save("checkpoints/i{}_l{}.ckpt".format(counter, lstm_size))
            print('Epoch: {}/{}... '.format(e+1, epochs),
                    'Training Step: {}... '.format(counter),
                    '{:.4f} sec/batch'.format((end-start)))
            break


#model.fit_generator(batch_generator.batch_generate(), steps_per_epoch=batch_generator.get_n_batches(), epochs=epochs,  callbacks=[checkpointer])
##model.fit(x=encoded, y=np.append(encoded[1:], 0), batch_size=batch_size, epochs=epochs, callbacks=[checkpointer])





_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input (InputLayer)           (10, 50, 1)               0         
_________________________________________________________________
lstm_1 (LSTM)                (10, 50, 128)             66560     
_________________________________________________________________
lstm_2 (LSTM)                (10, 128)                 131584    
_________________________________________________________________
dense_1 (Dense)              (10, 56)                  7224      
_________________________________________________________________
activation_1 (Activation)    (10, 56)                  0         
Total params: 205,368
Trainable params: 205,368
Non-trainable params: 0
_________________________________________________________________
None
Number of batches: 536
Number of patterns: 5362
Epoch: 1/20...  Training Step: 50...  0.1194 sec/batch
Epoch: 1/20...  Training Ste

Epoch: 12/20...  Training Step: 400...  0.1569 sec/batch
Epoch: 12/20...  Training Step: 450...  0.1228 sec/batch
Epoch: 12/20...  Training Step: 500...  0.1418 sec/batch
Epoch: 12/20...  Training Step: 537...  0.1191 sec/batch
Epoch: 13/20...  Training Step: 50...  0.1402 sec/batch
Epoch: 13/20...  Training Step: 100...  0.1261 sec/batch
Epoch: 13/20...  Training Step: 150...  0.1248 sec/batch
Epoch: 13/20...  Training Step: 200...  0.1510 sec/batch
Epoch: 13/20...  Training Step: 250...  0.1240 sec/batch
Epoch: 13/20...  Training Step: 300...  0.1569 sec/batch
Epoch: 13/20...  Training Step: 350...  0.1216 sec/batch
Epoch: 13/20...  Training Step: 400...  0.1484 sec/batch
Epoch: 13/20...  Training Step: 450...  0.1154 sec/batch
Epoch: 13/20...  Training Step: 500...  0.1226 sec/batch
Epoch: 13/20...  Training Step: 537...  0.1407 sec/batch
Epoch: 14/20...  Training Step: 50...  0.1259 sec/batch
Epoch: 14/20...  Training Step: 100...  0.1240 sec/batch
Epoch: 14/20...  Training Step: 1

## Sampling

Now that the network is trained, we'll can use it to generate new text. The idea is that we pass in a character, then the network will predict the next character. We can use the new one, to predict the next one. And we keep doing this to generate all new text. I also included some functionality to prime the network with some text by passing in a string and building up a state from that.

The network gives us predictions for each character. To reduce noise and make things a little less random, I'm going to only choose a new character from the top N most likely characters.



In [11]:
def pick_top_n(preds, vocab_size, top_n=5):
    p = np.squeeze(preds)
    print(p)
    print(p.shape)
    print(np.zeros(shape=(p.shape[0],1)))
    p[np.arange(np.shape(p)[0])[:,np.newaxis],np.argsort(p, axis=1)][:,:-top_n] 
    print(p)
    print(p.shape)
    p = p / np.sum(p)
    c = np.random.choice(vocab_size, 1, p=p)[0]
    return c

In [16]:
def pick_top_n(preds, vocab_size, top_n=5):
    preds = preds[0]
    p = np.squeeze(preds)
    p[np.argsort(p)[:-top_n]] = 0
    p = p / np.sum(p)
    c = np.random.choice(vocab_size, 1, p=p)[0]
    return c

In [13]:
start = np.random.randint(0, batch_generator.get_n_patterns() -1)
x, y = batch_generator.random_generate(start)
print(x.shape)

(10, 50, 1)


In [21]:
from functools import reduce

def sample(checkpoint, n_samples, lstm_size, vocab_size, batch_generator, start):
    
    x, y = batch_generator.random_generate(start)
    x_n_elements = reduce(lambda a, b: a*b, x.shape)
    text = list(np.squeeze(x[0,:,:]))
    for i in range(n_samples):
        #y = []
        #for i in range(10): 
        #    y.append(np.random.randint(0,5,56))
        #y = np.vstack(y)
        pred = pick_top_n(model.predict(x), len(vocab))
        #pred = pick_top_n(y, len(vocab))
        #print(x)
        pred_slice_tocat = np.reshape([pred] * x_n_elements, x.shape)[:,0:1,:]
        x = np.concatenate((x[:,1:,:], pred_slice_tocat), axis=1)
        text.append(pred)
        
    print(text)
    samples = ''.join([int_to_vocab[c] for c in text])
    return samples

Here, pass in the path to a checkpoint and sample from the network.

In [22]:
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = sample(checkpoint, 2000, lstm_size, len(vocab), batch_generator, np.random.randint(0, batch_generator.get_n_patterns() -1))
print(samp)

[45, 29, 19, 27, 1, 19, 10, 41, 17, 41, 39, 37, 41, 17, 41, 39, 19, 35, 19, 17, 22, 37, 29, 17, 48, 31, 17, 29, 19, 48, 29, 19, 17, 44, 48, 17, 41, 39, 19, 44, 35, 17, 10, 44, 6, 44, 48, 3, 17, 41, 35, 41, 39, 1, 40, 37, 17, 19, 19, 19, 19, 19, 40, 17, 44, 41, 17, 31, 35, 17, 37, 35, 12, 19, 37, 41, 44, 35, 1, 37, 37, 17, 19, 19, 17, 44, 35, 29, 19, 19, 44, 48, 41, 17, 31, 35, 44, 19, 17, 31, 48, 17, 40, 17, 10, 31, 23, 19, 44, 48, 41, 41, 37, 37, 39, 31, 44, 44, 44, 29, 37, 37, 41, 17, 44, 35, 35, 39, 37, 41, 17, 19, 19, 26, 44, 17, 10, 19, 44, 17, 19, 19, 44, 48, 41, 44, 44, 37, 44, 31, 37, 39, 35, 44, 19, 10, 19, 19, 17, 44, 48, 41, 17, 31, 41, 41, 44, 19, 10, 44, 48, 29, 39, 17, 44, 35, 44, 17, 19, 44, 23, 19, 44, 35, 29, 39, 37, 41, 19, 19, 31, 17, 19, 19, 17, 44, 35, 41, 31, 37, 44, 31, 44, 17, 17, 19, 19, 19, 31, 48, 17, 44, 35, 44, 39, 37, 29, 44, 44, 35, 41, 37, 37, 41, 19, 31, 35, 44, 31, 41, 31, 1, 19, 19, 26, 35, 44, 44, 35, 39, 31, 37, 37, 37, 35, 41, 31, 44, 37, 41, 35, 3

In [24]:
checkpoint = 'checkpoints/i200_l512.ckpt'
samp = sample(checkpoint, 2000, lstm_size, len(vocab), batch_generator, np.random.randint(0, batch_generator.get_n_patterns() -1))
print(samp)

KeyboardInterrupt: 

In [None]:
checkpoint = 'checkpoints/i600_l512.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)

In [None]:
checkpoint = 'checkpoints/i1200_l512.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)