# Task 2: Char-RNN

Char-RNN implements multi-layer Recurrent Neural Network (RNN, LSTM, and GRU) for training/sampling from character-level language models. In other words the model takes one text file as input and trains a Recurrent Neural Network that learns to predict the next character in a sequence. The RNN can then be used to generate text character by character that will look like the original training data. This network is first posted by Andrej Karpathy, you can find out about his original code on https://github.com/karpathy/char-rnn, the original code is written in *lua*.

Here we will implement Char-RNN using Tensorflow!

In [1]:
import time
import numpy as np
import tensorflow as tf

# Notebook auto reloads code. (Ref: http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython)
%load_ext autoreload
%autoreload 2

## Part 1: Setup
In this part, we will read the data of our input text and process the text for later network training. There are two txt files in the data folder, for computing time consideration, we will use tinyshakespeare.txt here.

In [3]:
with open('data/tinyshakespeare.txt', 'r') as f:
    text=f.read()
# length of text is the number of characters in it
print('Length of text: {} characters'.format(len(text)))
# and let's get a glance of what the text is
print(text[:500])

Length of text: 1115394 characters
First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.

All:
We know't, we know't.

First Citizen:
Let us kill him, and we'll have corn at our own price.
Is't a verdict?

All:
No more talking on't; let it be done: away, away!

Second Citizen:
One word, good citizens.

First Citizen:
We are accounted poor


In [4]:
# The unique characters in the file
vocab = sorted(set(text))
print ('{} unique characters'.format(len(vocab)))

65 unique characters


In [5]:
# Creating a mapping from unique characters to indices
vocab_to_ind = {c: i for i, c in enumerate(vocab)}
ind_to_vocab = dict(enumerate(vocab))
text_as_int = np.array([vocab_to_ind[c] for c in text], dtype=np.int32)

# We mapped the character as indexes from 0 to len(vocab)
for char,_ in zip(vocab_to_ind, range(20)):
    print('{:6s} ---> {:4d}'.format(repr(char), vocab_to_ind[char]))
# Show how the first 10 characters from the text are mapped to integers
print ('{} --- characters mapped to int --- > {}'.format(text[:10], text_as_int[:10]))

'\n'   --->    0
' '    --->    1
'!'    --->    2
'$'    --->    3
'&'    --->    4
"'"    --->    5
','    --->    6
'-'    --->    7
'.'    --->    8
'3'    --->    9
':'    --->   10
';'    --->   11
'?'    --->   12
'A'    --->   13
'B'    --->   14
'C'    --->   15
'D'    --->   16
'E'    --->   17
'F'    --->   18
'G'    --->   19
First Citi --- characters mapped to int --- > [18 47 56 57 58  1 15 47 58 47]


## Part 2: Creating batches
Now that we have preprocessed our input data, we then need to partition our data, here we will use mini-batches to train our model, so how will we define our batches?

Let's first clarify the concepts of batches:
1. **batch_size**: Reviewing batches in CNN, if we have 100 samples and we set batch_size as 10, it means that we will send 10 samples to the network at one time. In RNN, batch_size have the same meaning, it defines how many samples we send to the network at one time.
2. **sequence_length**: However, as for RNN, we store memory in our cells, we pass the information through cells, so we have this sequence_length concept, which also called 'steps', it defines how long a sequence is.

From above two concepts, we here clarify the meaning of batch_size in RNN. Here, we define the number of sequences in a batch as N and the length of each sequence as M, so batch_size in RNN **still** represent the number of sequences in a batch but the data size of a batch is actually an array of size **[N, M]**.

<span style="color:red">TODO:</span>
finish the get_batches() function below to generate mini-batches.

Hint: this function defines a generator, use *yield*.

In [6]:
def get_batches(array, n_seqs, n_steps):
    '''
    Partition data array into mini-batches
    input:
    array: input data
    n_seqs: number of sequences in a batch
    n_steps: length of each sequence
    output:
    x: inputs
    y: targets, which is x with one position shift
       you can check the following figure to get the sence of what a target looks like
    '''
    batch_size = n_seqs * n_steps
    n_batches = int(len(array) / batch_size)
    # we only keep the full batches and ignore the left.
    array = array[:batch_size * n_batches]
    array = array.reshape((n_seqs, -1))
    
    # You should now create a loop to generate batches for inputs and targets
    #############################################
    #           TODO: YOUR CODE HERE            #
    #############################################
    while True:
        np.random.shuffle(array)
        for n in range(0,array.shape[1],n_steps):
            x=array[:,n:n+n_steps]
            y=np.zeros_like(x)
            y[:,:-1],y[:,-1]=x[:,1:],x[:,0]
            yield x,y

In [7]:
batches = get_batches(text_as_int, 10, 10)
x, y = next(batches)
print('x\n', x[:10, :10])
print('\ny\n', y[:10, :10])

x
 [[50 58 57  1 51 39 63  1 57 46]
 [57 47 53 52  1 53 44  1 56 43]
 [46 47 51  1 42 53 61 52  1 58]
 [ 1 43 52 43 51 63 11  0 37 43]
 [56 44 53 50 49  6  0 27 52  1]
 [ 1 40 43 43 52  1 57 47 52 41]
 [56 57  6  1 39 52 42  1 57 58]
 [18 47 56 57 58  1 15 47 58 47]
 [47 52  1 57 54 47 58 43  1 53]
 [52 58 43 42  1 60 47 56 58 59]]

y
 [[58 57  1 51 39 63  1 57 46 50]
 [47 53 52  1 53 44  1 56 43 57]
 [47 51  1 42 53 61 52  1 58 46]
 [43 52 43 51 63 11  0 37 43  1]
 [44 53 50 49  6  0 27 52  1 56]
 [40 43 43 52  1 57 47 52 41  1]
 [57  6  1 39 52 42  1 57 58 56]
 [47 56 57 58  1 15 47 58 47 18]
 [52  1 57 54 47 58 43  1 53 47]
 [58 43 42  1 60 47 56 58 59 52]]


## Part 3: Build Char-RNN model
In this section, we will build our char-rnn model, it consists of input layer, rnn_cell layer, output layer, loss and optimizer, we will build them one by one.

The goal is to predict new text after given prime word, so for our training data, we have to define inputs and targets, here is a figure that explains the structure of the Char-RNN network.

![structure](img/charrnn.jpg)

<span style="color:red">TODO:</span>
finish all TODOs in ecbm4040.CharRNN and the blanks in the following cells.

**Note: The training process on following settings of parameters takes about 20 minutes on a GTX 1070 GPU, so you are suggested to use GCP for this task.**

In [9]:
from ecbm4040.CharRNN import *

### Training
Set sampling as False(default), we can start training the network, we automatically save checkpoints in the folder /checkpoints.

In [10]:
# these are preset parameters, you can change them to get better result
batch_size = 100         # Sequences per batch
num_steps = 100          # Number of sequence steps per batch
rnn_size = 256           # Size of hidden layers in rnn_cell
num_layers = 2           # Number of hidden layers
learning_rate = 0.005    # Learning rate

In [11]:
model = CharRNN(len(vocab), batch_size, num_steps, 'LSTM', rnn_size,
               num_layers, learning_rate)
batches = get_batches(text_as_int, batch_size, num_steps)
model.train(batches, 6000, 2000)

step: 200  loss: 2.0194  0.1889 sec/batch
step: 400  loss: 1.7171  0.1864 sec/batch
step: 600  loss: 1.5774  0.1907 sec/batch
step: 800  loss: 1.4624  0.1934 sec/batch
step: 1000  loss: 1.5236  0.1900 sec/batch
step: 1200  loss: 1.3919  0.1867 sec/batch
step: 1400  loss: 1.3587  0.1975 sec/batch
step: 1600  loss: 1.3317  0.1892 sec/batch
step: 1800  loss: 1.3074  0.1925 sec/batch
step: 2000  loss: 1.2914  0.1955 sec/batch
step: 2200  loss: 1.2880  0.1961 sec/batch
step: 2400  loss: 1.2556  0.1959 sec/batch
step: 2600  loss: 1.2971  0.1892 sec/batch
step: 2800  loss: 1.2453  0.1946 sec/batch
step: 3000  loss: 1.1979  0.1891 sec/batch
step: 3200  loss: 1.2134  0.1926 sec/batch
step: 3400  loss: 1.2324  0.1902 sec/batch
step: 3600  loss: 1.2242  0.1878 sec/batch
step: 3800  loss: 1.1876  0.1914 sec/batch
step: 4000  loss: 1.1696  0.1923 sec/batch
step: 4200  loss: 1.2042  0.1956 sec/batch
step: 4400  loss: 1.1697  0.1887 sec/batch
step: 4600  loss: 1.1915  0.1883 sec/batch
step: 4800  los

In [12]:
# look up checkpoints
tf.train.get_checkpoint_state('checkpoints_LSTM')

model_checkpoint_path: "checkpoints_LSTM/i6000_l256.ckpt"
all_model_checkpoint_paths: "checkpoints_LSTM/i2000_l256.ckpt"
all_model_checkpoint_paths: "checkpoints_LSTM/i4000_l256.ckpt"
all_model_checkpoint_paths: "checkpoints_LSTM/i6000_l256.ckpt"

### Sampling
Set the sampling as True and we can generate new characters one by one. We can use our saved checkpoints to see how the network learned gradually.

In [13]:
model = CharRNN(len(vocab), batch_size, num_steps,'LSTM', rnn_size,
               num_layers, learning_rate, sampling=True)
# choose the last checkpoint and generate new text
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = model.sample(checkpoint, 1000, len(vocab), vocab_to_ind, ind_to_vocab, prime="LORD ")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints/i6000_l256.ckpt
LORD TYRANUS:
So; they then, will thine on your head is troth,
And well I have all my call that be made as long.

PARIS:
To the mother, what are your mother?

Secont Servingman:
What swear, I will not so stoop on wife?

PETER:
I have sorrow stain the state to tranch thou art.

LUCIO:
I will not send me for their servant of your silling and they
are an indouching of to trump,
And then I must altogety to thine entingive;
I must be so, then here come forth to thee.

KING RICHARD III:
And would you have the power of him and his
meaning.

CAPULET:
I am a sight of me out of your beauty,
I will be to that strange, tell me that to-day,
Which the south of his cunnor with the world,
The pardon than strive with him to step a supble, which
such little broughts that save and marry wings, thou hast made
an one where when thy son, thou hast but bid, a break
An end in his soul fortune will be pride:
As well as thou and this still be

In [14]:
# choose a checkpoint other than the final one and see the results. It could be nasty, don't worry!
#############################################
#           TODO: YOUR CODE HERE            #
#############################################
checkpoint = 'checkpoints_LSTM/i4000_l256.ckpt'
samp = model.sample(checkpoint, 1000, len(vocab), vocab_to_ind, ind_to_vocab, prime="LORD ")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints_LSTM/i4000_l256.ckpt
LORD AUMILO:
A cundiss that, a word. A with yonder things
To the corn, for this sons hath to supplod'd.

GLOUCESTER:
I am a greatire of my beauty.

GREMIO:
It is the treason of the case of her
Thou straight, and that I can tell your brief to him.

BUCKINGHAM:
Now, to him for his commonwealth the senators,
Which he having tamed the wars of state warm man,
When you have barrable strive to their horses,
With his assisting battle strokes with women.

KING RICHARD III:
With her shin not, my gallant I would
When I were short with a crown of me as
But my true ladys' seas, should she his son:
Thou hast spoke to this dead, why, then you will be,
And being the point of thine a true our war;
Which with and throne shall have this thought of hang.

GLEUCHER:
A penitent hole, and thou hast my state,
Thou whight not how? and this to my be arm,
That's my the postern of my thoughts.

BUCKINGHAM:
The gates of the shall--this dece

### Change another type of RNN cell
We are using LSTM cell as the original work, but GRU cell is getting more popular today, let's chage the cell in rnn_cell layer to GRU cell and see how it performs. Your number of step should be the same as above.

**Note: You need to change your saved checkpoints' name or they will rewrite the LSTM results that you have already saved.**

In [15]:
# these are preset parameters, you can change them to get better result
batch_size = 100         # Sequences per batch
num_steps = 100          # Number of sequence steps per batch
rnn_size = 256           # Size of hidden layers in rnn_cell
num_layers = 2           # Number of hidden layers
learning_rate = 0.005    # Learning rate

model = CharRNN(len(vocab), batch_size, num_steps, 'GRU', rnn_size,
               num_layers, learning_rate)
batches = get_batches(text_as_int, batch_size, num_steps)
model.train(batches, 6000, 2000)

step: 200  loss: 1.9665  0.1746 sec/batch
step: 400  loss: 1.6980  0.1738 sec/batch
step: 600  loss: 1.5670  0.1748 sec/batch
step: 800  loss: 1.4581  0.1760 sec/batch
step: 1000  loss: 1.5451  0.1734 sec/batch
step: 1200  loss: 1.3833  0.1834 sec/batch
step: 1400  loss: 1.3690  0.1731 sec/batch
step: 1600  loss: 1.3436  0.1729 sec/batch
step: 1800  loss: 1.3182  0.1842 sec/batch
step: 2000  loss: 1.3090  0.1771 sec/batch
step: 2200  loss: 1.3000  0.1788 sec/batch
step: 2400  loss: 1.2858  0.1761 sec/batch
step: 2600  loss: 1.3043  0.1755 sec/batch
step: 2800  loss: 1.2655  0.1789 sec/batch
step: 3000  loss: 1.2446  0.1820 sec/batch
step: 3200  loss: 1.2382  0.1844 sec/batch
step: 3400  loss: 1.2618  0.1759 sec/batch
step: 3600  loss: 1.2537  0.1771 sec/batch
step: 3800  loss: 1.2227  0.1802 sec/batch
step: 4000  loss: 1.2143  0.1762 sec/batch
step: 4200  loss: 1.2385  0.1786 sec/batch
step: 4400  loss: 1.2033  0.1781 sec/batch
step: 4600  loss: 1.2390  0.1747 sec/batch
step: 4800  los

In [16]:
model = CharRNN(len(vocab), batch_size, num_steps, 'GRU', rnn_size,
               num_layers, learning_rate, sampling=True)
# choose the last checkpoint and generate new text
checkpoint = tf.train.latest_checkpoint('checkpoints_GRU')
samp = model.sample(checkpoint, 1000, len(vocab), vocab_to_ind, ind_to_vocab, prime="LORD ")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints_GRU/i6000_l256.ckpt
LORD ARCHIS:
Whom with my servant doth the king's dear?

BUCKINGHAM:
What is the woes be true to hear the common man!

KING EDWARD IV:
The good, whose foul tongue seess his father.

GLOUCESTER:
I dare not long, and show me well as heavel,
That he dreads such me thanks his shield to thee.
What hath he should be consupt yates?

LADY ANNE:
Why, then, I think there.

GLOUCESTER:
Then the kump' day the duke of the day.

QUEEN ELIZABETH:
But it is so, my larges all, but shadows meet.

LADY ANNE:s true.

KING RICHARD III:
The soldiers of as he which the war
Should bring me to my hand with things that made
When then his hand the horrer than you went.

GLOUCESTER:
Where is to best croom out his father?

KING EDWARD IV:
And therefore I had shade the sea while then.

PRINCE EDWARD:
Ay, boy: it is the day, and there my tongue;
Being another which days warm with them;
And therefore give the sendous provered grave.

KING EDWAR

#### Questions
1. Compare your result of two networks that you built and the reasons that caused the difference. (It is a qualitative comparison, it should be based on the specific model that you build.)
2. Discuss the difference between LSTM cells and GRU cells, what are the pros and cons of using GRU cells?

Answer:

1. GRU can be trained slightly faster than LSTM. (About 0.18 sec/batch vs 0.19 sec/batch, because LSTM has one more gate than GRU and thus need more tensor operation)  
The performance of LSTM is a little bit better than GRU, which has more complete sentenses and words, because its gates design is more intricate.


2. A GRU unlike an LSTM network does not have a cell state and has 2 gates instead of 3(forget, update, output). A gated recurrent unit (GRU) uses an update gate and a reset gate. The update gate decides on how much of information from the past should be let through and the reset gate decides on how much of information from the past should be discarded. Also, the GRU unit controls the flow of information like the LSTM unit, but without having to use a memory unit. It just exposes the full hidden content without any control.  
*Pros:* GRU is computationally efficient and trained faster than an LSTM network, due to the reduction of gates.  
*Cons:* GRU is not as good as LSTM in terms of performance.