# Character RNN

We use an updated version of Andrej Karpathy's seminal [blog](http://karpathy.github.io/2015/05/21/rnn-effectiveness) - The unusual effectiveness of RNN ([gist](https://gist.github.com/karpathy/d4dee566867f8291f086) | [code](https://github.com/karpathy/char-rnn)) and an updated code [here](https://render.githubusercontent.com/view/sessions/RNN.ipynb). 

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from collections import Counter
from random import uniform

In [1]:
!mkdir dataset

mkdir: cannot create directory ‘dataset’: File exists


In [2]:
!wget https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt -O dataset/input.txt

--2021-03-09 14:57:15--  https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1115394 (1.1M) [text/plain]
Saving to: 'dataset/input.txt'


2021-03-09 14:57:16 (3.92 MB/s) - 'dataset/input.txt' saved [1115394/1115394]



In [4]:
filename = 'dataset/input.txt'
text = open(filename, 'rt').read()
print("Number of characters: {}".format(len(text)))
print("Number of unique characters: {}".format(len(set(text))))
print("Number of lines: {}".format(text.count('\n')))
print("Number of words: {}".format(text.count(' ')))
print()
print("Excerpt:")
print("*" * len("Excerpt:"))
print(text[:500])


Number of characters: 1115394
Number of unique characters: 65
Number of lines: 40000
Number of words: 169892

Excerpt:
********
First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.

All:
We know't, we know't.

First Citizen:
Let us kill him, and we'll have corn at our own price.
Is't a verdict?

All:
No more talking on't; let it be done: away, away!

Second Citizen:
One word, good citizens.

First Citizen:
We are accounted poor


In [6]:
# convert the input text to characters
chars = list(set(text))

# total characters and unique character size
data_size, vocab_size = len(text), len(chars) 

# dict from index to char and vice-versa - converted to numpy
idx_to_char = dict(enumerate(chars)) # { i: ch for i,ch in enumerate(chars) }
char_to_idx = dict(zip(idx_to_char.values(), idx_to_char.keys())) # { ch: i for i,ch in enumerate(chars) }
data = np.array([char_to_idx[c] for c in text], dtype=int)


Defining the softmax and loss functions

In [7]:
def softmax(x):
    x = x.squeeze()
    expx = np.exp(x - x.sum())
    return expx / expx.sum()

def cross_entropy(predictions, targets):
    return sum([-np.log(p[t]) for p, t in zip(predictions, targets)])

In [8]:
# Optimization code 
def average(prev, curr, β):
    return [
        β * p + (1 - β) * c
        for p, c
        in zip(prev, curr)
    ]
    
class AdamOptimizer:
    def __init__(self, α=0.001, β1=0.9, β2=0.999, ϵ=1e-8):
        self.α = α
        self.β1 = β1
        self.β2 = β2
        self.ϵ = ϵ
        self.m = None
        self.v = None
        self.t = 0

    def send(self, gradients):
        if self.m is None:
            self.m = [0] * len(gradients)
        if self.v is None:
            self.v = [0] * len(gradients)

        self.t += 1
        αt = self.α * np.sqrt(1 - self.β2**self.t) / (1 - self.β1**self.t)
        self.m = average(self.m, gradients, self.β1)        
        self.v = average(self.v, (g*g for g in gradients), self.β2)

        updates = [-αt * mi / (np.sqrt(vi) + self.ϵ) for mi, vi in zip(self.m, self.v)]
        for upd in updates:
            assert np.isfinite(upd).all()
        return updates


In [9]:
def step(params, x_t, h_t_1=None):
    Wxh, Whh, Why, bh, by = params
    if h_t_1 is None:
        h_t_1 = np.zeros(h_size)    
    if h_t_1.ndim == 1:
        h_t_1 = h_t_1.reshape(-1, 1)
    if x_t.ndim == 1:
        x_t = x_t.reshape(-1, 1)

    # update hidden layer
    h_t = np.tanh(Wxh @ x_t + Whh @ h_t_1 + bh)
    # fully connected layer
    z_t = Why @ h_t + by
    z_t = z_t.squeeze()
    h_t = h_t.squeeze()
    # softmax readout layer
    yhat_t = softmax(z_t)
    return h_t, z_t, yhat_t

def feed_forward(params, x, h0=None):
    if h0 is None:
        h0 = np.zeros(h_size)
    h = {-1: h0}
    
    shape = (len(x), vocab_size)
    x_original = x.copy()
    x, z, yhat = np.zeros(shape), np.empty(shape), np.empty(shape)
    
    for t, char_idx in enumerate(x_original):
        x[t, char_idx] = 1.0 # one-hot encoding input into xs  
        h[t], z[t, :], yhat[t, :] = step(params, x[t, :], h[t-1])

    return x, h, z, yhat


In [10]:
def back_propagation(params, x, y, h0=None):
    """Calculates loss and gradiens of loss wrt paramters
    
    See http://cs231n.github.io/neural-networks-case-study/#grad
    
    Parameters
    ----------
    params : list of arrays
        model parameters
    x, y : list of integers
        indices of characters for the input and target of the network
    h0 : np.ndarray
        initial hidden state of shape Hx1
    Returns
    -------
    loss : float
        value of loss function
    dWxh, dWhh, dWhy, dbh, dby 
        gradients of the loss function wrt to model parameters
    h0 : np.ndarray
        initial hidden state
    """
    n_inputs = len(x)
    # forward pass: compute predictions and loss going forwards
    x, h, z, yhat = feed_forward(params, x, h0=h0)
    loss = cross_entropy(yhat, y)
    
    # backward pass: compute gradients going backwards
    Wxh, Whh, Why, bh, by = params
    dWxh, dWhh, dWhy = np.zeros_like(Wxh), np.zeros_like(Whh), np.zeros_like(Why)
    dbh, dby = np.zeros_like(bh), np.zeros_like(by)
    dh_next = np.zeros_like(h[0])
    
    # back propagate through the unrolled network
    for t in reversed(range(len(y))):
        # backprop into y
        x_t, h_t, yhat_t, y_t = x[t], h[t], yhat[t], y[t] # can't zip because hs is not ordered
        dyhat = yhat_t.copy()
        dyhat[y_t] -= 1 # Yhat - Y
        dWhy += np.outer(dyhat, h_t)  # outer product, same as dy.reshape(-1, 1) @ h.reshape(1, -1)
        dby += dyhat.reshape(-1, 1) # dby is a column vector
        # backprop into h_t
        dh = Why.T @ dyhat + dh_next
        # backprop through tanh
        dh = (1 - h_t * h_t) * dh # tanh'(x) = 1-x^2
        dbh += dh.reshape(-1, 1) # dbh is a column vector
        dWxh += np.outer(dh, x_t)
        dWhh += np.outer(dh, h[t-1]) # try to use h[t] instead of h[t-1] and see effect in grad_check
        dh_next = Whh.T @ dh

    gradients = dWxh, dWhh, dWhy, dbh, dby
    for grad in gradients:
        # clip to mitigate exploding gradients
        np.clip(grad, -5, 5, out=grad) # out=grad makes this run in-place
    return loss, gradients, h[n_inputs-1]


## Trainer

Data in batches of `seq_length` and calculation of gradients. 

In [11]:
class Trainer:
    def __init__(self, data, seq_length):
        self.optimizer = AdamOptimizer()
        self.step, self.pos, self.h = 0, 0, None
        self.seq_length = seq_length
        self.data = data

    def train(self, params):
        self.step += 1
        if self.pos + self.seq_length + 1 >= len(self.data):
            # reset data position and hidden state
            self.pos, self.h = 0, None
        x = self.data[self.pos : self.pos + self.seq_length]
        y = self.data[self.pos + 1 : self.pos + self.seq_length + 1]
        
        loss, gradients, self.h = back_propagation(params, x, y, self.h)
        Δs = self.optimizer.send(gradients)
        for par, Δ in zip(params, Δs):
            par += Δ
        self.pos += self.seq_length
        return loss


In [12]:
def sample(params, seed_idx, num_samples, h0=None):
    x = np.zeros((num_samples + 1, vocab_size), dtype=float)
    x[0, seed_idx] = 1
    idx = np.empty(num_samples, dtype=int)
    h_t = h0
    for t in range(num_samples):
        h_t, _, yhat_t = step(params, x[t, :], h_t)
        # draw from output distribution
        idx[t] = np.random.choice(range(vocab_size), p=yhat_t.ravel())        
        x[t + 1, idx[t]] = 1
    chars = (idx_to_char[i] for i in idx)
    return str.join('', chars)


## Parameter initialization

In [13]:
h_size = 100 # number of units in hidden layer
seq_length = 25 # number of steps to unroll the RNN for
max_train_step = 500000

# initialize model parameters
Wxh = np.random.randn(h_size, vocab_size) * 0.01 
Whh = np.random.randn(h_size, h_size) * 0.01
Why = np.random.randn(vocab_size, h_size) * 0.01 
bh = np.zeros((h_size, 1)) # hidden layer bias
by = np.zeros((vocab_size, 1)) # readout layer bias
params = Wxh, Whh, Why, bh, by

trainer = Trainer(data, seq_length)


In [14]:
while trainer.step < max_train_step:
    loss = trainer.train(params)
    if trainer.step % (max_train_step//10) == 0:
        sample_text = sample(params, 0, 200)
        print(sample_text)
        print()
        print('train step {:d}, loss: {:.2g}'.format(trainer.step, loss))
        print('-'*80)


en:
Now the
gins day my feirs
Then his
ane conspeck you are toy;
I genal that is your panes sornact atill: and tethis your nase
Jalichis.

Sepoth I pear threeds you show, by thee, sir, as lo; the fart

train step 50000, loss: 36
--------------------------------------------------------------------------------
ed furghter thou my sword?

HASTINGS:
Alasion,.

BUCKINGHAM:
Then, when, my ramms, by ent since to him how: my soife! Beery-not is that well the consen
Thy will tween not thee this don''d now by pack 

train step 100000, loss: 19
--------------------------------------------------------------------------------
ence is thus broke me thou all neadsh a our my deave our peod I basping.
Hare the head?

Lake, Bolainster eign our toops made say and sence,
And did the lightroy the radious his laintners of thy may g

train step 150000, loss: 51
--------------------------------------------------------------------------------
ent
Unill the too my leave, and tike on't ther, who none:
Oument me?