In [None]:
## some voo-doo Jupyter magicks
%load_ext autoreload
%autoreload 2

# Do We Really Need English Lit Anymore? ;p

## Mar 29, 2018

We'll be building a 3-layer LSTM (though it can be easily extended) using the [TensorFlow's RNN API](https://github.com/tensorflow/tensorflow/blob/r1.6/tensorflow/python/ops/rnn_cell_impl.py#L476)'s implementation.

Ultimately, its important to understand how LSTMs work, but in practice, I've never seen it hand-coded except in "deep learning" libraries.

About today's dataset, its all of William Shakespeare's gibberish (jk, he's supposedly real dope). Ultimately, we won't be training this network on it, but I'll have another notebook running in the background that will be performing the optimizations.
> **NOTE:** It took about 30 minutes to run on my 1080 Ti at home. Unless you use a GPU, you'll be waiting a while &ndash; I don't advise doing anything with this unless you have a GPU, or a desktop you can do without for some time.

---
**What you should leave with:** You should leave with a practical understanding of how to utilize the TensorFlow RNN API, which gives us access to more than LSTMs and RNNs (as you'll see below). Ultimately, there's a lot of parameter fiddling we could implement to determine the best setup for our network (I'm using some of the parameter settings from [Andrej Karpathy's blog bost on RNNs](karpathy.github.io/2015/05/21/rnn-effectiveness/)). 

You should continue to foster that itch that was (hopefully) sparked in the first workshop &ndash; sadly we'll likely switch to a different library in the fall; but this workshop ought to further showcase how using these can speed up your building of models to test out ideas and hypotheses you may have.

Some of the only times where this isn't the case is when you're building a new architecture, but, uhh... we're not that good, yet. :p

# Contents:
1. [Building Our Model](#1.-Building-Our-Model)
1. [Writing up the Training Phase](#2.-Writing-Up-the-Training-Phase)

# 1. Building Our Model

The code below is essentially some initial setup, allowing us to write up parts of our `Model` as cells in Jupyter rather that a massive code-cell.

In [None]:
## Class initialization
class Model:
    pass

Importing everything we need from TensorFlow to build the LSTM.

```python
from tensorflow.contrib import rnn # www.tensorflow.org/versions/r1.4/api_docs/python/tf/contrib/rnn
from tensorflow.contrib import legacy_seq2seq # www.tensorflow.org/versions/r1.4/api_docs/python/tf/contrib/legacy_seq2seq
```
The actual links: 
- [`from tensorflow.contrib import rnn`](www.tensorflow.org/versions/r1.4/api_docs/python/tf/contrib/rnn) This is the RNN library which basically houses everything we could ever want for the lecture.
- [`from tensorflow.contrib import legacy_seq2seq`](www.tensorflow.org/versions/r1.4/api_docs/python/tf/contrib/legacy_seq2seq) This is deprecated, not sure when it'll be taken out &ndash; but for the purposes of teaching, it works. We'll be using it to build a [Sequence to Sequence](https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html) model.

As noted in the blog post (the Seq2Seq link), these models are typically used in machine translation, but they can also be used in text generation. We'll, obviously, be partaking in the latter.

In [None]:
import tensorflow as tf
from tensorflow.contrib import rnn
from tensorflow.contrib import legacy_seq2seq as ls2s

import numpy as np

In [None]:
## This is how Python implements `switch` statements :/ we
##   have to use a dictionary instead, but we'll be fine. ^^
models = {
    "rnn" : rnn.BasicRNNCell,
    "gru" : rnn.GRUCell,
    "lstm": rnn.BasicLSTMCell,
    "nas" : rnn.NASCell,
}

In [None]:
def training(args, training):
    ## we want a batch_size and seq_length of 1 for sampling
    args.batch_size = args.batch_size if training else 1
    args.seq_length = args.seq_length if training else 1
    return args

def build_rnn(args):
    ## grabbing the desired model, if it exists
    assert args.model in models.keys(), 
        "-> model type unsupported: {}".format(args.model)
    cell_fn = models[args.model]
    
    ## building the LSTM cells
    cells = [cell_fn(args.rnn_size) for _ in range(args.n_layers)]
    
    ## wrapping the LSTM cells into LSTM network we desire
    ##  - this is akin to tacking on additional hidden layers to 
    ##    a ANN
    return rnn.MultiRNNCell(cells)
    
def build_reshape_input(args, inp):
    embeds = tf.get_variable("embedding", [args.vocab_size, args.rnn_size])
    inputs = tf.nn.embedding_lookup(embeds, inp)
    
    ## split the input into chunks of `seq_length`
    inputs = tf.split(inputs, args.seq_length, 1)
    ## trim the split to remove extraneous single value dimensions
    ## - ex: arry.shape = [1, 3, 6, 1, 1, 2] -> [3, 6, 2], b/c the values of 
    ##   [1] just means unnecessary indexing (getting )
    inputs = [tf.squeeze(input_, [1]) for input_ in inputs]
    
    return embeds, inputs

In [None]:
def __init__(self, args, training=True):
    self.args = training(args, training)
    
    self.cell = build_rnn(args)
    
    self.input   = tf.placeholder(tf.int32, 
                                  [args.batch_size, args.seq_length])
    self.targets = tf.placeholder(tf.int32, 
                                  [args.batch_size, args.seq_length])
    ## intiial state - this should be 0s b/c we can't possibly have
    ##   recall when we just start learning
    self.init_state = cell.zero_state(args.batch_size, tf.float32)
    
    ## rnn language model, using softmax as the activation function
    with tf.variable_scope("rnnlm"):
        softmax_w = tf.get_variable("softmax_w", 
                                    [args.rnn_size, args.vocab_size])
        softmax_b = tf.get_variable("softmax_b", 
                                    [args.vocab_size])

    embeds, inputs = build_reshape_input(args, self.input)    

    ## as the docs spec: https://www.tensorflow.org/versions/r1.4/api_docs/python/tf/contrib/legacy_seq2seq/rnn_decoder#args
    ##   we won't be using the index, so in standard python we spec "don't care" by '_'
    def loop(prev, _):
        ## standard neural net math :joy:
        prev = tf.matmul(prev, softmax_w) + softmax_b
        ## 
        prev_symbol = tf.stop_gradient(tf.argmax(prev, 1))
        return tf.nn.embedding_lookup(embeds, prev_symbol)

    
    outputs, last_state = ls2s.rnn_decoder(inputs, self.init_state, cell, 
                                           loop_function=loop if not training else None, 
                                           scope="rnnlm")
    output = tf.reshape(tf.concat(outputs, 1), [-1, args.rnn_size])

    self.logits = tf.matmul(output, softmax_w) + softmax_b
    self.probs  = tf.nn.softmax(self.logits)
    loss = ls2s.sequence_loss_by_example(
            [self.logits],
            [tf.reshape(self.targets, [-1])],
            [tf.ones([args.batch_size * args.seq_length])])
    
    with tf.name_scope("cost"):
        self.cost = tf.reduce_sum(loss) / (args.batch_size * args.seq_length)
        
    self.final_state = last_state
    
    self.lr = tf.Variable(0.0, trainable=False)
    tvars = tf.trainable_variables()
    grads, _ = tf.clip_by_global_norm(tf.gradients(self.cost, tvars), args.grad_clip)
    
    with tf.name_scope("optimizer"):
        optimizer = tf.train.AdamOptimizer(self.lr)
        
    self.train_op = optimizer.apply_gradients(zip(grads, tvars))

    tf.summary.histogram("logits", self.logits)
    tf.summary.histogram("loss", loss)
    tf.summary.scalar("train_loss", self.cost)
    
Model.__init__ = __init__

In [None]:
def sample(self, sess, chars, covab, num=200, prime="The ", sampling_type=1):
    state = sess.run(self.cell.zero_state(1, tf.float32))
    
    for char in prime[:-1]:
        x = np.zeroes((1, 1))
        x[0, 0] = vocab[char]
        feed = {self.input_data: x, self.initial_state: state}
        [state] = sess.run([self.final_state], feed)
        
    def weighted_pick(weights):
        t = np.cumsum(weights)
        s = np.sum(weights)
        return int(np.searchsorted(t, np.random.rand(1) * s))
    
    ret = prime; char = prime[-1]
    for n in range(num):
        x = np.zeroes((1, 1))
        x[0, 0] = vocab[char]
        feed = {self.input_data: x, self.initial_state: state}
        [probs, state] = sess.run([self.probs, self.final_state], feed)
        p = probs[0]
        
        sample_argmax = bool(sampling_type == 0 or (sampling_type == 2 and char != ' '))
        sample = np.argmax(p) if sample_argmax else weighted_pick(p)
        
        pred = chars[sample]
        ret += pred
        char = pred
        
    return ret

Model.sample = sample

In [None]:
from utils import Args

args = {
    "data_dir": "/data/shakespeare/",
    "rnn_size": 700,
    "n_layers":   3,
}

args = Args(args=args)

Andrej (he's dope, btw) have a few data sources we can use. You'll find them here: https://cs.stanford.edu/people/karpathy/char-rnn/. However, tonight we'll be using the Shakespearian text &ndash; the initial plans were to use the Linux Kernel, but that came out to ~750MB of text and would have taken 11.5 days to train.

I'll train it over the beginning of the summer and provide the weights file if anyone wants it, though. :D

To snag the Shakespeare corpus, let's run the command below:

In [None]:
!curl https://cs.stanford.edu/people/karpathy/char-rnn/shakespeare_input.txt > /data/shakespeare/input.txt

To avoid giving everyone a lesson on preprocessing, that's been abstracted away &ndash; you can take a gander at the `utils.py` file in this semester's directory for an inside on how we get it into a usable form.

In [None]:
from utils import TextLoader

data_loader = TextLoader(args.data_dir, args.batch_size, args.seq_length)
args.vocab_size = data_loader.vocab_size

Since none of us have bomb GPUs in our laptops, we'll load the checkpoint file that has all the weights and such from when I trained this network. I also have one training right now, as you saw at the beginning, and we might wanna compare the outputs, since randomness does play a minor role and with even a 2 hour difference, we can get slightly different results.

In [None]:
from six.moves import cPickle

chkpt = tf.train.get_checkpoint_state(args.data_dir)
assert chkpt, "No checkpoint found."
assert chkpt.model_checkpoint_path, "No model path found in `chkpt`."

## Loading the prior configuration, in terms of everything listed in `utils.Args`
##   we can take a gander at the parameters if you'd like.
with open(args.data_dir + "config.pkl", "rb") as f:
    saved_model_args = cPickle.load(f)

## we have to have a vocabulary because this is how the network is penalized
with open(args.data_dir + "chars_vocab.pkl", "rb") as f:
    saved_chars, saved_vocab = cPickle.load(f)
    
saved_model_args.data_dir = args.data_dir
saved_model_args.n_layers = saved_model_args.num_layers

## building the model from previous parameters
model = Model(saved_model_args, training=False)

Like last time, because TensorFlow is weird, we have to launch `tf.InteractiveSession()`, so let's do that.

In [None]:
sess = tf.InteractiveSession()

Initializing all the variables we've declared, as per TensorFlow's requirements.

In [None]:
tf.global_variables_initializer().run()

This bit is only pertinent on the other notebook I'm running, since this is opening the TensorBoard and we can look at the network's progress there.

In [None]:
import time
summaries = tf.summary.merge_all()
writer = tf.summary.FileWriter(args.logs_dir + time.strftime("%Y-%m-%d-%H-%M-%S"))
writer.add_graph(sess.graph)

Fully restoring what we had from the previous checkpoint...

In [None]:
saver = tf.train.Saver(tf.global_variables())
saver.restore(sess, chkpt.model_checkpoint_path)

In [None]:
from utils import pretty_print, final_checkpoint
strlen = {
    "batch": str(len(str(args.num_epochs * data_loader.num_batches))),
    "epoch": str(len(str(args.num_epochs))),
}

# 3. Training the Network

Sike! Okay, that was cruel; but honestly, let's skip training the network. Your laptop begs of you, don't be a cruel human. We'll go to the next section to generate some new Shakespeare, and if you ever sit in on an English Lit class again, you might just see this. ;p

In [None]:
for e in range(args.num_epochs):
    ## decay the learning rate
    upd_lr = args.lr * (args.decay ** e)
    sess.run(tf.assign(model.lr, upd_lr))

    ## reset batches
    data_loader.reset_batch_pointer()

    ## reset s_0
    state = sess.run(model.initial_state)

    for b in range(data_loader.num_batches):
        st = time.time()
        offset = e * data_loader.num_batches + b

        x, y = data_loader.next_batch()
        feed = {model.input_data: x, model.targets: y}
        for i, (c, h) in enumerate(model.initial_state):
            feed[c] = state[i].c
            feed[h] = state[i].h

        run_args = [summaries, model.cost, mode.final_state, model.train_op]
        summ, train_loss, state, _ = sess.run(run_args, feed)

        writer.add_summary(summ, offset)

        pretty_print(strlen, e, b, train_loss, st, end=time.time())
        final_chkpt(sess, saver, save_dir, offset, 
                    args.num_epochs, 
                    data_loader.num_batches, 
                    args.save_every)

# 4. Replacing William Shakespeare, Featuring the Machines

In [None]:
# model = Model(args, training=False)

sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

saver = tf.train.Saver(tf.global_variables())
ckpt = tf.train.get_checkpoint_state(args.data_dir)

if ckpt and ckpt.model_checkpoint_path:
    saver.restore(sess, ckpt.model_checkpoint_path)
    ## int(chars_to_sample) -> number of characters to sample
    ## str(primer) -> the first bit of text to "nudge" the network
    ## int(how_sample) -> 0 uses max() at each timestep, 1 samples on every timestep, 2 samples on every space
    chars_to_sample = 500
    primer = u' '
    how_sample = 1
    print(model.sample(sess, chars, vocab, chars_to_sample, primer, how_sample).encode('utf-8'))

 HENRY VI:
Far be the thought of this frost; he cannot play me.

ABHORSON:
Bate, you from me.

ESCALUS:
Well, I do so.

POMPEY:
Women are yourself, sir? a dell the heaven, and I
will push upon thee: long it is to go to it.
Thou wouldst have leave to live, I look on thee.

All:
Happily make thee speak: I say upon thee.

CAPULET:
And learn to marry, my lord, to charge thee.

SEBASTIAN:
What answer place?

MENENIUS:
Only he may profane; let us be content:
How could I might have an unwilling love!

V

penish to you,
And catch attorn your grace to do so.

MERCUTIO:
You will not, then?

First Musician:
No.

PETER:
O, I still have none;
You would be deceived; your quoding throne
While she's fallen with victors' son, with one
And make myself have attendantly.
I never look'd for better attendantnron.
Speak not what, within, rouse all, as you say,
To see if all these still--beggare, dull-brait,
'Thus I took me to a figure of us by leave.
When I have seen consequed by no greater,
Seldom with over-blown,