<a href="https://colab.research.google.com/github/TimWhiting/DeepLearning/blob/master/DL_Lab6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Description:
For this lab, you will code up the [char-rnn model of Karpathy](http://karpathy.github.io/2015/05/21/rnn-effectiveness/). This is a recurrent neural network that is trained probabilistically on sequences of characters, and that can then be used to sample new sequences that are like the original.

This lab will help you develop several new skills, as well as understand some best practices needed for building large models. In addition, we'll be able to create networks that generate neat text!

## There are two parts of this lab:
###  1.   Wiring up a basic sequence-to-sequence computation graph
###  2.   Implementing your own GRU cell.



---

## Part 0: Readings, data loading, and high level training

---

There is a tutorial here that will help build out scaffolding code, and get an understanding of using sequences in pytorch.

**TODO:**

**DONE:**

* Read the following

> * [Pytorch sequence-to-sequence tutorial](https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html)
* [Understanding LSTM Networks](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)






In [0]:
! wget -O ./text_files.tar.gz 'https://piazza.com/redirect/s3?bucket=uploads&prefix=attach%2Fjlifkda6h0x5bk%2Fhzosotq4zil49m%2Fjn13x09arfeb%2Ftext_files.tar.gz' 
! tar -xzf text_files.tar.gz
! pip install unidecode
! pip install torch

import unidecode
import string
import random
import re
 
import pdb
 
all_characters = string.printable
n_characters = len(all_characters)
file = unidecode.unidecode(open('./text_files/lotr.txt').read())
file_len = len(file)
print('file_len =', file_len)

--2019-07-11 23:47:28--  https://piazza.com/redirect/s3?bucket=uploads&prefix=attach%2Fjlifkda6h0x5bk%2Fhzosotq4zil49m%2Fjn13x09arfeb%2Ftext_files.tar.gz
Resolving piazza.com (piazza.com)... 3.215.209.172, 54.236.201.50, 34.199.224.99, ...
Connecting to piazza.com (piazza.com)|3.215.209.172|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://d1b10bmlvqabco.cloudfront.net/attach/jlifkda6h0x5bk/hzosotq4zil49m/jn13x09arfeb/text_files.tar.gz [following]
--2019-07-11 23:47:33--  https://d1b10bmlvqabco.cloudfront.net/attach/jlifkda6h0x5bk/hzosotq4zil49m/jn13x09arfeb/text_files.tar.gz
Resolving d1b10bmlvqabco.cloudfront.net (d1b10bmlvqabco.cloudfront.net)... 99.86.32.60, 99.86.32.115, 99.86.32.66, ...
Connecting to d1b10bmlvqabco.cloudfront.net (d1b10bmlvqabco.cloudfront.net)|99.86.32.60|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1533290 (1.5M) [application/x-gzip]
Saving to: ‘./text_files.tar.gz’


2019-07-11 23:47:34 (2.99 MB

In [0]:
chunk_len = 200
 
def random_chunk():
  start_index = random.randint(0, file_len - chunk_len)
  end_index = start_index + chunk_len + 1
  return file[start_index:end_index]
  
print(random_chunk())

 had not had 
any burns, luckily. He did not want his folk to hurt themselves in their 
fury, and he did not want Saruman to escape out of some hole in the 
confusion. Many of the Ents were hurling the


In [0]:
import torch
from torch.autograd import Variable
# Turn string into list of longs
def char_tensor(string):
  tensor = torch.zeros(len(string)).long()
  for c in range(len(string)):
      tensor[c] = all_characters.index(string[c])
  return Variable(tensor)

print(char_tensor('abcDEF'))

tensor([10, 11, 12, 39, 40, 41])


---

## Part 4: Creating your own GRU cell 

**(Come back to this later - its defined here so that the GRU will be defined before it is used)**

---

The cell that you used in Part 1 was a pre-defined Pytorch layer. Now, write your own GRU class using the same parameters as the built-in Pytorch class does.

Please try not to look at the GRU cell definition. The answer is right there in the code, and in theory, you could just cut-and-paste it. This bit is on your honor!

**TODO:**

**DONE:**

* Create a custom GRU cell

In [0]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable


class GRU(nn.Module):
  def __init__(self, input_size, hidden_size, num_layers):
    super(GRU, self).__init__()
    self.input_size = input_size
    self.hidden_size = hidden_size
    self.num_layers = num_layers
    self.reset_input_layers = []
    self.reset_hidden_layers = []
    self.forget_input_layers = []
    self.forget_hidden_layers = []
    self.new_input_layers = []
    self.new_hidden_layers = []
    self.reset_input_layers.append(nn.Linear(input_size, hidden_size))
    self.reset_hidden_layers.append(nn.Linear(hidden_size, hidden_size))
    self.forget_input_layers.append(nn.Linear(input_size, hidden_size))
    self.forget_hidden_layers.append(nn.Linear(hidden_size, hidden_size))
    self.new_input_layers.append(nn.Linear(input_size, hidden_size))
    self.new_hidden_layers.append(nn.Linear(hidden_size, hidden_size))
    for i in range(num_layers -1):
      self.reset_input_layers.append(nn.Linear(input_size, hidden_size))
      self.reset_hidden_layers.append(nn.Linear(hidden_size, hidden_size))
      self.forget_input_layers.append(nn.Linear(input_size, hidden_size))
      self.forget_hidden_layers.append(nn.Linear(hidden_size, hidden_size))
      self.new_input_layers.append(nn.Linear(input_size, hidden_size))
      self.new_hidden_layers.append(nn.Linear(hidden_size, hidden_size))
    for i, layer in enumerate(self.reset_input_layers):
      self.add_module(str(i) + "_reset_in", layer)
    for i, layer in enumerate(self.forget_input_layers):
      self.add_module(str(i) + "_forget_in", layer)
    for i, layer in enumerate(self.new_input_layers):
      self.add_module(str(i) + "_new_in", layer)
    for i, layer in enumerate(self.reset_hidden_layers):
      self.add_module(str(i) + "_reset_hidden", layer)
    for i, layer in enumerate(self.forget_hidden_layers):
      self.add_module(str(i) + "_forget_hidden", layer)
    for i, layer in enumerate(self.new_hidden_layers):
      self.add_module(str(i) + "_new_hidden", layer)
    
  
  def forward(self, inputs, hidden):
    # Each layer does the following:
    # r_t = sigmoid(W_ir*x_t + b_ir + W_hr*h_(t-1) + b_hr)
    # z_t = sigmoid(W_iz*x_t + b_iz + W_hz*h_(t-1) + b_hz)
    # n_t = tanh(W_in*x_t + b_in + r_t**(W_hn*h_(t-1) + b_hn))
    # h_(t) = (1 - z_t)**n_t + z_t**h_(t-1)
    # Where ** is hadamard product (not matrix multiplication, but elementwise multiplication)
    hiddens = None
    inputs = inputs.view(1,self.input_size)
    for i in range(self.num_layers):
      new_input = self.new_input_layers[i]
      new_hidden = self.new_hidden_layers[i]
      reset_input = self.reset_input_layers[i]
      reset_hidden = self.reset_hidden_layers[i]
      forget_input = self.forget_input_layers[i]
      forget_hidden = self.forget_hidden_layers[i]
      r_t = torch.sigmoid(reset_input(inputs) + reset_hidden(hidden[i]))
      z_t = torch.sigmoid(forget_input(inputs) + forget_hidden(hidden[i]))
      n_t = torch.tanh(new_input(inputs) + r_t*(new_hidden(hidden[i])))
      outputs = (1-z_t)*n_t + z_t*hidden[i]
      if hiddens is None:
        hiddens = outputs.unsqueeze(0)
      else:
        hiddens = torch.cat((hiddens, outputs.unsqueeze(0)), dim=0)
      inputs = outputs
    return outputs, hiddens
  


---

##  Part 1: Building a sequence to sequence model

---

Great! We have the data in a useable form. We can switch out which text file we are reading from, and trying to simulate.

We now want to build out an RNN model, in this section, we will use all built in Pytorch pieces when building our RNN class.


**TODO:**

**DONE:**

* Create an RNN class that extends from nn.Module.

In [0]:
class RNN(nn.Module):
  def __init__(self, input_size, hidden_size, output_size, n_layers=1):
    super(RNN, self).__init__()
    self.input_size = input_size
    self.hidden_size = hidden_size
    self.output_size = output_size
    self.n_layers = n_layers
    # encode using embedding layer
    self.encoding = nn.Embedding(input_size, hidden_size) 
    # set up GRU passing in number of layers parameter (nn.GRU)
    self.GRU = GRU(input_size=hidden_size, hidden_size=hidden_size,num_layers=n_layers)
    # decode output
    self.out = nn.Linear(hidden_size, output_size)

  def forward(self, input_char, hidden):
    # by reviewing the documentation, construct a forward function that properly uses the output
    # of the GRU
#     print(input_char)
    encoded = self.encoding(input_char.unsqueeze(0).view(-1,1))
    out, hidden = self.GRU(encoded, hidden)
#     print(out.size())
    out_decoded = self.out(out.view(-1,self.hidden_size))
    # return output and hidden
    return out_decoded, hidden

  def init_hidden(self):
    return Variable(torch.zeros(self.n_layers, 1, self.hidden_size))

In [0]:
def random_training_set():    
  chunk = random_chunk()
  inp = char_tensor(chunk[:-1])
  target = char_tensor(chunk[1:])
  return inp, target

---

## Part 2: Sample text and Training information

---

We now want to be able to train our network, and sample text after training.

This function outlines how training a sequence style network goes. 

**TODO:**

**DONE:**

* Fill in the pieces.



In [0]:
def train(inp, target):
  ## initialize hidden layers, set up gradient and loss 
    # your code here
  ## /
  decoder_optimizer.zero_grad()
  hidden = decoder.init_hidden()
  loss = 0
  for c in range(chunk_len):
      output, hidden = decoder(inp[c], hidden)# run the forward pass of your rnn with proper input
#       print(output.size())
#       print(hidden.size())
#       print(target.size())
      loss += criterion(output, target[c].unsqueeze(0))
      
  ## calculate backwards loss and step the optimizer (globally)
    # your code here
  ## /
  loss.backward()
  decoder_optimizer.step()

  return loss.item() / chunk_len

---

## Part 3: Sample text and Training information

---

You can at this time, if you choose, also write out your train loop boilerplate that samples random sequences and trains your RNN. This will be helpful to have working before writing your own GRU class.

If you are finished training, or during training, and you want to sample from the network you may consider using the following function. If your RNN model is instantiated as `decoder`then this will probabilistically sample a sequence of length `predict_len`

**TODO:**

**DONE:**

* Fill out the evaluate function to generate text frome a primed string

In [0]:
def evaluate(prime_str='A', predict_len=100, temperature=0.8):
  ## initialize hidden variable, initialize other useful variables 
    # your code here
  ## /
  hidden = decoder.init_hidden()
  prime_input = char_tensor(prime_str)

  # Use priming string to "build up" hidden state
  for p in range(len(prime_str) - 1):
      _, hidden = decoder(prime_input[p], hidden)
  inp = prime_input[-1]

  predicted = []
  predicted.extend(prime_input)
  
  for p in range(predict_len):
      output, hidden = decoder(inp, hidden)#run your RNN/decoder forward on the input

      # Sample from the network as a multinomial distribution
      output_dist = output.data.view(-1).div(temperature).exp()
      top_i = torch.multinomial(output_dist, 1)[0]

      ## get character from your list of all characters, add it to your output str sequence, set input
      ## for the next pass through the model
       # your code here
      ## /
      inp = top_i #all_characters[top_i]
      
      predicted.append(inp)
      
  predicted = [all_characters[i] for i in predicted]
  return ''.join(predicted)

---

## Part 4: (Create a GRU cell, requirements above)

---



---

## Part 5: Run it and generate some text!

---

Assuming everything has gone well, you should be able to run the main function in the scaffold code, using either your custom GRU cell or the built in layer, and see output something like this. I trained on the “lotr.txt” dataset, using chunk_length=200, hidden_size=100 for 2000 epochs gave.

**TODO:** 

**DONE:**
* Create some cool output


```
[0m 9s (100 5%) 2.2169]
 Whaiss Mainde 

'

he and the 



'od and roulll and Are say the 
rere. 
'Wor 
'Iow anond wes ou 

'Yi 

[0m 19s (200 10%) 2.0371]
Whimbe. 

'Thhe 
on not of they was thou hit of 
sil ubat thith hy the seare 
as sower and of len beda 

[0m 29s (300 15%) 2.0051]
Whis the cart. Whe courn!' 'Bu't of they aid dou giter of fintard of the not you ous, 
'Thas orntie it 

[0m 38s (400 20%) 1.8617]
Wh win took be to the know the gost bing to kno wide dought, and he as of they thin. 

The Gonhis gura 

[0m 48s (500 25%) 1.9821]
When of they singly call the and thave thing 
they the nowly we'tly by ands, of less be grarmines of t 

[0m 58s (600 30%) 1.8170]
Whinds to mass of I 
not ken we ting and dour 
and they. 


'Wat res swe Ring set shat scmaid. The 
ha 

[1m 7s (700 35%) 2.0367]
Whad ded troud wanty agy. Ve tanle gour the gone veart on hear, as dent far of the Ridgees.' 

'The Ri 

[1m 17s (800 40%) 1.9458]
Whis is brouch Heared this lack and was weself, for on't 
abothom my and go staid it 
they curse arsh  

[1m 27s (900 45%) 1.7522]
Whout bear the 
Evening 
the pace spood, Arright the spaines beren the and Wish was was on the more yo 

[1m 37s (1000 50%) 1.6444]
Whe Swarn. at colk. N(r)rce or they he 
wearing. And the on the he was are he said Pipin. 

'Yes and i 

[1m 47s (1100 55%) 1.8770]
Whing at they and thins the Wil might 
happened you dlack rusting and thousting fy them, there lifted  

[1m 57s (1200 60%) 1.9401]
Wh the said Frodo eary him that the herremans! 

'I the Lager into came and broveener he sanly 
for 
s 

[2m 7s (1300 65%) 1.8095]
When lest 
- in sound fair, and 
the Did dark he in the gose cilling the stand I in the sight. Frodo y 

[2m 16s (1400 70%) 1.9229]
Whing in a shade and Mowarse round and parse could pass not a have partainly. ' for as I come of I 
le 

[2m 26s (1500 75%) 1.8169]
Whese one her of in a lief that, 
but. 'We repagessed, 
wandere in these fair of long one have here my 

[2m 36s (1600 80%) 1.6635]
Where fread in thougraned in woohis, on the the green the 
pohered alked tore becaming was seen what c 

[2m 46s (1700 85%) 1.7868]
Whil neat 
came to 
is laked, 
and fourst on him grey now they as pass away aren have in the border sw 

[2m 56s (1800 90%) 1.6343]
Wh magered. 

Then tell some tame had bear that 
came as it nome in 
to houbbirnen and to heardy. 


' 

[3m 6s (1900 95%) 1.8191]
Who expey to must away be to the master felkly and for, what shours was alons? I had be the long to fo 

[3m 16s (2000 100%) 1.8725]
White, and his of his in before that for brown before can then took on the fainter smass about rifall

```

In [0]:
import time
n_epochs = 5000
print_every = 200
plot_every = 10
hidden_size = 200
n_layers = 3
lr = 0.001
 
decoder = RNN(n_characters, hidden_size, n_characters, n_layers)
decoder_optimizer = torch.optim.Adam(decoder.parameters(), lr=lr)
criterion = nn.CrossEntropyLoss()
 
start = time.time()
all_losses = []
loss_avg = 0

In [0]:
# n_epochs = 2000
for epoch in range(1, n_epochs + 1):
  loss_ = train(*random_training_set())       
  loss_avg += loss_

  if epoch % print_every == 0:
      print('[%s (%d %d%%) %.4f]' % (time.time() - start, epoch, epoch / n_epochs * 100, loss_))
      print(evaluate('Wh', 100), '\n')

  if epoch % plot_every == 0:
      all_losses.append(loss_avg / plot_every)
      loss_avg = 0

[135.32607412338257 (200 4%) 2.2093]
When; 
And He garn I the. 'werat cawn 
He mewann the 
thwanly sheat thind revat we's the mland the't h 

[270.05828642845154 (400 8%) 1.9820]
Whery the over of he hacy there 




fet, lagound the woll us world war itt aratt the 
the one. There  

[408.5662467479706 (600 12%) 1.9200]
White, and there gane houtetle to be that in the string now oof them whe said our bestend now you sill 

[545.3730459213257 (800 16%) 1.6831]
Why him, tall was proad of that his not his with he hopes 
now to canters you came is many vade while  

[683.2958378791809 (1000 20%) 1.6400]
What fould like again us a have day stail a are and soothor and -vuch as battight a grow!' I golding w 

[827.1255896091461 (1200 24%) 1.9501]
Whing. Pitcred their saw hour bit of they would and I gather, be canning out at the going this penting 

[971.8213222026825 (1400 28%) 1.5967]
Whet and from filled 
all the road the pass all the Ent hard 
not barrers. 

'And and east there were, 

[11

In [0]:
for i in range(10):
  start_strings = [" Th", " wh", " he", " I ", " ca", " G", " lo", " ra"]
  start = random.randint(0,len(start_strings)-1)
  print(start_strings[start])
#   all_characters.index(string[c])
  print(evaluate(start_strings[start], 200), '\n')

 G
 Gandalf was decrond. 




'All have lord you. Forward the road at least walk this is stuff, and 
went to the long grey housel-winding and kindled side was a sleep pleasuring, I do long 
row hrough. In  

 lo
 lost death it. 

'The last of the gatherings and take you,' said Aragorn, shining out of the Gate. 

'Yes, as you there were remembaused to seen their pass, when? What 
said here, such seven an the sear 

 lo
 low, and frod to keepn 
Came of their most. But here priced doubtless to an Sam up is 
masters; he left hor as they are looked. And he could now the long to stout in the right fro horseless of 
the like 

 I 
 I had been the 
























in his eyes with the perushed to lest, if then only the ring and the legended 
of the less of the long they which as the 
enders of Orcovered and smood, and the p 

 I 
 I they were not the lord of the hoomes. 

Home already well from the Elves. And he sat strength, and we 
housed out of the good of the days to the mountains 

---

## Part 6: Generate output on a different dataset

---

**TODO:**

* Choose a textual dataset. Here are some [text datasets](https://www.kaggle.com/datasets?tags=14104-text+data%2C13205-text+mining) from Kaggle 

* Generate some decent looking results and evaluate your model's performance (say what it did well / not so well)

**DONE:**

