> **DO NOT EDIT IF INSIDE annadl_f19 folder**


# Week 5: Recurrent neural networks

Text, speech, weather, sensor output and video are but a few examples of the many types of data that is inherently sequential. So how does one predict the next word in a sentence, future temperatures or missing video frames? Using **recurrent neural networks** (RNNs)!

In [3]:
%matplotlib inline

import numpy as np
import requests as rq
import random
import sys
import io
from bs4 import BeautifulSoup
from keras.callbacks import LambdaCallback
from keras.models import Sequential
from keras.layers import Dense, LSTM
from keras.optimizers import RMSprop

## Exercises

#### Modeling text

Text prediction is a good place to start when learning about RNNs, because most of us humans have a pretty well
optimized inner model for text prediction ourselves. We can, therefore, easily assess the performance of a neural
network in executing this task.

Below is some code that loads the screenplay for Tarantino's 1994 film 'Pulp Fiction'. I recommend reading through the
first 20 lines or so to get a feeling for the language and style used (and enjoy probably the best written screenplay
in the history of film).

In [17]:
response = rq.get("http://www.dailyscript.com/scripts/pulp_fiction.html")
text = BeautifulSoup(response.content, "html.parser").getText()
print(text[:2000])



"PULP FICTION" -- by Quentin Tarantino & Roger Avary


                                      "PULP FICTION"

                                            By

                             Quentin Tarantino & Roger Avary

                

               PULP [pulp] n.

               1. A soft, moist, shapeless mass or matter.

               2. A magazine or book containing lurid subject matter and 
               being characteristically printed on rough, unfinished paper.

               American Heritage Dictionary: New College Edition

               INT. COFFEE SHOP – MORNING

               A normal Denny's, Spires-like coffee shop in Los Angeles. 
               It's about 9:00 in the morning. While the place isn't jammed, 
               there's a healthy number of people drinking coffee, munching 
               on bacon and eating eggs.

               Two of these people are a YOUNG MAN and a YOUNG WOMAN. The 
               Young Man has a slight 

> **Ex. 5.1.1:** What is the most used symbol in this screenplay and what accuracy would a model constantly predicting this symbol obtain? In other words, what is the "baseline accuracy"?

In [18]:
char_freqs = {}

for symbol in text:
    if symbol in char_freqs:
        char_freqs[symbol] += 1
    else:
        char_freqs[symbol] = 1

freqs = 0       
i=0
max_freq = 0
        
for symbol in sorted(char_freqs, key=char_freqs.get, reverse=True):
    freqs += char_freqs[symbol]
    if i==0:
        print("max freq letter: %s with %d occurences" % (repr(symbol), char_freqs[symbol]))
        max_freqs = freqs
    i+=1
    
acc = max_freqs / freqs
print("baseline accuracy: %f" % acc)

max freq letter: ' ' with 164787 occurences
baseline accuracy: 0.541059


I've adapted some code for text generation from [this Keras example](https://keras.io/examples/lstm_text_generation/).
I've inserted some questions in the code (look for `Q:`) for you to answer in the exercise below.

In [None]:
# Q1: What is the purpose of this block? When is `char_indices` used? What about `indices_char`?
# A1: char_indices is a dictionary of the characters keyed by the characters and indices_char is keyed by the indices
chars = sorted(list(set(text)))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

# Q2: What is the purpose of this block? What does the `seqlen` and `step` parameters do?
# A2: seqlen specifies the length of the sequence for each iteration and the step parameter specifies the number of elements to skip
seqlen = 40
step = seqlen
sentences = []
for i in range(0, len(text) - seqlen - 1, step):
    sentences.append(text[i: i + seqlen + 1])

# Q3: What about this block? What is `x` and what is `y`? Why do they have this dimensionality?
# A3: `x` is the current character and `y` is the next character
x = np.zeros((len(sentences), seqlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), seqlen, len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    # Q3a: What happens in this loop?
    # A3a: assigns 1 to the current char and the next char
    for t, (char_in, char_out) in enumerate(zip(sentence[:-1], sentence[1:])):
        x[i, t, char_indices[char_in]] = 1
        y[i, t, char_indices[char_out]] = 1


# Q4: Here we build the model. What does the `return_sequences` argument do? Why the dense layer at the end?
# A4: returns the full output sequence
model = Sequential()
model.add(LSTM(128, input_shape=(seqlen, len(chars)), return_sequences=True))
model.add(Dense(len(chars), activation='softmax'))

model.compile(
    loss='categorical_crossentropy',
    optimizer=RMSprop(lr=0.01),
    metrics=['categorical_crossentropy', 'accuracy']
)

def sample(preds, temperature=1.0):
    """Helper function to sample an index from a probability array."""
    preds = np.asarray(preds).astype('float64')
    preds = np.exp(np.log(preds) / temperature)  # softmax
    preds = preds / np.sum(preds)                #
    probas = np.random.multinomial(1, preds, 1)  # sample index
    return np.argmax(probas)                     #


def on_epoch_end(epoch, _):
    """Function invoked at end of each epoch. Prints generated text."""
    print()
    print('----- Generating text after Epoch: %d' % epoch)

    start_index = random.randint(0, len(text) - seqlen - 1)
    
    # Q5: What does diversity do?
    for diversity in [0.2, 0.5, 1.0]:
        print('----- diversity:', diversity)

        generated = ''
        sentence = text[start_index: start_index + seqlen]
        generated += sentence
        print('----- Generating with seed: "' + sentence + '"')
        sys.stdout.write(generated)

        for i in range(400):
            x_pred = np.zeros((1, seqlen, len(chars)))
            for t, char in enumerate(sentence):
                x_pred[0, t, char_indices[char]] = 1.
            
            # What is the dimensionality of `preds`? Why do we input `preds[0, -1]` to the `sample` function?
            preds = model.predict(x_pred, verbose=0)
            next_index = sample(preds[0, -1], diversity)
            next_char = indices_char[next_index]

            sentence = sentence[1:] + next_char

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

print_callback = LambdaCallback(on_epoch_end=on_epoch_end)

model.fit(x, y,
          batch_size=128,
          epochs=50,
          callbacks=[print_callback])



Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where

Epoch 1/50

----- Generating text after Epoch: 0
----- diversity: 0.2
----- Generating with seed: " blows out the match.

               "
 blows out the match.

                                                                                                                                                                                                                                                                                                                                                                                                                               
----- diversity: 0.5
----- Generating with seed: " blows out the match.

               "
 blows out the match.

                                         d mend thenther, bint                                                                                                                  a                              the 


               He can a was a bar and she don't a 
                                                                                                                                                                                                                                                                                                                                                                            
----- diversity: 0.5
----- Generating with seed: "           Fuck you.

               H"
           Fuck you.

               He like the sith the sas a cronger and she 
                                                                                                                                                                                                                                                                                                                                                                    
----- diversity: 1.0
----- Generating with seed: "      

                                VINCENT
                                          Whis an ador.

                   twere infiem to the twoutch.  Whear they killace, a petting up bluager out ewat if this 
                         Sugers eal. Talks your just glause 
                   
Epoch 10/50

----- Generating text after Epoch: 9
----- diversity: 0.2
----- Generating with seed: "g Man 
               continues in a lo"
g Man 
               continues in a look of the fingers his hands to him.

                                                                                                                                                                                                                                                                                                                                                                         
----- diversity: 0.5
----- Generating with seed: "g Man 
               continues in a lo"
g Man 
               continues in a losk he

                         that. It's to see it, it's see it a bathroom to 
                         the kit the pool.

                                   want that gonna look die that the fuck the past of the bathroom. Then 
                         pooking the same to say the fight and starts to 
                         for the want to the fuck the bathroo
----- diversity: 1.0
----- Generating with seed: "nd LUCKY LADIES in 
               fanc"
nd LUCKY LADIES in 
               fance is to the Quncake, ther aim?"

                                   mean'thand's doop what the nood wishes.

               Heppens, Zipp. Trung of his fuck up a 
                         are that gonease's of a Quized.

                                     APREACLIAN
                         You knooks from there fromine.

            in iss heak. BREABS bothers not tonuid the Hit-offec
Epoch 15/50

----- Generating text after Epoch: 14
----- diversity: 0.2
----- Generating with seed: " humor about that 

          – I'm talkin'. Now let me ask you 
                         a vicior looks at her head the shot's 
                         looking at her head out of his hand the 
                         from the cornerstand the car and stands 
                         the should be fuckin' dook it, the strag starts the 
                         on the should be country one should 
                         the should be fuckin' dooks 
----- diversity: 0.5
----- Generating with seed: "          – I'm talkin'. Now let me ask "
          – I'm talkin'. Now let me ask you 
                         shit I couldn't shit. I'm not it's the 
                         friend smilling the wall to a bunce.

                                                                                                                                                                                                                                                                       
----- diversity: 1.0
----- Generatin


                                          It's aow: I need, that!

                                       MIA'S TEVA
                         Well you oth this in there I get ma fuckin' 
                         somethin'?

               MAR VINCENT THERSher breaker a nobver sew 
               COUTH a Ming
Epoch 24/50

----- Generating text after Epoch: 23
----- diversity: 0.2
----- Generating with seed: "s.

                                  "
s.

                                                                                                                                                                                                                                                                                                                                                                                                                                                  
----- diversity: 0.5
----- Generating with seed: "s.

                                  "
s.

               

                                                                                                                                                                                                                                                                                                                                                                                                         
----- diversity: 1.0
----- Generating with seed: " 
                         there.  Eati"
 
                         there.  Eatin' OW Good's the bathrequst 
                                        THE WOLF
                                         JULES
                                      Seem you grabs?

                    MOTER (MOVIV)
                                   Pomine driving the watch puts, it.

                 LONCE WOLLY-owide, crass Marsellus DLOILING Madicy-
                              subucte
Epoch 29/50

----- Generating text after Epoch: 28
----- diversity: 0.2
----- Genera

                         mindless, the door.  I don't got 
                                                                                                                                                                                                                                                                                                                                                                                      
----- diversity: 0.5
----- Generating with seed: "ike 
                         mindless,"
ike 
                         mindless, but I say?

                                                                                                                                       JULES
                         I don't mean this normession.

                                                      BUTCH
                         What did 'em that his ass be her dogar 
                         the room, man, I don't got the 

----- diversity: 1.0
----- Generating with 

                                     FABIENNE
                         told you didn't don't remember. And I wanna 
                         wouldn't wanna do not the way tearty menses 
                         kinda methin t ing findet.

                                     A don't belf so there girl, Vincent 
                  
Epoch 38/50

----- Generating text after Epoch: 37
----- diversity: 0.2
----- Generating with seed: "te 
                         how person"
te 
                         how persong way.

                                                                                                                                                                                                                                                                                                                                                                                                      
----- diversity: 0.5
----- Generating with seed: "te 
                         how perso

                                                                                             VINCENT
                         Every his assed the night of the door of the 
               her one it an accest of his pause)
                         I need a masally ard a nam of heroy.

               Then her cener tonter.

                                              BUTCH
                         Which on' know of as if I think
----- diversity: 1.0
----- Generating with seed: "                                        "
                                         MARSELLUS
                         I think had nits car and Lance, acher see redrown once, hand has words off.

                                            VINCENT
                         Then I can me.  'm her head but have as worth through to them higs Checeple?

                                               VINCENT
                         I'm not a hard.

               EXT. BUTKY-
Epoch 43/50

----- Generating text after Ep

                         the first two are the bathroom. We see 
                         on the shot.

                                                                                                                                                                                                                                                                                                                           
----- diversity: 0.5
----- Generating with seed: "ch other, it'll 
                      "
ch other, it'll 
                         through the fucks a look differry, who would 
                                                                                                                                                                                                                                                         VINCENT
                         What's your and a fuckin' Marsellus 
                         the 
----- diversity: 1.0
----- Generating with seed: "ch

> **Ex. 5.1.2**: Add a callback for Tensorboard, so you can log the training process. Start training the network (takes ~10 minutes on my computer). While it's running move on to the next question.

> **Ex. 5.1.3**: Answer the questions in the code above (look for code comments starting with `Q:`).

> **Ex. 5.1.4**: Did the network finish training? Consider the generated text across epochs.
1. In the early batches (0-10), the generated text looks very bad. Can you explain why the low diversity generated text contains almost only the symbol " " (that is, spaces)?
2. The high diversity generated text is messed up too, but in a different way. Explain how.
3. In later batches (20-30) what do you notice is off about the low diversity generated text?

> **Ex. 5.1.5**: For the network trained over all 50 epochs, generate a longer piece of text
(say 5000 symbols long). Use the sentence `text[1486:1526]` as seed (starts with 'YOUNG MAN' ends with 'No, ')
and set diversity to 0.5.
Describe what features of the screenplay and language in general that the network learned in only 50 epochs.
Also describe what serious mistakes it makes.

> **Ex. 5.1.6**: Do the same as above, but for 40 random letters (e.g. smash away on your keyboard) as seed. What happens? Can you explain why?

> **Challenge** Download [this](https://www.yelp.com/dataset/download) Yelp dataset and train a model that predicts rating given a review text!