## Recurrent Neural Network
A neural network that is much more capable of processing sequential data such as text or characters - **recurrent neural network** (RNN for short) for:
- Sentiment Analysis
- Character Generation 

Since textual data contains many words that follow in a very specific and meaningful order, we need to be able to keep track of each word and when it occurs in the data. Simply encoding say an entire paragraph of text into one data point wouldn't give us a very meaningful picture of the data and would be very difficult to do anything with. This is why we treat text as a sequence and process one word at a time. We will keep track of where each of these words appear and use that information to try to understand the meaning of peices of text.

As we know machine learning models and neural networks don't take raw text data as an input. This means we must somehow encode our textual data to numeric values that our models can understand. There are many different ways of doing this and we will look at a few examples below. 

### 1. Bag of Words
The first and simplest way to encode our data is to use something called **bag of words**. This is a pretty easy technique where each word in a sentence is encoded with an integer and thrown into a collection that does not maintain the order of the words but does keep track of the frequency.

In [1]:
vocab = {}  # maps word to integer representing it
word_encoding = 1
def bag_of_words(text):
  global word_encoding # defined as global variable because we want to update it's value INSIDE function

  words = text.lower().split(" ")  # create a list of all of the words in the text, well assume there is no grammar in our text for this example
  bag = {}  # stores all of the encodings and their frequency

  for word in words:
    if word in vocab:
      encoding = vocab[word]  # get encoding from vocab
    else:
      vocab[word] = word_encoding
      encoding = word_encoding
      word_encoding += 1
    
    if encoding in bag:
      bag[encoding] += 1
    else:
      bag[encoding] = 1
  
  return bag

text = "this is a test to see if this test will work is is test a a"
bag = bag_of_words(text)
print(bag)
print(vocab)

{1: 2, 2: 3, 3: 3, 4: 3, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1}
{'this': 1, 'is': 2, 'a': 3, 'test': 4, 'to': 5, 'see': 6, 'if': 7, 'will': 8, 'work': 9}


In [2]:
positive_review = "I thought the movie was going to be bad but it was actually amazing"
negative_review = "I thought the movie was going to be amazing but it was actually bad"

pos_bag = bag_of_words(positive_review)
neg_bag = bag_of_words(negative_review)

print("Positive:", pos_bag)
print("Negative:", neg_bag)

Positive: {10: 1, 11: 1, 12: 1, 13: 1, 14: 2, 15: 1, 5: 1, 16: 1, 17: 1, 18: 1, 19: 1, 20: 1, 21: 1}
Negative: {10: 1, 11: 1, 12: 1, 13: 1, 14: 2, 15: 1, 5: 1, 16: 1, 21: 1, 18: 1, 19: 1, 20: 1, 17: 1}


We can see that even though these sentences have a very different meaning they are encoded exaclty the same way. Obviously, this isn't going to fly.

### 2. Integer Encoding
This involves representing each word or character in a sentence as a unique integer and maintaining the order of these words. This should hopefully fix the problem we saw before that we lost the order of words.

In [3]:
vocab = {}  
word_encoding = 1
def one_hot_encoding(text):
  global word_encoding

  words = text.lower().split(" ") 
  encoding = []  

  for word in words:
    if word in vocab:
      code = vocab[word]  
      encoding.append(code) 
    else:
      vocab[word] = word_encoding
      encoding.append(word_encoding)
      word_encoding += 1
  
  return encoding

text = "this is a test to see if this test will work is is test a a"
encoding = one_hot_encoding(text)
print(encoding)
print(vocab)

[1, 2, 3, 4, 5, 6, 7, 1, 4, 8, 9, 2, 2, 4, 3, 3]
{'this': 1, 'is': 2, 'a': 3, 'test': 4, 'to': 5, 'see': 6, 'if': 7, 'will': 8, 'work': 9}


In [4]:
positive_review = "I thought the movie was going to be bad but it was actually amazing"
negative_review = "I thought the movie was going to be amazing but it was actually bad"

pos_encode = one_hot_encoding(positive_review)
neg_encode = one_hot_encoding(negative_review)

print("Positive:", pos_encode)
print("Negative:", neg_encode)

Positive: [10, 11, 12, 13, 14, 15, 5, 16, 17, 18, 19, 14, 20, 21]
Negative: [10, 11, 12, 13, 14, 15, 5, 16, 21, 18, 19, 14, 20, 17]


Much better, now we are keeping track of the order of words and we can tell where each occurs. But this still has a few issues with it. Ideally when we encode words, we would like similar words to have similar labels and different words to have very different labels. For example, the words happy and joyful should probably have very similar labels so we can determine that they are similar. While words like horrible and amazing should probably have very different labels. The method we looked at above won't be able to do something like this for us. This could mean that the model will have a very difficult time determing if two words are similar or not which could result in some pretty drastic performace impacts.

### 3. Word Embeddings
This method keeps the order of words intact as well as encodes similar words with very similar labels. It attempts to not only encode the frequency and order of words but the meaning of those words in the sentence. It encodes each word as a dense vector that represents its context in the sentence.

Unlike the previous techniques word embeddings are learned by looking at many different training examples. You can add what's called an *embedding layer* to the beggining of your model and while your model trains your embedding layer will learn the correct embeddings for words. You can also use pretrained embedding layers.

## Recurrent Neural Networks (RNNs)
Up until this point we have been using something called **feed-forward** neural networks. This simply means that all our data is fed forwards (all at once) from left to right through the network. This was fine for the problems we considered before but won't work very well for processing text. After all, even we (humans) don't process text all at once. We read word by word from left to right and keep track of the current meaning of the sentence so we can understand the meaning of the next word. Well this is exaclty what a recurrent neural network is designed to do. When we say recurrent neural network all we really mean is a network that contains a loop. A RNN will process one word at a time while maintaining an internal memory of what it's already seen. This will allow it to treat words differently based on their order in a sentence and to slowly build an understanding of the entire input, one word at a time.

It is a **Simple RNN Layer**. It can be effective at processing shorter sequences of text for simple problems but has many downfalls associated with it. One of them being the fact that as text sequences get longer it gets increasingly difficult for the network to understand the text properly.

### LSTM vs simpleRNN
The layer we dicussed in depth above was called a *simpleRNN*. However, there does exist some other recurrent layers (layers that contain a loop) that work much better than a simple RNN layer. The one we will talk about here is called LSTM (Long Short-Term Memory). This layer works very similarily to the simpleRNN layer but adds a way to access inputs from any timestep in the past. Whereas in our simple RNN layer input from previous timestamps gradually disappeared as we got further through the input. With a LSTM we have a long-term memory data structure storing all the previously seen inputs as well as when we saw them. This allows for us to access any previous value we want at any point in time. This adds to the complexity of our network and allows it to discover more useful relationships between inputs and when they appear.

## Sentiment Analysis

The process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer's attitude towards a particular topic, product, etc. is positive, negative, or neutral. We’ll use here is classifying movie reviews as either postive, negative or neutral.

Start by loading in the IMDB movie review dataset from keras. This dataset contains 25,000 reviews from IMDB where each one is already preprocessed and has a label as either positive or negative. Each review is encoded by integers that represents how common a word is in the entire dataset. For example, a word encoded by the integer 3 means that it is the 3rd most common word in the dataset.

In [11]:
from keras.datasets import imdb
from keras.preprocessing import sequence
import keras
import tensorflow as tf
import os
import numpy as np

VOCAB_SIZE = 88584

MAXLEN = 250
BATCH_SIZE = 64

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words = VOCAB_SIZE)

In [12]:
# Lets look at one review
print(len(train_data[1]))
print(train_data[1])

189
[1, 194, 1153, 194, 8255, 78, 228, 5, 6, 1463, 4369, 5012, 134, 26, 4, 715, 8, 118, 1634, 14, 394, 20, 13, 119, 954, 189, 102, 5, 207, 110, 3103, 21, 14, 69, 188, 8, 30, 23, 7, 4, 249, 126, 93, 4, 114, 9, 2300, 1523, 5, 647, 4, 116, 9, 35, 8163, 4, 229, 9, 340, 1322, 4, 118, 9, 4, 130, 4901, 19, 4, 1002, 5, 89, 29, 952, 46, 37, 4, 455, 9, 45, 43, 38, 1543, 1905, 398, 4, 1649, 26, 6853, 5, 163, 11, 3215, 10156, 4, 1153, 9, 194, 775, 7, 8255, 11596, 349, 2637, 148, 605, 15358, 8003, 15, 123, 125, 68, 23141, 6853, 15, 349, 165, 4362, 98, 5, 4, 228, 9, 43, 36893, 1157, 15, 299, 120, 5, 120, 174, 11, 220, 175, 136, 50, 9, 4373, 228, 8255, 5, 25249, 656, 245, 2350, 5, 4, 9837, 131, 152, 491, 18, 46151, 32, 7464, 1212, 14, 9, 6, 371, 78, 22, 625, 64, 1382, 9, 8, 168, 145, 23, 4, 1690, 15, 16, 4, 1355, 5, 28, 6, 52, 154, 462, 33, 89, 78, 285, 16, 145, 95]


If we have a look at some of our loaded in reviews, we'll notice that they are different lengths. This is an issue. We cannot pass different length data into our neural network. Therefore, we must make each review the same length. To do this we will follow the procedure below:
- if the review is greater than 250 words then trim off the extra words
- if the review is less than 250 words add the necessary amount of 0's to make it equal to 250

In [13]:
train_data = sequence.pad_sequences(train_data, MAXLEN)
test_data = sequence.pad_sequences(test_data, MAXLEN)

In [14]:
print(len(train_data[1]))
print(train_data[1])

250
[    0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     1   194  1153   194  8255    78   228     5     6  1463  4369
  5012   134    26     4   715     8   118  1634    14   394    20    13
   119   954   189   102     5   207   110  3103    21    14    69   188
     8    30    23     7     4   249   126    93     4   114     9  2300
  1523     5   647     4   116     9    35  8163     4   229     9   340
  1322     4   118     9     4   130  4901    19     4  1002     5    89
    29   952    46    37     4   455     9    45    43    38  1543  1905
   398     4  1649    26  6853     5   163    11  3215 10156     4  1153
     9   194   775     7  8255 11596   349  263

Now it's time to create the model. We'll use a **word embedding layer** as the first layer in our model and add a **LSTM layer** afterwards that feeds into a **dense node** to get our predicted sentiment. 

32 stands for the output dimension of the vectors generated by the embedding layer. We can change this value if we'd like!

In [15]:
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(VOCAB_SIZE, 32),
    tf.keras.layers.LSTM(32),
    tf.keras.layers.Dense(1, activation="sigmoid")
])

2021-12-27 13:11:55.888044: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [16]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, None, 32)          2834688   
                                                                 
 lstm (LSTM)                 (None, 32)                8320      
                                                                 
 dense (Dense)               (None, 1)                 33        
                                                                 
Total params: 2,843,041
Trainable params: 2,843,041
Non-trainable params: 0
_________________________________________________________________


In [17]:
model.compile(loss="binary_crossentropy",optimizer="rmsprop",metrics=['acc'])

history = model.fit(train_data, train_labels, epochs=10, validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [18]:
results = model.evaluate(test_data, test_labels)
print(results)

[0.57110995054245, 0.8387200236320496]


Not bad for a simple recurrent network.

Now let’s use our network to make predictions on our own reviews. 

Since our reviews are encoded we will need to convert any review that we write into that form so the network can understand it. To do that we will load the encodings from the dataset and use them to encode our own data.

In [32]:
word_index = imdb.get_word_index()

def encode_text(text):
  tokens = keras.preprocessing.text.text_to_word_sequence(text)
  print(tokens)
  tokens = [word_index[word] if word in word_index else 0 for word in tokens]
  print(tokens)
  return sequence.pad_sequences([tokens], MAXLEN)[0]

text = "that movie was just amazing, so amazing"
encoded = encode_text(text)
print(encoded)

['that', 'movie', 'was', 'just', 'amazing', 'so', 'amazing']
[12, 17, 13, 40, 477, 35, 477]
[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0

In [37]:
# while were at it lets make a decode function

reverse_word_index = {value: key for (key, value) in word_index.items()}

def decode_integers(integers):
    PAD = 0
    text = ""
    for num in integers:
      if num != PAD:
        text += reverse_word_index[num] + " "

    return text[:-1]
  
print(decode_integers(encoded))

that movie was just amazing so amazing 


In [50]:
# now time to make a prediction

def predict(text):
  encoded_text = encode_text(text)
  print(encoded_text.shape)
  pred = np.zeros((1,250))
  pred[0] = encoded_text
  print(pred.shape)
  result = model.predict(pred) # model expects a (1,250) np array while encoded_text is (250,)
  print(result[0])

positive_review = "That movie was! really loved it and would great watch it again because it was amazingly great"
predict(positive_review)

negative_review = "that movie really sucked. I hated it and wouldn't watch it again. Was one of the worst things I've ever watched"
predict(negative_review)

['that', 'movie', 'was', 'really', 'loved', 'it', 'and', 'would', 'great', 'watch', 'it', 'again', 'because', 'it', 'was', 'amazingly', 'great']
[12, 17, 13, 63, 444, 9, 2, 59, 84, 103, 9, 171, 85, 9, 13, 2786, 84]
(250,)
(1, 250)
[0.8735082]
['that', 'movie', 'really', 'sucked', 'i', 'hated', 'it', 'and', "wouldn't", 'watch', 'it', 'again', 'was', 'one', 'of', 'the', 'worst', 'things', "i've", 'ever', 'watched']
[12, 17, 63, 2064, 10, 1797, 9, 2, 583, 103, 9, 171, 13, 28, 4, 1, 246, 180, 204, 123, 293]
(250,)
(1, 250)
[0.24362063]


## RNN Play Generator

Use a RNN to generate a play. We will simply show the RNN an example of something we want it to recreate and it will learn how to write a version of it on its own. We'll do this using a character predictive model that will take as input a variable length sequence and predict the next character. We can use the model many times in a row with the output from the last predicition as the input for the next call to generate a sequence.

In [51]:
from keras.preprocessing import sequence
import keras
import tensorflow as tf
import os
import numpy as np

We only need one peice of training data. In fact, we can write our own poem or play and pass that to the network for training if we'd like. However, to make things easy we'll use an extract from a shakesphere play.

In [53]:
path_to_file = tf.keras.utils.get_file(
               'shakespeare.txt', 
               'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt'
)

#### Loading Your Own Data
To load your own data, you'll need to upload a file from the dialog below. Then you'll need to follow the steps from above but load in this new file instead.

In [17]:
# from google.colab import files
# path_to_file = list(files.upload().keys())[0]

In [54]:
# Read, then decode for py2 compat.
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# length of text is the number of characters in it
print ('Length of text: {} characters'.format(len(text)))

Length of text: 1115394 characters


In [55]:
# Take a look at the first 250 characters in text
print(text[:250])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.



#### Encoding
Since this text isn't encoded yet well need to do that ourselves. We are going to encode each unique character as a different integer.

In [56]:
vocab = sorted(set(text)) # set will remove all duplicates
# Creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(vocab)} # enumerate method adds counter to an iterable and returns it
idx2char = np.array(vocab)

def text_to_int(text):
  return np.array([char2idx[c] for c in text])

text_as_int = text_to_int(text)

In [68]:
# lets look at how part of our text is encoded
print("Text:", text[:13])
print("Encoded:", text_to_int(text[:13]))

Text: First Citizen
Encoded: [18 47 56 57 58  1 15 47 58 47 64 43 52]


In [76]:
# A function that can convert our numeric values to text
def int_to_text(ints):
  try:
    ints = ints.numpy()
  except:
    pass
  return ''.join(idx2char[ints])

print(int_to_text(text_as_int[:13]))

First Citizen


#### Creating Training Examples
Remember our task is to feed the model a sequence and have it return to us the next character. This means we need to split our text data from above into many shorter sequences that we can pass to the model as training examples. 

The training examples we will prepapre will use a *seq_length* sequence as input and a *seq_length* sequence as the output where that sequence is the original sequence shifted one letter to the right. For example:

```input: Hell | output: ello```

Our first step will be to create a stream of characters from our text data.

In [77]:
seq_length = 100  # length of sequence for a training example
examples_per_epoch = len(text)//(seq_length+1)

# generate tensors from data
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

In [81]:
# Batch method to turn this stream of characters/tensors into batches of desired length
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

In [83]:
a = 0
for i in sequences:
    print(i)
    a += 1
    if a >= 5:
        break

tf.Tensor(
[18 47 56 57 58  1 15 47 58 47 64 43 52 10  0 14 43 44 53 56 43  1 61 43
  1 54 56 53 41 43 43 42  1 39 52 63  1 44 59 56 58 46 43 56  6  1 46 43
 39 56  1 51 43  1 57 54 43 39 49  8  0  0 13 50 50 10  0 31 54 43 39 49
  6  1 57 54 43 39 49  8  0  0 18 47 56 57 58  1 15 47 58 47 64 43 52 10
  0 37 53 59  1], shape=(101,), dtype=int64)
tf.Tensor(
[39 56 43  1 39 50 50  1 56 43 57 53 50 60 43 42  1 56 39 58 46 43 56  1
 58 53  1 42 47 43  1 58 46 39 52  1 58 53  1 44 39 51 47 57 46 12  0  0
 13 50 50 10  0 30 43 57 53 50 60 43 42  8  1 56 43 57 53 50 60 43 42  8
  0  0 18 47 56 57 58  1 15 47 58 47 64 43 52 10  0 18 47 56 57 58  6  1
 63 53 59  1 49], shape=(101,), dtype=int64)
tf.Tensor(
[52 53 61  1 15 39 47 59 57  1 25 39 56 41 47 59 57  1 47 57  1 41 46 47
 43 44  1 43 52 43 51 63  1 58 53  1 58 46 43  1 54 43 53 54 50 43  8  0
  0 13 50 50 10  0 35 43  1 49 52 53 61  5 58  6  1 61 43  1 49 52 53 61
  5 58  8  0  0 18 47 56 57 58  1 15 47 58 47 64 43 52 10  0 24 43 58  1
 

In [84]:
# Use these sequences of length 101 and split them into input and output
def split_input_target(chunk):  # for the example: hello
    input_text = chunk[:-1]  # hell
    target_text = chunk[1:]  # ello
    return input_text, target_text  # hell, ello

dataset = sequences.map(split_input_target)  # we use map to apply the above function to every entry

In [85]:
for x, y in dataset.take(2): #only look at the first 2 elements
  print("\n\nEXAMPLE\n")
  print("INPUT")
  print(int_to_text(x))
  print("\nOUTPUT")
  print(int_to_text(y))



EXAMPLE

INPUT
First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You

OUTPUT
irst Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You 


EXAMPLE

INPUT
are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you 

OUTPUT
re all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you k


In [86]:
# Make training batches
BATCH_SIZE = 64
VOCAB_SIZE = len(vocab)  # vocab is number of unique characters
EMBEDDING_DIM = 256
RNN_UNITS = 1024

# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000

data = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

In [100]:
for i in data.take(1):
    print(i[0][60])
    print(i[1][60])

tf.Tensor(
[43  8  1 14 59 58  0 58 46 43  1 40 53 58 58 53 51  1 53 44  1 58 46 43
  1 52 43 61 57  1 47 57  1 58 46 39 58  1 53 59 56  1 45 43 52 43 56 39
 50  1 47 57  1 41 59 58  1 47  5  0 58 46 43  1 51 47 42 42 50 43  1 39
 52 42  1 40 59 58  1 53 52 43  1 46 39 50 44  1 53 44  1 61 46 39 58  1
 46 43  1 61], shape=(100,), dtype=int64)
tf.Tensor(
[ 8  1 14 59 58  0 58 46 43  1 40 53 58 58 53 51  1 53 44  1 58 46 43  1
 52 43 61 57  1 47 57  1 58 46 39 58  1 53 59 56  1 45 43 52 43 56 39 50
  1 47 57  1 41 59 58  1 47  5  0 58 46 43  1 51 47 42 42 50 43  1 39 52
 42  1 40 59 58  1 53 52 43  1 46 39 50 44  1 53 44  1 61 46 39 58  1 46
 43  1 61 39], shape=(100,), dtype=int64)


#### Building the Model
Now it is time to build the model. We will use an **embedding layer** a **LSTM** and one **dense layer** that contains a node for each unique character in our training data. The dense layer will give us a probability distribution over all nodes.

In [101]:
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim,
                              batch_input_shape=[batch_size, None]),
    tf.keras.layers.LSTM(rnn_units,
                        return_sequences=True,
                        stateful=True,
                        recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(vocab_size)
  ])
  return model

model = build_model(VOCAB_SIZE,EMBEDDING_DIM, RNN_UNITS, BATCH_SIZE)
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (64, None, 256)           16640     
                                                                 
 lstm_1 (LSTM)               (64, None, 1024)          5246976   
                                                                 
 dense_1 (Dense)             (64, None, 65)            66625     
                                                                 
Total params: 5,330,241
Trainable params: 5,330,241
Non-trainable params: 0
_________________________________________________________________


#### Creating a Loss Function
Now we are going to create our own loss function for this problem. This is because our model will output a (64, sequence_length, 65) shaped tensor that represents the probability distribution of each character at each timestep for every sequence in the batch. 

However, before we do that let's have a look at a sample input and the output from our untrained model. This is so we can understand what the model is giving us.

In [115]:
for input_example_batch, target_example_batch in data.take(1):
  print(input_example_batch)
  example_batch_predictions = model(input_example_batch)
  # ask our model for a prediction on our first batch of training data (64 entries)
  print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")  
  # print out the output shape

tf.Tensor(
[[58  1 57 ... 46 43  1]
 [59 56 43 ... 54 39 57]
 [43 56 63 ... 40 43  1]
 ...
 [51 43 56 ... 59 52 42]
 [ 1 46 43 ... 43 41 49]
 [39 56 56 ...  1 53 59]], shape=(64, 100), dtype=int64)
(64, 100, 65) # (batch_size, sequence_length, vocab_size)


In [117]:
# we can see that the predicition is an array of 64 arrays, one for each entry in the batch
print(len(example_batch_predictions))
print(example_batch_predictions)

64
tf.Tensor(
[[[ 8.88737198e-03 -3.89459135e-04 -3.85667197e-04 ...  7.73166539e-04
   -8.22616741e-04  2.18325295e-06]
  [ 4.53541614e-03 -9.30596795e-03  6.52253162e-04 ...  4.88170143e-03
   -5.71037875e-03  4.63583414e-03]
  [ 1.87510764e-03 -8.01964756e-03  3.26865166e-03 ...  6.84304675e-03
    1.24659040e-03 -2.81497370e-03]
  ...
  [ 2.43138405e-04 -3.71936942e-04  1.60229707e-03 ... -6.91157160e-03
    9.17242747e-03  2.63734441e-03]
  [ 1.35667040e-03 -7.57548027e-04 -5.05056512e-03 ... -3.20511311e-03
    9.20652226e-03  1.05718821e-02]
  [-6.59766607e-04 -1.08650029e-02 -4.23905393e-03 ...  1.58766285e-03
    1.91760412e-03  1.26729477e-02]]

 [[-1.82141713e-03 -9.11518093e-03 -8.29156954e-04 ...  4.90280520e-03
    4.19899588e-03  5.80808986e-03]
  [-3.44860344e-03 -8.05037189e-03 -1.68337650e-03 ...  2.17182562e-03
    1.84672419e-03  5.05521288e-03]
  [-8.99372157e-04 -7.32528139e-03 -7.81962276e-03 ...  4.46208799e-03
    5.25792688e-03  1.20163169e-02]
  ...
  [-1.346

In [118]:
# lets examine one prediction
pred = example_batch_predictions[0]
print(len(pred))
print(pred)
# notice this is a 2d array of length 100, where each interior array is the prediction for the next character at each time step

100
tf.Tensor(
[[ 8.8873720e-03 -3.8945914e-04 -3.8566720e-04 ...  7.7316654e-04
  -8.2261674e-04  2.1832529e-06]
 [ 4.5354161e-03 -9.3059679e-03  6.5225316e-04 ...  4.8817014e-03
  -5.7103788e-03  4.6358341e-03]
 [ 1.8751076e-03 -8.0196476e-03  3.2686517e-03 ...  6.8430468e-03
   1.2465904e-03 -2.8149737e-03]
 ...
 [ 2.4313841e-04 -3.7193694e-04  1.6022971e-03 ... -6.9115716e-03
   9.1724275e-03  2.6373444e-03]
 [ 1.3566704e-03 -7.5754803e-04 -5.0505651e-03 ... -3.2051131e-03
   9.2065223e-03  1.0571882e-02]
 [-6.5976661e-04 -1.0865003e-02 -4.2390539e-03 ...  1.5876628e-03
   1.9176041e-03  1.2672948e-02]], shape=(100, 65), dtype=float32)


In [119]:
# and finally well look at a prediction at the first timestep
time_pred = pred[0]
print(len(time_pred))
print(time_pred)
# and of course its 65 values representing the probabillity of each character occuring next

65
tf.Tensor(
[ 8.8873720e-03 -3.8945914e-04 -3.8566720e-04 -5.2223948e-04
  7.6188762e-03 -7.2167721e-05  2.3388923e-03  9.7754179e-04
 -5.1537775e-03 -5.5673497e-04 -5.0233463e-03 -5.4896269e-03
  4.3113381e-03 -1.3399334e-04  4.2958423e-03 -6.8725320e-04
 -9.7384956e-03 -2.6062878e-03  9.9222800e-03 -4.9169101e-03
  5.0284276e-03 -2.5855128e-03  7.8635383e-04  1.2487841e-02
 -3.0421102e-03  6.4492095e-03  2.5084266e-03 -4.4517992e-03
 -2.1157041e-04 -1.4921324e-03 -9.7693945e-04  7.5311898e-03
 -2.0633847e-03  7.3526055e-03 -5.3049303e-03 -3.0981614e-03
  8.0278823e-03  5.2639679e-04 -3.5811884e-03  6.3575027e-03
 -3.2834790e-03  6.5011904e-03  7.2189169e-03 -3.9057049e-03
  2.5207065e-03  7.7863880e-03  9.4246548e-03  4.7421083e-04
  3.6018086e-04 -1.0684413e-03  7.0897327e-03 -7.0586856e-03
  5.0668875e-03 -5.4867044e-03 -4.8759575e-03  6.8383412e-03
  6.4094237e-04  2.2931448e-03  4.2047738e-03 -8.5567869e-04
  1.7380645e-04  9.1863738e-04  7.7316654e-04 -8.2261674e-04
  2.183252

In [120]:
# If we want to determine the predicted character we need to sample the output distribution (pick a value based on probabillity)
# Samples 1 prob out of 65 for each 100 timesteps
sampled_indices = tf.random.categorical(pred, num_samples=1) # returns indices of each sampling for 100 timesteps
print(sampled_indices)

# now we can reshape that array and convert all the integers to numbers to see the actual characters
sampled_indices = np.reshape(sampled_indices, (1, -1))[0]
print(sampled_indices)
predicted_chars = int_to_text(sampled_indices)

predicted_chars  # and this is what the model predicted for training sequence 1

tf.Tensor(
[[40]
 [21]
 [59]
 [25]
 [10]
 [52]
 [22]
 [ 6]
 [62]
 [14]
 [ 3]
 [57]
 [50]
 [18]
 [33]
 [34]
 [20]
 [58]
 [24]
 [13]
 [25]
 [23]
 [45]
 [64]
 [39]
 [27]
 [12]
 [64]
 [58]
 [18]
 [45]
 [ 9]
 [ 4]
 [ 1]
 [53]
 [26]
 [43]
 [20]
 [40]
 [22]
 [30]
 [17]
 [58]
 [29]
 [56]
 [62]
 [50]
 [ 9]
 [ 7]
 [ 2]
 [ 9]
 [24]
 [23]
 [ 1]
 [49]
 [31]
 [64]
 [13]
 [15]
 [18]
 [44]
 [41]
 [26]
 [ 7]
 [21]
 [29]
 [ 7]
 [51]
 [37]
 [15]
 [28]
 [60]
 [61]
 [47]
 [37]
 [ 1]
 [58]
 [57]
 [58]
 [60]
 [50]
 [ 4]
 [60]
 [59]
 [29]
 [64]
 [35]
 [46]
 [11]
 [22]
 [63]
 [41]
 [21]
 [22]
 [ 5]
 [ 6]
 [10]
 [22]
 [18]
 [62]], shape=(100, 1), dtype=int64)
[40 21 59 25 10 52 22  6 62 14  3 57 50 18 33 34 20 58 24 13 25 23 45 64
 39 27 12 64 58 18 45  9  4  1 53 26 43 20 40 22 30 17 58 29 56 62 50  9
  7  2  9 24 23  1 49 31 64 13 15 18 44 41 26  7 21 29  7 51 37 15 28 60
 61 47 37  1 58 57 58 60 50  4 60 59 29 64 35 46 11 22 63 41 21 22  5  6
 10 22 18 62]


"bIuM:nJ,xB$slFUVHtLAMKgzaO?ztFg3& oNeHbJREtQrxl3-!3LK kSzACFfcN-IQ-mYCPvwiY tstvl&vuQzWh;JycIJ',:JFx"

So now we need to create a loss function that can compare that output to the expected output and give us some numeric value representing how close the two were. 

In [125]:
def loss(labels, logits):
  return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

#### Compiling the Model
At this point we can think of our problem as a classification problem where the model predicts the probabillity of each unique letter coming next.

In [126]:
model.compile(optimizer='adam', loss=loss)

#### Creating Checkpoints
Now we are going to setup and configure our model to save checkpoinst as it trains. This will allow us to load our model from a checkpoint and continue training it.

In [127]:
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

#### Training
Finally, we will start training the model. 
**If this is taking a while go to Runtime > Change Runtime Type and choose "GPU" under hardware accelerator.**

In [None]:
history = model.fit(data, epochs=50, callbacks=[checkpoint_callback])

Epoch 1/50
Epoch 2/50
 31/172 [====>.........................] - ETA: 13:56 - loss: 1.9947

#### Loading the Model
We'll rebuild the model from a checkpoint using a batch_size of 1 so that we can feed one peice of text to the model and have it make a prediction.

In [None]:
model = build_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, batch_size=1)

Once the model is finished training, we can find the **lastest checkpoint** that stores the models weights using the following line.

In [None]:
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
model.build(tf.TensorShape([1, None]))

We can load **any checkpoint** we want by specifying the exact file to load.

In [None]:
checkpoint_num = 10
model.load_weights(tf.train.load_checkpoint("./training_checkpoints/ckpt_" + str(checkpoint_num)))
model.build(tf.TensorShape([1, None]))

#### Generating Text
Now we can use the lovely function provided by tensorflow to generate some text using any starting string we'd like.

In [None]:
def generate_text(model, start_string):
  # Evaluation step (generating text using the learned model)

  # Number of characters to generate
  num_generate = 800

  # Converting our start string to numbers (vectorizing)
  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)

  # Empty string to store our results
  text_generated = []

  # Low temperatures results in more predictable text.
  # Higher temperatures results in more surprising text.
  # Experiment to find the best setting.
  temperature = 1.0

  # Here batch size == 1
  model.reset_states()
  for i in range(num_generate):
      predictions = model(input_eval)
      # remove the batch dimension
    
      predictions = tf.squeeze(predictions, 0)

      # using a categorical distribution to predict the character returned by the model
      predictions = predictions / temperature
      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

      # We pass the predicted character as the next input to the model
      # along with the previous hidden state
      input_eval = tf.expand_dims([predicted_id], 0)

      text_generated.append(idx2char[predicted_id])

  return (start_string + ''.join(text_generated))

In [None]:
inp = input("Type a starting string: ")
print(generate_text(model, inp))