### Generating Text from a Character level using RNN

* Article : [Andrej_Karpathy](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)

In [0]:
%tensorflow_version 2.x

In [0]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import tensorflow as tf

-------------------------------------------------------------------------------------------------------------------------------

### 1) Data

* We can grab any free text from here : [text](https://www.gutenberg.org/)


* We will choose Shakespeare's works for two main reasons :

    1) Its a large corpus of text, its usually recommended you have at least a source of 1 million characters total to get realistic text generation

    2) It has a very distinctive style. Since the text data uses old style english and is formatted in the style of a stage play, it will be very obvious to us if the model is able to reproduce similar results.

In [0]:
path_to_file = 'shakespeare.txt'

In [0]:
text = open(path_to_file, 'r').read()    # read it with mode 'r'

In [9]:
print(text[:500])


                     1
  From fairest creatures we desire increase,
  That thereby beauty's rose might never die,
  But as the riper should by time decease,
  His tender heir might bear his memory:
  But thou contracted to thine own bright eyes,
  Feed'st thy light's flame with self-substantial fuel,
  Making a famine where abundance lies,
  Thy self thy foe, to thy sweet self too cruel:
  Thou that art now the world's fresh ornament,
  And only herald to the gaudy spring,
  Within thine own bu


### `Understanding the Unique Characters`

In [0]:
vocab = sorted(set(text))

In [11]:
print(vocab)

['\n', ' ', '!', '"', '&', "'", '(', ')', ',', '-', '.', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '<', '>', '?', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '[', ']', '_', '`', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '|']


In [12]:
len(vocab)            # We need to remember this for the last Dense layer 

83

-------------------------------------------------------------------------------------------------------------------------------

### 2) Text Processing

* `Text Vectorization`

* `Create Encoding Dictionary`

**We know a neural network can't take in the raw_string data, we need to assign numbers to each character**

**Let's create two dictionaries that can go from numeric index to character and character to numeric index**

##### We will use `enumerate logic` : which creates a tuple containing the integer or number for corresponding characters

In [13]:
for pair in enumerate(vocab):
    
    print(pair)

(0, '\n')
(1, ' ')
(2, '!')
(3, '"')
(4, '&')
(5, "'")
(6, '(')
(7, ')')
(8, ',')
(9, '-')
(10, '.')
(11, '0')
(12, '1')
(13, '2')
(14, '3')
(15, '4')
(16, '5')
(17, '6')
(18, '7')
(19, '8')
(20, '9')
(21, ':')
(22, ';')
(23, '<')
(24, '>')
(25, '?')
(26, 'A')
(27, 'B')
(28, 'C')
(29, 'D')
(30, 'E')
(31, 'F')
(32, 'G')
(33, 'H')
(34, 'I')
(35, 'J')
(36, 'K')
(37, 'L')
(38, 'M')
(39, 'N')
(40, 'O')
(41, 'P')
(42, 'Q')
(43, 'R')
(44, 'S')
(45, 'T')
(46, 'U')
(47, 'V')
(48, 'W')
(49, 'X')
(50, 'Y')
(51, 'Z')
(52, '[')
(53, ']')
(54, '_')
(55, '`')
(56, 'a')
(57, 'b')
(58, 'c')
(59, 'd')
(60, 'e')
(61, 'f')
(62, 'g')
(63, 'h')
(64, 'i')
(65, 'j')
(66, 'k')
(67, 'l')
(68, 'm')
(69, 'n')
(70, 'o')
(71, 'p')
(72, 'q')
(73, 'r')
(74, 's')
(75, 't')
(76, 'u')
(77, 'v')
(78, 'w')
(79, 'x')
(80, 'y')
(81, 'z')
(82, '|')


##### Let's create a dictionary where the keys are the characters with some number assigned to them

* Using `dictionary comprehension`

In [0]:
char_to_ind = {char:ind for ind,char in enumerate(vocab)}

In [15]:
char_to_ind

{'\n': 0,
 ' ': 1,
 '!': 2,
 '"': 3,
 '&': 4,
 "'": 5,
 '(': 6,
 ')': 7,
 ',': 8,
 '-': 9,
 '.': 10,
 '0': 11,
 '1': 12,
 '2': 13,
 '3': 14,
 '4': 15,
 '5': 16,
 '6': 17,
 '7': 18,
 '8': 19,
 '9': 20,
 ':': 21,
 ';': 22,
 '<': 23,
 '>': 24,
 '?': 25,
 'A': 26,
 'B': 27,
 'C': 28,
 'D': 29,
 'E': 30,
 'F': 31,
 'G': 32,
 'H': 33,
 'I': 34,
 'J': 35,
 'K': 36,
 'L': 37,
 'M': 38,
 'N': 39,
 'O': 40,
 'P': 41,
 'Q': 42,
 'R': 43,
 'S': 44,
 'T': 45,
 'U': 46,
 'V': 47,
 'W': 48,
 'X': 49,
 'Y': 50,
 'Z': 51,
 '[': 52,
 ']': 53,
 '_': 54,
 '`': 55,
 'a': 56,
 'b': 57,
 'c': 58,
 'd': 59,
 'e': 60,
 'f': 61,
 'g': 62,
 'h': 63,
 'i': 64,
 'j': 65,
 'k': 66,
 'l': 67,
 'm': 68,
 'n': 69,
 'o': 70,
 'p': 71,
 'q': 72,
 'r': 73,
 's': 74,
 't': 75,
 'u': 76,
 'v': 77,
 'w': 78,
 'x': 79,
 'y': 80,
 'z': 81,
 '|': 82}

In [16]:
char_to_ind['H']

33

In [0]:
ind_to_char = np.array(vocab)

In [18]:
ind_to_char[33]

'H'

##### `Encoding the text`

In [0]:
encoded_text = np.array([char_to_ind[c] for c in text])

In [20]:
encoded_text

array([ 0,  1,  1, ...,  1,  1, 39])

In [21]:
encoded_text.shape

(3145728,)

* So we have almost 5.5 million characters

In [0]:
#

##### We now have a mapping we can use to go back and forth from characters to numerics

In [0]:
sample = text[:40]

In [24]:
sample

'\n                     1\n  From fairest c'

In [25]:
encoded_text[:40]

array([ 0,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1, 12,  0,  1,  1, 31, 73, 70, 68,  1, 61, 56, 64,
       73, 60, 74, 75,  1, 58])

-------------------------------------------------------------------------------------------------------------------------------

### 3) Creating Batches

Overall what we are trying to achieve is to have the model predict the next highest probability character given a historical sequence of characters. Its up to us (the user) to choose how long that historic sequence. Too short a sequence and we don't have enough information (e.g. given the letter "a" , what is the next character) , too long a sequence and training will take too long and most likely overfit to sequence characters that are irrelevant to characters farther out. While there is no correct sequence length choice, you should consider the text itself, how long normal phrases are in it, and a reasonable idea of what characters/words are relevant to each other

* `Understand Text Sequences`

        To understand how the sequences are organized and shifted one character forward
        

* `Creating Batches`        


* `Shuffle Batches`

------------------------------------------------------------------------------------------------------------------------------

##### How long the training sequence should be?

##### We must make sure that our training sequences are long enough that they will actually grab the general structure of the text

In [26]:
print(text[:500])


                     1
  From fairest creatures we desire increase,
  That thereby beauty's rose might never die,
  But as the riper should by time decease,
  His tender heir might bear his memory:
  But thou contracted to thine own bright eyes,
  Feed'st thy light's flame with self-substantial fuel,
  Making a famine where abundance lies,
  Thy self thy foe, to thy sweet self too cruel:
  Thou that art now the world's fresh ornament,
  And only herald to the gaudy spring,
  Within thine own bu


In [0]:
line = "From fairest creatures we desire increase"

In [28]:
len(line)

41

In [0]:
part_stanza = '''
From fairest creatures we desire increase,
  That thereby beauty's rose might never die,
  But as the riper should by time decease,
'''

In [30]:
len(part_stanza)

133

### `Training Sequences`

* The actual text data will be the text sequence shifted one character forward

* For Instance :

    * Sequence In : 'Hello my nam'
    
    * Sequence Out : 'ello my name'

In [0]:
seq_len = 120   # choosing a value around (133)

In [0]:
'''
Total no.of sequences in the Text

// is to round off the division value

+1 is to include index_0

'''

total_num_seq = len(text) // (seq_len+1)    

In [33]:
total_num_seq

25997

##### Now let's create the Training sequences 

* `tf.data.Dataset.from_tensor_slices` function converts a text vector into a stream of character indices

In [0]:
char_dataset = tf.data.Dataset.from_tensor_slices(encoded_text)

In [35]:
type(char_dataset)

tensorflow.python.data.ops.dataset_ops.TensorSliceDataset

In [36]:
for item in char_dataset.take(500):   # take method creates a dataset
    
    print(item)

tf.Tensor(0, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(12, shape=(), dtype=int64)
tf.Tensor(0, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(31, shape=(), dtype=int64)
tf.Tensor(73, shape=(), dt

In [37]:
for item in char_dataset.take(500):
    
    print(item.numpy())

0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
12
0
1
1
31
73
70
68
1
61
56
64
73
60
74
75
1
58
73
60
56
75
76
73
60
74
1
78
60
1
59
60
74
64
73
60
1
64
69
58
73
60
56
74
60
8
0
1
1
45
63
56
75
1
75
63
60
73
60
57
80
1
57
60
56
76
75
80
5
74
1
73
70
74
60
1
68
64
62
63
75
1
69
60
77
60
73
1
59
64
60
8
0
1
1
27
76
75
1
56
74
1
75
63
60
1
73
64
71
60
73
1
74
63
70
76
67
59
1
57
80
1
75
64
68
60
1
59
60
58
60
56
74
60
8
0
1
1
33
64
74
1
75
60
69
59
60
73
1
63
60
64
73
1
68
64
62
63
75
1
57
60
56
73
1
63
64
74
1
68
60
68
70
73
80
21
0
1
1
27
76
75
1
75
63
70
76
1
58
70
69
75
73
56
58
75
60
59
1
75
70
1
75
63
64
69
60
1
70
78
69
1
57
73
64
62
63
75
1
60
80
60
74
8
0
1
1
31
60
60
59
5
74
75
1
75
63
80
1
67
64
62
63
75
5
74
1
61
67
56
68
60
1
78
64
75
63
1
74
60
67
61
9
74
76
57
74
75
56
69
75
64
56
67
1
61
76
60
67
8
0
1
1
38
56
66
64
69
62
1
56
1
61
56
68
64
69
60
1
78
63
60
73
60
1
56
57
76
69
59
56
69
58
60
1
67
64
60
74
8
0
1
1
45
63
80
1
74
60
67
61
1
75
63
80
1
61
70
60
8
1
75
70
1
75
63


In [38]:
for item in char_dataset.take(500):
    
    print(ind_to_char[item.numpy()])



 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1


 
 
F
r
o
m
 
f
a
i
r
e
s
t
 
c
r
e
a
t
u
r
e
s
 
w
e
 
d
e
s
i
r
e
 
i
n
c
r
e
a
s
e
,


 
 
T
h
a
t
 
t
h
e
r
e
b
y
 
b
e
a
u
t
y
'
s
 
r
o
s
e
 
m
i
g
h
t
 
n
e
v
e
r
 
d
i
e
,


 
 
B
u
t
 
a
s
 
t
h
e
 
r
i
p
e
r
 
s
h
o
u
l
d
 
b
y
 
t
i
m
e
 
d
e
c
e
a
s
e
,


 
 
H
i
s
 
t
e
n
d
e
r
 
h
e
i
r
 
m
i
g
h
t
 
b
e
a
r
 
h
i
s
 
m
e
m
o
r
y
:


 
 
B
u
t
 
t
h
o
u
 
c
o
n
t
r
a
c
t
e
d
 
t
o
 
t
h
i
n
e
 
o
w
n
 
b
r
i
g
h
t
 
e
y
e
s
,


 
 
F
e
e
d
'
s
t
 
t
h
y
 
l
i
g
h
t
'
s
 
f
l
a
m
e
 
w
i
t
h
 
s
e
l
f
-
s
u
b
s
t
a
n
t
i
a
l
 
f
u
e
l
,


 
 
M
a
k
i
n
g
 
a
 
f
a
m
i
n
e
 
w
h
e
r
e
 
a
b
u
n
d
a
n
c
e
 
l
i
e
s
,


 
 
T
h
y
 
s
e
l
f
 
t
h
y
 
f
o
e
,
 
t
o
 
t
h
y
 
s
w
e
e
t
 
s
e
l
f
 
t
o
o
 
c
r
u
e
l
:


 
 
T
h
o
u
 
t
h
a
t
 
a
r
t
 
n
o
w
 
t
h
e
 
w
o
r
l
d
'
s
 
f
r
e
s
h
 
o
r
n
a
m
e
n
t
,


 
 
A
n
d
 
o
n
l
y
 
h
e
r
a
l
d
 
t
o
 
t
h
e
 
g
a
u
d
y
 
s
p
r
i
n
g
,


 
 
W
i
t
h
i
n
 
t
h
i
n
e
 
o
w
n
 
b
u


##### So now let's create Sequences from it

* `batch` method converts the individual character calls into sequences we can feed in as a batch

* `drop_remainder` represents whether or not the last batch should be dropped in the case it has fewer elements than the actual `batch_size` elements; the default behaviour is not to drop the smaller batch

In [0]:
sequences = char_dataset.batch(seq_len+1, drop_remainder=True)

So now we have our sequences, we will perform the following steps for each one of them to create our target text sequence :

* Grab the input text sequence

* Assign the target text sequence as the input text sequence shifted by one step forward

* Group them together as a tuple

In [0]:
def create_seq_targets(seq):
    
    input_txt = seq[:-1]   # hello my nam
    
    target_txt = seq[1:]   # ello my name
    
    return input_txt, target_txt

##### Let's map the function to all the sequences 

So my final dataset will be :

In [0]:
dataset = sequences.map(create_seq_targets)

In [42]:
for input_txt, target_txt in dataset.take(1):
    
    print(input_txt.numpy())
    
    print(''.join(ind_to_char[input_txt.numpy()]))
    
    print(target_txt.numpy())
    
    print(''.join(ind_to_char[target_txt.numpy()]))

[ 0  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 12  0
  1  1 31 73 70 68  1 61 56 64 73 60 74 75  1 58 73 60 56 75 76 73 60 74
  1 78 60  1 59 60 74 64 73 60  1 64 69 58 73 60 56 74 60  8  0  1  1 45
 63 56 75  1 75 63 60 73 60 57 80  1 57 60 56 76 75 80  5 74  1 73 70 74
 60  1 68 64 62 63 75  1 69 60 77 60 73  1 59 64 60  8  0  1  1 27 76 75]

                     1
  From fairest creatures we desire increase,
  That thereby beauty's rose might never die,
  But
[ 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 12  0  1
  1 31 73 70 68  1 61 56 64 73 60 74 75  1 58 73 60 56 75 76 73 60 74  1
 78 60  1 59 60 74 64 73 60  1 64 69 58 73 60 56 74 60  8  0  1  1 45 63
 56 75  1 75 63 60 73 60 57 80  1 57 60 56 76 75 80  5 74  1 73 70 74 60
  1 68 64 62 63 75  1 69 60 77 60 73  1 59 64 60  8  0  1  1 27 76 75  1]
                     1
  From fairest creatures we desire increase,
  That thereby beauty's rose might never die,
  But 


* `So now we have all the actual sequences`


* `Let's create the training batches`


* `We have to shuffle those sequences into a random order, so the model doesn't learn on a particular ordering of the text. It should be any random snippet of the text and the model should start generating the next sequence from it`


* `By shuffling, the model doesn't overfit to any section of the text, but can instead generate characters given any seed text`

In [0]:
batch_size = 128    # 128 sequences feeding into the network at a time

buffer_size = 10000

'''
buffer_size to shuffle the dataset so it doesn't attempt to shuffle the entire dataset in the memory

If dealing with the large dataset, it could cause a potential memory error

'''

dataset = dataset.shuffle(buffer_size).batch(batch_size, drop_remainder=True)

In [44]:
dataset

<BatchDataset shapes: ((128, 120), (128, 120)), types: (tf.int64, tf.int64)>

----------------------------------------------------------------------------------------------------------------------------
128 is the no.of sequences

each sequence is 120 long

first tuple is for input sequence

second tuple is for target sequence

-----------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------

### 4) Create the Model

* Set up the Loss Function

* Create the Model

    * Embedding layer
    
    * GRU layer
    
    * Dense layer
    
----------------------------------------------------------------------------------------------------------------------------   

* Embeddings are the only way one can transform discrete feature into a vector form

* All machine learning algorithms take a vector and return a prediction

* Therefore if you have a categorical feature, the only way you can use it in a ML model is by embedding it into a vector

* The simplest kind of embedding is one-hot encoding:

        1 -> (1, 0, 0)
        2 -> (0, 1, 0)
        3 -> (0, 0, 1)

* We can replace the categorical feature with three possible values with the vectors as above without losing any information

* These vectors have as many elements as the number of values of the categorical feature

------------------------------------------------------------------------------------------------------------------------------

* When your categorical feature has a lot of possible values, it is often better to replace it with embeddings with lower dimensionality

* Lower dimensionality gives you two advantages:

        It is more computationally efficient, because smaller embeddings require less memory
        
        It regularizes your model, because the smaller number of parameters your model have, the better it is regularized
-------------------------------                

* Embeddings are often used to map words to vectors in NLP systems, words represented as vectors can be used as an input for recurrent neural network

* Refer this Article : [Word Embedding Layers with Keras](https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/)

* Embeddings are also quite often used in recommendation systems to represent high dimensionality categorical variables like user_id or recommendable_item_id

* Refer this Article : [Embeddings for Collaborative Filtering](https://developers.google.com/machine-learning/crash-course/embeddings/motivation-from-collaborative-filtering)

------------------------------------------------------------------------------------------------------------------------------


````sh
````

from keras.models import Sequential
from keras.layers import Embedding

import numpy as np

model = Sequential()

model.add(Embedding(1000, 4))    # 1000 words, 4 dimensions

model.compile(optimizer='adam', loss='mse')


print(model.predict(np.array([[4,8,3]])))


o/p :

     [[[-0.09090  -0.93339  0.87322  -0.90893]

       [-0.09989   0.89879 -0.97079   0.86853]
  
       [0.468687   0.78346  0.67352   0.42736]]]
  

-----------------------------------------------------------------------------------------------------------------------------

* The answer is that the embedding layers in TensorFlow completely differ from the the word embedding algorithms, such as word2vec and GloVe. They only share a similar name!


* Embedding refers to mapping a high-dimensional sparse feature vector to a dense vector with a much lower dimension. The embedding layer in TensorFlow is just like a look-up table. For instance, assume that there is a 2D tensor in which the first dimension represent the ID of a word and the second dimension represents the dense vector that is going to be learned during the training phase of the neural network. It is notable that you can also use pre-trained word embeddings (e.g., using word2vec) and use them as an input of the network. You can set “Trainable” argument to “False”, if you want to use pre-trained embeddings and do not wish to update them during the learning process of your network

-------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------

* We based this model architecture off the [DeepMoji](https://deepmoji.mit.edu/) and the original source code can be found [here](https://github.com/bfelbo/DeepMoji)


* The embedding layer will serve as the input layer, which essentially creates a lookup table that maps the numbers indices of each character to a vector with "embedding dim" number of dimensions. As you can imagine, the larger this embedding size, the more complex the training. This is similar to the idea behind word2vec, where words are mapped to some n-dimensional space. Embedding before feeding straight into the LSTM usually leads to more realisitic results

In [0]:
vocab_size = len(vocab)            # unique characters

embed_dim = 64                     # near to vocab_size (prefer to be lesser than vocab_size)

rnn_neurons = 1026

In [0]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, GRU, Dense

##### `Let's create a function that easily adapts to different variables as shown above :`

### Setting up Loss Function :

* For our loss we will use `sparse categorical crossentropy`, which we can import from Keras. We will also set this as logits=True

-----------------------------------------------------------------------------------------------------------
###### Sparse Categorical CrossEntropy vs Categorical CrossEntropy

* `If your targets are one-hot encoded, use categorical_crossentropy. Examples of one-hot encodings :`

                [1,0,0]

                [0,1,0] 

                [0,0,1]
                

* `But if your targets are integers, use sparse_categorical_crossentropy. Examples of integer encodings :`

                1
                
                2
                
                3

In [0]:
from tensorflow.keras.losses import sparse_categorical_crossentropy

In [0]:
# help(sparse_categorical_crossentropy)


* We can't just pass-in sparse_c_entropy because we have to add logits=True, as we have one-hot encoded if it is False they are not One-hot encoded 


* As we need to add this, we have to create our own custom function as follows :

In [0]:
def sparse_cat_loss(y_true, y_pred):
    
    return sparse_categorical_crossentropy(y_true, y_pred, from_logits=True)

##### `Adaptable Model Function :`

In [0]:
def create_model(vocab_size, embed_dim, rnn_neurons, batch_size) :
    
    model = Sequential()
    
    model.add(Embedding(vocab_size, embed_dim, batch_input_shape=[batch_size, None]))
    
    model.add(GRU(rnn_neurons, return_sequences=True, stateful=True, recurrent_initializer='glorot_uniform'))
    
    model.add(Dense(vocab_size))
    
    model.compile(optimizer='adam', loss=sparse_cat_loss)
    
    return model

* return_sequences - to include even the last sequence

* stateful - to keep the current state

* recu_intializer - weight values for the layer

In [0]:
model = create_model(vocab_size=vocab_size, embed_dim=embed_dim, rnn_neurons=rnn_neurons, batch_size=batch_size)

In [52]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (128, None, 64)           5312      
_________________________________________________________________
gru (GRU)                    (128, None, 1026)         3361176   
_________________________________________________________________
dense (Dense)                (128, None, 83)           85241     
Total params: 3,451,729
Trainable params: 3,451,729
Non-trainable params: 0
_________________________________________________________________


------------------------------------------------------------------------------------------------------------------------------

### 5) Train the Model

* `Let's make sure that everything is ok with our model before we spend too much time on training`


* `So let's pass in a batch to confirm that the model predicts some random characters without any training`

* `So let's run an input batch`

In [0]:
for input_example_batch, target_example_batch in dataset.take(1):

  # predict off some random batch
  example_batch_predictions = model(input_example_batch)

* `input_example_batch` is the original sequence and `target_example_batch` is the original sequence shifted forward by 1 character

In [54]:
example_batch_predictions.shape

TensorShape([128, 120, 83])

In [56]:
example_batch_predictions[0]              # grabbing the very first batch predictions

<tf.Tensor: shape=(120, 83), dtype=float32, numpy=
array([[ 3.7978380e-03,  6.3308072e-04,  2.6800812e-03, ...,
         3.4248736e-03, -5.9183580e-03,  1.8865713e-03],
       [-1.8610966e-03,  2.8349934e-03, -9.1890730e-03, ...,
         4.8141559e-03,  2.8258939e-03, -2.5631636e-03],
       [-6.7382334e-03,  6.8786405e-03, -8.7012798e-03, ...,
         1.3493203e-03, -8.8981271e-04, -2.2271080e-03],
       ...,
       [ 3.1102551e-03, -6.8542780e-03,  2.7509965e-04, ...,
        -4.6271598e-04,  3.2786462e-03,  2.0610620e-03],
       [ 3.2731490e-03, -7.8187045e-03,  4.3129027e-03, ...,
        -4.7275750e-03,  6.2239864e-03, -1.4775491e-05],
       [ 8.1839971e-03, -9.8539060e-03,  3.8443143e-03, ...,
        -3.0750241e-03,  3.1683266e-03, -9.2872384e-04]], dtype=float32)>

##### These values are just probabilities that our model assumes for each concurrent character

In [0]:
sampled_indices = tf.random.categorical(example_batch_predictions[0],num_samples=1)

In [58]:
sampled_indices

<tf.Tensor: shape=(120, 1), dtype=int64, numpy=
array([[ 4],
       [ 6],
       [70],
       [27],
       [67],
       [41],
       [10],
       [29],
       [58],
       [51],
       [54],
       [22],
       [74],
       [17],
       [45],
       [78],
       [31],
       [ 5],
       [40],
       [61],
       [47],
       [15],
       [15],
       [56],
       [65],
       [23],
       [75],
       [78],
       [ 3],
       [49],
       [18],
       [28],
       [61],
       [34],
       [33],
       [78],
       [64],
       [24],
       [49],
       [27],
       [ 6],
       [80],
       [33],
       [38],
       [40],
       [47],
       [78],
       [20],
       [30],
       [57],
       [22],
       [62],
       [24],
       [26],
       [52],
       [46],
       [ 0],
       [69],
       [32],
       [27],
       [ 9],
       [28],
       [48],
       [51],
       [14],
       [74],
       [28],
       [16],
       [18],
       [32],
       [21],
       [ 0],
       [69],
   

##### Inorder to pass this sort of Array to ind_to char sequence, we need to reshape this Array

In [0]:
sampled_indices = tf.squeeze(sampled_indices, axis=-1).numpy()            # Reformat to not to be a list of lists

In [61]:
sampled_indices             # Now we have got this in the format of ind_to_char sequence

array([ 4,  6, 70, 27, 67, 41, 10, 29, 58, 51, 54, 22, 74, 17, 45, 78, 31,
        5, 40, 61, 47, 15, 15, 56, 65, 23, 75, 78,  3, 49, 18, 28, 61, 34,
       33, 78, 64, 24, 49, 27,  6, 80, 33, 38, 40, 47, 78, 20, 30, 57, 22,
       62, 24, 26, 52, 46,  0, 69, 32, 27,  9, 28, 48, 51, 14, 74, 28, 16,
       18, 32, 21,  0, 69, 37, 27, 37, 36, 75, 25, 23, 21, 10,  6, 19, 13,
       17, 13, 34, 47, 53,  6, 61, 78,  6, 71, 41, 10, 61, 47, 59, 15, 77,
       48, 44, 13, 10, 58, 43, 35, 49, 57, 10, 72, 33, 39, 51, 54, 60, 33,
       73])

In [62]:
ind_to_char[sampled_indices]

array(['&', '(', 'o', 'B', 'l', 'P', '.', 'D', 'c', 'Z', '_', ';', 's',
       '6', 'T', 'w', 'F', "'", 'O', 'f', 'V', '4', '4', 'a', 'j', '<',
       't', 'w', '"', 'X', '7', 'C', 'f', 'I', 'H', 'w', 'i', '>', 'X',
       'B', '(', 'y', 'H', 'M', 'O', 'V', 'w', '9', 'E', 'b', ';', 'g',
       '>', 'A', '[', 'U', '\n', 'n', 'G', 'B', '-', 'C', 'W', 'Z', '3',
       's', 'C', '5', '7', 'G', ':', '\n', 'n', 'L', 'B', 'L', 'K', 't',
       '?', '<', ':', '.', '(', '8', '2', '6', '2', 'I', 'V', ']', '(',
       'f', 'w', '(', 'p', 'P', '.', 'f', 'V', 'd', '4', 'v', 'W', 'S',
       '2', '.', 'c', 'R', 'J', 'X', 'b', '.', 'q', 'H', 'N', 'Z', '_',
       'e', 'H', 'r'], dtype='<U1')

##### The above values are just a bunch of random characters since our model has not been trained at all

##### `After confirming the dimensions are working, we can now train our model :`

In [0]:
epochs = 30            # 30 epochs atleast to get the realistic results

In [67]:
model.fit(dataset, epochs=epochs)

Train for 203 steps
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<tensorflow.python.keras.callbacks.History at 0x7f93de833b00>

----------------------------------------------------------------------------------------------------------------------------------------------------------------

### 6) Generating Text

In [0]:
model.save('shakespeare.h5')

##### `Currently our model only expects 128 sequences at a time. We can create a new model that only expects a batch_size=1`

##### `We can create a new model with this batch_size, then load our saved model's weights. Then call .build() on the model`

In [0]:
from tensorflow.keras.models import load_model

In [0]:
model = create_model(vocab_size, embed_dim, rnn_neurons, batch_size=1)

model.load_weights('shakespeare.h5')

model.build(tf.TensorShape([1, None]))      # we will build the model by passing the input shape

In [72]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (1, None, 64)             5312      
_________________________________________________________________
gru_1 (GRU)                  (1, None, 1026)           3361176   
_________________________________________________________________
dense_1 (Dense)              (1, None, 83)             85241     
Total params: 3,451,729
Trainable params: 3,451,729
Non-trainable params: 0
_________________________________________________________________


##### `Notice that it is the same model summary as earlier but instead of 128 as the batch size now it only expects a batch size of 1`

##### Now we can create our own custom function to generate the text

In [0]:
def generate_text(model, start_seed, gen_size=500, temp=1.0) :

  '''
  model : Trained model to generate text

  start_seed : Initial seed text in string form

  gen_size : No.of characters to generate

  temp : hyper-parameter  used to control the randomness of predictions by scaling the logits before applying softmax

  ----------------
  
  logits are the values to be used as inputs to softmax

  sigma ^ -1 (x) is called as logit in statistics, and it stands for the inverse function of logistic sigmoid function

  -----------------

  Basic idea behind this function is to take in some seed text, 

  format it so that it is in the correct shape for our network

  Then loop the sequence as we keep adding our own predicted characters.

  Pretty similar to the work in the RNN time series analysis

  '''

  num_generate = gen_size

  # Vectorizing starting seed text
  input_eval = [char_to_ind[s] for s in start_seed]             # for every character will go ahead and transform it to an index and then we will have them in a list

  # Expand this to match batch format shape
  input_eval = tf.expand_dims(input_eval, 0)

  # Empty List to hold the resulting generated text
  text_generated = []


  # Temperature effects randomness in our resulting text
  # The term is derived from entropy/thermodynamics.
  # The temperature is used to effect probability of next characters.
  # Higher probability == lesss surprising/ more expected
  # Lower temperature == more surprising / less expected

  temperature = temp

  # Here batch size == 1
  model.reset_states()

  for i in range(num_generate):

    # Generate Predictions
    predictions = model(input_eval)

    # Remove the batch shape dimension
    predictions = tf.squeeze(predictions, 0)     # just reverse of expand_dims

    # Use a Categorical distribution to select the next character

    predictions = predictions/temperature

    predicted_id = tf.random.categorical(predictions, num_samples=1)[-1, 0].numpy()

    # Pass the predicted character for the next input
    input_eval = tf.expand_dims([predicted_id], 0)

    # Transform back to character letter
    text_generated.append(ind_to_char[predicted_id])

  return (start_seed + ''.join(text_generated))  

In [74]:
print(generate_text(model, 'flower', gen_size=1000))

flowers on all little
    question to emblace,
    Shall be emplixt's soul! I entreat you
    When you are pantled sleak.  So shape a preparation from our pees
    But keep the same-
    As Saughter! how made it die?  
  MACBETH. O thou thunder after 'tis
    sufficient. Get you Cleford; it shall be sufficaing
    So half me an alisporation. I'll not be gainged,
    And his remembrancer!
  ISABELLA. 'dy, Harry, and for an agumblion calls:
    I am the restor of them up
    That all the Thorning of the dark!
     He's dead! Ye will, not fair you.
  JAQUES. And I for Bondowinkbet
    So worthy of thy outward prepared,
    As all the rest, how thou didst met you so?
  LADY MACBETH. But, madame, beggars!
  'My more divide; that some blood fearful:
    'Tis sin to dream! Look how he loses!
  Corn. For me no kindness. It is not worth the most care to keep
    The shadow of a most. Say what thou hadagon's dake?
DROMIO OF SYRACUSE. Besides, they suck overwore to make or pawn upon 's,
    An En