In [213]:
%matplotlib inline

import utils_ted
from utils_ted import *

In [214]:
batch_size = 64

In [215]:
from keras.layers import TimeDistributed, Activation
from numpy.random import choice

[Keras 2.0 release notes](https://github.com/fchollet/keras/wiki/Keras-2.0-release-notes)

```
Recurrent layers
    output_dim -> units
    init -> kernel_initializer
    inner_init -> recurrent_initializer
    added argument bias_initializer
    W_regularizer -> kernel_regularizer
    b_regularizer -> bias_regularizer
    added arguments kernel_constraint, recurrent_constraint, bias_constraint
    dropout_W -> dropout
    dropout_U -> recurrent_dropout
    consume_less -> implementation. String values have been replaced with integers: implementation 0 (default), 1 or 2.
    LSTM only: the argument forget_bias_init has been removed. Instead there is a boolean argument unit_forget_bias, defaulting to True.
```

## Setup

We haven't really looked into the detail of how this works yet - so this is provided for self-study for those who are interested. We'll look at it closely next week.

In [216]:
path=get_file('nietzsche.txt', origin="https://s3.amazonaws.com/text-datasets/nietzsche.txt")
text = open(path, encoding='utf8').read().lower()
#text = open(path, encoding='utf8').read()

In [217]:
print('corpus length:', len(text))

corpus length: 600893


In [218]:
!tail {path} -n25

are thinkers who believe in the saints.


144

It stands to reason that this sketch of the saint, made upon the model
of the whole species, can be confronted with many opposing sketches that
would create a more agreeable impression. There are certain exceptions
among the species who distinguish themselves either by especial
gentleness or especial humanity, and perhaps by the strength of their
own personality. Others are in the highest degree fascinating because
certain of their delusions shed a particular glow over their whole
being, as is the case with the founder of christianity who took himself
for the only begotten son of God and hence felt himself sinless; so that
through his imagination--that should not be too harshly judged since the
whole of antiquity swarmed with sons of god--he attained the same goal,
the sense of complete sinlessness, complete irresponsibility, that can
now be attained by every individual through science.--In the same manner
I have viewed t

In [219]:
chars = sorted(list(set(text)))
vocab_size = len(chars) + 1

In [220]:
print("total chars : %s" % vocab_size)

total chars : 58


In [221]:
chars.insert(0, '/n')

In [222]:
"".join(chars[1:-5])

'\n !"\'(),-.0123456789:;=?[]_abcdefghijklmnopqrstuvwxy'

In [223]:
char_indices = {c:i for i, c in enumerate(chars)}
indices_char = {i:c for i, c in enumerate(chars)}

In [224]:
text_idxs = [char_indices[c] for c in text]

In [225]:
print(text_idxs[:10])

[43, 45, 32, 33, 28, 30, 32, 1, 1, 1]


In [226]:
''.join(indices_char[idx] for idx in text_idxs[:70])

'preface\n\n\nsupposing that truth is a woman--what then? is there not gro'

## 3 char model

### Create inputs

Create a list of every 4th character, starting at the 0th, 1st, 2nd, then 3rd characters

In [227]:
cs = 3
c1_data = [text_idxs[i] for i in range(0, len(text_idxs) - (cs+1), cs)]
c2_data = [text_idxs[i+1] for i in range(0, len(text_idxs) - (cs+1), cs)]
c3_data = [text_idxs[i+2] for i in range(0, len(text_idxs) - (cs+1), cs)]
c4_data = [text_idxs[i+3] for i in range(0, len(text_idxs) - (cs+1), cs)]

Our inputs

In [228]:
x1 = np.array(c1_data[:-2])
x2 = np.array(c2_data[:-2])
x3 = np.array(c3_data[:-2])

Our output

In [229]:
y = np.array(c4_data[:-2])

The first 4 inputs and outputs

In [230]:
x1[:4], x2[:4], x3[:4], y[:4]

(array([43, 33, 32,  1]),
 array([45, 28,  1, 46]),
 array([32, 30,  1, 48]),
 array([33, 32,  1, 43]))

In [231]:
x1.shape, y.shape

((200295,), (200295,))

The number of latent factors to create (i.e. the size of the embedding matrix)

In [232]:
n_fac = 42

Create inputs and embedding outputs for each of our 3 character inputs

In [233]:
def embedding_input(name, n_in, n_out):
    inp = Input((1, ), dtype='int64', name=name)
    emb = Embedding(n_in, n_out, input_length=1)(inp)
    return inp, Flatten()(emb)

In [234]:
c1_in, c1 = embedding_input('c1', vocab_size, n_fac)
c2_in, c2 = embedding_input('c2', vocab_size, n_fac)
c3_in, c3 = embedding_input('c3', vocab_size, n_fac)

### Create and train model

Pick a size for our hidden state

In [235]:
n_hidden = 256

This is the 'green arrow' from our diagram - the layer operation from input to hidden.

In [236]:
dense_in = Dense(n_hidden, activation='relu')

Our first hidden activation is simply this function applied to the result of the embedding of the first character.

In [237]:
c1_hidden = dense_in(c1)

This is the 'orange arrow' from our diagram - the layer operation from hidden to hidden.

In [238]:
dense_hidden = Dense(n_hidden, activation='tanh')

Our second and third hidden activations sum up the previous hidden state (after applying dense_hidden) to the new input state.

In [239]:
from keras.layers import Add

In [240]:
c2_dense = dense_in(c2)
hidden_2 = dense_hidden(c1_hidden)
c2_hidden = Add()([c2_dense, hidden_2])

In [241]:
c3_dense = dense_in(c3)
hidden_3 = dense_hidden(c2_hidden)
c3_hidden = Add()([c3_dense, hidden_3])

This is the 'blue arrow' from our diagram - the layer operation from hidden to output.

In [242]:
dense_out = Dense(vocab_size, activation='softmax')

The third hidden state is the input to our output layer.

In [243]:
c4_out = dense_out(c3_hidden)

In [244]:
model = Model([c1_in, c2_in, c3_in], c4_out)

In [245]:
model.compile(Adam(), loss='sparse_categorical_crossentropy')

In [246]:
model.optimizer.lr = 1e-7

In [247]:
model.fit([x1, x2, x3], y, batch_size=batch_size, epochs=4, verbose=2)

Epoch 1/4
 - 17s - loss: 4.0529
Epoch 2/4
 - 16s - loss: 4.0465
Epoch 3/4
 - 16s - loss: 4.0400
Epoch 4/4
 - 16s - loss: 4.0332


<keras.callbacks.History at 0x7f2c74f1db00>

In [248]:
model.optimizer.lr = 0.01

In [249]:
model.fit([x1, x2, x3], y, batch_size=batch_size, epochs=4, verbose=2)

Epoch 1/4
 - 16s - loss: 4.0261
Epoch 2/4
 - 16s - loss: 4.0186
Epoch 3/4
 - 16s - loss: 4.0107
Epoch 4/4
 - 16s - loss: 4.0024


<keras.callbacks.History at 0x7f2c74067978>

In [250]:
model.optimizer.lr = 1e-6

In [251]:
model.fit([x1, x2, x3], y, batch_size=batch_size, epochs=4, verbose=2)

Epoch 1/4
 - 16s - loss: 3.9935
Epoch 2/4
 - 17s - loss: 3.9841
Epoch 3/4
 - 16s - loss: 3.9741
Epoch 4/4
 - 16s - loss: 3.9633


<keras.callbacks.History at 0x7f2c5fb36518>

In [252]:
model.optimizer.lr = 0.01

In [253]:
model.fit([x1, x2, x3], y, batch_size=batch_size, epochs=4, verbose=2)

Epoch 1/4
 - 16s - loss: 3.9519
Epoch 2/4
 - 17s - loss: 3.9397
Epoch 3/4
 - 16s - loss: 3.9268
Epoch 4/4
 - 16s - loss: 3.9129


<keras.callbacks.History at 0x7f2c74e972e8>

### Test model

In [254]:
def get_next(inp):
    idxs = [char_indices[c] for c in inp]
    #arrs = [np.array(i).reshape(1,) for i in idxs] # to fit in the Input() input shape
    arrs = [np.array(i)[np.newaxis] for i in idxs] 
    preds = model.predict(arrs)
    preds_idxs = np.argmax(preds)
    return chars[preds_idxs]

In [255]:
get_next('zzz')

[array([53]), array([53]), array([53])]
2


' '

In [256]:
get_next(' th')

[array([2]), array([47]), array([35])]
2


' '

In [257]:
get_next(' an')

[array([2]), array([28]), array([41])]
2


' '

## Our first RNN!

### Create inputs

This is the size of our unrolled RNN.

In [258]:
cs = 8 

For each of 0 through 7, create a list of every 8th character with that starting point. These will be the 8 inputs to out model.

In [259]:
c_in_data = [[text_idxs[i+n] for i in range(0, len(text_idxs) - (cs+1), cs)] for n in range(cs)]

Then create a list of the next character in each of these series. This will be the labels for our model.

In [260]:
c_out_data = [text_idxs[i+cs] for i in range(0, len(text_idxs) - (cs+1), cs)]

In [261]:
xs = [np.array(c[:-2]) for c in c_in_data]

In [262]:
len(xs), xs[0].shape

(8, (75109,))

In [263]:
y = np.array(c_out_data[:-2])

So each column below is one series of 8 characters from the text.

In [264]:
[xs[n][:cs] for n in range(cs)]

[array([43,  1, 36,  2, 46, 41, 47,  2]),
 array([45,  1, 41, 47,  2,  9, 35, 47]),
 array([32, 46, 34, 45, 28,  9, 32, 35]),
 array([33, 48,  2, 48,  2, 50, 41, 32]),
 array([28, 43, 47, 47, 50, 35, 24, 45]),
 array([30, 43, 35, 35, 42, 28,  2, 32]),
 array([32, 42, 28,  2, 40, 47, 36,  2]),
 array([ 1, 46, 47, 36, 28,  2, 46, 41])]

...and this is the next character after each sequence.

In [265]:
y[:cs]

array([ 1, 36,  2, 46, 41, 47,  2, 42])

In [266]:
n_fac = 42

### Create and train model

In [277]:
def embedding_input(name, n_in, n_out):
    inp = Input((1, ), dtype='int64', name=name+'_in')
    emb = Embedding(n_in, n_out, input_length=1, name=name+'_emb')(inp)
    return inp, Flatten()(emb)

In [278]:
c_ins = [embedding_input('c'+str(n), vocab_size, n_fac) for n in range(cs)]

In [279]:
n_hidden = 256

In [280]:
dense_in = Dense(n_hidden, activation='relu')
dense_hidden = Dense(n_hidden, activation='relu', kernel_initializer='identity')
dense_out = Dense(vocab_size, activation='softmax')

The first character of each sequence goes through dense_in(), to create our first hidden activations.

In [281]:
hidden = dense_in(c_ins[0][1])

Then for each successive layer we combine the output of dense_in() on the next character with the output of dense_hidden() on the current hidden state, to create the new hidden state.

In [282]:
for i in range(1, cs):
    c_dense = dense_in(c_ins[i][1])
    hidden = dense_hidden(hidden)
    hidden = Add()([c_dense, hidden])

Putting the final hidden state through dense_out() gives us our output.

In [283]:
c_out = dense_out(hidden)

So now we can create our model.

In [287]:
model = Model([c[0] for c in c_ins], c_out)

In [288]:
model.compile(optimizer=Adam(), loss='sparse_categorical_crossentropy')

In [289]:
model.fit(xs, y, batch_size=batch_size, epochs=12, verbose=2)

Epoch 1/12
 - 11s - loss: 1.8773
Epoch 2/12
 - 11s - loss: 1.8464
Epoch 3/12
 - 11s - loss: 1.8178
Epoch 4/12
 - 11s - loss: 1.7942
Epoch 5/12
 - 11s - loss: 1.7704
Epoch 6/12
 - 11s - loss: 1.7495
Epoch 7/12
 - 11s - loss: 1.7283
Epoch 8/12
 - 11s - loss: 1.7113
Epoch 9/12
 - 11s - loss: 1.6939
Epoch 10/12
 - 11s - loss: 1.6783
Epoch 11/12
 - 11s - loss: 1.6625
Epoch 12/12
 - 11s - loss: 1.6502


<keras.callbacks.History at 0x7f2c5e566e48>

### Test model

In [290]:
def get_next(inp):
    idxs = [char_indices[c] for c in inp]
    #arrs = [np.array(i).reshape(1,) for i in idxs] # to fit in the Input() input shape
    arrs = [np.array(i)[np.newaxis] for i in idxs] 
    preds = model.predict(arrs)
    preds_idxs = np.argmax(preds)
    return chars[preds_idxs]

In [291]:
get_next('for thos')

'e'

In [292]:
get_next('part of ')

't'

In [293]:
get_next('queens a')

'n'

## Our first RNN with keras!

In [295]:
n_hidden, n_fac, cs, vocab_size = (256, 42, 8, 58)

This is nearly exactly equivalent to the RNN we built ourselves in the previous section.

In [297]:
from keras.layers import SimpleRNN

In [298]:
model = Sequential([
    Embedding(vocab_size, n_fac, input_length=cs),
    SimpleRNN(n_hidden, activation='relu', recurrent_initializer='identity'),
    Dense(vocab_size, activation='softmax')
])

In [300]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_21 (Embedding)     (None, 8, 42)             2436      
_________________________________________________________________
simple_rnn_1 (SimpleRNN)     (None, 256)               76544     
_________________________________________________________________
dense_30 (Dense)             (None, 58)                14906     
Total params: 93,886
Trainable params: 93,886
Non-trainable params: 0
_________________________________________________________________


In [301]:
model.compile(Adam(), loss='sparse_categorical_crossentropy')

In [306]:
model.fit(np.concatenate(xs, axis=1), y, batch_size=batch_size, epochs=8, verbose=2)

Epoch 1/8
 - 8s - loss: 2.6815
Epoch 2/8
 - 8s - loss: 2.2063
Epoch 3/8
 - 8s - loss: 1.9973
Epoch 4/8
 - 8s - loss: 1.8570
Epoch 5/8
 - 8s - loss: 1.7564
Epoch 6/8
 - 8s - loss: 1.6793
Epoch 7/8
 - 8s - loss: 1.6182
Epoch 8/8
 - 8s - loss: 1.5658


<keras.callbacks.History at 0x7f2c5da8c6a0>

In [308]:
def get_next_keras(inp):
    idxs = [char_indices[c] for c in inp]
    #arrs = [np.array(i).reshape(1,) for i in idxs] # to fit in the Input() input shape
    arrs = np.array(idxs)[np.newaxis,:] 
    preds = model.predict(arrs)[0]
    preds_idxs = np.argmax(preds)
    return chars[preds_idxs]

In [309]:
get_next_keras('this is ')

't'

In [310]:
get_next_keras('part of ')

't'

In [311]:
get_next_keras('queens a')

'n'

## Returning sequences

### Create inputs

To use a sequence model, we can leave our input unchanged - but we have to change our output to a sequence (of course!)

Here, c_out_dat is identical to c_in_dat, but moved across 1 character.