In [1]:
%matplotlib inline

import utils_ted
from utils_ted import *

Using TensorFlow backend.


In [2]:
batch_size = 64

[Keras 2.0 release notes](https://github.com/fchollet/keras/wiki/Keras-2.0-release-notes)

```
Recurrent layers
    output_dim -> units
    init -> kernel_initializer
    inner_init -> recurrent_initializer
    added argument bias_initializer
    W_regularizer -> kernel_regularizer
    b_regularizer -> bias_regularizer
    added arguments kernel_constraint, recurrent_constraint, bias_constraint
    dropout_W -> dropout
    dropout_U -> recurrent_dropout
    consume_less -> implementation. String values have been replaced with integers: implementation 0 (default), 1 or 2.
    LSTM only: the argument forget_bias_init has been removed. Instead there is a boolean argument unit_forget_bias, defaulting to True.
```

## Setup

We haven't really looked into the detail of how this works yet - so this is provided for self-study for those who are interested. We'll look at it closely next week.

In [3]:
path=get_file('nietzsche.txt', origin="https://s3.amazonaws.com/text-datasets/nietzsche.txt")
#text = open(path, encoding='utf8').read().lower()
text = open(path, encoding='utf8').read()

In [4]:
print('corpus length:', len(text))

corpus length: 600893


In [5]:
!tail {path} -n25

are thinkers who believe in the saints.


144

It stands to reason that this sketch of the saint, made upon the model
of the whole species, can be confronted with many opposing sketches that
would create a more agreeable impression. There are certain exceptions
among the species who distinguish themselves either by especial
gentleness or especial humanity, and perhaps by the strength of their
own personality. Others are in the highest degree fascinating because
certain of their delusions shed a particular glow over their whole
being, as is the case with the founder of christianity who took himself
for the only begotten son of God and hence felt himself sinless; so that
through his imagination--that should not be too harshly judged since the
whole of antiquity swarmed with sons of god--he attained the same goal,
the sense of complete sinlessness, complete irresponsibility, that can
now be attained by every individual through science.--In the same manner
I have viewed the saints of India

In [6]:
chars = sorted(list(set(text)))
vocab_size = len(chars) + 1

In [7]:
print("total chars : %s" % vocab_size)

total chars : 85


In [8]:
chars.insert(0, '/n')

In [9]:
"".join(chars[1:-5])

'\n !"\'(),-.0123456789:;=?ABCDEFGHIJKLMNOPQRSTUVWXYZ[]_abcdefghijklmnopqrstuvwxyz'

In [10]:
char_indices = {c:i for i, c in enumerate(chars)}
indices_char = {i:c for i, c in enumerate(chars)}

In [11]:
text_idxs = [char_indices[c] for c in text]

In [12]:
print(text_idxs[:10])

[40, 42, 29, 30, 25, 27, 29, 1, 1, 1]


In [13]:
''.join(indices_char[idx] for idx in text_idxs[:70])

'PREFACE\n\n\nSUPPOSING that Truth is a woman--what then? Is there not gro'

## 3 char model

### Create inputs

Create a list of every 4th character, starting at the 0th, 1st, 2nd, then 3rd characters

In [14]:
cs = 3
c1_data = [text_idxs[i] for i in range(0, len(text_idxs) - (cs+1), cs)]
c2_data = [text_idxs[i+1] for i in range(0, len(text_idxs) - (cs+1), cs)]
c3_data = [text_idxs[i+2] for i in range(0, len(text_idxs) - (cs+1), cs)]
c4_data = [text_idxs[i+3] for i in range(0, len(text_idxs) - (cs+1), cs)]

Our inputs

In [15]:
x1 = np.array(c1_data[:-2])
x2 = np.array(c2_data[:-2])
x3 = np.array(c3_data[:-2])

Our output

In [16]:
y = np.array(c4_data[:-2])

The first 4 inputs and outputs

In [17]:
x1[:4], x2[:4], x3[:4], y[:4]

(array([40, 30, 29,  1]),
 array([42, 25,  1, 43]),
 array([29, 27,  1, 45]),
 array([30, 29,  1, 40]))

In [18]:
x1.shape, y.shape

((200295,), (200295,))

The number of latent factors to create (i.e. the size of the embedding matrix)

In [19]:
n_fac = 42

Create inputs and embedding outputs for each of our 3 character inputs

In [20]:
def embedding_input(name, n_in, n_out):
    inp = Input(shape=(1,), dtype='int64', name=name)
    emb = Embedding(n_in, n_out, input_length=1)(inp)
    return inp, Flatten()(emb)

In [21]:
c1_in, c1_emb = embedding_input('c1', vocab_size, n_fac)
c2_in, c2_emb = embedding_input('c2', vocab_size, n_fac)
c3_in, c3_emb = embedding_input('c3', vocab_size, n_fac)

### Create and train model

Pick a size for our hidden state

In [22]:
n_hidden = 256

This is the 'green arrow' from our diagram - the layer operation from input to hidden.

In [23]:
dense_in = Dense(n_hidden, activation='relu')

Our first hidden activation is simply this function applied to the result of the embedding of the first character.

In [24]:
c1_dense_in = dense_in(c1_emb)

This is the 'orange arrow' from our diagram - the layer operation from hidden to hidden.

In [25]:
dense_hidden = Dense(n_hidden, activation='tanh')

Our second and third hidden activations sum up the previous hidden state (after applying dense_hidden) to the new input state.

In [26]:
from keras.layers import Add

In [27]:
c2_dense_in = dense_in(c2_emb)
hidden_2 = dense_hidden(c1_dense_in)
c2_hidden = Add()([c2_dense_in, hidden_2])

In [28]:
c3_dense_in = dense_in(c3_emb)
hidden_3 = dense_hidden(c2_hidden)
c3_hidden = Add()([c3_dense_in, hidden_3])

This is the 'blue arrow' from our diagram - the layer operation from hidden to output.

In [29]:
dense_out = Dense(vocab_size, activation='softmax')

The third hidden state is the input to our output layer.

In [30]:
c4_out = dense_out(c3_hidden)

In [31]:
model = Model([c1_in, c2_in, c3_in], c4_out)

In [32]:
model.compile(Adam(), loss='sparse_categorical_crossentropy')

In [33]:
model.optimizer.lr = 1e-6

In [34]:
model.fit([x1, x2, x3], y, batch_size=batch_size, epochs=4, verbose=2)

Epoch 1/4
17s - loss: 4.4047
Epoch 2/4
16s - loss: 4.2783
Epoch 3/4
17s - loss: 4.0158
Epoch 4/4
19s - loss: 3.6262


<keras.callbacks.History at 0x25740836d68>

In [35]:
model.optimizer.lr = 0.01

In [36]:
model.fit([x1, x2, x3], y, batch_size=batch_size, epochs=4, verbose=2)

Epoch 1/4
 - 16s - loss: 3.3020
Epoch 2/4
 - 16s - loss: 3.1850
Epoch 3/4
 - 16s - loss: 3.1410
Epoch 4/4
 - 16s - loss: 3.1195


<keras.callbacks.History at 0x7fc834c9b9b0>

In [37]:
model.optimizer.lr = 1e-6

In [38]:
model.fit([x1, x2, x3], y, batch_size=batch_size, epochs=4, verbose=2)

Epoch 1/4
 - 16s - loss: 3.1060
Epoch 2/4
 - 16s - loss: 3.0961
Epoch 3/4
 - 16s - loss: 3.0880
Epoch 4/4
 - 16s - loss: 3.0807


<keras.callbacks.History at 0x7fc835041ba8>

In [39]:
model.optimizer.lr = 0.01

In [40]:
model.fit([x1, x2, x3], y, batch_size=batch_size, epochs=4, verbose=2)

Epoch 1/4
 - 16s - loss: 3.0740
Epoch 2/4
 - 16s - loss: 3.0674
Epoch 3/4
 - 16s - loss: 3.0608
Epoch 4/4
 - 16s - loss: 3.0542


<keras.callbacks.History at 0x7fc835190d30>

### Test model

In [35]:
def get_next(inp):
    idxs = [char_indices[c] for c in inp]
    #arrs = [np.array(i).reshape(1,) for i in idxs] # to fit in the Input() input shape
    arrs = [np.array(i)[np.newaxis] for i in idxs] 
    preds = model.predict(arrs)
    preds_idxs = np.argmax(preds)
    return chars[preds_idxs]

In [36]:
get_next('zzz')

' '

In [37]:
get_next(' th')

' '

In [38]:
get_next(' an')

' '

## Our first RNN!

### Create inputs

This is the size of our unrolled RNN.

In [63]:
cs = 8 

For each of 0 through 7, create a list of every 8th character with that starting point. These will be the 8 inputs to out model.

In [64]:
c_in_data = [[text_idxs[i+n] for i in range(0, len(text_idxs) - (cs+1), cs)] for n in range(cs)]

Then create a list of the next character in each of these series. This will be the labels for our model.

In [65]:
c_out_data = [text_idxs[i+cs] for i in range(0, len(text_idxs) - (cs+1), cs)]

In [69]:
#xs = [np.array(c[:-2]) for c in c_in_data]
xs = [np.stack(c[:-2]) for c in c_in_data]

In [70]:
len(xs), xs[0].shape

(8, (75109,))

In [72]:
#y = np.array(c_out_data[:-2]) # y = np.stack(c_out_data[:-2])
y = np.stack(c_out_data[:-2])

So each column below is one series of 8 characters from the text.

In [73]:
[xs[n][:cs] for n in range(cs)]

[array([40,  1, 33,  2, 72, 67, 73,  2]),
 array([42,  1, 38, 44,  2,  9, 61, 73]),
 array([29, 43, 31, 71, 54,  9, 58, 61]),
 array([30, 45,  2, 74,  2, 76, 67, 58]),
 array([25, 40, 73, 73, 76, 61, 24, 71]),
 array([27, 40, 61, 61, 68, 54,  2, 58]),
 array([29, 39, 54,  2, 66, 73, 33,  2]),
 array([ 1, 43, 73, 62, 54,  2, 72, 67])]

...and this is the next character after each sequence.

In [74]:
y[:cs]

array([ 1, 33,  2, 72, 67, 73,  2, 68])

In [75]:
n_fac = 42

### Create and train model

In [48]:
def embedding_input(name, n_in, n_out):
    inp = Input((1, ), dtype='int64', name=name+'_in')
    emb = Embedding(n_in, n_out, input_length=1, name=name+'_emb')(inp)
    return inp, Flatten()(emb)

In [49]:
c_ins = [embedding_input('c'+str(n), vocab_size, n_fac) for n in range(cs)]

In [50]:
n_hidden = 256

In [51]:
dense_in = Dense(n_hidden, activation='relu')
dense_hidden = Dense(n_hidden, activation='relu', kernel_initializer='identity')
dense_out = Dense(vocab_size, activation='softmax')

The first character of each sequence goes through dense_in(), to create our first hidden activations.

In [52]:
hidden = dense_in(c_ins[0][1])  # c_ins[0][1]: c0_emb

Then for each successive layer we combine the output of dense_in() on the next character with the output of dense_hidden() on the current hidden state, to create the new hidden state.

In [53]:
for i in range(1, cs):
    c_dense_in = dense_in(c_ins[i][1]) # c_ins[i][1]: c1_emb, c2_emb, ... c7_emb
    hidden = dense_hidden(hidden)
    hidden = Add()([c_dense_in, hidden])

Putting the final hidden state through dense_out() gives us our output.

In [54]:
c_out = dense_out(hidden)

So now we can create our model.

In [55]:
model = Model([c[0] for c in c_ins], c_out)

In [56]:
model.compile(optimizer=Adam(), loss='sparse_categorical_crossentropy')

In [67]:
model.fit(xs, y, batch_size=batch_size, epochs=12, verbose=2)

Epoch 1/12
 - 11s - loss: 2.5412
Epoch 2/12
 - 11s - loss: 2.2509
Epoch 3/12
 - 11s - loss: 2.1431
Epoch 4/12
 - 11s - loss: 2.0710
Epoch 5/12
 - 11s - loss: 2.0141
Epoch 6/12
 - 11s - loss: 1.9686
Epoch 7/12
 - 11s - loss: 1.9270
Epoch 8/12
 - 11s - loss: 1.8907
Epoch 9/12
 - 11s - loss: 1.8575
Epoch 10/12
 - 11s - loss: 1.8273
Epoch 11/12
 - 11s - loss: 1.7994
Epoch 12/12
 - 11s - loss: 1.7733


<keras.callbacks.History at 0x7fc802c4bfd0>

### Test model

In [68]:
def get_next(inp):
    idxs = [char_indices[c] for c in inp]
    #arrs = [np.array(i).reshape(1,) for i in idxs] # to fit in the Input() input shape
    arrs = [np.array(i)[np.newaxis] for i in idxs] 
    preds = model.predict(arrs)
    preds_idxs = np.argmax(preds)
    return chars[preds_idxs]

In [69]:
get_next('for thos')

'e'

In [70]:
get_next('part of ')

't'

In [71]:
get_next('queens a')

't'

## Our first RNN with keras!

In [123]:
n_hidden, n_fac, cs, vocab_size = (256, 42, 8, 85)

This is nearly exactly equivalent to the RNN we built ourselves in the previous section.

In [124]:
from keras.layers import SimpleRNN

In [125]:
model = Sequential([
    Embedding(vocab_size, n_fac, input_length=cs),
    SimpleRNN(n_hidden, activation='relu', recurrent_initializer='identity'),
    Dense(vocab_size, activation='softmax')
])

In [126]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_8 (Embedding)      (None, 8, 42)             3570      
_________________________________________________________________
simple_rnn_5 (SimpleRNN)     (None, 256)               76544     
_________________________________________________________________
dense_11 (Dense)             (None, 85)                21845     
Total params: 101,959
Trainable params: 101,959
Non-trainable params: 0
_________________________________________________________________


In [127]:
model.compile(Adam(), loss='sparse_categorical_crossentropy')

In [128]:
len(xs), xs[0].shape

(8, (75109,))

In [129]:
#np.concatenate(xs,axis=1)
np.stack(xs,axis=1)[:5]
#np.array(xs).shape

array([[40, 42, 29, 30, 25, 27, 29,  1],
       [ 1,  1, 43, 45, 40, 40, 39, 43],
       [33, 38, 31,  2, 73, 61, 54, 73],
       [ 2, 44, 71, 74, 73, 61,  2, 62],
       [72,  2, 54,  2, 76, 68, 66, 54]])

In [130]:
y[:5]

array([ 1, 33,  2, 72, 67])

In [131]:
#model.fit(np.concatenate(xs, axis=1), y, batch_size=batch_size, epochs=8, verbose=2)
model.fit(np.stack(xs,axis=1), y, batch_size=batch_size, epochs=8, verbose=2)

Epoch 1/8
17s - loss: 2.7957
Epoch 2/8
17s - loss: 2.2801
Epoch 3/8
17s - loss: 2.0692
Epoch 4/8
17s - loss: 1.9282
Epoch 5/8
18s - loss: 1.8213
Epoch 6/8
17s - loss: 1.7376
Epoch 7/8
18s - loss: 1.6723
Epoch 8/8
17s - loss: 1.6193


<keras.callbacks.History at 0x257462c4d30>

In [132]:
def get_next_keras(inp):
    idxs = [char_indices[c] for c in inp]
    #arrs = [np.array(i).reshape(1,) for i in idxs] # to fit in the Input() input shape
    arrs = np.array(idxs)[np.newaxis,:] 
    preds = model.predict(arrs)[0]
    preds_idxs = np.argmax(preds)
    return chars[preds_idxs]

In [133]:
get_next_keras('this is ')

't'

In [134]:
get_next_keras('part of ')

't'

In [135]:
get_next_keras('queens a')

'n'

## Returning sequences

### Create inputs

To use a sequence model, we can leave our input unchanged - but we have to change our output to a sequence (of course!)

Here, c_out_dat is identical to c_in_dat, but moved across 1 character.

In [136]:
n_hidden, n_fac, cs, vocab_size = (256, 42, 8, 85)

In [137]:
c_in_data = [[text_idxs[i+n] for i in range(0, len(text_idxs) - (cs+1), cs)] for n in range(cs)]

# c_out_data = [text_idxs[i+cs] for i in range(0, len(text_idxs) - (cs+1), cs)]
c_out_data = [[text_idxs[i+n] for i in range(1, len(text_idxs) - cs, cs)] for n in range(cs)] 

In [138]:
xs = [np.array(c[:-2]) for c in c_in_data]

In [139]:
ys = [np.array(c[:-2]) for c in c_out_data]

Reading down each column shows one set of inputs and outputs.

In [186]:
[xs[n][:10] for n in range(cs)]

[array([40,  1, 33,  2, 72, 67, 73,  2, 68, 57]),
 array([42,  1, 38, 44,  2,  9, 61, 73, 73,  1]),
 array([29, 43, 31, 71, 54,  9, 58, 61,  2, 59]),
 array([30, 45,  2, 74,  2, 76, 67, 58, 60, 68]),
 array([25, 40, 73, 73, 76, 61, 24, 71, 71, 71]),
 array([27, 40, 61, 61, 68, 54,  2, 58, 68,  2]),
 array([29, 39, 54,  2, 66, 73, 33,  2, 74, 72]),
 array([ 1, 43, 73, 62, 54,  2, 72, 67, 67, 74])]

In [189]:
[ys[n][:10] for n in range(cs)]

[array([[42],
        [ 1],
        [38],
        [44],
        [ 2],
        [ 9],
        [61],
        [73],
        [73],
        [ 1]]), array([[29],
        [43],
        [31],
        [71],
        [54],
        [ 9],
        [58],
        [61],
        [ 2],
        [59]]), array([[30],
        [45],
        [ 2],
        [74],
        [ 2],
        [76],
        [67],
        [58],
        [60],
        [68]]), array([[25],
        [40],
        [73],
        [73],
        [76],
        [61],
        [24],
        [71],
        [71],
        [71]]), array([[27],
        [40],
        [61],
        [61],
        [68],
        [54],
        [ 2],
        [58],
        [68],
        [ 2]]), array([[29],
        [39],
        [54],
        [ 2],
        [66],
        [73],
        [33],
        [ 2],
        [74],
        [72]]), array([[ 1],
        [43],
        [73],
        [62],
        [54],
        [ 2],
        [72],
        [67],
        [67],
        [74]]), array([[ 1],

### Create and train model

In [142]:
def embedding_input(name, n_in, n_out):
    inp = Input((1, ), dtype='int64', name=name+'_in')
    emb = Embedding(n_in, n_out, input_length=1, name=name+'_emb')(inp)
    return inp, Flatten()(emb)

In [143]:
c_ins = [embedding_input('c'+str(n), vocab_size, n_fac) for n in range(cs)]

In [144]:
dense_in = Dense(n_hidden, activation='relu')
dense_hidden = Dense(n_hidden, activation='relu', kernel_initializer='identity')
dense_out = Dense(vocab_size, activation='softmax', name='output')

We're going to pass a vector of all zeros as our starting point - here's our input layers for that:

In [145]:
inp1 = Input(shape=(n_fac,), name='zeros')
hidden = dense_in(inp1)

In [146]:
outs = []

for i in range(cs):
    c_dense = dense_in(c_ins[i][1])
    hidden = dense_hidden(hidden)
    hidden = Add()([c_dense, hidden])
    # every layer now has an output
    outs.append(dense_out(hidden))

In [147]:
model = Model([inp1] + [c[0] for c in c_ins], outs)
model.compile(Adam(), loss='sparse_categorical_crossentropy')

In [148]:
#zeros = np.tile(np.zeros(n_fac), (len(xs[0]), 1))
zeros = np.zeros((len(xs[0]), n_fac))
zeros.shape

(75109, 42)

In [149]:
len(xs), xs[0].shape

(8, (75109,))

In [150]:
model.fit([zeros]+xs, ys, batch_size=batch_size, epochs=12, verbose=2)

Epoch 1/12
21s - loss: 20.1558 - output_loss_1: 2.6977 - output_loss_2: 2.5694 - output_loss_3: 2.5135 - output_loss_4: 2.4893 - output_loss_5: 2.4710 - output_loss_6: 2.4682 - output_loss_7: 2.4771 - output_loss_8: 2.4696
Epoch 2/12
20s - loss: 17.8922 - output_loss_1: 2.5159 - output_loss_2: 2.3579 - output_loss_3: 2.2314 - output_loss_4: 2.1797 - output_loss_5: 2.1543 - output_loss_6: 2.1521 - output_loss_7: 2.1619 - output_loss_8: 2.1389
Epoch 3/12
20s - loss: 17.2951 - output_loss_1: 2.4995 - output_loss_2: 2.3362 - output_loss_3: 2.1723 - output_loss_4: 2.0943 - output_loss_5: 2.0552 - output_loss_6: 2.0496 - output_loss_7: 2.0563 - output_loss_8: 2.0317
Epoch 4/12
21s - loss: 16.9227 - output_loss_1: 2.4912 - output_loss_2: 2.3252 - output_loss_3: 2.1423 - output_loss_4: 2.0420 - output_loss_5: 1.9941 - output_loss_6: 1.9806 - output_loss_7: 1.9852 - output_loss_8: 1.9620
Epoch 5/12
20s - loss: 16.6719 - output_loss_1: 2.4873 - output_loss_2: 2.3203 - output_loss_3: 2.1234 - out

<keras.callbacks.History at 0x25745cd4b38>

### Test model

In [151]:
def get_nexts(inp):
    idxs = [char_indices[c] for c in inp]
    arrs = [np.array(i)[np.newaxis] for i in idxs]
    preds = model.predict([np.zeros(n_fac)[np.newaxis,:]] + arrs)
    print(list(inp))
    return [chars[np.argmax(p)] for p in preds]

In [152]:
get_nexts(' this is')

[' ', 't', 'h', 'i', 's', ' ', 'i', 's']


['a', 'h', 'e', 't', ' ', 'c', 'n', ' ']

In [153]:
get_nexts(' part of')

[' ', 'p', 'a', 'r', 't', ' ', 'o', 'f']


['a', 'o', 'r', 'e', 'i', 'o', 'f', ' ']

### Sequence model with keras

In [154]:
n_hidden, n_fac, cs, vocab_size

(256, 42, 8, 85)

To convert our previous keras model into a sequence model, simply add the 'return_sequences=True' parameter, and add TimeDistributed() around our dense layer.

In [155]:
model = Sequential([
    Embedding(vocab_size, n_fac, input_length=cs),
    SimpleRNN(n_hidden, return_sequences=True, activation='relu', recurrent_initializer='identity'),
    TimeDistributed(Dense(vocab_size, activation='softmax'))
])

In [156]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_9 (Embedding)      (None, 8, 42)             3570      
_________________________________________________________________
simple_rnn_6 (SimpleRNN)     (None, 8, 256)            76544     
_________________________________________________________________
time_distributed_1 (TimeDist (None, 8, 85)             21845     
Total params: 101,959
Trainable params: 101,959
Non-trainable params: 0
_________________________________________________________________


In [157]:
model.compile(Adam(), loss='sparse_categorical_crossentropy')

In [158]:
xs[0].shape, np.squeeze(xs).shape

((75109,), (8, 75109))

In [159]:
x_rnn = np.stack(np.squeeze(xs), axis=1)
y_rnn = np.atleast_3d(np.stack(np.squeeze(ys), axis=1))

In [160]:
x_rnn.shape, y_rnn.shape

((75109, 8), (75109, 8, 1))

In [161]:
model.fit(x_rnn, y_rnn, batch_size=batch_size, epochs=8, verbose=2)

Epoch 1/8
25s - loss: 2.4291
Epoch 2/8
25s - loss: 2.0027
Epoch 3/8
27s - loss: 1.8874
Epoch 4/8
26s - loss: 1.8264
Epoch 5/8
26s - loss: 1.7872
Epoch 6/8
26s - loss: 1.7592
Epoch 7/8
26s - loss: 1.7394
Epoch 8/8
26s - loss: 1.7234


<keras.callbacks.History at 0x2574d66cef0>

In [162]:
def get_nexts_keras(inp):
    idxs = [char_indices[c] for c in inp]
    #arrs = [np.array(i).reshape(1,) for i in idxs] # to fit in the Input() input shape
    arrs = np.array(idxs)[np.newaxis,:] 
    preds = model.predict(arrs)[0]
    print(list(inp))
    return [chars[np.argmax(p)] for p in preds]

In [163]:
get_nexts_keras(' this is')

[' ', 't', 'h', 'i', 's', ' ', 'i', 's']


['t', 'h', 'e', 'n', ' ', 's', 'n', ' ']

## Stateful model with keras

In [165]:
n_hidden, n_fac, cs, vocab_size = (256, 42, 8, 85)

**A stateful model** is easy to create (just **add "stateful=True"**) but **harder to train**. We had to add batchnorm and use LSTM to get reasonable results.

When using stateful in keras, you have to also **add 'batch_input_shape' to the first layer, and fix the batch size there**.

In [166]:
model = Sequential([
    Embedding(vocab_size, n_fac, input_length=cs, batch_input_shape=(batch_size, cs)),
    BatchNormalization(),
    LSTM(n_hidden, return_sequences=True, stateful=True),
    TimeDistributed(Dense(vocab_size, activation='softmax')) 
])

In [167]:
model.compile(Adam(), loss='sparse_categorical_crossentropy')

Since we're using a fixed batch shape, we have to ensure our inputs and outputs are a even multiple of the batch size.

In [168]:
mx = len(x_rnn) // batch_size * batch_size
my = len(y_rnn) // batch_size * batch_size
assert(mx == my)

In [169]:
model.fit(x_rnn[:mx], y_rnn[:my], epochs=4, batch_size=batch_size, verbose=2)

Epoch 1/4
92s - loss: 2.2407
Epoch 2/4
94s - loss: 1.9824
Epoch 3/4
93s - loss: 1.9159
Epoch 4/4
91s - loss: 1.8806


<keras.callbacks.History at 0x2574d6851d0>

In [170]:
model.optimizer.lr = 1e-4

In [171]:
model.fit(x_rnn[:mx], y_rnn[:my], epochs=4, batch_size=batch_size, verbose=2)

Epoch 1/4
90s - loss: 1.8574
Epoch 2/4
91s - loss: 1.8405
Epoch 3/4
85s - loss: 1.8277
Epoch 4/4
88s - loss: 1.8177


<keras.callbacks.History at 0x2574d685898>

In [172]:
model.fit(x_rnn[:mx], y_rnn[:my], epochs=4, batch_size=batch_size, verbose=2)

Epoch 1/4
86s - loss: 1.8078
Epoch 2/4
88s - loss: 1.8014
Epoch 3/4
92s - loss: 1.7958
Epoch 4/4
91s - loss: 1.7904


<keras.callbacks.History at 0x2574d685550>

### One-hot sequence model with keras

This is the keras version of the theano model that we're about to create.

In [177]:
model = Sequential([
    SimpleRNN(n_hidden, return_sequences=True, input_shape=(cs, vocab_size), 
              activation='relu', recurrent_initializer='identity'),
    TimeDistributed(Dense(vocab_size, activation='softmax'))
])

In [178]:
model.compile(Adam(), loss='categorical_crossentropy')

In [196]:
oh_xs = [to_categorical(o, vocab_size) for o in xs]
oh_x_rnn = np.stack(oh_xs, axis=1)

oh_ys = [to_categorical(o, vocab_size) for o in ys]
oh_y_rnn = np.stack(oh_ys, axis=1)

oh_x_rnn.shape, oh_y_rnn.shape

((75109, 8, 85), (75109, 8, 85))

In [197]:
model.fit(oh_x_rnn, oh_y_rnn, batch_size=batch_size, epochs=2)

Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x257551a1198>

In [198]:
model.fit(oh_x_rnn, oh_y_rnn, batch_size=batch_size, epochs=6)

Epoch 1/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6


<keras.callbacks.History at 0x257551a1eb8>

In [227]:
def get_nexts_oh(inp):
    print(list(inp))
    idxs = [char_indices[c] for c in inp]
    oh_idxs = to_categorical(idxs, vocab_size)
    pred = np.squeeze(model.predict(oh_idxs[np.newaxis,:]))
    return [indices_char[np.argmax(p)] for p in pred]

In [228]:
get_nexts_oh(' this is')

[' ', 't', 'h', 'i', 's', ' ', 'i', 's']


['t', 'h', 'e', 'n', ' ', 's', 's', ' ']

## Keras GRU

Identical to the last keras rnn, but a GRU!

In [230]:
from keras.layers import GRU

In [231]:
model = Sequential([
    #GRU(n_hidden, return_sequences=True, input_shape=(cs, vocab_size), 
    #    activation='relu', recurrent_initiator='identity'),
    GRU(n_hidden, return_sequences=True, input_shape=(cs, vocab_size)),
    TimeDistributed(Dense(vocab_size, activation='softmax'))
])

In [232]:
model.compile(Adam(), loss='categorical_crossentropy')

In [233]:
model.fit(oh_x_rnn, oh_y_rnn, batch_size=batch_size, epochs=2)

Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x257467c3eb8>

In [234]:
model.fit(oh_x_rnn, oh_y_rnn, batch_size=batch_size, epochs=6)

Epoch 1/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6


<keras.callbacks.History at 0x25730d68828>

In [235]:
get_nexts_oh(' this is')

[' ', 't', 'h', 'i', 's', ' ', 'i', 's']


['t', 'h', 'e', 's', ' ', 'd', 'n', ' ']

In [236]:
get_nexts_oh('are you ')

['a', 'r', 'e', ' ', 'y', 'o', 'u', ' ']


['n', 'e', ' ', 't', 'o', 'u', 'n', 'a']