In [23]:
from theano.sandbox import cuda
cuda.use('gpu2')

 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29



In [2]:
%matplotlib inline
import utils;
from utils import *
from keras.layers import TimeDistributed, Activation
from numpy.random import choice

Using Theano backend.


## Setup

In [15]:
path = get_file('nietzsche.txt', origin="https://s3.amazonaws.com/text-datasets/nietzsche.txt")
text = open(path).read()
print('corpus length:', len(text))

corpus length: 600893


In [16]:
!tail {path} -n10

whole of antiquity swarmed with sons of god--he attained the same goal,
the sense of complete sinlessness, complete irresponsibility, that can
now be attained by every individual through science.--In the same manner
I have viewed the saints of India who occupy an intermediate station
between the christian saints and the Greek philosophers and hence are
not to be regarded as a pure type. Knowledge and science--as far as they
existed--and superiority to the rest of mankind by logical discipline
and training of the intellectual powers were insisted upon by the
Buddhists as essential to sanctity, just as they were denounced by the
christian world as the indications of sinfulness.

In [17]:
chars = sorted(list(set(text)))
vocab_size = len(chars)+1
print('total chars: ', vocab_size)

total chars:  85


Sometimes it's useful to have a zero value in the dataset, e.g. for padding

In [18]:
chars.insert(0, "\0")
''.join(chars[:-6])

'\x00\n !"\'(),-.0123456789:;=?ABCDEFGHIJKLMNOPQRSTUVWXYZ[]_abcdefghijklmnopqrstuvwxy'

In [19]:
char_indices = dict((c, i) for i,c in enumerate(chars))
indices_char = dict((i, c) for i,c in enumerate(chars))
idx = [char_indices[c] for c in text]

In [20]:
idx[:10]

[40, 42, 29, 30, 25, 27, 29, 1, 1, 1]

In [21]:
''.join(indices_char[i] for i in idx[:20])

'PREFACE\n\n\nSUPPOSING '

## 3 Char Model

### Create inputs
Create a list of every 4th character, starting at the 0th 

In [25]:
cs=3
c1_dat = [idx[i] for i in range(0, len(idx)-1-cs, cs)]
c2_dat = [idx[i+1] for i in range(0, len(idx)-1-cs, cs)]
c3_dat = [idx[i+2] for i in range(0, len(idx)-1-cs, cs)]
c4_dat = [idx[i+3] for i in range(0, len(idx)-1-cs, cs)]

In [27]:
c1_dat[:5]
?np.stack

Our inputs

In [28]:
# Return them into numpy arrays
x1 = np.stack(c1_dat[:-2])
x2 = np.stack(c2_dat[:-2])
x3 = np.stack(c3_dat[:-2])

In [36]:
print(x1.shape)
x1[:5]

(200295,)


array([40, 30, 29,  1, 40])

Our output

In [34]:
y = np.stack(c4_dat[:-2])

The number of latent factors to create

In [32]:
n_fac = 42

Create inputs and embedding outputs for each of our 3 character inputs

In [33]:
def embedding_input(name, n_in, n_out):
    """ Create embedding by first create an input layer
    then apply an embedding layer to it
    """
    inp = Input(shape=(1,), dtype='int64', name=name)
    emb = Embedding(n_in, n_out, input_length=1)(inp)
    return inp, Flatten()(emb)

Of course, you can always use one-hot encoding for each character. But with embedding, we are able to capture the similarities between 'A' and 'a' for example. Whereas with one-hot encoding, 'A' and 'a' will be treated no differently with 'A' and 'Z'.

In [37]:
c1_in, c1 = embedding_input('c1', vocab_size, n_fac)
c2_in, c2 = embedding_input('c2', vocab_size, n_fac)
c3_in, c3 = embedding_input('c3', vocab_size, n_fac)

### Create and train model
We choose to have 256 activations

In [38]:
n_hidden = 256

Now create the 'green arrow' from our diagram - the layer operation from input to hidden

In [39]:
dense_in = Dense(n_hidden, activation='relu')

Our first hidden activation is simply this function applied to the result of the embedding of the first character(s)

In [40]:
c1_hidden = dense_in(c1)

Now create the 'orange arrow' from our diagram - the layer operation from hidden to hidden

In [41]:
dense_hidden = Dense(n_hidden, activation='tanh')

Our 2nd and 3rd hidden activations sum up the previous hidden status to the new input state

In [43]:
c2_dense = dense_in(c2)
hidden_2 = dense_hidden(c1_hidden)
c2_hidden = merge([c2_dense, hidden_2])
# merge: by default is a sum

In [44]:
c3_dense = dense_in(c3)
hidden_3 = dense_hidden(c2_hidden)
c3_hidden = merge([c3_dense, hidden_3])

Now create the 'blue arrow' from our diagram - the layer operation from hidden to output

In [45]:
dense_out = Dense(vocab_size, activation='softmax')

In [46]:
c4_out = dense_out(c3_hidden)

Till now, `c4_out` contains all the model process information

In [47]:
c4_out

Softmax.0

In [48]:
model = Model([c1_in, c2_in, c3_in], c4_out)

In [49]:
model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam())

In [50]:
model.optimizer.lr=0.001

In [51]:
model.fit([x1, x2, x3], y, batch_size=64, nb_epoch=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f67f66395f8>

### Test model

In [52]:
def get_next(inp):
    idxs = [char_indices[c] for c in inp]
    arrs = [np.array(i)[np.newaxis] for i in idxs]
    p = model.predict(arrs)
    i = np.argmax(p)
    return chars[i]

In [53]:
get_next('phi')

'l'

In [54]:
get_next(' th')

'e'

In [55]:
get_next(' an')

'd'

## Our First RNN!
Now, we will try to implement the typical structure of RNN - i.e. the rolled one.

That is, we cannot use c1, c2, c.... Instead, we will need an array of inputs all at once.