### What am i?

This is just a reference notebok showcasing how to write custom RNN steps in lasagne.
It follows `char_rnn.ipynb` right until the network composition phase.

You are invited to skip to that phase and read through what's defined there.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Generate names
* Struggle to find a name for the variable? Let's see how you'll come up with a name for your son/daughter. Surely no human has expertise over what is a good child name, so let us train NN instead.
* Dataset contains ~8k human names from different cultures[in latin transcript]
* Objective (toy problem): learn a generative model over names.

In [2]:
start_token = " "

with open("names") as f:
    names = f.read()[:-1].split('\n')
    names = [start_token+name for name in names]
    

In [3]:
print ('n samples = ',len(names))
for x in names[::1000]:
    print( x)

('n samples = ', 7944)
 Abagael
 Claresta
 Glory
 Liliane
 Prissie
 Geeta
 Giovanne
 Piggy


# Text processing

In [4]:
tokens = list(set(''.join(names)))

print ('n_tokens = ',len(tokens))


('n_tokens = ', 55)


In [5]:
#!token_to_id = <dictionary of symbol -> its identifier (index in tokens list)>
token_to_id = {t:i for i,t in enumerate(tokens) }

#!id_to_token = < dictionary of symbol identifier -> symbol itself>
id_to_token = {i:t for i,t in enumerate(tokens)}

### Cast everything from symbols into identifiers

In [6]:
names_ix = list(map(lambda name: list(map(token_to_id.get,name)),names))

MAX_LEN = 16
#crop long names and pad short ones
for i in range(len(names_ix)):
    names_ix[i] = names_ix[i][:MAX_LEN] #crop too long
    
    if len(names_ix[i]) < MAX_LEN:
        names_ix[i] += [token_to_id[" "]]*(MAX_LEN - len(names_ix[i])) #pad too short
        
assert len(set(map(len,names_ix)))==1

names_ix = np.array(names_ix)

# Input variables

In [7]:
import theano
import theano.tensor as T

import lasagne
from lasagne.layers import *

import agentnet
from agentnet import Recurrence
agentnet.config.shut_up() #disable warnings

In [8]:
sequence = T.matrix('token sequence','int64')

inputs = sequence[:,:-1]
targets = sequence[:,1:]


l_input_sequence = InputLayer(shape=(None, None),input_var=inputs)


# Build NN

You'll be building a model that takes token sequence and predicts next tokens at each tick

This is basically equivalent to how rnn step was described in the lecture

In [9]:
from agentnet.memory import GRUCell,LSTMCell,RNNCell
from agentnet.resolver import ProbabilisticResolver

###One step of rnn
class step:
    
    #inputs
    inp = InputLayer((None,),name='current character')
    h_prev = InputLayer((None,64),name='previous rnn state')
    
    #recurrent part
    emb = EmbeddingLayer(inp,len(tokens),16)
    
    h_new = DenseLayer(concat([h_prev,emb]),64,nonlinearity=T.tanh)
    #same: h_new = RNNCell(h_prev,emb)
    
    next_token_probas = DenseLayer(h_new,len(tokens),nonlinearity=T.nnet.softmax)
    
    #pick next token from predicted probas
    next_token = ProbabilisticResolver(next_token_probas)
    


### Recurrence, explained

You can use `Recurrence` to define a custom recurrent layer for lasagne, defined by a single-step graph.

In the example below, we use define three types of layers:
* __state_variables__: a dict of `{layer: becomes_this_on_next_step}`
* __input_sequences__: a dict of `{layer: iterates_over_axis1_of_this_layer`}
* __tracked_outputs__: a list of layers you want to access layer (i.e. outputs)

In [10]:
training_loop = Recurrence(
    state_variables={step.h_new:step.h_prev},    # ~ h_new becomes h_prev on next tick
    input_sequences={step.inp:l_input_sequence}, # ~ inp iterates over l_input_sequence axis 1
    tracked_outputs=[step.next_token_probas,],   # ~ i want to access sequence of next_token_probas later
    
    unroll_scan=False,                           # same as in lasagne. if True, compiles longer, but runs faster
)


Note that the recurrence defined above won't compute `next_token` or compute any layer not required to compute `tracked_outputs` or `state_variables`.

You can also use three other types (not covered here):
* __input_nonsequences__: a dict of `{layer: is_equal_to_this_layer_on_every_tick}`
* __state_init__: a dict of `{state_variables key: is_equal_to_this_before_first_tick}`, defaults to zeros
* __mask_input__: same as mask_input in lasagne recurrent layers. If equal to 0, skips this turn.

You can use Recurrence as any other lasagne layer. Here's an example:

In [11]:
# Model weights
weights = lasagne.layers.get_all_params(training_loop,trainable=True)
print (weights)

[W, W, b, W, b]


You can get a sequence of outputs or state values from recurrence using a simple syntax below.

__l_probs_seq__ is a Lasagne layer that can be fed into another layer or used as network output.

In [12]:
#layer with "next token probas" at each tick, shape = [batch,time,n_tokens]
l_probs_seq = training_loop[step.next_token_probas]

assert isinstance(l_probs_seq,Layer)

#symbolic output
predicted_probabilities = lasagne.layers.get_output(l_probs_seq)

#If you use dropout do not forget to create deterministic version for evaluation

Similarly, one can request `training_loop[step.h_new]` to get RNN states on each tick.

However, if you request a tensor this way, make sure it's either in `state_variables` or `tracked_outputs`.

For the sake of this demo, we proceed by defining loss function and updates using lasagne builtins.

In [13]:

#<Loss function - a simple categorical crossentropy will do, maybe add some regularizer>
loss = lasagne.objectives.categorical_crossentropy(predicted_probabilities.reshape((-1,len(tokens))),
                                                   targets.reshape((-1,))
                                                  ).mean()

updates = lasagne.updates.adam(loss,weights)

### Compiling

Again, we compile a training function in the same fashion as in [lasagne tutorial](https://github.com/Lasagne/Lasagne/blob/master/examples/mnist.py).

The only difference here is that we call `training_loop.get_automatic_updates()`. 

We need them because that's how theano handles randomness (at least <=v0.9.0). Automatic updates are updates for random states and default updates [(example)](https://github.com/Lasagne/Lasagne/blob/b0940946f9c48aa3a7aaaf0df2aafe77cd17af9a/lasagne/layers/normalization.py#L295-L299).

With default RNN implementation, you can drop this line since __we don't require any auto-updates yet__, but if you choose to add dropout or noize, you will require the automatic updates.

In [14]:

#training
train_step = theano.function([sequence], loss,
                             updates=updates+training_loop.get_automatic_updates())


# Generative mode

Here we re-wire the recurrent network so that it's output is fed back to it's input. This is useful for generating sequences.

Unlike previous recurrence,
* there's a second recurrent state: `next_token` is fed back into `inp`
* there's no `input_sequences` this time
* we have to explicitly provide n_steps and batch size

In [15]:
n_steps = T.scalar(dtype='int32')
feedback_loop = Recurrence(
    state_variables={step.h_new:step.h_prev,
                     step.next_token:step.inp},
    tracked_outputs=[step.next_token_probas,step.next_token],
    batch_size=theano.shared(1),
    n_steps=n_steps,
    unroll_scan=False,
)


We proceed by compiling a function that generates a sequence of indices.

This time we __require automatic updates__ because the recurrence involves random sample from token probabilities.

In [16]:
generated_tokens = get_output(feedback_loop[step.next_token])

generate_sample = theano.function([n_steps],generated_tokens,
                                  updates=feedback_loop.get_automatic_updates())

In [17]:
def generate_string(length=MAX_LEN):
    """generate random sequence up to the given length"""
    output_indices = generate_sample(length)[0]
    return ''.join(tokens[i] for i in output_indices)
    

In [18]:
#test it on a non-trained network
print(generate_string())

lE rNJbmxTEkSaAo


# Model training

Our lil'RNN is trained on random minibatches of data using the training function we defined above.


In [19]:
def sample_batch(data, batch_size):
    rows = data[np.random.randint(0,len(data),size=batch_size)]
    return rows

In [20]:
#total N iterations
n_epochs=10

# how many minibatches are there in the epoch 
batches_per_epoch = 500

for epoch in range(n_epochs):

    avg_cost = 0;
    for _ in range(batches_per_epoch):
        avg_cost += train_step(sample_batch(names_ix,batch_size=10))
        
    print("\n\nEpoch {} average loss = {}".format(epoch, avg_cost / batches_per_epoch))

    print ("Generated names:")
    for i in range(10):
        print(generate_string())




Epoch 0 average loss = 1.45295459339
Generated names
Harye           
Kdt ne          
Muctia          
qariis          
Freme           
Jariele         
Goriele         
Jlntitee        
Rnvgkur         
Ganiend         


Epoch 1 average loss = 1.13703199921
Generated names
Mvaynih         
Jonnli          
Beoeeine        
Sumo            
Aviy            
Wury            
Nrfa            
Marelena        
Nasulo          
Begalarer       


Epoch 2 average loss = 1.09981673085
Generated names
Mell            
Bery            
Mal             
Baa             
byynthoore      
Monta           
Swra            
Ary             
Emfig           
Restiegl        


Epoch 3 average loss = 1.07280006848
Generated names
Mayna           
Kellyd          
Zlaneert        
Leltone         
Midbye          
Erfy            
Jerdi           
AlrPie          
Wweste          
Alhane          


Epoch 4 average loss = 1.05519196807
Generated names
Cipias          
Comy            
Alumeen    

KeyboardInterrupt: 

In [None]:
#random sample outputs
for _ in range(50):
    print(generate_string())

# And now,
* add temperature (shared or input)
* try gru/lstm 
 * mind that __lstm__ has two kinds of memory: cell(c) and output(h)
* add several layers
* try your own dataset of any kind