### Using LSTM to Count 1's in a sequence


In the following example we explore Long Short Term Memory Network's abilty to count one's in a sequence. I came across this problem in http://monik.in blogpost about RNN. And stretched it a lttle further to see what kind of learning curve we can see for different sequence length.

It's 'Many to One' topology from Unreasonable Effectiveness of Recurrent Neural Networks. The sequence length was kept constant in all the runs. A few other interesting experiments for explorations (planned):

* Counting number of one's for valriable sequence length Using ideas of bucketting
* Using external memory load method https://arxiv.org/abs/1410.5401 (Alex Graves)

In [None]:
import os, time
import tensorflow as tf
import numpy as np
from random import shuffle
from tqdm import tqdm
from collections import namedtuple

In [82]:
# Generating Training Data
seq_length=20
train_input = ['{0:0b}'.format(i).zfill(seq_length) for i in range(2**seq_length)]
shuffle(train_input)
train_input = [list(map(int,i)) for i in train_input]

train_input = np.array(train_input)
train_input = np.expand_dims(train_input, axis=2)

In [83]:
# Generating training output
train_output = np.sum(train_input, axis=1)

# Let's make it one hot :) seq_length+1 bits
train_output=(train_output == np.arange(seq_length+1)).astype(np.int32)

In [84]:
# Train Test Split
NUM_TRAIN= 10000
test_input = train_input[NUM_TRAIN:]
test_output = train_output[NUM_TRAIN:]

train_input = train_input[:NUM_TRAIN]
train_output = train_output[:NUM_TRAIN]

In [85]:
# HyperParameters of the model
# Defining hyperparameter tuple and setting hyper parameters
hparams = namedtuple('hyper_parameters', 
                     'hidden_size, seq_length, learning_rate,'
                     'batch_size,'
                    'num_epochs')


hps = hparams(hidden_size=25,
              seq_length=seq_length,
              learning_rate=1e-3,
              batch_size=1000,
              num_epochs=1200)

In [86]:
# Define the network

tf.reset_default_graph()
data   = tf.placeholder(tf.float32, [None, seq_length, 1])
target = tf.placeholder(tf.float32, [None, seq_length+1])

num_hidden = hps.hidden_size
with tf.name_scope("RNN"):
    cell=tf.contrib.rnn.BasicLSTMCell(num_hidden, state_is_tuple=True)
    val, state = tf.nn.dynamic_rnn(cell, data, dtype=tf.float32)


# Since we are going to take output from the last unrolled state
# we transpose it to swap the batch dimension with 
# the num_unrolling dimension
val = tf.transpose(val, [1,0,2])
last = tf.gather(val, int(val.get_shape()[0])-1)

initializer = tf.contrib.layers.xavier_initializer()
with tf.name_scope("DenseLayer"):
    logits = tf.layers.dense(last,int(target.get_shape()[1]), kernel_initializer=initializer)


    
prediction = tf.nn.softmax(logits)
cross_entropy = -tf.reduce_sum(
    target * tf.log(tf.clip_by_value(prediction, 1e-10, 1.0)))
tf.summary.scalar("Cross_Entropy", cross_entropy)

optimizer = tf.train.AdamOptimizer(hps.learning_rate)
train_op = optimizer.minimize(cross_entropy)
errors = tf.not_equal(tf.argmax(target, 1), tf.argmax(prediction,1))
accuracy = 1 - tf.reduce_mean(tf.cast(errors, tf.float32))
tf.summary.scalar("Training_Accuracy", accuracy)


  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


<tf.Tensor 'Training_Accuracy:0' shape=() dtype=string>

In [87]:
init_op = tf.global_variables_initializer()

# Configuration to ensure that tensorflow flow does NOT reserve all available memory on your GPU unnecessarily
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True
sess=tf.Session(config=config)
sess.run(init_op)

# Creating Merged Summary op
summaries=tf.summary.merge_all()

# Writer object will be passed to summary writers
writer = tf.summary.FileWriter(
            os.path.join('./tf_logs', time.strftime("%Y-%m-%d-%H-%M-%S")))
writer.add_graph(sess.graph)

In [88]:
batch_size=hps.batch_size
n_batches=train_input.shape[0]//batch_size
epoch=hps.num_epochs
for i in tqdm(list(range(epoch)), desc='epoch'):
    ptr=0
    #for j in tqdm(list(range(n_batches)), desc='batches'):
    for j in range(n_batches):
        _, summary = sess.run([train_op, summaries] , feed_dict={
            data:train_input[ptr:ptr+batch_size],
            target: train_output[ptr:ptr+batch_size]
        })
        writer.add_summary(summary)
        ptr+=batch_size
    acc=sess.run(accuracy, feed_dict={
        data: test_input,
        target: test_output
    })
    print('Epoch: {:2d} Test Accuracy: {:3.1f}%'.format(i+1, 100*acc))
sess.close()

Experiments with seq 16, 18 and 20: Any guess which cost profile belongs to which :)

![alt text](crossEntropy1.png "'Loss'")