# Repsly trial data

In [1]:
from repsly_data import RepslyData

repsly_data = RepslyData()
print('Reading data (this might take a minute or so)...', end='')
repsly_data.read_data('data/trial_users_analysis.csv', mode='FC')
print('done.')

Reading data (this might take a minute or so)...done.



Let's see what the data looks like:

In [2]:
read_batch = repsly_data.read_batch(batch_size=20)

X, y = next(read_batch)
print('X{}: {}'.format(list(X.shape), X))
print('y:', y)

X[20, 241]: [[ 153.    1.    1. ...,    0.    0.    0.]
 [ 224.    0.    0. ...,    0.    0.    0.]
 [  54.    0.    0. ...,    0.    0.    0.]
 ..., 
 [ 185.    0.    0. ...,    0.    0.    0.]
 [  55.    0.    0. ...,    0.    0.    0.]
 [ 131.    0.    0. ...,    1.    5.    0.]]
y: [0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0]


As you can see above, each input vector `X` has `1+15*16=241` values, most of which are zeros. The first one is the trial start date as offset from `2016-01-01` and the rest is different usage parameters for the following `16` days. Data provided by batch read is randomly shuffled. Output values are stored in `y` and they represent if the user purchased the Repsly service after the trial or not.

# Training

First, we create a network with two fully connected hidden layers of size 250 and 50% dropout:

In [3]:
from repsly_nn import RepslyFC

repsly_nn = RepslyFC()

arch = [250, 250]
arch_dict = {'keep_prob': 0.5}
learning_rate = 0.001
decay_steps=10
decay_rate=0.99

repsly_nn.create_net(arch, arch_dict, learning_rate, decay_steps, decay_rate)

Then we train it for some number of epochs. (NB. there is s bug with restoring checkpoint)

In [4]:
batch_size = 64
epochs_loops = 1
epochs_at_once = 20
skip_steps=10

for i in range(epochs_loops):
    print('Training for {} epochs...'.format(epochs_at_once))
    repsly_nn.train(data=repsly_data, batch_size=batch_size, epochs=epochs_at_once, skip_steps=skip_steps)

Training for 20 epochs...
Checkpoint directory is: /Users/davor/PycharmProjects/deep_learning/repsly_challenge/checkpoints/RepslyFC-[250,250]/keep_prob-0.5/lr-0.001/dr-0.99/ds-10
Creating tf.train.Saver()...done
self.checkpoint_path: checkpoints/RepslyFC-[250,250]/keep_prob-0.5/lr-0.001/dr-0.99/ds-10
ckpt: None
[00000/0.8 sec]   train/validation loss = 17.18806/9.96532
[00010/1.9 sec]   train/validation loss = 14.93347/9.92426
[00020/3.0 sec]   train/validation loss = 2.39778/2.31172
[00030/3.8 sec]   train/validation loss = 3.73553/2.24757
[00040/4.7 sec]   train/validation loss = 1.40663/2.33551
[00050/5.7 sec]   train/validation loss = 1.50320/0.21643
[00060/6.5 sec]   train/validation loss = 1.21213/2.59093
[00070/7.5 sec]   train/validation loss = 1.80783/2.25574
[00080/8.5 sec]   train/validation loss = 1.62885/1.17066
[00090/9.3 sec]   train/validation loss = 3.21743/0.90493
[00100/10.2 sec]   train/validation loss = 1.14871/1.04004
[00110/11.2 sec]   train/validation loss = 0.5

[01340/124.4 sec]   train/validation loss = 0.13067/0.29594
[01350/125.4 sec]   train/validation loss = 0.30450/0.19573
[01360/126.2 sec]   train/validation loss = 0.28386/0.29211
[01370/127.2 sec]   train/validation loss = 0.30634/0.72007
[01380/128.2 sec]   train/validation loss = 0.22192/0.28948
[01390/129.0 sec]   train/validation loss = 0.19389/0.22231
[01400/130.0 sec]   train/validation loss = 0.16087/0.28322
[01410/131.0 sec]   train/validation loss = 0.19974/0.15466
[01420/131.9 sec]   train/validation loss = 0.18367/0.28817
[01430/132.8 sec]   train/validation loss = 0.29874/0.18214
[01440/133.6 sec]   train/validation loss = 0.20993/0.28749
[01450/134.6 sec]   train/validation loss = 0.29569/0.75456
[01460/135.4 sec]   train/validation loss = 0.21969/0.28833
[01470/136.3 sec]   train/validation loss = 0.23392/0.22658
[01480/137.2 sec]   train/validation loss = 0.17178/0.26609
[01490/138.2 sec]   train/validation loss = 0.17992/0.16214
[01500/139.0 sec]   train/validation los