# Repsly trial data

In [1]:
from repsly_data import RepslyData

repsly_data = RepslyData()
print('Reading data (this might take a minute or so)...', end='')
repsly_data.read_data('data/trial_users_analysis.csv', mode='FC')
print('done.')

Reading data (this might take a minute or so)...done.



Let's see what the data looks like:

In [2]:
read_batch = repsly_data.read_batch(batch_size=20)

X, y = next(read_batch)
print('X{}: {}'.format(list(X.shape), X))
print('y:', y)

X[20, 241]: [[ 153.    1.    1. ...,    0.    0.    0.]
 [ 224.    0.    0. ...,    0.    0.    0.]
 [  54.    0.    0. ...,    0.    0.    0.]
 ..., 
 [ 185.    0.    0. ...,    0.    0.    0.]
 [  55.    0.    0. ...,    0.    0.    0.]
 [ 131.    0.    0. ...,    1.    5.    0.]]
y: [0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0]


As you can see above, each input vector `X` has `1+15*16=241` values, most of which are zeros. The first one is the trial start date as offset from `2016-01-01` and the rest is different usage parameters for the following `16` days. Data provided by batch read is randomly shuffled. Output values are stored in `y` and they represent if the user purchased the Repsly service after the trial or not.

# Training

First, we create a network with two fully connected hidden layers of size 250 and 50% dropout:

In [11]:
from repsly_nn import RepslyFC

repsly_nn = RepslyFC()

Then we train it for some number of epochs. (NB. there is s bug with restoring checkpoint)

In [12]:
arch = [250, 250]
keep_probs = [0.5, 0.6, 0.7, 0.8, 0.9]
learning_rate = 0.001
decay_steps=10
decay_rate=0.99

batch_size = 128
epochs_loops = 1
epochs_at_once = 20
skip_steps=10

for keep_prob in keep_probs:
    arch_dict = {'keep_prob': keep_prob}
    
    repsly_nn.create_net(arch, arch_dict, learning_rate, decay_steps, decay_rate)
    for i in range(epochs_loops):
        print('Training for {} epochs...'.format(epochs_at_once))
        repsly_nn.train(data=repsly_data, batch_size=batch_size, epochs=epochs_at_once, skip_steps=skip_steps)

Training for 20 epochs...
Checkpoint directory is: /Users/davor/projects/deep_learning/repsly_challenge/checkpoints/RepslyFC-[250,250]/keep_prob-0.5/lr-0.001/dr-0.99/ds-10
Creating tf.train.Saver()...done
self.checkpoint_path: checkpoints/RepslyFC-[250,250]/keep_prob-0.5/lr-0.001/dr-0.99/ds-10
ckpt: None
[00000/1.0 sec]   train/validation loss = 15.46151/4.83147
[00010/2.8 sec]   train/validation loss = 3.84949/5.72295
[00020/3.9 sec]   train/validation loss = 4.00909/1.41547
[00030/4.8 sec]   train/validation loss = 2.16797/3.04622
[00040/6.0 sec]   train/validation loss = 2.43950/1.30083
[00050/6.9 sec]   train/validation loss = 0.87334/1.12709
[00060/8.0 sec]   train/validation loss = 2.32095/0.66576
[00070/8.9 sec]   train/validation loss = 1.20375/0.88096
[00080/10.0 sec]   train/validation loss = 1.34568/1.42624
[00090/11.1 sec]   train/validation loss = 0.79435/0.41533
[00100/12.3 sec]   train/validation loss = 1.35675/0.28633
[00110/13.3 sec]   train/validation loss = 0.81194/0

[00490/53.8 sec]   train/validation loss = 0.14931/0.29438
[00500/54.9 sec]   train/validation loss = 0.29203/0.32558
[00510/55.8 sec]   train/validation loss = 0.26386/0.27849
[00520/56.9 sec]   train/validation loss = 0.27338/0.44598
[00530/57.8 sec]   train/validation loss = 0.18220/0.28040
[00540/58.8 sec]   train/validation loss = 0.23362/0.30003
[00550/59.9 sec]   train/validation loss = 0.20747/0.25962
[00560/60.9 sec]   train/validation loss = 0.29770/0.48444
[00570/62.0 sec]   train/validation loss = 0.15656/0.27869
[00580/62.8 sec]   train/validation loss = 0.28410/0.27251
[00590/63.9 sec]   train/validation loss = 0.24578/0.26532
[00600/64.8 sec]   train/validation loss = 0.26121/0.48008
[00610/65.9 sec]   train/validation loss = 0.18749/0.26599
[00620/66.9 sec]   train/validation loss = 0.23981/0.31227
[00630/68.0 sec]   train/validation loss = 0.20630/0.24752
[00640/69.0 sec]   train/validation loss = 0.21789/0.47910
[00650/70.2 sec]   train/validation loss = 0.22518/0.253

[00180/20.0 sec]   train/validation loss = 0.30650/0.27195
[00190/20.9 sec]   train/validation loss = 0.23183/0.28080
[00200/22.1 sec]   train/validation loss = 0.31852/0.39104
[00210/23.0 sec]   train/validation loss = 0.21127/0.27765
[00220/24.1 sec]   train/validation loss = 0.29865/0.24159
[00230/25.2 sec]   train/validation loss = 0.23052/0.28234
[00240/26.1 sec]   train/validation loss = 0.26897/0.31182
[00250/27.3 sec]   train/validation loss = 0.16432/0.28031
[00260/28.2 sec]   train/validation loss = 0.26964/0.23288
[00270/29.3 sec]   train/validation loss = 0.22110/0.27867
[00280/30.3 sec]   train/validation loss = 0.31870/0.30226
[00290/31.3 sec]   train/validation loss = 0.17293/0.27924
[00300/32.2 sec]   train/validation loss = 0.21102/0.24452
[00310/33.4 sec]   train/validation loss = 0.23126/0.28214
[00320/34.5 sec]   train/validation loss = 0.24199/0.29705
[00330/35.5 sec]   train/validation loss = 0.20847/0.25519
[00340/36.6 sec]   train/validation loss = 0.30885/0.214

[00720/80.4 sec]   train/validation loss = 0.16324/0.90012
[00730/81.5 sec]   train/validation loss = 0.13735/0.18363
[00740/82.7 sec]   train/validation loss = 0.19308/0.27203
[00750/83.6 sec]   train/validation loss = 0.17320/0.22721
[00760/84.9 sec]   train/validation loss = 0.20289/0.98847
[00770/85.9 sec]   train/validation loss = 0.15324/0.20274
[00780/87.1 sec]   train/validation loss = 0.16484/0.27151
[00790/88.1 sec]   train/validation loss = 0.14651/0.23052
