Problems creating new datafiles #28

ftamburin · 2015-11-10T12:33:35Z

I have just cloned the pdnn package, verified that mnist/mnist_rbm examples work and I am trying to build some new examples in order to verify the pickle file creation before working on my real data.
First of all I reproduced the example at page
https://www.cs.cmu.edu/~ymiao/pdnntk/data.html
writing the python script that create a sample file:

import cPickle, numpy, gzip
feature = numpy.array([[0.2, 0.3, 0.5, 1.4], [1.3, 2.1, 0.3, 0.1], [0.3, 0.5, 0.5, 1.4]], dtype = 'float32')
label = numpy.array([2, 0, 1])
with gzip.open('filename.pkl.gz', 'wb') as f:
cPickle.dump((feature, label), f)

The creation process was fine, but, when I tried to run a simple DNN training using the script

!/bin/bash

two variables you need to set

pdnndir=/home/guest-fac/tamburin/pdnn # pointer to PDNN
device=cpu # the device to be used. set it to "cpu" if you don't have GPUs

export environment variables

export PYTHONPATH=$PYTHONPATH:$pdnndir
export THEANO_FLAGS=mode=FAST_RUN,device=$device,floatX=float32

rm *.tmp

TRAIN DNN

python $pdnndir/cmds/run_DNN.py --train-data "filename.pkl.gz" --valid-data "filename.pkl.gz" --nnet-spec "4:5:3" --wdir ./ --param-output-file dnn.mdl --cfg-output-file dnn.cfg

I get the following output:

[2015-11-10 13:20:47.589817] > ... building the model
[2015-11-10 13:20:47.603441] > ... getting the finetuning functions
[2015-11-10 13:20:48.612798] > ... finetuning the model
/usr/lib/python2.7/dist-packages/numpy/core/_methods.py:55: RuntimeWarning: Mean of empty slice.
warnings.warn("Mean of empty slice.", RuntimeWarning)
[2015-11-10 13:20:48.614276] > epoch 1, training error nan (%)
[2015-11-10 13:20:48.615054] > epoch 1, lrate 0.080000, validation error nan (%)
[2015-11-10 13:20:48.619409] > epoch 2, training error nan (%)
[2015-11-10 13:20:48.619491] > epoch 2, lrate 0.080000, validation error nan (%)
[2015-11-10 13:20:48.622980] > epoch 3, training error nan (%)
[2015-11-10 13:20:48.623059] > epoch 3, lrate 0.080000, validation error nan (%)
[2015-11-10 13:20:48.626443] > epoch 4, training error nan (%)

and nothing change forever...
Actually, I got this behavior using a lot of different datasets, but I reproduced it here with this simple example for clarity.
Any idea about the problem?
I got this problem on MacOSX 10.10, python 2.7.10 and on Linux SMP Debian 3.16.7, python 2.7.9, thus it should not be dependent on local python installations.
Any help is more than welcome.
Thanks!
Fabio

intfloat · 2015-12-11T17:03:46Z

This issue is related to this line of code: https://github.com/yajiemiao/pdnn/blob/master/learning/sgd.py#L71.

batch_size = 256, which is much larger than size of training data 3, leads to train_sets.cur_frame_num / batch_size = 0, leads to train_error = [], then leads to numpy.mean([]) emits a warning, as you see.

In one sentence: the boundary condition is not handled correctly.

I fixed this issue in my pull request, only changed several lines of code.

Below is output by running your script after fixing this issue (added one extra option --lrate "C:0.1:10" to stop it from running indefinitely).

[2015-12-12 10:42:00.854358] > ... building the model
[2015-12-12 10:42:00.864003] > ... getting the finetuning functions
[2015-12-12 10:42:02.142837] > ... finetuning the model
[2015-12-12 10:42:02.145008] > epoch 1, training error 66.666667 (%)
[2015-12-12 10:42:02.146348] > epoch 1, lrate 0.100000, validation error 33.333333 (%)
[2015-12-12 10:42:02.148447] > epoch 2, training error 33.333333 (%)
[2015-12-12 10:42:02.148744] > epoch 2, lrate 0.100000, validation error 33.333333 (%)
[2015-12-12 10:42:02.149959] > epoch 3, training error 33.333333 (%)
[2015-12-12 10:42:02.150215] > epoch 3, lrate 0.100000, validation error 33.333333 (%)
[2015-12-12 10:42:02.151403] > epoch 4, training error 33.333333 (%)
[2015-12-12 10:42:02.151596] > epoch 4, lrate 0.100000, validation error 33.333333 (%)
[2015-12-12 10:42:02.152745] > epoch 5, training error 33.333333 (%)
[2015-12-12 10:42:02.152934] > epoch 5, lrate 0.100000, validation error 33.333333 (%)
[2015-12-12 10:42:02.154048] > epoch 6, training error 33.333333 (%)
[2015-12-12 10:42:02.154237] > epoch 6, lrate 0.100000, validation error 33.333333 (%)
[2015-12-12 10:42:02.155377] > epoch 7, training error 33.333333 (%)
[2015-12-12 10:42:02.155566] > epoch 7, lrate 0.100000, validation error 33.333333 (%)
[2015-12-12 10:42:02.156708] > epoch 8, training error 33.333333 (%)
[2015-12-12 10:42:02.156894] > epoch 8, lrate 0.100000, validation error 0.000000 (%)
[2015-12-12 10:42:02.158023] > epoch 9, training error 0.000000 (%)
[2015-12-12 10:42:02.158214] > epoch 9, lrate 0.100000, validation error 0.000000 (%)
[2015-12-12 10:42:02.159442] > epoch 10, training error 0.000000 (%)
[2015-12-12 10:42:02.159636] > epoch 10, lrate 0.100000, validation error 0.000000 (%)
[2015-12-12 10:42:02.161165] > ... the final PDNN model parameter is dnn.mdl
[2015-12-12 10:42:02.161569] > ... the final PDNN model config is dnn.cfg

Hope it helps.

This reverts commit a664372.

intfloat mentioned this issue Dec 11, 2015

Fix issue #28 and #31 #32

Closed

intfloat added a commit to intfloat/pdnn that referenced this issue Dec 12, 2015

Fix issue yajiemiao#28

e8504f9

intfloat mentioned this issue Dec 12, 2015

Fix issue #28 #33

Closed

intfloat added a commit to intfloat/pdnn that referenced this issue Dec 12, 2015

Revert "Fix issue yajiemiao#28"

8797a76

This reverts commit a664372.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems creating new datafiles #28

Problems creating new datafiles #28

ftamburin commented Nov 10, 2015

intfloat commented Dec 11, 2015

Problems creating new datafiles #28

Problems creating new datafiles #28

Comments

ftamburin commented Nov 10, 2015

!/bin/bash

two variables you need to set

export environment variables

TRAIN DNN

intfloat commented Dec 11, 2015