Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems creating new datafiles #28

Open
ftamburin opened this issue Nov 10, 2015 · 1 comment
Open

Problems creating new datafiles #28

ftamburin opened this issue Nov 10, 2015 · 1 comment

Comments

@ftamburin
Copy link

I have just cloned the pdnn package, verified that mnist/mnist_rbm examples work and I am trying to build some new examples in order to verify the pickle file creation before working on my real data.
First of all I reproduced the example at page
https://www.cs.cmu.edu/~ymiao/pdnntk/data.html
writing the python script that create a sample file:


import cPickle, numpy, gzip
feature = numpy.array([[0.2, 0.3, 0.5, 1.4], [1.3, 2.1, 0.3, 0.1], [0.3, 0.5, 0.5, 1.4]], dtype = 'float32')
label = numpy.array([2, 0, 1])
with gzip.open('filename.pkl.gz', 'wb') as f:
cPickle.dump((feature, label), f)


The creation process was fine, but, when I tried to run a simple DNN training using the script


!/bin/bash

two variables you need to set

pdnndir=/home/guest-fac/tamburin/pdnn # pointer to PDNN
device=cpu # the device to be used. set it to "cpu" if you don't have GPUs

export environment variables

export PYTHONPATH=$PYTHONPATH:$pdnndir
export THEANO_FLAGS=mode=FAST_RUN,device=$device,floatX=float32

rm *.tmp

TRAIN DNN

python $pdnndir/cmds/run_DNN.py --train-data "filename.pkl.gz" --valid-data "filename.pkl.gz" --nnet-spec "4:5:3" --wdir ./ --param-output-file dnn.mdl --cfg-output-file dnn.cfg


I get the following output:

[2015-11-10 13:20:47.589817] > ... building the model
[2015-11-10 13:20:47.603441] > ... getting the finetuning functions
[2015-11-10 13:20:48.612798] > ... finetuning the model
/usr/lib/python2.7/dist-packages/numpy/core/_methods.py:55: RuntimeWarning: Mean of empty slice.
warnings.warn("Mean of empty slice.", RuntimeWarning)
[2015-11-10 13:20:48.614276] > epoch 1, training error nan (%)
[2015-11-10 13:20:48.615054] > epoch 1, lrate 0.080000, validation error nan (%)
[2015-11-10 13:20:48.619409] > epoch 2, training error nan (%)
[2015-11-10 13:20:48.619491] > epoch 2, lrate 0.080000, validation error nan (%)
[2015-11-10 13:20:48.622980] > epoch 3, training error nan (%)
[2015-11-10 13:20:48.623059] > epoch 3, lrate 0.080000, validation error nan (%)
[2015-11-10 13:20:48.626443] > epoch 4, training error nan (%)

and nothing change forever...
Actually, I got this behavior using a lot of different datasets, but I reproduced it here with this simple example for clarity.
Any idea about the problem?
I got this problem on MacOSX 10.10, python 2.7.10 and on Linux SMP Debian 3.16.7, python 2.7.9, thus it should not be dependent on local python installations.
Any help is more than welcome.
Thanks!
Fabio

@intfloat
Copy link

This issue is related to this line of code: https://github.com/yajiemiao/pdnn/blob/master/learning/sgd.py#L71.

batch_size = 256, which is much larger than size of training data 3, leads to train_sets.cur_frame_num / batch_size = 0, leads to train_error = [], then leads to numpy.mean([]) emits a warning, as you see.

In one sentence: the boundary condition is not handled correctly.

I fixed this issue in my pull request, only changed several lines of code.

Below is output by running your script after fixing this issue (added one extra option --lrate "C:0.1:10" to stop it from running indefinitely).

[2015-12-12 10:42:00.854358] > ... building the model
[2015-12-12 10:42:00.864003] > ... getting the finetuning functions
[2015-12-12 10:42:02.142837] > ... finetuning the model
[2015-12-12 10:42:02.145008] > epoch 1, training error 66.666667 (%)
[2015-12-12 10:42:02.146348] > epoch 1, lrate 0.100000, validation error 33.333333 (%)
[2015-12-12 10:42:02.148447] > epoch 2, training error 33.333333 (%)
[2015-12-12 10:42:02.148744] > epoch 2, lrate 0.100000, validation error 33.333333 (%)
[2015-12-12 10:42:02.149959] > epoch 3, training error 33.333333 (%)
[2015-12-12 10:42:02.150215] > epoch 3, lrate 0.100000, validation error 33.333333 (%)
[2015-12-12 10:42:02.151403] > epoch 4, training error 33.333333 (%)
[2015-12-12 10:42:02.151596] > epoch 4, lrate 0.100000, validation error 33.333333 (%)
[2015-12-12 10:42:02.152745] > epoch 5, training error 33.333333 (%)
[2015-12-12 10:42:02.152934] > epoch 5, lrate 0.100000, validation error 33.333333 (%)
[2015-12-12 10:42:02.154048] > epoch 6, training error 33.333333 (%)
[2015-12-12 10:42:02.154237] > epoch 6, lrate 0.100000, validation error 33.333333 (%)
[2015-12-12 10:42:02.155377] > epoch 7, training error 33.333333 (%)
[2015-12-12 10:42:02.155566] > epoch 7, lrate 0.100000, validation error 33.333333 (%)
[2015-12-12 10:42:02.156708] > epoch 8, training error 33.333333 (%)
[2015-12-12 10:42:02.156894] > epoch 8, lrate 0.100000, validation error 0.000000 (%)
[2015-12-12 10:42:02.158023] > epoch 9, training error 0.000000 (%)
[2015-12-12 10:42:02.158214] > epoch 9, lrate 0.100000, validation error 0.000000 (%)
[2015-12-12 10:42:02.159442] > epoch 10, training error 0.000000 (%)
[2015-12-12 10:42:02.159636] > epoch 10, lrate 0.100000, validation error 0.000000 (%)
[2015-12-12 10:42:02.161165] > ... the final PDNN model parameter is dnn.mdl
[2015-12-12 10:42:02.161569] > ... the final PDNN model config is dnn.cfg

Hope it helps.

intfloat added a commit to intfloat/pdnn that referenced this issue Dec 12, 2015
@intfloat intfloat mentioned this issue Dec 12, 2015
intfloat added a commit to intfloat/pdnn that referenced this issue Dec 12, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants