Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The error of training an RNNsearch model #30

Closed
Julisa-test opened this issue May 1, 2018 · 6 comments
Closed

The error of training an RNNsearch model #30

Julisa-test opened this issue May 1, 2018 · 6 comments

Comments

@Julisa-test
Copy link

There is my command as follow . What cause the error and how to solve it ?
(venv-2.7.14) ubuntu@ubuntu:~/python2.7/tensorflow$ python THUMT/thumt/bin/trainer.py --input corpus.tc.32k.de.shuf corpus.tc.32k.en.shuf --vocabulary vocab.32k.de.txt vocab.32k.en.txt --model rnnsearch --validation newstest2014.tc.32k.de --references newstest2014.tc.32k.en --parameters=batch_size=128,device_list=[0],train_steps=200000 INFO:tensorflow:Restoring hyper parameters from /home/ubuntu/python2.7/tensorflow/train/params.json Traceback (most recent call last): File "THUMT/thumt/bin/trainer.py", line 472, in <module> main(parse_args()) File "THUMT/thumt/bin/trainer.py", line 317, in main params = import_params(args.output, args.model, params) File "THUMT/thumt/bin/trainer.py", line 122, in import_params params.parse_json(json_str) File "/home/ubuntu/.pyenv/versions/venv-2.7.14/lib/python2.7/site-packages/tensorflow/contrib/training/python/training/hparam.py", line 587, in parse_json return self.override_from_dict(values_map) File "/home/ubuntu/.pyenv/versions/venv-2.7.14/lib/python2.7/site-packages/tensorflow/contrib/training/python/training/hparam.py", line 539, in override_from_dict self.set_hparam(name, value) File "/home/ubuntu/.pyenv/versions/venv-2.7.14/lib/python2.7/site-packages/tensorflow/contrib/training/python/training/hparam.py", line 490, in set_hparam param_type, is_list = self._hparam_types[name] KeyError: u'num_hidden_layers'
Thanks in advance .

@Playinf
Copy link
Collaborator

Playinf commented May 1, 2018

The num_hidden_layers is a hyper-parameter specific to seq2seq architecture. This error should not be happened unless you have renamed seq2seq.json to rnnsearch.json. I think you should delete the train directory and try the command again.

@Julisa-test
Copy link
Author

I just downloaded all of the THUMT packages from GitHub and then tried to reproduce your experiments in accordance with the user manual. The training files were generated automatically after running the code in the user manual 3.2.1. I did not rename any files. Just after running the code in 3.2.2 and then running the code in 3.2.2, there was an error. I use python2.7.0, Tensorflow1.6.0, is the version I use wrong?

@Playinf
Copy link
Collaborator

Playinf commented May 1, 2018

That's weird. I have tested the latest commit of THUMT and do not found this problem. Have you tried deleting train directory and re-run the command?

@Julisa-test
Copy link
Author

Hi, thanks for your reply!

I tried deleting train directory and re-run the command, but now I get this error:

2018-05-01 15:30:16.636478: W tensorflow/core/common_runtime/bfc_allocator.cc:279] **************************************************************************************************xx 2018-05-01 15:30:16.647710: E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS 2018-05-01 15:30:16.647734: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:203] Unexpected Event status: 1 2018-05-01 15:30:16.647724: E tensorflow/core/common_runtime/bfc_allocator.cc:381] tried to deallocate nullptr 2018-05-01 15:30:16.647755: E tensorflow/core/common_runtime/bfc_allocator.cc:381] tried to deallocate nullptr Aborted (core dumped)
nvidia driver version: 390.30
CUDA Version 9.0.176
cudnn7_7.1.3.16
gcc version 5.4.0

Should i ignore this?

@Playinf
Copy link
Collaborator

Playinf commented May 1, 2018

It seems that you have run out of GPU memory. Try to reduce the batch_size hyper-parameter.

@Julisa-test
Copy link
Author

Thanks very much, it finally works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants