The error of training an RNNsearch model #30

Julisa-test · 2018-05-01T01:48:27Z

There is my command as follow . What cause the error and how to solve it ?
(venv-2.7.14) ubuntu@ubuntu:~/python2.7/tensorflow$ python THUMT/thumt/bin/trainer.py --input corpus.tc.32k.de.shuf corpus.tc.32k.en.shuf --vocabulary vocab.32k.de.txt vocab.32k.en.txt --model rnnsearch --validation newstest2014.tc.32k.de --references newstest2014.tc.32k.en --parameters=batch_size=128,device_list=[0],train_steps=200000 INFO:tensorflow:Restoring hyper parameters from /home/ubuntu/python2.7/tensorflow/train/params.json Traceback (most recent call last): File "THUMT/thumt/bin/trainer.py", line 472, in <module> main(parse_args()) File "THUMT/thumt/bin/trainer.py", line 317, in main params = import_params(args.output, args.model, params) File "THUMT/thumt/bin/trainer.py", line 122, in import_params params.parse_json(json_str) File "/home/ubuntu/.pyenv/versions/venv-2.7.14/lib/python2.7/site-packages/tensorflow/contrib/training/python/training/hparam.py", line 587, in parse_json return self.override_from_dict(values_map) File "/home/ubuntu/.pyenv/versions/venv-2.7.14/lib/python2.7/site-packages/tensorflow/contrib/training/python/training/hparam.py", line 539, in override_from_dict self.set_hparam(name, value) File "/home/ubuntu/.pyenv/versions/venv-2.7.14/lib/python2.7/site-packages/tensorflow/contrib/training/python/training/hparam.py", line 490, in set_hparam param_type, is_list = self._hparam_types[name] KeyError: u'num_hidden_layers'
Thanks in advance .

The text was updated successfully, but these errors were encountered:

Playinf · 2018-05-01T03:15:11Z

The num_hidden_layers is a hyper-parameter specific to seq2seq architecture. This error should not be happened unless you have renamed seq2seq.json to rnnsearch.json. I think you should delete the train directory and try the command again.

Julisa-test · 2018-05-01T03:39:43Z

I just downloaded all of the THUMT packages from GitHub and then tried to reproduce your experiments in accordance with the user manual. The training files were generated automatically after running the code in the user manual 3.2.1. I did not rename any files. Just after running the code in 3.2.2 and then running the code in 3.2.2, there was an error. I use python2.7.0, Tensorflow1.6.0, is the version I use wrong?

Playinf · 2018-05-01T03:50:13Z

That's weird. I have tested the latest commit of THUMT and do not found this problem. Have you tried deleting train directory and re-run the command?

Julisa-test · 2018-05-01T08:19:08Z

Hi, thanks for your reply!

I tried deleting train directory and re-run the command, but now I get this error:

2018-05-01 15:30:16.636478: W tensorflow/core/common_runtime/bfc_allocator.cc:279] **************************************************************************************************xx 2018-05-01 15:30:16.647710: E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS 2018-05-01 15:30:16.647734: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:203] Unexpected Event status: 1 2018-05-01 15:30:16.647724: E tensorflow/core/common_runtime/bfc_allocator.cc:381] tried to deallocate nullptr 2018-05-01 15:30:16.647755: E tensorflow/core/common_runtime/bfc_allocator.cc:381] tried to deallocate nullptr Aborted (core dumped)
nvidia driver version: 390.30
CUDA Version 9.0.176
cudnn7_7.1.3.16
gcc version 5.4.0

Should i ignore this？

Playinf · 2018-05-01T12:45:11Z

It seems that you have run out of GPU memory. Try to reduce the batch_size hyper-parameter.

Julisa-test · 2018-05-02T02:30:45Z

Thanks very much， it finally works！

Julisa-test closed this as completed May 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The error of training an RNNsearch model #30

The error of training an RNNsearch model #30

Julisa-test commented May 1, 2018

Playinf commented May 1, 2018

Julisa-test commented May 1, 2018

Playinf commented May 1, 2018

Julisa-test commented May 1, 2018

Playinf commented May 1, 2018

Julisa-test commented May 2, 2018

The error of training an RNNsearch model #30

The error of training an RNNsearch model #30

Comments

Julisa-test commented May 1, 2018

Playinf commented May 1, 2018

Julisa-test commented May 1, 2018

Playinf commented May 1, 2018

Julisa-test commented May 1, 2018

Playinf commented May 1, 2018

Julisa-test commented May 2, 2018