[BUG] for training part of policy gradient #13

sherlock1987 · 2020-05-27T02:53:38Z

Describe the bug
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [40,0,0], thread: [0,0,0] Assertion val >= zero failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [40,0,0], thread: [1,0,0] Assertion val >= zero failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [9,0,0], thread: [0,0,0] Assertion val >= zero failed.

This will happen when runing train.py in convlab2/policy/pg, and this will happen after 15 epoch even I have already loaded the MLE model.

The text was updated successfully, but these errors were encountered:

sherlock1987 · 2020-05-27T03:20:03Z

And it is caused by this.
DEBUG:root:<> epoch 19, iteration 0, policy, loss -52.303607639513515
DEBUG:root:<> epoch 19, iteration 1, policy, loss nan
DEBUG:root:<> epoch 19, iteration 2, policy, loss nan
DEBUG:root:<> epoch 19, iteration 3, policy, loss nan
DEBUG:root:<> epoch 19, iteration 4, policy, loss nan
INFO:root:<> epoch 19: saved network to mdl
Process SpawnProcess-81:
Traceback (most recent call last):
File "/home/raliegh/anaconda3/envs/convlab/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/raliegh/anaconda3/envs/convlab/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/raliegh/图片/ConvLab-2/convlab2/policy/pg/train.py", line 61, in sampler
a = policy.predict(s)
File "/home/raliegh/视频/ConvLab-2/convlab2/policy/pg/pg.py", line 55, in predict
a = self.policy.select_action(s_vec.to(device=DEVICE), self.is_train).cpu()
File "/home/raliegh/视频/ConvLab-2/convlab2/policy/rlmodule.py", line 92, in select_action
a = a_probs.multinomial(1).squeeze(1) if sample else a_probs.argmax(1)
RuntimeError: invalid multinomial distribution (encountering probability entry < 0)
Process SpawnProcess-82:
Traceback (most recent call last):
File "/home/raliegh/anaconda3/envs/convlab/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/raliegh/anaconda3/envs/convlab/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/raliegh/图片/ConvLab-2/convlab2/policy/pg/train.py", line 61, in sampler
a = policy.predict(s)
File "/home/raliegh/视频/ConvLab-2/convlab2/policy/pg/pg.py", line 55, in predict
a = self.policy.select_action(s_vec.to(device=DEVICE), self.is_train).cpu()
File "/home/raliegh/视频/ConvLab-2/convlab2/policy/rlmodule.py", line 92, in select_action
a = a_probs.multinomial(1).squeeze(1) if sample else a_probs.argmax(1)
RuntimeError: invalid multinomial distribution (encountering probability entry < 0)
Process SpawnProcess-83:
Traceback (most recent call last):
File "/home/raliegh/anaconda3/envs/convlab/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/raliegh/anaconda3/envs/convlab/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/raliegh/图片/ConvLab-2/convlab2/policy/pg/train.py", line 61, in sampler
a = policy.predict(s)
File "/home/raliegh/视频/ConvLab-2/convlab2/policy/pg/pg.py", line 55, in predict
a = self.policy.select_action(s_vec.to(device=DEVICE), self.is_train).cpu()
File "/home/raliegh/视频/ConvLab-2/convlab2/policy/rlmodule.py", line 92, in select_action
a = a_probs.multinomial(1).squeeze(1) if sample else a_probs.argmax(1)
RuntimeError: invalid multinomial distribution (encountering probability entry < 0)
Process SpawnProcess-84:
Traceback (most recent call last):
File "/home/raliegh/anaconda3/envs/convlab/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/raliegh/anaconda3/envs/convlab/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/raliegh/图片/ConvLab-2/convlab2/policy/pg/train.py", line 61, in sampler
a = policy.predict(s)
File "/home/raliegh/视频/ConvLab-2/convlab2/policy/pg/pg.py", line 55, in predict
a = self.policy.select_action(s_vec.to(device=DEVICE), self.is_train).cpu()
File "/home/raliegh/视频/ConvLab-2/convlab2/policy/rlmodule.py", line 92, in select_action
a = a_probs.multinomial(1).squeeze(1) if sample else a_probs.argmax(1)
RuntimeError: invalid multinomial distribution (encountering probability entry < 0)

zqwerty · 2020-05-27T06:29:12Z

Sorry, we will check this as soon as possible

sherlock1987 · 2020-05-27T06:30:42Z

It is fine, thank you so much, I think this is related to the loss, when loss goes to NAN, it fail to do multi process.

zqwerty · 2020-06-10T13:20:48Z

We have updated the policy to address this issue. Have a try!

zqwerty · 2020-07-16T01:34:48Z

move to #54

sherlock1987 added the bug Something isn't working label May 27, 2020

sherlock1987 closed this as completed Jun 12, 2020

zqwerty mentioned this issue Jul 16, 2020

[Maintenance] RL policy training #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] for training part of policy gradient #13

[BUG] for training part of policy gradient #13

sherlock1987 commented May 27, 2020

sherlock1987 commented May 27, 2020

zqwerty commented May 27, 2020

sherlock1987 commented May 27, 2020

zqwerty commented Jun 10, 2020

zqwerty commented Jul 16, 2020

[BUG] for training part of policy gradient #13

[BUG] for training part of policy gradient #13

Comments

sherlock1987 commented May 27, 2020

sherlock1987 commented May 27, 2020

zqwerty commented May 27, 2020

sherlock1987 commented May 27, 2020

zqwerty commented Jun 10, 2020

zqwerty commented Jul 16, 2020