adam optim ERROR:If capturable=False, state_steps should not be CUDA tensors. #681

jaried · 2022-07-03T05:38:25Z

I have marked all applicable categories:
I have visited the source website
I have searched through the issue tracker for duplicates
I have mentioned version numbers, operating system and environment, where applicable:

import tianshou, gym, torch, numpy, sys
print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)
0.4.8 0.21.0 1.12.0+cu113 1.20.1 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)] win32

The SAC algorithm has been normal before. Recently, after reading the saved policy weight and optimizer, the first learning error is as follows. How can I solve it?

Epoch #1:   0%|                                                                                                                                                                                 | 23/57600 [00:02<1:37:11,  9.87it/s]
('If capturable=False, state_steps should not be CUDA tensors.',)
===========
Traceback (most recent call last):
  File "D:\Tony\Documents\yunpan\invest\2022\Quant\code\factor\myfactor.py", line 684, in wrapper
    func(*args, **kw)
  File "tianshou_if.py", line 378, in sac_with_il_if
    result = offpolicy_trainer(
  File "D:\Anaconda3\lib\site-packages\tianshou\trainer\offpolicy.py", line 129, in offpolicy_trainer
    return OffpolicyTrainer(*args, **kwargs).run()
  File "D:\Anaconda3\lib\site-packages\tianshou\trainer\base.py", line 425, in run
    deque(self, maxlen=0)  # feed the entire iterator into a zero-length deque
  File "D:\Anaconda3\lib\site-packages\tianshou\trainer\base.py", line 282, in __next__
    self.policy_update_fn(data, result)
  File "D:\Anaconda3\lib\site-packages\tianshou\trainer\offpolicy.py", line 118, in policy_update_fn
    losses = self.policy.update(self.batch_size, self.train_collector.buffer)
  File "D:\Anaconda3\lib\site-packages\tianshou\policy\base.py", line 277, in update
    result = self.learn(batch, **kwargs)
  File "D:\Anaconda3\lib\site-packages\tianshou\policy\modelfree\sac.py", line 149, in learn
    td1, critic1_loss = self._mse_optimizer(
  File "D:\Anaconda3\lib\site-packages\tianshou\policy\modelfree\ddpg.py", line 158, in _mse_optimizer
    optimizer.step()
  File "D:\Anaconda3\lib\site-packages\torch\optim\optimizer.py", line 109, in wrapper
    return func(*args, **kwargs)
  File "D:\Anaconda3\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "D:\Anaconda3\lib\site-packages\torch\optim\adam.py", line 157, in step
    adam(params_with_grad,
  File "D:\Anaconda3\lib\site-packages\torch\optim\adam.py", line 213, in adam
    func(params,
  File "D:\Anaconda3\lib\site-packages\torch\optim\adam.py", line 255, in _single_tensor_adam
    assert not step_t.is_cuda, "If capturable=False, state_steps should not be CUDA tensors."
AssertionError: If capturable=False, state_steps should not be CUDA tensors.

The text was updated successfully, but these errors were encountered:

jaried · 2022-07-03T06:51:53Z

Before executing offpolicy_trainer, I added the following code:

        actor_optim.param_groups[0]['capturable'] = True
        alpha_optim.param_groups[0]['capturable'] = True
        critic1_optim.param_groups[0]['capturable'] = True
        critic2_optim.param_groups[0]['capturable'] = True

can run, what is the reason?

jaried · 2022-07-03T13:15:42Z

This article says that it is caused by pressing ctrl+c to end the training?
babysor/MockingBird#631

But I use the previous version of the checkpoint file in question, and the same error is reported, why is this?

jaried · 2022-07-04T08:14:48Z

pytorch/pytorch#80809

Someone said this:

Hi, I am also facing the same issue when I try to load the checkpoint and resume model training on the latest pytorch (1.12).

It seems to be related with a newly introduced parameter (capturable) for the Adam and AdamW optimizers. Currently two workarounds:

forcing capturable = True after loading the checkpoint (as suggested above) optim.param_groups[0]['capturable'] = True . This seems to slow down the model training by approx. 10% (YMMV depending on the setup).

Reverting pytorch back to previous versions (I have been using 1.11.0).

I'm wondering whether enforcing capturable = True may incur unwanted side effects.

I'm also wondering about whether forcing captureable=True would have unwanted side effects. I will also return to torch1.11.

jaried mentioned this issue Jul 3, 2022

assert not step_t.is_cuda, "If capturable=False, state_steps should not be CUDA tensors. pytorch/pytorch#80809

Closed

jaried closed this as completed Jul 4, 2022

Trinkle23897 added the question Further information is requested label Jul 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adam optim ERROR:If capturable=False, state_steps should not be CUDA tensors. #681

adam optim ERROR:If capturable=False, state_steps should not be CUDA tensors. #681

jaried commented Jul 3, 2022 •

edited

jaried commented Jul 3, 2022

jaried commented Jul 3, 2022 •

edited

jaried commented Jul 4, 2022 •

edited

adam optim ERROR:If capturable=False, state_steps should not be CUDA tensors. #681

adam optim ERROR:If capturable=False, state_steps should not be CUDA tensors. #681

Comments

jaried commented Jul 3, 2022 • edited

jaried commented Jul 3, 2022

jaried commented Jul 3, 2022 • edited

jaried commented Jul 4, 2022 • edited

jaried commented Jul 3, 2022 •

edited

jaried commented Jul 3, 2022 •

edited

jaried commented Jul 4, 2022 •

edited