Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [3, 256, 7, 7]], which is output 0 of CudnnConvolutionBackward, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck! #9

Open
dogydev opened this issue Feb 25, 2020 · 0 comments

Comments

@dogydev
Copy link

dogydev commented Feb 25, 2020

RuntimeError: one of the variables needed for gradient computation has been modified by an in-place operation: [torch.cuda.FloatTensor [3, 256, 7, 7]], which is output 0 of CudnnConvolutionBackward, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
while running the training code using the command provided in the readme. Further research showed that I could debug this error using the PyTorch anomaly detector. This showed that the error occurred when calling a forward function.
Full traceback:
/pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:57: UserWarning: Traceback of forward call that caused the error:
File "scripts/trainval.py", line 401, in
main()
File "scripts/trainval.py", line 173, in main
main_train(train_dataset, validation_dataset, extra_dataset)
File "scripts/trainval.py", line 291, in main_train
train_epoch(epoch, trainer, train_dataloader, meters)
File "scripts/trainval.py", line 346, in train_epoch
loss, monitors, output_dict, extra_info = trainer.step(feed_dict, cast_tensor=False)
File "/home/user/Jacinle/jactorch/train/env.py", line 135, in step
loss, monitors, output_dict = self._model(feed_dict)
File "/home/user/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "experiments/clevr/desc_nscl_derender.py", line 40, in forward
f_sng = self.scene_graph(f_scene, feed_dict.objects, feed_dict.objects_length)
File "/home/weichen/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/data2/PycharmProjects/NSCL-PyTorch-Release/nscl/nn/scene_graph/scene_graph.py", line 125, in forward
this_object_features[sub_id], this_object_features[obj_id],

Traceback (most recent call last):
File "scripts/trainval.py", line 401, in
main()
File "scripts/trainval.py", line 173, in main
main_train(train_dataset, validation_dataset, extra_dataset)
File "scripts/trainval.py", line 291, in main_train
train_epoch(epoch, trainer, train_dataloader, meters)
File "scripts/trainval.py", line 346, in train_epoch
loss, monitors, output_dict, extra_info = trainer.step(feed_dict, cast_tensor=False)
File "/home/user/Jacinle/jactorch/train/env.py", line 155, in step
loss.backward()
File "/home/user/.local/lib/python3.7/site-packages/torch/tensor.py", line 118, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/user/.local/lib/python3.7/site-packages/torch/autograd/init.py", line 93, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [3, 256, 7, 7]], which is output 0 of CudnnConvolutionBackward, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

The error occurred in epoch 6. The loss value seemed to be decreasing normally.
Epoch 6 acc/qa=0.687500 loss=0.708516 loss/qa=0.708516 time/data=0.542045 time/step=1.222588: 0%| | 1/469 [00:01<13:45, 1.76s/i
Epoch 6 acc/qa=0.562500 loss=0.713307 loss/qa=0.713307 time/data=0.009401 time/step=1.012083: 0%| | 1/469 [00:02<13:45, 1.76s/i
Epoch 6 acc/qa=0.562500 loss=0.713307 loss/qa=0.713307 time/data=0.009401 time/step=1.012083: 0%| | 2/469 [00:02<11:59, 1.54s/i
Epoch 6 acc/qa=0.812500 loss=0.566920 loss/qa=0.566920 time/data=0.011552 time/step=1.050238: 0%| | 2/469 [00:03<11:59, 1.54s/i
Epoch 6 acc/qa=0.812500 loss=0.566920 loss/qa=0.566920 time/data=0.011552 time/step=1.050238: 1%| | 3/469 [00:03<10:51, 1.40s/it]/

Any ideas,
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant