Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when I try to do the inference #15

Closed
JoseMoFi opened this issue May 12, 2022 · 6 comments
Closed

Error when I try to do the inference #15

JoseMoFi opened this issue May 12, 2022 · 6 comments

Comments

@JoseMoFi
Copy link

JoseMoFi commented May 12, 2022

Hello, I'm replicating this model but when I execute the command for do the inferece an unknowns error appears. However, I don't know why I have this error.
My setup it's:

  • RTX 3060ti
  • 16GB RAM
  • Ryzen 7 5800X

The complete error is:

Traceback (most recent call last):
  File "main.py", line 209, in <module>
    processor.start()
  File "main.py", line 61, in start
    dev_wer = seq_eval(self.arg, self.data_loader["dev"], self.model, self.device,
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/seq_scripts.py", line 56, in seq_eval
    ret_dict = model(vid, vid_lgt, label=label, label_lgt=label_lgt)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/slr_network.py", line 63, in forward
    framewise = self.masked_bn(inputs, len_x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/slr_network.py", line 53, in masked_bn
    x = self.conv2d(x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torchvision/models/resnet.py", line 249, in forward
    return self._forward_impl(x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torchvision/models/resnet.py", line 233, in _forward_impl
    x = self.bn1(x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 135, in forward
    return F.batch_norm(
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/functional.py", line 2149, in batch_norm
    return torch.batch_norm(
RuntimeError: CUDA error: unknown error

And I have change the config file:
-batch_size: 2
+batch_size: 1
-test_batch_size: 8
-num_worker: 10
-device: 0,1,2
+test_batch_size: 1
+num_worker: 1
+device: 0

Also my torch version its 1.8.1+cu111

Thank you for the help!

UPDATE

Also i found this error:

Traceback (most recent call last):
  File "main.py", line 209, in <module>
    processor.start()
  File "main.py", line 61, in start
    dev_wer = seq_eval(self.arg, self.data_loader["dev"], self.model, self.device,
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/seq_scripts.py", line 56, in seq_eval
    ret_dict = model(vid, vid_lgt, label=label, label_lgt=label_lgt)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/slr_network.py", line 63, in forward
    framewise = self.masked_bn(inputs, len_x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/slr_network.py", line 53, in masked_bn
    x = self.conv2d(x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torchvision/models/resnet.py", line 249, in forward
    return self._forward_impl(x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torchvision/models/resnet.py", line 232, in _forward_impl
    x = self.conv1(x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 399, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 395, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: CUDA error: unknown error

whit the next config
-batch_size: 2
+batch_size: 1
random_seed: 0
-test_batch_size: 8
-num_worker: 10
-device: 0,1,2
+test_batch_size: 2
+num_worker: 2
+device: 0

@ycmin95
Copy link
Collaborator

ycmin95 commented May 14, 2022

Hi, @JoseMoFi ,
it seems like its about your environment setting, because the error occurs in the forward of ResNet. Perhaps you can check your envionment first and then an input validation may be helpful.

@JoseMoFi
Copy link
Author

JoseMoFi commented May 15, 2022

I use WSL 2, could it be the problem? And thank you for the help!

@ycmin95
Copy link
Collaborator

ycmin95 commented May 15, 2022

I'm not familar with WSL 2, all experiments are conducted on ubuntu. Can WSL 2 detect the GPU device?

@JoseMoFi
Copy link
Author

Yes, WSL 2 can detect the GPU device. However, I think the problem should be WSL 2 because I had similar error in other repo when I was training and now I test again but in W10 and it work, so... I'll do more test, but it is very probable who the problem must be WSL 2 or some config.
If I find something I'll post here. And really thank you for the help!

@ardasatata
Copy link

@JoseMoFi I suggest you go straight install Ubuntu rather than wasting your time to set this up on W10 (been there myself & I ended up installing Ubuntu 😢)
This code works well on Ubuntu, even on the Nvidia DGX-1 environment ✌🏼

@JoseMoFi
Copy link
Author

Ok, I am secure that the problem was WSL 2. However, I don't know if it's because I have bad config CUDA or if WSL can't work with the graphic card. But I use other code that neither work in WSL but it can work on server with Ubuntu. So I can say thay my problem is caused by WSL.
Thank you for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants