Error when I try to do the inference #15

JoseMoFi · 2022-05-12T07:50:22Z

Hello, I'm replicating this model but when I execute the command for do the inferece an unknowns error appears. However, I don't know why I have this error.
My setup it's:

RTX 3060ti
16GB RAM
Ryzen 7 5800X

The complete error is:

Traceback (most recent call last):
  File "main.py", line 209, in <module>
    processor.start()
  File "main.py", line 61, in start
    dev_wer = seq_eval(self.arg, self.data_loader["dev"], self.model, self.device,
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/seq_scripts.py", line 56, in seq_eval
    ret_dict = model(vid, vid_lgt, label=label, label_lgt=label_lgt)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/slr_network.py", line 63, in forward
    framewise = self.masked_bn(inputs, len_x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/slr_network.py", line 53, in masked_bn
    x = self.conv2d(x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torchvision/models/resnet.py", line 249, in forward
    return self._forward_impl(x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torchvision/models/resnet.py", line 233, in _forward_impl
    x = self.bn1(x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 135, in forward
    return F.batch_norm(
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/functional.py", line 2149, in batch_norm
    return torch.batch_norm(
RuntimeError: CUDA error: unknown error

And I have change the config file:
-batch_size: 2
+batch_size: 1
-test_batch_size: 8
-num_worker: 10
-device: 0,1,2
+test_batch_size: 1
+num_worker: 1
+device: 0

Also my torch version its 1.8.1+cu111

Thank you for the help!

UPDATE

Also i found this error:

Traceback (most recent call last):
  File "main.py", line 209, in <module>
    processor.start()
  File "main.py", line 61, in start
    dev_wer = seq_eval(self.arg, self.data_loader["dev"], self.model, self.device,
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/seq_scripts.py", line 56, in seq_eval
    ret_dict = model(vid, vid_lgt, label=label, label_lgt=label_lgt)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/slr_network.py", line 63, in forward
    framewise = self.masked_bn(inputs, len_x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/slr_network.py", line 53, in masked_bn
    x = self.conv2d(x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torchvision/models/resnet.py", line 249, in forward
    return self._forward_impl(x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torchvision/models/resnet.py", line 232, in _forward_impl
    x = self.conv1(x)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 399, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 395, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: CUDA error: unknown error

whit the next config
-batch_size: 2
+batch_size: 1
random_seed: 0
-test_batch_size: 8
-num_worker: 10
-device: 0,1,2
+test_batch_size: 2
+num_worker: 2
+device: 0

The text was updated successfully, but these errors were encountered:

ycmin95 · 2022-05-14T09:04:53Z

Hi, @JoseMoFi ,
it seems like its about your environment setting, because the error occurs in the forward of ResNet. Perhaps you can check your envionment first and then an input validation may be helpful.

JoseMoFi · 2022-05-15T09:47:20Z

I use WSL 2, could it be the problem? And thank you for the help!

ycmin95 · 2022-05-15T13:37:36Z

I'm not familar with WSL 2, all experiments are conducted on ubuntu. Can WSL 2 detect the GPU device?

JoseMoFi · 2022-05-16T08:44:14Z

Yes, WSL 2 can detect the GPU device. However, I think the problem should be WSL 2 because I had similar error in other repo when I was training and now I test again but in W10 and it work, so... I'll do more test, but it is very probable who the problem must be WSL 2 or some config.
If I find something I'll post here. And really thank you for the help!

ardasatata · 2022-05-22T18:00:14Z

@JoseMoFi I suggest you go straight install Ubuntu rather than wasting your time to set this up on W10 (been there myself & I ended up installing Ubuntu 😢)
This code works well on Ubuntu, even on the Nvidia DGX-1 environment ✌🏼

JoseMoFi · 2022-05-27T07:41:13Z

Ok, I am secure that the problem was WSL 2. However, I don't know if it's because I have bad config CUDA or if WSL can't work with the graphic card. But I use other code that neither work in WSL but it can work on server with Ubuntu. So I can say thay my problem is caused by WSL.
Thank you for the help!

JoseMoFi closed this as completed May 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when I try to do the inference #15

Error when I try to do the inference #15

JoseMoFi commented May 12, 2022 •

edited

ycmin95 commented May 14, 2022

JoseMoFi commented May 15, 2022 •

edited

ycmin95 commented May 15, 2022

JoseMoFi commented May 16, 2022

ardasatata commented May 22, 2022

JoseMoFi commented May 27, 2022

Error when I try to do the inference #15

Error when I try to do the inference #15

Comments

JoseMoFi commented May 12, 2022 • edited

UPDATE

ycmin95 commented May 14, 2022

JoseMoFi commented May 15, 2022 • edited

ycmin95 commented May 15, 2022

JoseMoFi commented May 16, 2022

ardasatata commented May 22, 2022

JoseMoFi commented May 27, 2022

JoseMoFi commented May 12, 2022 •

edited

JoseMoFi commented May 15, 2022 •

edited