Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any limit for Batch_size? #4

Closed
axxddzh opened this issue Oct 12, 2021 · 1 comment
Closed

Is there any limit for Batch_size? #4

axxddzh opened this issue Oct 12, 2021 · 1 comment

Comments

@axxddzh
Copy link

axxddzh commented Oct 12, 2021

I try to train the model with tsn feature.But it only use 2GB GPU memory.So I try to train the model with bitch_size = 8.But there are some error like:

/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [41,0,0], thread: [0,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [41,0,0], thread: [1,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [41,0,0], thread: [2,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
  0%|                                                                                                                                                                                                   | 0/2502 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 317, in <module>
    train(opt)
  File "train.py", line 181, in train
    output, loss = model(dt, criterion, opt.transformer_input_type)
  File "/home/anaconda3/envs/PDVC-main/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/media/axxddzh/dat/axxddzh/PDVC-main/models/pdvc.py", line 166, in forward
    disable_iterative_refine)
  File "/media/axxddzh/dat/axxddzh/PDVC-main/models/pdvc.py", line 299, in parallel_prediction_matched
    others, self.opt.caption_decoder_type, indices)
  File "/media/axxddzh/dat/axxddzh/PDVC-main/models/pdvc.py", line 387, in caption_prediction
    cap_prob = cap_head(hs[:, feat_bigids], reference[:, feat_bigids], others, seq)
RuntimeError: CUDA error: device-side assert triggered

I have met the same problem when Batch_size not be 1

@ttengwang
Copy link
Owner

Hi, in this code, the standard captioning module (PDVC) doesn't support batch size > 1, but PDVC_light does. I tried to train PDVC_light with a larger batch size but got a slight performance drop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants