Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

6G GPU memory,batch_size=1 with D1 network,still got CUDA out of memory #32

Open
AlexLuya opened this issue Dec 14, 2019 · 11 comments
Open

Comments

@AlexLuya
Copy link

Your default batch size is 32,What GPU did you used for training?

@RayOnFire
Copy link

Same. 2080TI (11GB) with batch_size = 1 still not work. Here's the traceback:

Traceback (most recent call last):
  File "train.py", line 195, in <module>
    train()
  File "train.py", line 140, in train
    classification, regression, anchors = model(images)
  File "/home/ray/anaconda3/envs/dl/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ray/EfficientDetPytorch/models/efficientdet.py", line 62, in forward
    anchors = self.anchors(inputs)
  File "/home/ray/anaconda3/envs/dl/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ray/EfficientDetPytorch/models/module.py", line 153, in forward
    return torch.from_numpy(all_anchors.astype(np.float32)).cuda()
RuntimeError: CUDA error: out of memory

@RayOnFire
Copy link

You can try NVIDIA apex with opt_level = 'O2, I got 8100M GPU memory usage with batch size 16, you can try to use smaller batch size to fit in 6GB GPU RAM.

@shengyuqing
Copy link

Same problem. Two 2080TI (11GB*2) with batch_size = 6 . Here's the traceback:
Traceback (most recent call last): File "C:/Users/Admin/Desktop/EfficientDet.Pytorch-master/train.py", line 196, in <module> train() File "C:/Users/Admin/Desktop/EfficientDet.Pytorch-master/train.py", line 141, in train classification, regression, anchors = model(images) File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\parallel\data_parallel.py", line 152, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\parallel\data_parallel.py", line 162, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 85, in parallel_apply output.reraise() File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\_utils.py", line 385, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 60, in _worker output = module(*input, **kwargs) File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "C:\Users\Admin\Desktop\EfficientDet.Pytorch-master\models\efficientdet.py", line 59, in forward features = self.BIFPN(features[-5:]) File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "C:\Users\Admin\Desktop\EfficientDet.Pytorch-master\models\bifpn.py", line 109, in forward laterals = bifpn_module(laterals) File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "C:\Users\Admin\Desktop\EfficientDet.Pytorch-master\models\bifpn.py", line 196, in forward pathtd[i], scale_factor=2, mode='nearest') RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 11.00 GiB total capacity; 5.97 GiB already allocated; 678.40 KiB free; 38.08 MiB cached)

@toandaominh1997
Copy link
Owner

@AlexLuya @RayOnFire @shengyuqing
I used:
OS: Ubuntu 18.04
GPU: 2*2080TI(11GB)
When training, I set batch_size 32 for EffficientDet-D0(~20000MB cuda), and batch_size 16 for EfficientDet-Do(~20000MB cuda).
At commit #36 , If you use multi-GPU, I have changed .cuda() in loss function and Anchor to .to(input.device). I think it will fix this issues.

@shengyuqing
Copy link

@AlexLuya @RayOnFire @shengyuqing
I used:
OS: Ubuntu 18.04
GPU: 2*2080TI(11GB)
When training, I set batch_size 32 for EffficientDet-D0(~20000MB cuda), and batch_size 16 for EfficientDet-Do(~20000MB cuda).
At commit #36 , If you use multi-GPU, I have changed .cuda() in loss function and Anchor to .to(input.device). I think it will fix this issues.

Thanks! I have updated the code, but still the same problem. Very strange.

@shengyuqing
Copy link

@toandaominh1997
I used Windows10

@toandaominh1997 toandaominh1997 added this to InProgress in EfficientDet Dec 18, 2019
@foocker
Copy link

foocker commented Dec 24, 2019

but, i want to use d0-d7, just one 2080Ti, and batch_size >=4 for any backbone, and input shape >=(448,448) or (640, 640)
it's seems that, the basic backbone limit the input shape, and need more cuda memory, not like the paper said....more light, more efficient.

@qtw1998
Copy link

qtw1998 commented Dec 29, 2019

@AlexLuya @RayOnFire @shengyuqing
I used:
OS: Ubuntu 18.04
GPU: 2*2080TI(11GB)
When training, I set batch_size 32 for EffficientDet-D0(~20000MB cuda), and batch_size 16 for EfficientDet-Do(~20000MB cuda).
At commit #36 , If you use multi-GPU, I have changed .cuda() in loss function and Anchor to .to(input.device). I think it will fix this issues.

I don't understand the explicit way?

@qtw1998
Copy link

qtw1998 commented Dec 29, 2019

@toandaominh1997
I used Windows10

have U solve the problem?

@Jasper-Bai
Copy link

have you solved the out of memory ?

@yaoliUoA
Copy link

I got the same problem on my Titan rtx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
EfficientDet
  
InProgress
Development

No branches or pull requests

8 participants