6G GPU memory,batch_size=1 with D1 network,still got CUDA out of memory #32

AlexLuya · 2019-12-14T13:39:45Z

Your default batch size is 32,What GPU did you used for training?

RayOnFire · 2019-12-15T15:17:48Z

Same. 2080TI (11GB) with batch_size = 1 still not work. Here's the traceback:

Traceback (most recent call last):
  File "train.py", line 195, in <module>
    train()
  File "train.py", line 140, in train
    classification, regression, anchors = model(images)
  File "/home/ray/anaconda3/envs/dl/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ray/EfficientDetPytorch/models/efficientdet.py", line 62, in forward
    anchors = self.anchors(inputs)
  File "/home/ray/anaconda3/envs/dl/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ray/EfficientDetPytorch/models/module.py", line 153, in forward
    return torch.from_numpy(all_anchors.astype(np.float32)).cuda()
RuntimeError: CUDA error: out of memory

RayOnFire · 2019-12-15T17:00:18Z

You can try NVIDIA apex with opt_level = 'O2, I got 8100M GPU memory usage with batch size 16, you can try to use smaller batch size to fit in 6GB GPU RAM.

shengyuqing · 2019-12-16T09:31:04Z

Same problem. Two 2080TI (11GB*2) with batch_size = 6 . Here's the traceback:
Traceback (most recent call last): File "C:/Users/Admin/Desktop/EfficientDet.Pytorch-master/train.py", line 196, in <module> train() File "C:/Users/Admin/Desktop/EfficientDet.Pytorch-master/train.py", line 141, in train classification, regression, anchors = model(images) File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\parallel\data_parallel.py", line 152, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\parallel\data_parallel.py", line 162, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 85, in parallel_apply output.reraise() File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\_utils.py", line 385, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 60, in _worker output = module(*input, **kwargs) File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "C:\Users\Admin\Desktop\EfficientDet.Pytorch-master\models\efficientdet.py", line 59, in forward features = self.BIFPN(features[-5:]) File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "C:\Users\Admin\Desktop\EfficientDet.Pytorch-master\models\bifpn.py", line 109, in forward laterals = bifpn_module(laterals) File "C:\Users\Admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "C:\Users\Admin\Desktop\EfficientDet.Pytorch-master\models\bifpn.py", line 196, in forward pathtd[i], scale_factor=2, mode='nearest') RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 11.00 GiB total capacity; 5.97 GiB already allocated; 678.40 KiB free; 38.08 MiB cached)

toandaominh1997 · 2019-12-17T03:11:49Z

@AlexLuya @RayOnFire @shengyuqing
I used:
OS: Ubuntu 18.04
GPU: 2*2080TI(11GB)
When training, I set batch_size 32 for EffficientDet-D0(~20000MB cuda), and batch_size 16 for EfficientDet-Do(~20000MB cuda).
At commit #36 , If you use multi-GPU, I have changed .cuda() in loss function and Anchor to .to(input.device). I think it will fix this issues.

shengyuqing · 2019-12-17T09:43:46Z

@AlexLuya @RayOnFire @shengyuqing
I used:
OS: Ubuntu 18.04
GPU: 2*2080TI(11GB)
When training, I set batch_size 32 for EffficientDet-D0(~20000MB cuda), and batch_size 16 for EfficientDet-Do(~20000MB cuda).
At commit #36 , If you use multi-GPU, I have changed .cuda() in loss function and Anchor to .to(input.device). I think it will fix this issues.

Thanks! I have updated the code, but still the same problem. Very strange.

shengyuqing · 2019-12-17T09:50:34Z

@toandaominh1997
I used Windows10

foocker · 2019-12-24T02:35:21Z

but, i want to use d0-d7, just one 2080Ti, and batch_size >=4 for any backbone, and input shape >=(448,448) or (640, 640)
it's seems that, the basic backbone limit the input shape, and need more cuda memory, not like the paper said....more light, more efficient.

qtw1998 · 2019-12-29T11:46:43Z

@AlexLuya @RayOnFire @shengyuqing
I used:
OS: Ubuntu 18.04
GPU: 2*2080TI(11GB)
When training, I set batch_size 32 for EffficientDet-D0(~20000MB cuda), and batch_size 16 for EfficientDet-Do(~20000MB cuda).
At commit #36 , If you use multi-GPU, I have changed .cuda() in loss function and Anchor to .to(input.device). I think it will fix this issues.

I don't understand the explicit way?

qtw1998 · 2019-12-29T11:47:06Z

@toandaominh1997
I used Windows10

have U solve the problem?

Jasper-Bai · 2020-01-06T11:58:23Z

have you solved the out of memory ?

yaoliUoA · 2020-02-21T00:46:52Z

I got the same problem on my Titan rtx

toandaominh1997 added this to InProgress in EfficientDet Dec 18, 2019

qtw1998 mentioned this issue Dec 29, 2019

RuntimeError: CUDA error: out of memory #49

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

6G GPU memory,batch_size=1 with D1 network,still got CUDA out of memory #32

6G GPU memory,batch_size=1 with D1 network,still got CUDA out of memory #32

AlexLuya commented Dec 14, 2019

RayOnFire commented Dec 15, 2019

RayOnFire commented Dec 15, 2019

shengyuqing commented Dec 16, 2019

toandaominh1997 commented Dec 17, 2019

shengyuqing commented Dec 17, 2019

shengyuqing commented Dec 17, 2019

foocker commented Dec 24, 2019 •

edited

Loading

qtw1998 commented Dec 29, 2019

qtw1998 commented Dec 29, 2019

Jasper-Bai commented Jan 6, 2020

yaoliUoA commented Feb 21, 2020

6G GPU memory,batch_size=1 with D1 network,still got CUDA out of memory #32

6G GPU memory,batch_size=1 with D1 network,still got CUDA out of memory #32

Comments

AlexLuya commented Dec 14, 2019

RayOnFire commented Dec 15, 2019

RayOnFire commented Dec 15, 2019

shengyuqing commented Dec 16, 2019

toandaominh1997 commented Dec 17, 2019

shengyuqing commented Dec 17, 2019

shengyuqing commented Dec 17, 2019

foocker commented Dec 24, 2019 • edited Loading

qtw1998 commented Dec 29, 2019

qtw1998 commented Dec 29, 2019

Jasper-Bai commented Jan 6, 2020

yaoliUoA commented Feb 21, 2020

foocker commented Dec 24, 2019 •

edited

Loading