Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA out of memory. #144

Open
prashanth31 opened this issue Mar 20, 2021 · 6 comments
Open

RuntimeError: CUDA out of memory. #144

prashanth31 opened this issue Mar 20, 2021 · 6 comments

Comments

@prashanth31
Copy link

Can someone help me how to solve the "CUDA out of memory" error ? I think it has to do something with reducing the batch size but I am not sure where in the code I can do that. Here is the full error message

Traceback (most recent call last):
File "main_train.py", line 29, in
train(opt, Gs, Zs, reals, NoiseAmp)
File "c:\Projects\PK\Phd\Paper4_GAN\SinGAN-master\SinGAN\training.py", line 39, in train
z_curr,in_s,G_curr = train_single_scale(D_curr,G_curr,reals,Gs,Zs,in_s,NoiseAmp,opt)
File "c:\Projects\PK\Phd\Paper4_GAN\SinGAN-master\SinGAN\training.py", line 162, in train_single_scale
gradient_penalty.backward()
File "c:\ProgramData\Anaconda3\envs\torch\lib\site-packages\torch\tensor.py", line 195, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "c:\ProgramData\Anaconda3\envs\torch\lib\site-packages\torch\autograd_init_.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 2.00 GiB total capacity; 1.16 GiB already allocated; 18.86 MiB free; 1.28 GiB reserved in total by PyTorch)

@prashanth31
Copy link
Author

I was able to train my network by using the CPU instead of the GPU. It took a lot longer but at least it got the job done.

@ankuroo
Copy link

ankuroo commented Apr 7, 2021

hey @prashanth31, I was wondering, how did you get it to run on the GPU? what's the command I should use?

@prashanth31
Copy link
Author

prashanth31 commented Apr 7, 2021 via email

@metaphorz
Copy link

I had a similar issue. I was processing a 1024 pixel image (-max_size = 1024) and at about Scale 11, it crashed with the CUDA memory error. I have gone back to 512. The compute node being used is: https://www.nvidia.com/en-gb/geforce/graphics-cards/geforce-gtx-1080-ti/specifications/

@vuhungtvt2018
Copy link

@metaphorz How did you go back 512 and where is code for fix?
please! than you

@metaphorz
Copy link

This is so long ago I've forgotten. Been using Stable Diffusion through A1111 for most software runs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants