Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How much GPU memory is needed? #2

Open
llstela opened this issue May 11, 2024 · 7 comments
Open

How much GPU memory is needed? #2

llstela opened this issue May 11, 2024 · 7 comments

Comments

@llstela
Copy link

llstela commented May 11, 2024

I tried 32GB and 24GB GPU to run your demo code but all failed with CUDA out of memory.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 GiB (GPU 0; 31.74 GiB total capacity; 21.30 GiB already allocated; 9.10 GiB free; 21.46 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@xavihart
Copy link
Owner

i used ~20G

@xavihart
Copy link
Owner

can you offer the complete output from the terminal?

@llstela
Copy link
Author

llstela commented May 13, 2024

can you offer the complete output from the terminal?

This is the output I tried on 3090 (24GB):

(base) root@a83b401f11b6:/gdata/cold1/shengxuhan/codes/AIGC/PDM-Pure# python pdm_pure.py --image demo/advdm/original.png --save_path demo/advdm/ --device 1  
FORCE_MEM_EFFICIENT_ATTN= 0 @UNET:QKVATTENTION
/opt/conda/lib/python3.9/site-packages/huggingface_hub/file_download.py:1125: FutureWarning: The `force_filename` parameter is deprecated as a new caching system, which keeps the filenames as they are on the Hub, is now in place.
  warnings.warn(
/opt/conda/lib/python3.9/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Keyword arguments {'token': None} are not expected by StableDiffusionUpscalePipeline and will be ignored.
Begin to purify demo/advdm/original.png----------
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:18<00:00,  2.65it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:15<00:00,  3.30it/s]
Traceback (most recent call last):
  File "/gdata/cold1/shengxuhan/codes/AIGC/PDM-Pure/pdm_pure.py", line 65, in <module>
    main()
  File "/gdata/cold1/shengxuhan/codes/AIGC/PDM-Pure/pdm_pure.py", line 41, in main
    result = style_transfer(
  File "/opt/conda/lib/python3.9/site-packages/deepfloyd_if/pipelines/style_transfer.py", line 123, in style_transfer
    _stageIII_generations, _meta = if_III.embeddings_to_image(**if_III_kwargs)
  File "/opt/conda/lib/python3.9/site-packages/deepfloyd_if/modules/stage_III_sd_x4.py", line 80, in embeddings_to_image
    images = self.model(**metadata).images
  File "/opt/conda/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_upscale.py", line 727, in __call__
    image = self.vae.decode(latents).sample
  File "/opt/conda/lib/python3.9/site-packages/diffusers/models/autoencoder_kl.py", line 191, in decode
    decoded = self._decode(z).sample
  File "/opt/conda/lib/python3.9/site-packages/diffusers/models/autoencoder_kl.py", line 178, in _decode
    dec = self.decoder(z)
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/diffusers/models/vae.py", line 233, in forward
    sample = self.mid_block(sample)
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/diffusers/models/unet_2d_blocks.py", line 463, in forward
    hidden_states = attn(hidden_states)
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/diffusers/models/attention.py", line 168, in forward
    torch.empty(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB (GPU 1; 23.69 GiB total capacity; 13.30 GiB already allocated; 7.71 GiB free; 15.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@yuangan
Copy link

yuangan commented Sep 2, 2024

Hi, thank you for your excellent work. I'm facing a similar issue—running out of memory with the A40 GPU, which has 46GB of memory, when executing pdm_pure.py at a 512x512 resolution. Do you have any suggestions?

Update: Solved by using xformers: FORCE_MEM_EFFICIENT_ATTN=1 python xxx

@llstela
Copy link
Author

llstela commented Sep 7, 2024

Hi, thank you for your excellent work. I'm facing a similar issue—running out of memory with the A40 GPU, which has 46GB of memory, when executing pdm_pure.py at a 512x512 resolution. Do you have any suggestions?

Update: Solved by using xformers: FORCE_MEM_EFFICIENT_ATTN=1 python xxx

still not solved. I have given up.

@xavihart
Copy link
Owner

xavihart commented Sep 8, 2024

i think you need to use effective attn, otherwise it will too costful

@yuangan
Copy link

yuangan commented Sep 8, 2024

Hi, thank you for your excellent work. I'm facing a similar issue—running out of memory with the A40 GPU, which has 46GB of memory, when executing pdm_pure.py at a 512x512 resolution. Do you have any suggestions?
Update: Solved by using xformers: FORCE_MEM_EFFICIENT_ATTN=1 python xxx

still not solved. I have given up.

pip install xformers==0.0.16 is needed. You can find it in DeepFloyd IF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants