Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run on 8GB GPU? #15

Closed
tylersweat opened this issue Feb 7, 2023 · 1 comment
Closed

Run on 8GB GPU? #15

tylersweat opened this issue Feb 7, 2023 · 1 comment

Comments

@tylersweat
Copy link

Hi,

The results in the paper look promising and I'd like to try out your work on my system. However, I can't get the example to run on my 8GB GPU:

CUDA memory: 5630853120
Seed: 42
  0%|                                                                                                                                                                                | 0/51 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "run.py", line 90, in <module>
    main()
  File "/home/cs673/cs673/attend/lib/python3.8/site-packages/pyrallis/argparsing.py", line 158, in wrapper_inner
    response = fn(cfg, *args, **kwargs)
  File "run.py", line 73, in main
    image = run_on_prompt(prompt=config.prompt,
  File "run.py", line 44, in run_on_prompt
    outputs = model(prompt=prompt,
  File "/home/cs673/cs673/attend/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/cs673/cs673/Attend-and-Excite/pipeline_attend_and_excite.py", line 205, in __call__
    noise_pred_text = self.unet(latents, t, encoder_hidden_states=text_embeddings[1].unsqueeze(0)).sample
  File "/home/cs673/cs673/attend/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cs673/cs673/attend/lib/python3.8/site-packages/diffusers/models/unet_2d_condition.py", line 234, in forward
    sample, res_samples = downsample_block(
  File "/home/cs673/cs673/attend/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cs673/cs673/attend/lib/python3.8/site-packages/diffusers/models/unet_blocks.py", line 537, in forward
    hidden_states = attn(hidden_states, context=encoder_hidden_states)
  File "/home/cs673/cs673/attend/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cs673/cs673/attend/lib/python3.8/site-packages/diffusers/models/attention.py", line 148, in forward
    x = block(x, context=context)
  File "/home/cs673/cs673/attend/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cs673/cs673/attend/lib/python3.8/site-packages/diffusers/models/attention.py", line 197, in forward
    x = self.attn1(self.norm1(x)) + x
  File "/home/cs673/cs673/attend/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cs673/cs673/Attend-and-Excite/utils/ptp_utils.py", line 71, in forward
    sim = torch.einsum("b i d, b j d -> b i j", q, k) * self.scale
  File "/home/cs673/cs673/attend/lib/python3.8/site-packages/torch/functional.py", line 360, in einsum
    return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 7.79 GiB total capacity; 5.89 GiB already allocated; 386.25 MiB free; 6.02 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I've set max_split_size_mb to 128 MB and it seems that setting isn't being respected.

I've also tried setting torch_dtype=torch.float16 when loading the model. This doesn't have the out-of-memory error and instead gives:

CUDA memory: 2824863744
Seed: 42
  0%|                                                                                                                                                                                | 0/51 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "run.py", line 90, in <module>
    main()
  File "/home/cs673/cs673/attend/lib/python3.8/site-packages/pyrallis/argparsing.py", line 158, in wrapper_inner
    response = fn(cfg, *args, **kwargs)
  File "run.py", line 73, in main
    image = run_on_prompt(prompt=config.prompt,
  File "run.py", line 44, in run_on_prompt
    outputs = model(prompt=prompt,
  File "/home/cs673/cs673/attend/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/cs673/cs673/Attend-and-Excite/pipeline_attend_and_excite.py", line 205, in __call__
    noise_pred_text = self.unet(latents, t, encoder_hidden_states=text_embeddings[1].unsqueeze(0)).sample
  File "/home/cs673/cs673/attend/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cs673/cs673/attend/lib/python3.8/site-packages/diffusers/models/unet_2d_condition.py", line 225, in forward
    emb = self.time_embedding(t_emb)
  File "/home/cs673/cs673/attend/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cs673/cs673/attend/lib/python3.8/site-packages/diffusers/models/embeddings.py", line 73, in forward
    sample = self.linear_1(sample)
  File "/home/cs673/cs673/attend/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cs673/cs673/attend/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: expected scalar type Half but found Float

Is it possible to get this running on an 8GB GPU?

@AttendAndExcite
Copy link
Collaborator

Hi, currently you will need around 15GB to run. We have an open PR that will allow you to run with 12GB, but this is not merged yet. In any case, 8GB will not be enough to run even with float16.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants