You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The results in the paper look promising and I'd like to try out your work on my system. However, I can't get the example to run on my 8GB GPU:
CUDA memory: 5630853120
Seed: 42
0%| | 0/51 [00:00<?, ?it/s]
Traceback (most recent call last):
File "run.py", line 90, in <module>
main()
File "/home/cs673/cs673/attend/lib/python3.8/site-packages/pyrallis/argparsing.py", line 158, in wrapper_inner
response = fn(cfg, *args, **kwargs)
File "run.py", line 73, in main
image = run_on_prompt(prompt=config.prompt,
File "run.py", line 44, in run_on_prompt
outputs = model(prompt=prompt,
File "/home/cs673/cs673/attend/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/cs673/cs673/Attend-and-Excite/pipeline_attend_and_excite.py", line 205, in __call__
noise_pred_text = self.unet(latents, t, encoder_hidden_states=text_embeddings[1].unsqueeze(0)).sample
File "/home/cs673/cs673/attend/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cs673/cs673/attend/lib/python3.8/site-packages/diffusers/models/unet_2d_condition.py", line 234, in forward
sample, res_samples = downsample_block(
File "/home/cs673/cs673/attend/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cs673/cs673/attend/lib/python3.8/site-packages/diffusers/models/unet_blocks.py", line 537, in forward
hidden_states = attn(hidden_states, context=encoder_hidden_states)
File "/home/cs673/cs673/attend/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cs673/cs673/attend/lib/python3.8/site-packages/diffusers/models/attention.py", line 148, in forward
x = block(x, context=context)
File "/home/cs673/cs673/attend/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cs673/cs673/attend/lib/python3.8/site-packages/diffusers/models/attention.py", line 197, in forward
x = self.attn1(self.norm1(x)) + x
File "/home/cs673/cs673/attend/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cs673/cs673/Attend-and-Excite/utils/ptp_utils.py", line 71, in forward
sim = torch.einsum("b i d, b j d -> b i j", q, k) * self.scale
File "/home/cs673/cs673/attend/lib/python3.8/site-packages/torch/functional.py", line 360, in einsum
return _VF.einsum(equation, operands) # type: ignore[attr-defined]
RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 7.79 GiB total capacity; 5.89 GiB already allocated; 386.25 MiB free; 6.02 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I've set max_split_size_mb to 128 MB and it seems that setting isn't being respected.
I've also tried setting torch_dtype=torch.float16 when loading the model. This doesn't have the out-of-memory error and instead gives:
CUDA memory: 2824863744
Seed: 42
0%| | 0/51 [00:00<?, ?it/s]
Traceback (most recent call last):
File "run.py", line 90, in <module>
main()
File "/home/cs673/cs673/attend/lib/python3.8/site-packages/pyrallis/argparsing.py", line 158, in wrapper_inner
response = fn(cfg, *args, **kwargs)
File "run.py", line 73, in main
image = run_on_prompt(prompt=config.prompt,
File "run.py", line 44, in run_on_prompt
outputs = model(prompt=prompt,
File "/home/cs673/cs673/attend/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/cs673/cs673/Attend-and-Excite/pipeline_attend_and_excite.py", line 205, in __call__
noise_pred_text = self.unet(latents, t, encoder_hidden_states=text_embeddings[1].unsqueeze(0)).sample
File "/home/cs673/cs673/attend/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cs673/cs673/attend/lib/python3.8/site-packages/diffusers/models/unet_2d_condition.py", line 225, in forward
emb = self.time_embedding(t_emb)
File "/home/cs673/cs673/attend/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cs673/cs673/attend/lib/python3.8/site-packages/diffusers/models/embeddings.py", line 73, in forward
sample = self.linear_1(sample)
File "/home/cs673/cs673/attend/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cs673/cs673/attend/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: expected scalar type Half but found Float
Is it possible to get this running on an 8GB GPU?
The text was updated successfully, but these errors were encountered:
Hi, currently you will need around 15GB to run. We have an open PR that will allow you to run with 12GB, but this is not merged yet. In any case, 8GB will not be enough to run even with float16.
Hi,
The results in the paper look promising and I'd like to try out your work on my system. However, I can't get the example to run on my 8GB GPU:
I've set
max_split_size_mb
to 128 MB and it seems that setting isn't being respected.I've also tried setting
torch_dtype=torch.float16
when loading the model. This doesn't have the out-of-memory error and instead gives:Is it possible to get this running on an 8GB GPU?
The text was updated successfully, but these errors were encountered: