Skip to content

Commit

Permalink
Reduce peak VRAM by releasing large attention tensors (as soon as the…
Browse files Browse the repository at this point in the history
…y're unnecessary) (huggingface#3463)

Release large tensors in attention (as soon as they're no longer required). Reduces peak VRAM by nearly 2 GB for 1024x1024 (even after slicing), and the savings scale up with image size.
  • Loading branch information
cmdr2 committed May 17, 2023
1 parent 86eed86 commit 691d40d
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions models/attention_processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -344,11 +344,14 @@ def get_attention_scores(self, query, key, attention_mask=None):
beta=beta,
alpha=self.scale,
)
del baddbmm_input

if self.upcast_softmax:
attention_scores = attention_scores.float()

attention_probs = attention_scores.softmax(dim=-1)
del attention_scores

attention_probs = attention_probs.to(dtype)

return attention_probs
Expand Down

0 comments on commit 691d40d

Please sign in to comment.