Reduce peak VRAM by releasing large attention tensors (as soon as the…

…y're unnecessary) (huggingface#3463) Release large tensors in attention (as soon as they're no longer required). Reduces peak VRAM by nearly 2 GB for 1024x1024 (even after slicing), and the savings scale up with image size.
yoonseokjin · May 17, 2023 · 691d40d · 691d40d
1 parent 86eed86
commit 691d40d
Showing 1 changed file with 3 additions and 0 deletions.
diff --git a/models/attention_processor.py b/models/attention_processor.py
@@ -344,11 +344,14 @@ def get_attention_scores(self, query, key, attention_mask=None):
             beta=beta,
             alpha=self.scale,
         )
+        del baddbmm_input
 
         if self.upcast_softmax:
             attention_scores = attention_scores.float()
 
         attention_probs = attention_scores.softmax(dim=-1)
+        del attention_scores
+
         attention_probs = attention_probs.to(dtype)
 
         return attention_probs