Add a 1-line docstring to explain why calling context_attention_fwd twice in test_prefix_prefill.py #2553

JasonZhu1313 · 2024-01-22T19:59:40Z

Initially it was confusing to me why we call it twice repeatedly, later found out it was for warming up the triton kernel, just add 1-linear doc string:

Calling it once (warmup):

triton Time: 15.10 ms
xformers Time: 0.61 ms

Calling it twice (after warmup):
triton Time: 1.95 ms
xformers Time: 0.62 ms

…l.py

…wice in test_prefix_prefill.py (vllm-project#2553)

JasonZhu1313 added 3 commits January 22, 2024 11:51

Remove duplicated call of context_attention_fwd in test_prefix_prefil…

2dbe28c

…l.py

add doc string

12ca94e

Merge branch 'vllm-project:main' into JasonZhu1313/prefix_test_fix

ab03f21

simon-mo approved these changes Jan 22, 2024

View reviewed changes

simon-mo merged commit 7a0b011 into vllm-project:main Jan 22, 2024
16 checks passed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Add a 1-line docstring to explain why calling context_attention_fwd t…

4bdf83f

…wice in test_prefix_prefill.py (vllm-project#2553)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a 1-line docstring to explain why calling context_attention_fwd twice in test_prefix_prefill.py #2553

Add a 1-line docstring to explain why calling context_attention_fwd twice in test_prefix_prefill.py #2553

JasonZhu1313 commented Jan 22, 2024

Add a 1-line docstring to explain why calling context_attention_fwd twice in test_prefix_prefill.py #2553

Add a 1-line docstring to explain why calling context_attention_fwd twice in test_prefix_prefill.py #2553

Conversation

JasonZhu1313 commented Jan 22, 2024