Fixes the incorrect argument in the prefix-prefill test cases #3246

sighingnow · 2024-03-07T03:01:12Z

See also comment in #3007

Signed-off-by: Tao He <sighingnow@gmail.com>

zhuohan123

LGTM! Thanks for the fix!

zhuohan123 · 2024-03-07T07:20:53Z

tests/kernels/test_prefix_prefill.py

+    # Need this, otherwise when we capture the graph the process for GPU 1 would run on both
+    # GPU0 and GPU1 and things would hang
+    #
+    # see also similar issue: https://github.com/Dao-AILab/flash-attention/issues/523
+    torch.cuda.set_device(device)


I am confused why do we need this? Can you give a more detailed example?

There would be an error if we run the test case in environments with 2 GPU card, the test case test_contexted_kv_attention[cuda:0-dtype0-128-64-64] passed, but when run
test_contexted_kv_attention[cuda:1-dtype0-128-64-64] (note now it uses cuda:1), it failed and complains:

bin = self.cache[device][key] if not warmup: > bin.c_wrapper( grid_0, grid_1, grid_2, bin.num_warps, bin.num_ctas, bin.clusterDims[0], bin.clusterDims[1], bin.clusterDims[2], bin.shared, stream, bin.cu_function, CompiledKernel.launch_enter_hook, CompiledKernel.launch_exit_hook, bin, *bin.assemble_tensormap_to_arg(non_constexpr_arg_values), ) E ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?) /usr/local/lib/python3.10/dist-packages/triton/runtime/jit.py:550: ValueError

I'm not very clear about the root causes, but I found the same issue report in flash-attention and the fix from here: Dao-AILab/flash-attention#523 (comment), and confirmed it works.

sighingnow · 2024-03-11T02:35:25Z

Hi @zhuohan123 any further comments on this patch?

Thanks!

sighingnow · 2024-03-16T03:52:45Z

Hi @zhuohan123 @simon-mo, could you please take another look at this PR?

Thanks!

Fixes the incorrect argument in the prefix-prefill test cases

bfcf4e0

Signed-off-by: Tao He <sighingnow@gmail.com>

sighingnow mentioned this pull request Mar 7, 2024

Enables GQA support in the prefix prefill kernels #3007

Merged

zhuohan123 approved these changes Mar 7, 2024

View reviewed changes

simon-mo merged commit 3123f15 into vllm-project:main Mar 16, 2024
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes the incorrect argument in the prefix-prefill test cases #3246

Fixes the incorrect argument in the prefix-prefill test cases #3246

sighingnow commented Mar 7, 2024

zhuohan123 left a comment

zhuohan123 Mar 7, 2024

sighingnow Mar 7, 2024

sighingnow commented Mar 11, 2024

sighingnow commented Mar 16, 2024

Fixes the incorrect argument in the prefix-prefill test cases #3246

Fixes the incorrect argument in the prefix-prefill test cases #3246

Conversation

sighingnow commented Mar 7, 2024

zhuohan123 left a comment

Choose a reason for hiding this comment

zhuohan123 Mar 7, 2024

Choose a reason for hiding this comment

sighingnow Mar 7, 2024

Choose a reason for hiding this comment

sighingnow commented Mar 11, 2024

sighingnow commented Mar 16, 2024