Skip to content

Conversation

@Barry-Delaney
Copy link
Contributor

This PR added cp_gather_indexer_k_quant_cache for getting quantized k/k_scale from indexer k cache.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new CUDA kernel cp_gather_indexer_k_quant_cache for gathering quantized K-cache data. My review has identified some critical issues. There's a significant bug in the CUDA kernel's grid launch configuration that will lead to incorrect data gathering for longer sequences. Additionally, there are several const correctness issues and incorrect mutability annotations in the C++ code and PyTorch bindings, which misrepresent the function's contract and can cause subtle bugs. I've provided suggestions to fix these issues.

@Barry-Delaney Barry-Delaney force-pushed the indexer_gather_kernel branch 3 times, most recently from d6b44e5 to b3c47a9 Compare September 30, 2025 19:48
@simon-mo
Copy link
Collaborator

Can you integrate it and test the performance? cc @zyongye

@mergify
Copy link

mergify bot commented Oct 3, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Barry-Delaney.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 3, 2025
@zyongye
Copy link
Member

zyongye commented Oct 6, 2025

I am testing this branch.
I replace the line here to

ops.cp_gather_indexer_k_quant_cache(
                kv_cache,
                k_fp8,
                k_scale,
                prefill_metadata.block_table,
                prefill_metadata.cu_seq_lens,
            )

And getting 0.58 on gsm8k 20 shots. I did rebase with the main branch. Am I doing anything wrong?

@Barry-Delaney Barry-Delaney force-pushed the indexer_gather_kernel branch from b3c47a9 to 5f7b7fe Compare October 7, 2025 06:07
@mergify mergify bot removed the needs-rebase label Oct 7, 2025
@Barry-Delaney
Copy link
Contributor Author

I fixed the bug in the latest force push. Could you pls help on verifying again? @zyongye Thanks in advance!

Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
@Barry-Delaney Barry-Delaney force-pushed the indexer_gather_kernel branch from 38b409e to bd88b9b Compare October 7, 2025 08:47
@zyongye
Copy link
Member

zyongye commented Oct 7, 2025

Retest with the new branch. Get a good result in gsm8k and gsm8k 20 shots

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9447|±  |0.0063|
|     |       |strict-match    |     5|exact_match|↑  |0.9469|±  |0.0062|

20 shots

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|    20|exact_match|↑  |0.9431|±  |0.0064|
|     |       |strict-match    |    20|exact_match|↑  |0.9439|±  |0.0063|

@heheda12345
Copy link
Collaborator

@zyongye I think we can merge this PR first, and then you open a new PR for kernel integration.

Copy link
Collaborator

@heheda12345 heheda12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this important kernel!

@heheda12345 heheda12345 enabled auto-merge (squash) October 7, 2025 15:44
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 7, 2025
Copy link
Member

@zyongye zyongye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe need to add this to pass rocm build

simon-mo and others added 4 commits October 7, 2025 12:45
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
@heheda12345 heheda12345 disabled auto-merge October 8, 2025 04:20
@heheda12345 heheda12345 enabled auto-merge (squash) October 8, 2025 04:21
@heheda12345 heheda12345 merged commit 127c8b7 into vllm-project:main Oct 8, 2025
85 checks passed
mrasquinha-g pushed a commit to mrasquinha-g/vllm that referenced this pull request Oct 9, 2025
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants