Skip to content

Conversation

@njhill
Copy link
Member

@njhill njhill commented Nov 22, 2025

No description provided.

@mergify mergify bot added the v1 label Nov 22, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for logprobs with speculative decoding and asynchronous scheduling. The changes involve refactoring how cumulative token counts are calculated and passed to correctly process logprobs in these scenarios. The modifications in vllm/v1/sample/rejection_sampler.py and vllm/v1/worker/gpu_model_runner.py seem correct and well-structured. New tests are added to cover these cases. My main concern is the significant increase in tolerance in tests/v1/sample/test_logprobs.py for comparing logprobs, which might hide numerical precision issues. Please see the specific comment for details.

@njhill njhill requested a review from benchislett November 22, 2025 05:13
@mergify
Copy link

mergify bot commented Nov 22, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @njhill.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 22, 2025
Signed-off-by: Nick Hill <nhill@redhat.com>
cu_num_tokens = None
if return_cu_num_tokens:
cu_num_tokens = [0] + valid_mask.sum(axis=1).cumsum().tolist()
if len(discard_req_indices) > 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this done after computing cu_num_tokens?

Copy link
Member Author

@njhill njhill Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because cu_num_tokens is used to index into the logprobs tensors that don't take the discarded indices into account, so doing it beforehand results in incorrect output.

Originally it was done before which was a bug, fixed by #29216.

Copy link
Collaborator

@benchislett benchislett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall

@njhill njhill merged commit 4e57c65 into vllm-project:main Nov 25, 2025
49 checks passed
@njhill njhill deleted the as-sd-logprobs branch November 25, 2025 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants