Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aligning top_p and top_k Sampling #1885

Merged
merged 5 commits into from
Jan 12, 2024

Conversation

chenxu2048
Copy link
Contributor

We noticed that there are a little differences in the implementation of top_p and top_k in the vLLM sampler compared to Huggingface's implementation. We have aligned the implementation details of TopPLogitsWarper and TopKLogitsWarper in Huggingface transformers.

1. Sampling Order

In Huggingface transformers and FasterTransformers , top_k is applied first, followed by top_p. In vLLM, it is the opposite. Therefore, when specifying them simultaneously, the probability distribution generated in vLLM may be different.

# top_k = 2
# top_p = 0.5

# top_k first
[ 0.1, 0.2, 0.3, 0.4 ] -> [ -inf, -inf, -inf, 0.4 ] -> [ 0, 0, 0, 1 ]
# top_p first
[ 0.1, 0.2, 0.3, 0.4 ] -> [ -inf, -inf, 0.3, 0.4 ] -> [ 0, 0, 0.475, 0.525 ]

2. Sorting Order

Huggingface transformers top_p use ascending order, while vLLM uses descending order. When the logits of tokens are equal, the chosen token may be inconsistent (torch uses stable sorting).

# top_p = 0.3

# descending
[ 0.2, 0.2, 0.3, 0.3 ] -> [ -inf, -inf, 0.3, -inf ]
# ascending
[ 0.2, 0.2, 0.3, 0.3 ] -> [ -inf, -inf, -inf, 0.3 ]

3. TopK Selection

In Huggingface transformers, top_k selection is based on logits greater than or equal to the k-th largest, not the top_k items.

# top_k = 1

# huggingface
[ 0.1, 0.3, 0.3, 0.3 ] -> [ -inf, 0.3, 0.3, 0.3 ]
# vllm
[ 0.1, 0.3, 0.3, 0.3 ] -> [ -inf, -inf, 0.3, -inf ]

@Yard1
Copy link
Collaborator

Yard1 commented Dec 1, 2023

@chenxu2048 In order to keep parity moving forwards, do you think it would make sense to add a simple unit test comparing outputs of vLLM and HF implementations? Also, we set top_k to be vocab_size-1 by default, is that still going to mean "all possible tokens" with the new implementation?

@Yard1 Yard1 mentioned this pull request Dec 1, 2023
@chenxu2048
Copy link
Contributor Author

chenxu2048 commented Dec 4, 2023

In order to keep parity moving forwards, do you think it would make sense to add a simple unit test comparing outputs of vLLM and HF implementations?

Ok, we will provide a simple unit test to compare the sampler with HF. Should we add the script into the repo or provide it in
the PR comments?

Also, we set top_k to be vocab_size-1 by default, is that still going to mean "all possible tokens" with the new implementation?

top_k with vocab_size - 1 gather the vocab_size - 1-th element in the logits_sorted, which is the last one. All the top_k_mask would be true in this case. We can run a unit test to check it.

@Yard1 PTAL

@Yard1
Copy link
Collaborator

Yard1 commented Dec 4, 2023

@chenxu2048 please put the test in the repo (tests/samplers would be great), thanks!

@zhuohan123
Copy link
Collaborator

@Yard1 Has the issue in this PR been fixed by the refactor #1889?

@Yard1
Copy link
Collaborator

Yard1 commented Dec 18, 2023

No, the refactor did not change the logic, unlike this PR @zhuohan123

@chenxu2048
Copy link
Contributor Author

No, the refactor did not change the logic, unlike this PR @zhuohan123

Hi, @Yard1 @zhuohan123 I'll rebase my work with tests on #1889 and PR again in this weekend.

@chenxu2048
Copy link
Contributor Author

Hi, @Yard1 @zhuohan123 I'll rebase my work with tests on #1889 and PR again in this weekend.

This PR is ready for review. PTAL.

@chenxu2048
Copy link
Contributor Author

chenxu2048 commented Dec 25, 2023

It seems that _get_prompt_and_output_tokens is unused in the sampler, and I think we could remove it.

@chenxu2048
Copy link
Contributor Author

Here are the results of testing on main branch and this PR.
main.test_sampler_top_k_top_p.txt
pr.test_sampler_top_k_top_p.txt

@Yard1 Yard1 merged commit 218dc2c into vllm-project:main Jan 12, 2024
2 checks passed
@Yard1
Copy link
Collaborator

Yard1 commented Jan 12, 2024

Thanks!

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jan 14, 2024
* Align top_p and top_k with huggingface

* remove _get_prompt_and_output_tokens

* rename _apply_top_p_top_k

* compare top_p top_k with hf

* fix test errors
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Jan 18, 2024
* Align top_p and top_k with huggingface

* remove _get_prompt_and_output_tokens

* rename _apply_top_p_top_k

* compare top_p top_k with hf

* fix test errors
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
* Align top_p and top_k with huggingface

* remove _get_prompt_and_output_tokens

* rename _apply_top_p_top_k

* compare top_p top_k with hf

* fix test errors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants