-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aligning top_p
and top_k
Sampling
#1885
Conversation
@chenxu2048 In order to keep parity moving forwards, do you think it would make sense to add a simple unit test comparing outputs of vLLM and HF implementations? Also, we set top_k to be |
Ok, we will provide a simple unit test to compare the sampler with HF. Should we add the script into the repo or provide it in
@Yard1 PTAL |
@chenxu2048 please put the test in the repo (tests/samplers would be great), thanks! |
No, the refactor did not change the logic, unlike this PR @zhuohan123 |
Hi, @Yard1 @zhuohan123 I'll rebase my work with tests on #1889 and PR again in this weekend. |
485dbca
to
e1dc47b
Compare
e4b0614
to
9b1f683
Compare
This PR is ready for review. PTAL. |
It seems that |
9b1f683
to
2d0e798
Compare
Here are the results of testing on main branch and this PR. |
Thanks! |
* Align top_p and top_k with huggingface * remove _get_prompt_and_output_tokens * rename _apply_top_p_top_k * compare top_p top_k with hf * fix test errors
* Align top_p and top_k with huggingface * remove _get_prompt_and_output_tokens * rename _apply_top_p_top_k * compare top_p top_k with hf * fix test errors
* Align top_p and top_k with huggingface * remove _get_prompt_and_output_tokens * rename _apply_top_p_top_k * compare top_p top_k with hf * fix test errors
We noticed that there are a little differences in the implementation of
top_p
andtop_k
in the vLLM sampler compared to Huggingface's implementation. We have aligned the implementation details ofTopPLogitsWarper
andTopKLogitsWarper
in Huggingface transformers.1. Sampling Order
In Huggingface transformers and FasterTransformers , top_k is applied first, followed by top_p. In vLLM, it is the opposite. Therefore, when specifying them simultaneously, the probability distribution generated in vLLM may be different.
2. Sorting Order
Huggingface transformers top_p use ascending order, while vLLM uses descending order. When the logits of tokens are equal, the chosen token may be inconsistent (torch uses stable sorting).
3. TopK Selection
In Huggingface transformers, top_k selection is based on logits greater than or equal to the k-th largest, not the top_k items.