Aligning `top_p` and `top_k` Sampling #1885

chenxu2048 · 2023-12-01T09:24:43Z

We noticed that there are a little differences in the implementation of top_p and top_k in the vLLM sampler compared to Huggingface's implementation. We have aligned the implementation details of TopPLogitsWarper and TopKLogitsWarper in Huggingface transformers.

1. Sampling Order

In Huggingface transformers and FasterTransformers , top_k is applied first, followed by top_p. In vLLM, it is the opposite. Therefore, when specifying them simultaneously, the probability distribution generated in vLLM may be different.

# top_k = 2
# top_p = 0.5

# top_k first
[ 0.1, 0.2, 0.3, 0.4 ] -> [ -inf, -inf, -inf, 0.4 ] -> [ 0, 0, 0, 1 ]
# top_p first
[ 0.1, 0.2, 0.3, 0.4 ] -> [ -inf, -inf, 0.3, 0.4 ] -> [ 0, 0, 0.475, 0.525 ]

2. Sorting Order

Huggingface transformers top_p use ascending order, while vLLM uses descending order. When the logits of tokens are equal, the chosen token may be inconsistent (torch uses stable sorting).

# top_p = 0.3

# descending
[ 0.2, 0.2, 0.3, 0.3 ] -> [ -inf, -inf, 0.3, -inf ]
# ascending
[ 0.2, 0.2, 0.3, 0.3 ] -> [ -inf, -inf, -inf, 0.3 ]

3. TopK Selection

In Huggingface transformers, top_k selection is based on logits greater than or equal to the k-th largest, not the top_k items.

# top_k = 1

# huggingface
[ 0.1, 0.3, 0.3, 0.3 ] -> [ -inf, 0.3, 0.3, 0.3 ]
# vllm
[ 0.1, 0.3, 0.3, 0.3 ] -> [ -inf, -inf, 0.3, -inf ]

Yard1 · 2023-12-01T18:59:48Z

@chenxu2048 In order to keep parity moving forwards, do you think it would make sense to add a simple unit test comparing outputs of vLLM and HF implementations? Also, we set top_k to be vocab_size-1 by default, is that still going to mean "all possible tokens" with the new implementation?

chenxu2048 · 2023-12-04T02:10:19Z

In order to keep parity moving forwards, do you think it would make sense to add a simple unit test comparing outputs of vLLM and HF implementations?

Ok, we will provide a simple unit test to compare the sampler with HF. Should we add the script into the repo or provide it in
the PR comments?

Also, we set top_k to be vocab_size-1 by default, is that still going to mean "all possible tokens" with the new implementation?

top_k with vocab_size - 1 gather the vocab_size - 1-th element in the logits_sorted, which is the last one. All the top_k_mask would be true in this case. We can run a unit test to check it.

@Yard1 PTAL

Yard1 · 2023-12-04T17:24:23Z

@chenxu2048 please put the test in the repo (tests/samplers would be great), thanks!

zhuohan123 · 2023-12-18T03:09:01Z

@Yard1 Has the issue in this PR been fixed by the refactor #1889?

Yard1 · 2023-12-18T03:09:54Z

No, the refactor did not change the logic, unlike this PR @zhuohan123

chenxu2048 · 2023-12-20T03:21:40Z

No, the refactor did not change the logic, unlike this PR @zhuohan123

Hi, @Yard1 @zhuohan123 I'll rebase my work with tests on #1889 and PR again in this weekend.

chenxu2048 · 2023-12-25T04:50:00Z

Hi, @Yard1 @zhuohan123 I'll rebase my work with tests on #1889 and PR again in this weekend.

This PR is ready for review. PTAL.

chenxu2048 · 2023-12-25T04:52:57Z

It seems that _get_prompt_and_output_tokens is unused in the sampler, and I think we could remove it.

chenxu2048 · 2023-12-27T06:44:57Z

Here are the results of testing on main branch and this PR.
main.test_sampler_top_k_top_p.txt
pr.test_sampler_top_k_top_p.txt

Yard1 · 2024-01-12T21:51:13Z

Thanks!

* Align top_p and top_k with huggingface * remove _get_prompt_and_output_tokens * rename _apply_top_p_top_k * compare top_p top_k with hf * fix test errors

Yard1 mentioned this pull request Dec 1, 2023

Make sampler less blocking #1889

Merged

Align top_p and top_k with huggingface

e1dc47b

chenxu2048 force-pushed the align_top_k_top_p branch from 485dbca to e1dc47b Compare December 24, 2023 14:29

chenxu2048 added 3 commits December 24, 2023 22:37

remove _get_prompt_and_output_tokens

4e1fb8d

rename _apply_top_p_top_k

6bfc124

compare top_p top_k with hf

f96f732

chenxu2048 force-pushed the align_top_k_top_p branch from e4b0614 to 9b1f683 Compare December 25, 2023 04:47

fix test errors

2d0e798

chenxu2048 force-pushed the align_top_k_top_p branch from 9b1f683 to 2d0e798 Compare December 27, 2023 06:41

Yard1 approved these changes Jan 12, 2024

View reviewed changes

Yard1 merged commit 218dc2c into vllm-project:main Jan 12, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aligning `top_p` and `top_k` Sampling #1885

Aligning `top_p` and `top_k` Sampling #1885

chenxu2048 commented Dec 1, 2023

Yard1 commented Dec 1, 2023 •

edited

chenxu2048 commented Dec 4, 2023 •

edited

Yard1 commented Dec 4, 2023

zhuohan123 commented Dec 18, 2023

Yard1 commented Dec 18, 2023

chenxu2048 commented Dec 20, 2023

chenxu2048 commented Dec 25, 2023

chenxu2048 commented Dec 25, 2023 •

edited

chenxu2048 commented Dec 27, 2023

Yard1 commented Jan 12, 2024

Aligning top_p and top_k Sampling #1885

Aligning top_p and top_k Sampling #1885

Conversation

chenxu2048 commented Dec 1, 2023

1. Sampling Order

2. Sorting Order

3. TopK Selection

Yard1 commented Dec 1, 2023 • edited

chenxu2048 commented Dec 4, 2023 • edited

Yard1 commented Dec 4, 2023

zhuohan123 commented Dec 18, 2023

Yard1 commented Dec 18, 2023

chenxu2048 commented Dec 20, 2023

chenxu2048 commented Dec 25, 2023

chenxu2048 commented Dec 25, 2023 • edited

chenxu2048 commented Dec 27, 2023

Yard1 commented Jan 12, 2024

Aligning `top_p` and `top_k` Sampling #1885

Aligning `top_p` and `top_k` Sampling #1885

Yard1 commented Dec 1, 2023 •

edited

chenxu2048 commented Dec 4, 2023 •

edited

chenxu2048 commented Dec 25, 2023 •

edited