Misaligned beam search result with huggingface #975

Abraham-Xu · 2023-09-07T08:35:35Z

I am that happy that beam search PR (#857) has been merged into main branch. But I tested 2 models ( llama2-7b and baichuan13b-base ) and found that they generated different beam search results with huggingface transformers.

huggingface:
beam_outputs = model.generate(inputs.input_ids, do_sample=False, num_beams=4, num_return_sequences=4, max_new_tokens=1024)

vllm:
sampling_params = SamplingParams(temperature=0.0, use_beam_search=True, n=4, max_tokens=1024)

Is there anything wrong with my configuration? I checked vllm/tests/conftest.py and found no difference in sampling params.

The text was updated successfully, but these errors were encountered:

WoosukKwon · 2023-09-11T15:27:52Z

Hi @Abraham-Xu, thanks for bringing it up. It maybe because the generated outputs are too long. Because beam search uses the cumulative log probabilities of the beam candidates, the cumulated numerical differences due to the use of different kernel implementations can affect the results when the outputs are long. Could you please try it again with smaller max_tokens (say 128)?

cc @zhuohan123

Abraham-Xu · 2023-09-12T02:15:56Z

Hi, @WoosukKwon, thanks for reply. Sure the result is better with smaller max_tokens. But I doubt that the root cause of the error is from different kernel implementations because FasterTransformer also uses different kernel implementations with huggingface but it can produce the same result under FP32 precision & beam_width=4 & max_tokens=1024 which vLLM can not. I recommand to double check the implementation details of beam search.

lichen914 · 2023-09-12T06:34:54Z

i also have the same problem when using codellama and the max_tokens is 256 & best_of=4

lijianxing123 · 2023-10-20T12:19:08Z

i also have the same problem when using vicuna and the max_tokens is 256 & best_of=5

lijianxing123 · 2023-10-27T04:30:07Z

Have anyone solved this problem ? In vllm llama modle, I replaced some functions using hf transfomers funtion like LlamaMLP,qkv_proj except PagedAttentionWithRoPE, The result is different also.

KatIsCoding · 2024-02-22T13:52:33Z

SQLCoder is also affected by this, it is recommended by defog to run the model with beam search >= 4, but it is impossible to run it with that configuration on vllm because the output is always incomplete

github-actions · 2024-10-31T02:00:57Z

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

ronensc mentioned this issue Feb 19, 2024

Fix vllm:prompt_tokens_total metric calculation #2869

Merged

DarkLight1337 added the bug Something isn't working label May 31, 2024

github-actions bot added the stale label Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misaligned beam search result with huggingface #975

Misaligned beam search result with huggingface #975

Abraham-Xu commented Sep 7, 2023

WoosukKwon commented Sep 11, 2023

Abraham-Xu commented Sep 12, 2023 •

edited

Loading

lichen914 commented Sep 12, 2023 •

edited

Loading

lijianxing123 commented Oct 20, 2023

lijianxing123 commented Oct 27, 2023

KatIsCoding commented Feb 22, 2024

github-actions bot commented Oct 31, 2024

Misaligned beam search result with huggingface #975

Misaligned beam search result with huggingface #975

Comments

Abraham-Xu commented Sep 7, 2023

WoosukKwon commented Sep 11, 2023

Abraham-Xu commented Sep 12, 2023 • edited Loading

lichen914 commented Sep 12, 2023 • edited Loading

lijianxing123 commented Oct 20, 2023

lijianxing123 commented Oct 27, 2023

KatIsCoding commented Feb 22, 2024

github-actions bot commented Oct 31, 2024

Abraham-Xu commented Sep 12, 2023 •

edited

Loading

lichen914 commented Sep 12, 2023 •

edited

Loading