-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Misaligned beam search result with huggingface #975
Comments
Hi @Abraham-Xu, thanks for bringing it up. It maybe because the generated outputs are too long. Because beam search uses the cumulative log probabilities of the beam candidates, the cumulated numerical differences due to the use of different kernel implementations can affect the results when the outputs are long. Could you please try it again with smaller cc @zhuohan123 |
Hi, @WoosukKwon, thanks for reply. Sure the result is better with smaller max_tokens. But I doubt that the root cause of the error is from different kernel implementations because FasterTransformer also uses different kernel implementations with huggingface but it can produce the same result under FP32 precision & beam_width=4 & max_tokens=1024 which vLLM can not. I recommand to double check the implementation details of beam search. |
i also have the same problem when using codellama and the max_tokens is 256 & best_of=4 |
i also have the same problem when using vicuna and the max_tokens is 256 & best_of=5 |
Have anyone solved this problem ? In vllm llama modle, I replaced some functions using hf transfomers funtion like LlamaMLP,qkv_proj except PagedAttentionWithRoPE, The result is different also. |
SQLCoder is also affected by this, it is recommended by defog to run the model with beam search >= 4, but it is impossible to run it with that configuration on vllm because the output is always incomplete |
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
I am that happy that beam search PR (#857) has been merged into main branch. But I tested 2 models ( llama2-7b and baichuan13b-base ) and found that they generated different beam search results with huggingface transformers.
huggingface:
beam_outputs = model.generate(inputs.input_ids, do_sample=False, num_beams=4, num_return_sequences=4, max_new_tokens=1024)
vllm:
sampling_params = SamplingParams(temperature=0.0, use_beam_search=True, n=4, max_tokens=1024)
Is there anything wrong with my configuration? I checked vllm/tests/conftest.py and found no difference in sampling params.
The text was updated successfully, but these errors were encountered: