-
-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: topk=1 and temperature=0 cause different output in vllm #5404
Comments
Hi, I am also seeing different results for the same prompt even though temperature is set to 0.
Version was just updated to v0.4.3. |
I'm investigating the issue. Verified bug by running examples/offline_inference.py with:
However, bug is present only when adding/removing prompts to/from the input batch. Same behavior is seen across older versions (v0.3.3, v0.4.2, v0.4.3). Selected output for reference:
VS
|
All fields looked as expected when I stepped into the
|
Is there any script that I can use to reproduce this issue? I've been looking into #5607 which appears related, but after some digging it, that bug seems to related to the presence of |
I think #5607 fixed a different issue. After comparing logits before and after temperature scaling, I realized the zero-temperature is erroneously reassigned to 1.0. It should be https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/sampling_metadata.py#L359-L363 |
I think these lines of code are likely to be related to the problem, but whether the temperature should be set to _SAMPLING_EPS remains to be sorted out. I quickly tested this modification and found that the decoding result turned into nonsense output, unfortunately. |
Hello, @rangehow may I ask about which model you are using to produce this bug? Yet I am not able to reproduce this bug when I am using a non-quantized fp16 (bf16) model. |
gemma-2b 😃 |
🐛 Describe the bug
When using different generation configurations, such as top_k=1 or temperature=0 (while keeping other settings unchanged), why do the generated results change? They should both correspond to a deterministic greedy decoding.
vllm 0.4.3
Supplement:
The main issue encountered here is that the results generated by setting the temperature coefficient to 0 or topk to 1 are different. I understand that due to operator optimization and the lack of conventional arithmetic properties in floating-point numbers, matrix operations have a certain randomness. However, the sampling process occurs after the hidden_state is generated, at which point no calculations are involved. Therefore, the sampling results of the two sampling parameters should be the same.
The text was updated successfully, but these errors were encountered: