-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate nothing from VLLM output #1185
Comments
@FocusLiwen can you add some more detail like how you are using inferencing , sampling params & what is your request |
Hi, I used tensor_parallel_size as 2 with seed=0 and the following parameters: "gold": {"text": "Sports", "supplements": {}}, |
In Huggingface generation api, there is a parameter called: min_gen_len, which could be set as 1 to avoid zero output. But with vLLM, there is no such parameters. |
Does this happen if you increase the temperature to 1e-3 or 1e-2? |
We ran into the same problem (no output). Being able to set a minimal length for the generated text will be very helpful. |
The Mixtral AWQ vllm example gives empty output (with temperature 0,0.5,1.0 or default sampling parameters). |
same, have you solve this? |
This is the same for me as well. It says 1024 completion_tokens, but the content is blank. The dolphin version seems to work TheBloke/dolphin-2.6-mixtral-8x7b-AWQ. |
Seeing the same thing. Could it be a problem with the model itself? Wonder if TheBloke's GPTQ version works. |
It is likely your weights are corrupted |
same problem, has it been resolved? |
Same issue there! |
Same problem. Some of my generation outputs are empty with Mixtral-8x7B-Instruct |
same here with mistral the output is empty |
I ecountered the same issue, empty string text output, with the TheBloke Mixtral AWQ with vllm and 2 loaders from Oobaboogas Web UI as well. However, ybelkada/Mixtral-8x7B-Instruct-v0.1-AWQ worked on both vLLM and the WebUI for me. I'm still not 100% sure its a faulty model, so I'd be happy If one of you can confirm (or deny) this with your setup. |
In my case using Mistral Instruct model, input with proper template and setting max_tokens from SamplingParams helps. |
Are there any news on this. I also get empty strings when running in batches returning the empty string for some reason. If using batch size of one this never happens. Any update on this would be great. |
Same issue! |
im facing the same issue with llama 3 8b on a 48 gb vram gpu while using the outlines library to enforce json responses, the fields are empty, even though there is plenty of ram left and the model is loaded on the gpu completely. |
same problem, when the prompt is run for the first time it generates normally, if run again the same prompt returns an empty response. Model: phi3 medium EDIT: SamplingParams(temperature=0.5, min_tokens=1000) |
When I run batch inferences, sometimes, the output from vLLM is empty, meaning prediction is empty. Could we make it at least it generate one token? The output is empty is also strange.
The text was updated successfully, but these errors were encountered: