Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate nothing from VLLM output #1185

Open
FocusLiwen opened this issue Sep 26, 2023 · 20 comments
Open

Generate nothing from VLLM output #1185

FocusLiwen opened this issue Sep 26, 2023 · 20 comments

Comments

@FocusLiwen
Copy link

When I run batch inferences, sometimes, the output from vLLM is empty, meaning prediction is empty. Could we make it at least it generate one token? The output is empty is also strange.

@AnupKumarJha
Copy link

@FocusLiwen can you add some more detail like how you are using inferencing , sampling params & what is your request

@FocusLiwen
Copy link
Author

FocusLiwen commented Sep 26, 2023

Hi, I used tensor_parallel_size as 2 with seed=0 and the following parameters:
"max_tokens": 128,
"temperature": 0,
"top_p": 1.0,
"top_k": -1
I also extract the output from the generation call:

"gold": {"text": "Sports", "supplements": {}},
"predictions": [{"text": "", "raw_text": "", "logprob": 0, "tokens": []}]}

@FocusLiwen
Copy link
Author

In Huggingface generation api, there is a parameter called: min_gen_len, which could be set as 1 to avoid zero output. But with vLLM, there is no such parameters.

@viktor-ferenczi
Copy link
Contributor

Does this happen if you increase the temperature to 1e-3 or 1e-2?

@summer66
Copy link

summer66 commented Oct 4, 2023

In Huggingface generation api, there is a parameter called: min_gen_len, which could be set as 1 to avoid zero output. But with vLLM, there is no such parameters.

We ran into the same problem (no output). Being able to set a minimal length for the generated text will be very helpful.

@eaubin
Copy link

eaubin commented Jan 2, 2024

The Mixtral AWQ vllm example gives empty output (with temperature 0,0.5,1.0 or default sampling parameters).

@raihan0824
Copy link

The Mixtral AWQ vllm example gives empty output (with temperature 0,0.5,1.0 or default sampling parameters).

same, have you solve this?

@Andrew-MAQ
Copy link

Andrew-MAQ commented Jan 8, 2024

The Mixtral AWQ vllm example gives empty output (with temperature 0,0.5,1.0 or default sampling parameters).

same, have you solve this?

This is the same for me as well. It says 1024 completion_tokens, but the content is blank. The dolphin version seems to work TheBloke/dolphin-2.6-mixtral-8x7b-AWQ.

@hnhlester
Copy link

The Mixtral AWQ vllm example gives empty output (with temperature 0,0.5,1.0 or default sampling parameters).

same, have you solve this?

This is the same for me as well. It says 1024 completion_tokens, but the content is blank. The dolphin version seems to work TheBloke/dolphin-2.6-mixtral-8x7b-AWQ.

Seeing the same thing. Could it be a problem with the model itself? Wonder if TheBloke's GPTQ version works.

@sfc-gh-hazhang
Copy link
Contributor

It is likely your weights are corrupted

@HIT-Owen
Copy link

HIT-Owen commented Feb 2, 2024

The Mixtral AWQ vllm example gives empty output (with temperature 0,0.5,1.0 or default sampling parameters).

same problem, has it been resolved?

@imadcap
Copy link

imadcap commented Feb 6, 2024

Same issue there!
using vllm==0.3.0+cu118 with TheBloke/Mixtral-8x7B-Instruct-v0.1-AWQ
something is definitely wrong. It outputs an empty string despite calculation and gpu usage
if anyone knows why

@arifcraft
Copy link

Same problem. Some of my generation outputs are empty with Mixtral-8x7B-Instruct

@hahmad2008
Copy link

same here with mistral the output is empty

@Meersalzeis
Copy link

I ecountered the same issue, empty string text output, with the TheBloke Mixtral AWQ with vllm and 2 loaders from Oobaboogas Web UI as well. However, ybelkada/Mixtral-8x7B-Instruct-v0.1-AWQ worked on both vLLM and the WebUI for me.

I'm still not 100% sure its a faulty model, so I'd be happy If one of you can confirm (or deny) this with your setup.

@cieske
Copy link

cieske commented Apr 3, 2024

In my case using Mistral Instruct model, input with proper template and setting max_tokens from SamplingParams helps.

@sAviOr287
Copy link

Are there any news on this. I also get empty strings when running in batches returning the empty string for some reason. If using batch size of one this never happens. Any update on this would be great.

@AmoghM
Copy link

AmoghM commented Apr 30, 2024

Same issue!

@hugocool
Copy link

im facing the same issue with llama 3 8b on a 48 gb vram gpu while using the outlines library to enforce json responses, the fields are empty, even though there is plenty of ram left and the model is loaded on the gpu completely.

@joaograndotto
Copy link

joaograndotto commented May 28, 2024

same problem, when the prompt is run for the first time it generates normally, if run again the same prompt returns an empty response. Model: phi3 medium

EDIT:
I solve add min_tokens in SamplingParams

SamplingParams(temperature=0.5, min_tokens=1000)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests