Generate nothing from VLLM output #1185

FocusLiwen · 2023-09-26T18:31:25Z

When I run batch inferences, sometimes, the output from vLLM is empty, meaning prediction is empty. Could we make it at least it generate one token? The output is empty is also strange.

AnupKumarJha · 2023-09-26T19:25:40Z

@FocusLiwen can you add some more detail like how you are using inferencing , sampling params & what is your request

FocusLiwen · 2023-09-26T20:47:30Z

Hi, I used tensor_parallel_size as 2 with seed=0 and the following parameters:
"max_tokens": 128,
"temperature": 0,
"top_p": 1.0,
"top_k": -1
I also extract the output from the generation call:

"gold": {"text": "Sports", "supplements": {}},
"predictions": [{"text": "", "raw_text": "", "logprob": 0, "tokens": []}]}

FocusLiwen · 2023-09-26T20:49:43Z

In Huggingface generation api, there is a parameter called: min_gen_len, which could be set as 1 to avoid zero output. But with vLLM, there is no such parameters.

viktor-ferenczi · 2023-09-28T09:22:22Z

Does this happen if you increase the temperature to 1e-3 or 1e-2?

summer66 · 2023-10-04T19:44:28Z

In Huggingface generation api, there is a parameter called: min_gen_len, which could be set as 1 to avoid zero output. But with vLLM, there is no such parameters.

We ran into the same problem (no output). Being able to set a minimal length for the generated text will be very helpful.

eaubin · 2024-01-02T18:19:08Z

The Mixtral AWQ vllm example gives empty output (with temperature 0,0.5,1.0 or default sampling parameters).

raihan0824 · 2024-01-06T04:01:55Z

The Mixtral AWQ vllm example gives empty output (with temperature 0,0.5,1.0 or default sampling parameters).

same, have you solve this?

Andrew-MAQ · 2024-01-08T22:25:27Z

The Mixtral AWQ vllm example gives empty output (with temperature 0,0.5,1.0 or default sampling parameters).

same, have you solve this?

This is the same for me as well. It says 1024 completion_tokens, but the content is blank. The dolphin version seems to work TheBloke/dolphin-2.6-mixtral-8x7b-AWQ.

hnhlester · 2024-01-11T22:54:21Z

The Mixtral AWQ vllm example gives empty output (with temperature 0,0.5,1.0 or default sampling parameters).

same, have you solve this?

This is the same for me as well. It says 1024 completion_tokens, but the content is blank. The dolphin version seems to work TheBloke/dolphin-2.6-mixtral-8x7b-AWQ.

Seeing the same thing. Could it be a problem with the model itself? Wonder if TheBloke's GPTQ version works.

sfc-gh-hazhang · 2024-01-26T00:59:05Z

It is likely your weights are corrupted

HIT-Owen · 2024-02-02T03:19:57Z

The Mixtral AWQ vllm example gives empty output (with temperature 0,0.5,1.0 or default sampling parameters).

same problem, has it been resolved?

imadcap · 2024-02-06T15:23:22Z

Same issue there!
using vllm==0.3.0+cu118 with TheBloke/Mixtral-8x7B-Instruct-v0.1-AWQ
something is definitely wrong. It outputs an empty string despite calculation and gpu usage
if anyone knows why

arifcraft · 2024-02-16T14:40:03Z

Same problem. Some of my generation outputs are empty with Mixtral-8x7B-Instruct

hahmad2008 · 2024-03-24T08:30:13Z

same here with mistral the output is empty

Meersalzeis · 2024-04-01T11:07:37Z

I ecountered the same issue, empty string text output, with the TheBloke Mixtral AWQ with vllm and 2 loaders from Oobaboogas Web UI as well. However, ybelkada/Mixtral-8x7B-Instruct-v0.1-AWQ worked on both vLLM and the WebUI for me.

I'm still not 100% sure its a faulty model, so I'd be happy If one of you can confirm (or deny) this with your setup.

cieske · 2024-04-03T08:09:47Z

In my case using Mistral Instruct model, input with proper template and setting max_tokens from SamplingParams helps.

sAviOr287 · 2024-04-18T18:15:32Z

Are there any news on this. I also get empty strings when running in batches returning the empty string for some reason. If using batch size of one this never happens. Any update on this would be great.

AmoghM · 2024-04-30T18:12:40Z

Same issue!

hugocool · 2024-05-16T14:03:09Z

im facing the same issue with llama 3 8b on a 48 gb vram gpu while using the outlines library to enforce json responses, the fields are empty, even though there is plenty of ram left and the model is loaded on the gpu completely.

joaograndotto · 2024-05-28T13:17:38Z

same problem, when the prompt is run for the first time it generates normally, if run again the same prompt returns an empty response. Model: phi3 medium

EDIT:
I solve add min_tokens in SamplingParams

SamplingParams(temperature=0.5, min_tokens=1000)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate nothing from VLLM output #1185

Generate nothing from VLLM output #1185

FocusLiwen commented Sep 26, 2023

AnupKumarJha commented Sep 26, 2023

FocusLiwen commented Sep 26, 2023 •

edited

FocusLiwen commented Sep 26, 2023

viktor-ferenczi commented Sep 28, 2023

summer66 commented Oct 4, 2023

eaubin commented Jan 2, 2024

raihan0824 commented Jan 6, 2024

Andrew-MAQ commented Jan 8, 2024 •

edited

hnhlester commented Jan 11, 2024

sfc-gh-hazhang commented Jan 26, 2024

HIT-Owen commented Feb 2, 2024

imadcap commented Feb 6, 2024 •

edited

arifcraft commented Feb 16, 2024

hahmad2008 commented Mar 24, 2024

Meersalzeis commented Apr 1, 2024

cieske commented Apr 3, 2024

sAviOr287 commented Apr 18, 2024

AmoghM commented Apr 30, 2024

hugocool commented May 16, 2024

joaograndotto commented May 28, 2024 •

edited

Generate nothing from VLLM output #1185

Generate nothing from VLLM output #1185

Comments

FocusLiwen commented Sep 26, 2023

AnupKumarJha commented Sep 26, 2023

FocusLiwen commented Sep 26, 2023 • edited

FocusLiwen commented Sep 26, 2023

viktor-ferenczi commented Sep 28, 2023

summer66 commented Oct 4, 2023

eaubin commented Jan 2, 2024

raihan0824 commented Jan 6, 2024

Andrew-MAQ commented Jan 8, 2024 • edited

hnhlester commented Jan 11, 2024

sfc-gh-hazhang commented Jan 26, 2024

HIT-Owen commented Feb 2, 2024

imadcap commented Feb 6, 2024 • edited

arifcraft commented Feb 16, 2024

hahmad2008 commented Mar 24, 2024

Meersalzeis commented Apr 1, 2024

cieske commented Apr 3, 2024

sAviOr287 commented Apr 18, 2024

AmoghM commented Apr 30, 2024

hugocool commented May 16, 2024

joaograndotto commented May 28, 2024 • edited

FocusLiwen commented Sep 26, 2023 •

edited

Andrew-MAQ commented Jan 8, 2024 •

edited

imadcap commented Feb 6, 2024 •

edited

joaograndotto commented May 28, 2024 •

edited