-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Truncated response -- repro code #2464
Comments
Here's simpler repro, that happens about 90% of time.
gives:
|
Until I see otherwise, I'm going to assume the strict model card with space between |
I am experiencing the same thing with OpenHermes2.5-Mistral 7B AWQ. Chat template fix (I was applying ChatML by hand, turned it into tokenizer.apply_chat_template) didn't seem to fix it. Anyone has any fix? |
@pseudotensor can you please reopen this issue? I too am facing this with Mixtral. Trying to generate JSONs, and they often get truncated, always ending at character "2", just like in your case (while trying to generate years like 2023 and 2024) |
@WoosukKwon very interesting/maddening bug! |
Sure I re-opened. I agree it's unlikely the prompt change should have mattered so much. |
@vibhuagrawal14 I am seeing exactly the same bug. While writing years or dates, it stops at 2. This is for Mixtral model. @pseudotensor : any fixes or suggestions? Thanks. |
@pseudotensor fixing the spacing between BOS string and [INST] does appear to have fixed the issue. Thanks. |
I actually needed to use double spaces between the BOS ( |
we do face this issues , any workaround |
Adding a space between BOS and |
This sounds like a fix is needed in the chat template? https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1/blob/1e637f2d7cb0a9d6fb1922f305cb784995190a83/tokenizer_config.json#L42 Here's a fix https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1/discussions/176/files but waiting for the mistral team. You can load the fixed version of the chat template here in vLLM: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#chat-template |
See also this discussion: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1/discussions/182 |
We noticed mixtral behaving oddly, and narrow down to a (maybe) 100% repro on 0.2.7. Script is in the zip file. Just replace base_url's FILLIN with your endpoint.
testmixnew1.py.zip
Mixtral was run like:
The output is:
This is a bad output compared to normal as it is truncated. The server says it was a normal stop, but I don't believe it.
The prompt we used is a bit odd in order to repro what we see with normal prompts, so ignore that aspect.
There are several \u encodings in the text, which I'm worried about that leads to premature stop.
The text was updated successfully, but these errors were encountered: