Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncated response -- repro code #2464

Closed
pseudotensor opened this issue Jan 17, 2024 · 13 comments
Closed

Truncated response -- repro code #2464

pseudotensor opened this issue Jan 17, 2024 · 13 comments
Labels
bug Something isn't working

Comments

@pseudotensor
Copy link

pseudotensor commented Jan 17, 2024

We noticed mixtral behaving oddly, and narrow down to a (maybe) 100% repro on 0.2.7. Script is in the zip file. Just replace base_url's FILLIN with your endpoint.

testmixnew1.py.zip

Mixtral was run like:

export CUDA_HOME=/usr/local/cuda-12.3
export PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cu123"
pip install git+https://github.com/vllm-project/vllm.git
pip install mosaicml-turbo --upgrade
pip install git+https://github.com/stanford-futuredata/megablocks.git
pip install fschat==0.2.34
export CUDA_VISIBLE_DEVICES=6,7

python -m vllm.entrypoints.openai.api_server --port=5002 --host=0.0.0.0 --model mistralai/Mixtral-8x7B-Instruct-v0.1 --seed 1234 --tensor-parallel-size=2 --max-num-batched-tokens=163840

The output is:

 The Commonwealth Bank of Australia (CBA) reported strong financial results for the first half of fiscal year 2

This is a bad output compared to normal as it is truncated. The server says it was a normal stop, but I don't believe it.

The prompt we used is a bit odd in order to repro what we see with normal prompts, so ignore that aspect.

There are several \u encodings in the text, which I'm worried about that leads to premature stop.

@pseudotensor pseudotensor changed the title Escaped characters messing up response -- repro code Escaped characters leads to truncated response -- repro code Jan 17, 2024
@pseudotensor pseudotensor changed the title Escaped characters leads to truncated response -- repro code Escaped characters or something leads to truncated response -- repro code Jan 17, 2024
@pseudotensor
Copy link
Author

pseudotensor commented Jan 17, 2024

Here's simpler repro, that happens about 90% of time.

prompt_llm = """<s>[INST] In order to write a concise single-paragraph summary, pay attention to the following text:

\"\"\"
 The Commonwealth Bank of Australia (CBA) reported strong financial results for the first half of fiscal year 2023, with a statutory net profit after tax of AUD 5.216 billion, up 10% from the same period last year. Cash net profit after tax stood at AUD 5.153 billion, a 9% increase. Operating performance also improved by 18% to AUD 7.820 billion. The bank's home and consumer lending gross lending reached AUD 77 billion, while business and corporate lending gross lending amounted to AUD 18 billion. CBA's net promoter scores (NPS) remained high, with the bank ranking first in the consumer, business, and institutional categories. The bank's liquid assets and deposit funding increased, and its weighted average maturity stood at 5.8 years. CBA's CET1 ratio was 11.4%, and it declared a dividend per share of AUD 2.10 (35 cents). However, the bank warned that forward-looking statements should be treated with caution due to current economic uncertainties and geopolitical risks.
\"\"\"
Using only the text above, write a condensed and concise summary of key results (preferably as one paragraph):
 [/INST]"""

base_url = 'FILLME'
base_model = 'mistralai/Mixtral-8x7B-Instruct-v0.1'
api_key = 'EMPTY'
stream_output = False

client_kwargs = dict(model=base_model,
                     max_tokens=1024,
                     temperature=0,
                     stream=stream_output)

from openai import OpenAI, AsyncOpenAI
cls = OpenAI
client_args = dict(base_url=base_url, api_key=api_key)
openai_client = cls(**client_args)
client = openai_client.completions
client_kwargs.update(dict(prompt=prompt_llm))
responses = client.create(**client_kwargs)
text = responses.choices[0].text
print(text)

gives:

 The Commonwealth Bank of Australia (CBA) announced robust financial results for the first half of fiscal year 2

@pseudotensor
Copy link
Author

pseudotensor commented Jan 17, 2024

Until I see otherwise, I'm going to assume the strict model card with space between <s> and [INST] is required as they say, until mistral models that have no space. With that change these particular cases do not have issues. Will re-open if see others.

@abdullahpalaz
Copy link

I am experiencing the same thing with OpenHermes2.5-Mistral 7B AWQ. Chat template fix (I was applying ChatML by hand, turned it into tokenizer.apply_chat_template) didn't seem to fix it. Anyone has any fix?

@vibhuagrawal14
Copy link

vibhuagrawal14 commented Feb 9, 2024

@pseudotensor can you please reopen this issue? I too am facing this with Mixtral. Trying to generate JSONs, and they often get truncated, always ending at character "2", just like in your case (while trying to generate years like 2023 and 2024)

@vibhuagrawal14
Copy link

@WoosukKwon very interesting/maddening bug!

@pseudotensor pseudotensor reopened this Feb 9, 2024
@pseudotensor
Copy link
Author

Sure I re-opened. I agree it's unlikely the prompt change should have mattered so much.

@pseudotensor pseudotensor changed the title Escaped characters or something leads to truncated response -- repro code Truncated response -- repro code Feb 9, 2024
@chemrahul82
Copy link

@vibhuagrawal14 I am seeing exactly the same bug. While writing years or dates, it stops at 2. This is for Mixtral model. @pseudotensor : any fixes or suggestions? Thanks.

@chemrahul82
Copy link

chemrahul82 commented Feb 22, 2024

@pseudotensor fixing the spacing between BOS string and [INST] does appear to have fixed the issue. Thanks.

@zoltan-fedor
Copy link

zoltan-fedor commented Mar 20, 2024

I actually needed to use double spaces between the BOS (<s>) and [INST] for it to work, although in my case it truncated the response at numbers other than number 2

@abhibisht89
Copy link

we do face this issues , any workaround

@gmittal
Copy link

gmittal commented Mar 27, 2024

Adding a space between BOS and [INST] fixes this issue for us as well.

@simon-mo
Copy link
Collaborator

@keskival
Copy link

keskival commented Mar 28, 2024

See also this discussion: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1/discussions/182
The spaces almost totally fix the issue, but not completely. It seems to arise from Mistral training corpus, which likely includes corrupted files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

10 participants