Skip to content

[QUESTION] Inquiry about DynamicGenerator: EOS not returned until max_new_tokens reached, despite stop_conditions #809

@keds-rnd

Description

@keds-rnd

OS

Linux

GPU Library

CUDA 12.x

Python version

3.10

Pytorch version

2.8.0+cu128

Model

gemm3 27b exl2

Describe the bug

Hello,

I'm a developer in South Korea using your framework, and I want to start by saying thank you for building such an excellent library like ExLlamaV2.

I am currently using a combination of ExLlamaV2 (ExLlamaV2DynamicJob) with FastAPI and Redis to handle multiple concurrent user requests.

I've observed an issue where, even when the model's response to a user query is shorter than max_new_tokens, the generation process seems to continue internally until it reaches max_new_tokens before finally reporting the End-of-Stream (EOS) status.

This persistence occurs even though I have explicitly set stop_conditions. (Currently, I'm mitigating this by forcibly closing the web connection when no data is received for a set period, to allow the next conversation to begin.)

I'm wondering if there's a recommended way to force the LLM to return the EOS status immediately after the output is logically complete, without having to wait until max_new_tokens has been fully generated.

Here is the code I use to create the job:

job = ExLlamaV2DynamicJob(
    input_ids=input_ids, max_new_tokens=max_new_tokens,
    stop_conditions=get_stop_conditions(PROMPT_FORMAT, tokenizer),
    gen_settings=ExLlamaV2Sampler.Settings(),
    filter_prefer_eos=True, identifier=job_id
)

The model I am using is Gemma 3.

Thank you for your time and assistance!

Best regards.

Reproduction steps

.

Expected behavior

.

Logs

No response

Additional context

No response

Acknowledgements

  • I have looked for similar issues before submitting this one.
  • I understand that the developers have lives and my issue will be answered when possible.
  • I understand the developers of this program are human, and I will ask my questions politely.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions