[Misc]: [V1] prompt logprobs + chunked prefill can result in `EngineCore` partial prefill output

See https://github.com/vllm-project/vllm/blob/4f5b059f146adeecd153fa781cf21863ed6679d8/vllm/v1/engine/output_processor.py#L277

Prompt logprobs + chunked prefill can result in engine core returning an output for a partial prefill (in order to send back partial prompt logprobs.) This breaks the invariant that process_outputs is only operating on engine core outputs associated with non-partial completions. Currently this is handled by having `is_prefilling` in `OutputProcessor` check for new decoded tokens, indicating that the completion is not partial.

A follow-up PR should aggregate partial prompt logprobs in the EngineCore.

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Misc]: [V1] prompt logprobs + chunked prefill can result in `EngineCore` partial prefill output #14239

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Misc]: [V1] prompt logprobs + chunked prefill can result in EngineCore partial prefill output #14239

Description

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Misc]: [V1] prompt logprobs + chunked prefill can result in `EngineCore` partial prefill output #14239