Skip to content

[Question] How can I use ensemble model to get output token one at a time before it's sent to the client ? #280

@ZihanLiao

Description

@ZihanLiao

Suffering from repeated output, I'm trying to add some rules to the postprocessing model to avoid duplication. I'm trying to early stop the service when duplication occurs, so i think this should be a wrapper above the streaming mode.
So the question is how to get the token one at a time in the ensemble model to make me do early stopping?

Metadata

Metadata

Assignees

Labels

triagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions