Accumulation of tokens while beam_width > 1

### System Info

tensorrt_llm==0.11.0.dev2024061800

### Who can help?

@ncomly-nvidia 

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [X] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

deploy a model with `beam_width > 1` and trtllm backend, request the BLS model with geneate_stream endpoint and `stream: true`

### Expected behavior

the `accumulate_tokens` should be able to `True`

### actual behavior

error thrown: `Accumulation of tokens is only implemented for beam width = 1`

### additional notes

Maybe all we need to do is enhance the BLS script I think?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Accumulation of tokens while beam_width > 1 #513

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Accumulation of tokens while beam_width > 1 #513

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions