Skip to content

Accumulation of tokens while beam_width > 1 #513

@wxsms

Description

@wxsms

System Info

tensorrt_llm==0.11.0.dev2024061800

Who can help?

@ncomly-nvidia

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

deploy a model with beam_width > 1 and trtllm backend, request the BLS model with geneate_stream endpoint and stream: true

Expected behavior

the accumulate_tokens should be able to True

actual behavior

error thrown: Accumulation of tokens is only implemented for beam width = 1

additional notes

Maybe all we need to do is enhance the BLS script I think?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions