Model output keeps generating 

Hello, I built the triton 23.12 container with the trtllm 0.7.1 backend using the 3rd build option in the triton trtllm guide, and deployed 2 models, Mistral 7b instruct and phind codellama v2 34b, and when I asked them simple questions that didn't need long answers like "what is 1+1"  with 50 max tokens, the output was the answer, but it kept generating . It repeated the question with , 2+2 is 4, 3+3 is 6 etc, till it reached the max tokens. I tried sending it temperature 0 and stop_words, but it kept happening. I also tried using prompt engineering to tell hime to stop after giving the answer, but it didn't work. Happened both in generate and generate_stream. Sent the request through the tensorrt_llm_bls. I loaded both models on vllm, and it worked just fine. I was wondering if this is a feature or a bug of there is a way to stop that in the config.pbtxt or something.  Thanks in advance! 🙏

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model output keeps generating #287

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model output keeps generating #287

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions