-
Notifications
You must be signed in to change notification settings - Fork 132
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
System Info
gpu: A100
trtllm v0.11.0
trtllm-backend v0.11.0
image: Triton Inference Server: 24.07-trtllm-python-py3
model: llama-7b
lora: Japanese-Alpaca-LoRA-7b-v0
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Reference doc: https://github.com/triton-inference-server/tensorrtllm_backend/tree/v0.11.0/inflight_batcher_llm
python3 inflight_batcher_llm/client/inflight_batcher_llm_client.py
--request-output-len 10
--text "hello"
--tokenizer-dir /opt/app/ori_models/llama_7b
--lora-path /opt/app/TensorRT-LLM/examples/llama/Japanese-Alpaca-LoRA-7b-v0-weights
--lora-task-id 1
Expected behavior
Same parameters, same results multiple times
actual behavior
additional notes
tensorrt_llm: config.pbtxt
config.pbtxt.txt
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working

