-
Notifications
You must be signed in to change notification settings - Fork 132
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
System Info
I am trying to deploy Qwen2.5-0.5B-Instruct model after converting it to TensorRT-LLM engine on Triton Inference Server. After running nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3, I can't find /app/all_models/inflight_batcher_llm/ within the docker container.
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
- docker run --rm -it --net host --shm-size=2g --ulimit memlock=-1 --ulimit stack=67108864 --gpus all -v </path/to/engines>:/engines nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3
- mkdir /triton_model_repo
- cp -r /app/all_models/inflight_batcher_llm/* /triton_model_repo/
Expected behavior
The model repo should be copied.
actual behavior
You will get an error message: [No such file or directory](cp: cannot stat '/app/all_models/inflight_batcher_llm/*': No such file or directory)
additional notes
Please make the readme more descriptive, a detailed guide on how to convert a huggingface model to TensorRT-LLM engine and then deploy it on Triton Inference Server along with the exact versions of packages and docker images to be used would be more helpful.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working