You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.4 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla V100-SXM2-32GB On | 00000000:86:00.0 Off | 0 |
| N/A 43C P0 74W / 300W | 28465MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2-32GB On | 00000000:8A:00.0 Off | 0 |
| N/A 38C P0 61W / 300W | 681MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+v
Note that I requested the model to run on GPU 1 but nvidia smi shows that it's running on GPU 0. This is a problem because I have 2 GPUs and I want to run a model on each GPU, but vllm/Triton tries to run both on the same GPU (#0) and causes a CUDA OOM error, which seems related to #6855
I saw that latest commit in main had made changes related to CUDA devices so I have patched my Triton server with this latest code. The logs show that it's using device 1 but for some reason this doesn't seem to actually work I0613 06:03:52.276264 1 model.py:166] "Detected KIND_GPU model instance, explicitly setting GPU device=1 for vllm_model_1"
Triton Information
24.05
Are you using the Triton container or did you build it yourself?
Container from NCC.
Description
My issue is similar to triton-inference-server/tensorrtllm_backend#481 except it's for vllm.
I have the following config.pbtxt
When I check nvidia-smi, I get
Note that I requested the model to run on GPU 1 but nvidia smi shows that it's running on GPU 0. This is a problem because I have 2 GPUs and I want to run a model on each GPU, but vllm/Triton tries to run both on the same GPU (#0) and causes a CUDA OOM error, which seems related to #6855
I saw that latest commit in main had made changes related to CUDA devices so I have patched my Triton server with this latest code. The logs show that it's using device 1 but for some reason this doesn't seem to actually work
I0613 06:03:52.276264 1 model.py:166] "Detected KIND_GPU model instance, explicitly setting GPU device=1 for vllm_model_1"
Triton Information
24.05
Are you using the Triton container or did you build it yourself?
Container from NCC.
Note that I patched the backend code.
To Reproduce
nvidia-smi
Expected behavior
Model to be loaded on gpu 1
The text was updated successfully, but these errors were encountered: