You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @sfc-gh-zhwang, I don't think this is easily doable in vLLM at the moment within a single python process. Possibly you could construct each model on GPU 0 and move each to GPU X before moving on.
I would recommend starting a separate process for each LLM and specifying CUDA_VISIBLE_DEVICES for each i.e. CUDA_VISIBLE_DEVICES=0 python script.py, CUDA_VISIBLE_DEVICES=1 python script.py, etc
Your current environment
How would you like to use vllm
Below code will try to init LLM on the 1st GPU, causing GPU OOM.
The text was updated successfully, but these errors were encountered: