Why use tensor parallelism when model can easily fit on a single GPU ? #294
-
If the model can fit on a single GPU, wouldn't it be better to use something like DDP instead? What are the advantages of using tensor parallelism if the model is small enough to fit on a single GPU ? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
You are right, for small models, you should just use one GPU. You can start multiple vLLM replicas to achieve "data parallelism" for serving, so that is not shown in our code. Tensor parallelism is mainly for large models that cannot fit a single GPU. |
Beta Was this translation helpful? Give feedback.
-
Model parallelism in serving scenarios can be indeed beneficial when serving multiple models at the same time (https://arxiv.org/abs/2302.11665), but in case of serving a single model then you should stick to one GPU |
Beta Was this translation helpful? Give feedback.
You are right, for small models, you should just use one GPU. You can start multiple vLLM replicas to achieve "data parallelism" for serving, so that is not shown in our code. Tensor parallelism is mainly for large models that cannot fit a single GPU.