Skip to content

Why use tensor parallelism when model can easily fit on a single GPU ? #294

Answered by zhuohan123
vikigenius asked this question in Q&A
Discussion options

You must be logged in to vote

You are right, for small models, you should just use one GPU. You can start multiple vLLM replicas to achieve "data parallelism" for serving, so that is not shown in our code. Tensor parallelism is mainly for large models that cannot fit a single GPU.

Replies: 2 comments 3 replies

Comment options

You must be logged in to vote
3 replies
@nivibilla
Comment options

@gesanqiu
Comment options

@pri2si17-1997
Comment options

Answer selected by zhuohan123
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
6 participants
Converted from issue

This discussion was converted from issue #293 on June 28, 2023 15:32.