Skip to content

Why use manual tensor parallelism implementation instead of using something like deepspeed? #267

Answered by zhuohan123
vikigenius asked this question in Q&A
Discussion options

You must be logged in to vote

Zero and FSDP communicate weights instead of activations. In inference, activations are small but weights are large. Therefore we choose tensor parallelism for efficiency.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by zhuohan123
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants