Skip to content

Why use manual tensor parallelism implementation instead of using something like deepspeed? #267

Answered by zhuohan123
vikigenius asked this question in Q&A
Discussion options

You must be logged in to vote

Zero and FSDP communicate weights instead of activations. In inference, activations are small but weights are large. Therefore we choose tensor parallelism for efficiency.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@qingquansong
Comment options

Answer selected by zhuohan123
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants