Why use manual tensor parallelism implementation instead of using something like deepspeed? #267
Answered
by
zhuohan123
vikigenius
asked this question in
Q&A
-
I know it could have just been a design decision, but I would love to hear your rationale behind why you rolled your own implementation of model parallelism over using something like deepspeed as a dependency. |
Beta Was this translation helpful? Give feedback.
Answered by
zhuohan123
Jun 30, 2023
Replies: 1 comment 1 reply
-
Zero and FSDP communicate weights instead of activations. In inference, activations are small but weights are large. Therefore we choose tensor parallelism for efficiency. |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
zhuohan123
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Zero and FSDP communicate weights instead of activations. In inference, activations are small but weights are large. Therefore we choose tensor parallelism for efficiency.