Why use manual tensor parallelism implementation instead of using something like deepspeed? #267

vikigenius · 2023-06-27T00:10:39Z

vikigenius
Jun 27, 2023

I know it could have just been a design decision, but I would love to hear your rationale behind why you rolled your own implementation of model parallelism over using something like deepspeed as a dependency.

Answered by zhuohan123

Jun 30, 2023

Zero and FSDP communicate weights instead of activations. In inference, activations are small but weights are large. Therefore we choose tensor parallelism for efficiency.

View full answer

zhuohan123 · 2023-06-30T02:22:54Z

zhuohan123
Jun 30, 2023
Maintainer

Zero and FSDP communicate weights instead of activations. In inference, activations are small but weights are large. Therefore we choose tensor parallelism for efficiency.

1 reply

qingquansong Jul 21, 2024

For offline inference with prefill only case（such as embedding generation），this tp setting seems to be worse than deepspeed and fsdp.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why use manual tensor parallelism implementation instead of using something like deepspeed? #267

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Why use manual tensor parallelism implementation instead of using something like deepspeed? #267

vikigenius Jun 27, 2023

Replies: 1 comment · 1 reply

zhuohan123 Jun 30, 2023 Maintainer

qingquansong Jul 21, 2024

vikigenius
Jun 27, 2023

Replies: 1 comment 1 reply

zhuohan123
Jun 30, 2023
Maintainer