[Feature Request] Balancing computation with zigzag blocking #2

zhuzilin · 2024-02-22T02:54:35Z

Currently the implementation will split the input sequence into n blocks, e.g. 4 gpu will split into:

b0 | b1 | b2 | b3

however, this will result in uneven calculation, where the gpu that has b3 will do around 4 times more calculation than the gpu that has b0, due to causal attention mask.

If we split the input sequence into 2n blocks, e.g. 4 gpu will split into:

b0,b7 | b1,b6 | b2,b5 | b3,b4

then all gpu will have the same amount of calculation, and theoratically the latency should be decrease by half.

The text was updated successfully, but these errors were encountered:

andreaskoepf · 2024-02-22T21:01:13Z

Do you mean Striped Attention when you say zigzag blocking? Or is it something more simple which still gives you a much better utilization (lower latency) of the gpus?

zhuzilin · 2024-02-22T23:18:29Z

@andreaskoepf Oh... I haven't read the striped attention before... (thought that was some sparse attention mask version of ring attention like window attention from the name, my bad...)

but from a quick look, it seems that stripe attention is doing some thing like:

0,4,8,12 | 1,5,9,13 | 2,6,10,14 | 3,7,11,15

I was thinking about doing something like:

0,1,14,15| 2,3,12,13 | 4,5,10,11 | 6,7,8,9

which may be able to fold the causal mask from:

x
xx
xxx
xxxx
xxxxx
xxxxxx
xxxxxxx
xxxxxxxx

into

x xxxxxxxx
xx xxxxxxx
xxx xxxxxx
xxxx xxxxx

I'm not sure which could give better performace...

zhuzilin closed this as completed Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Balancing computation with zigzag blocking #2

[Feature Request] Balancing computation with zigzag blocking #2

zhuzilin commented Feb 22, 2024

andreaskoepf commented Feb 22, 2024

zhuzilin commented Feb 22, 2024

[Feature Request] Balancing computation with zigzag blocking #2

[Feature Request] Balancing computation with zigzag blocking #2

Comments

zhuzilin commented Feb 22, 2024

andreaskoepf commented Feb 22, 2024

zhuzilin commented Feb 22, 2024