Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Balancing computation with zigzag blocking #2

Closed
zhuzilin opened this issue Feb 22, 2024 · 2 comments
Closed

[Feature Request] Balancing computation with zigzag blocking #2

zhuzilin opened this issue Feb 22, 2024 · 2 comments

Comments

@zhuzilin
Copy link
Owner

Currently the implementation will split the input sequence into n blocks, e.g. 4 gpu will split into:

b0 | b1 | b2 | b3

however, this will result in uneven calculation, where the gpu that has b3 will do around 4 times more calculation than the gpu that has b0, due to causal attention mask.

If we split the input sequence into 2n blocks, e.g. 4 gpu will split into:

b0,b7 | b1,b6 | b2,b5 | b3,b4

then all gpu will have the same amount of calculation, and theoratically the latency should be decrease by half.

@andreaskoepf
Copy link

Do you mean Striped Attention when you say zigzag blocking? Or is it something more simple which still gives you a much better utilization (lower latency) of the gpus?

@zhuzilin
Copy link
Owner Author

@andreaskoepf Oh... I haven't read the striped attention before... (thought that was some sparse attention mask version of ring attention like window attention from the name, my bad...)

but from a quick look, it seems that stripe attention is doing some thing like:
image

0,4,8,12 | 1,5,9,13 | 2,6,10,14 | 3,7,11,15

I was thinking about doing something like:

0,1,14,15| 2,3,12,13 | 4,5,10,11 | 6,7,8,9

which may be able to fold the causal mask from:

x
xx
xxx
xxxx
xxxxx
xxxxxx
xxxxxxx
xxxxxxxx

into

x xxxxxxxx
xx xxxxxxx
xxx xxxxxx
xxxx xxxxx

I'm not sure which could give better performace...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants