large memory usage #23

LzhinFdu · 2024-03-19T08:50:22Z

Thanks for sharing this excellent implementation of ring attention.
Here are my test results on 2*A100 (with nvlink). Judging from the results, the memory usage of ring attention（ring_flash_attn_qkvpacked_func） seems to be very large. This is not as expected. Are there any possible problems?

GeneZC · 2024-03-22T03:50:37Z

The results of flash attention are somehow amazing... keep an eye on this.

LzhinFdu · 2024-03-22T06:04:02Z

The results of flash attention are somehow amazing... keep an eye on this.

thanks, I'll check it

GeneZC · 2024-03-24T04:09:02Z

And after reading the code, I have found that the ring attention should accept already-chunked qkv instead of the whole qkv. That is, qkv should be split into local qkv before fed into the ring attention. This might be a diff.

Not very certain that this is true, though.

jzhang38 · 2024-04-06T02:12:31Z

@LzhinFdu @GeneZC Yeah you need to shard the sequence yourself before feeding them into ring-flash-attention.
I have an implementation here: https://github.com/jzhang38/EasyContext

LzhinFdu · 2024-04-06T06:43:37Z

And after reading the code, I have found that the ring attention should accept already-chunked qkv instead of the whole qkv. That is, qkv should be split into local qkv before fed into the ring attention. This might be a diff.

Not very certain that this is true, though.

That's right. Therefore, when comparing memory usage, the context used by Flash Attention should be doubled. Despite this, Flash Attention still maintains a significant lead.

GeneZC mentioned this issue Mar 22, 2024

The performance comparison between flash attn and ring flash attn cuda-mode/ring-attention#14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

large memory usage #23

large memory usage #23

LzhinFdu commented Mar 19, 2024 •

edited

Loading

GeneZC commented Mar 22, 2024 •

edited

Loading

LzhinFdu commented Mar 22, 2024

GeneZC commented Mar 24, 2024 •

edited

Loading

jzhang38 commented Apr 6, 2024

LzhinFdu commented Apr 6, 2024

large memory usage #23

large memory usage #23

Comments

LzhinFdu commented Mar 19, 2024 • edited Loading

GeneZC commented Mar 22, 2024 • edited Loading

LzhinFdu commented Mar 22, 2024

GeneZC commented Mar 24, 2024 • edited Loading

jzhang38 commented Apr 6, 2024

LzhinFdu commented Apr 6, 2024

LzhinFdu commented Mar 19, 2024 •

edited

Loading

GeneZC commented Mar 22, 2024 •

edited

Loading

GeneZC commented Mar 24, 2024 •

edited

Loading