Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BACKEND][AMD] Disable linear layout due to perf regression #4126

Merged
merged 1 commit into from
Jun 12, 2024

Conversation

antiagainst
Copy link
Collaborator

We have identified a 20% perf regression in our downstream flash attention perf kernel after switching to linear layout. Initial analysis shows register pressure is increased to cause spills. Further analysis is still ongoing.

So this commit introduces a minimal way to selectively disable linear layout only on AMD backend to avoid affecting NVIDIA backend while contining bring it up on AMD side.

We have identified a 20% perf regression in our downstream flash
attention perf kernel after switching to linear layout. Initial
analysis shows register pressure is increased to cause spills.
Further analysis is still ongoing.

So this commit introduces a minimal way to selectively disable
linear layout only on AMD backend to avoid affecting NVIDIA
backend while contining bring it up on AMD side.
@antiagainst antiagainst requested a review from jlebar June 12, 2024 04:52
Copy link
Contributor

@jlebar jlebar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. I also observed something similar on nvidia, but never big enough to make a significant performance difference. I was never able to pin it down, though, because once I made smaller testcases the IR was simplified to exactly the same as without LLs. That is, I was seeing that there was something about "big testcases" that prevented some important optimization.

I tried things like increasing the recursion depth in instcombine and compute-known-bits but it didn't help for my testcase.

@antiagainst
Copy link
Collaborator Author

Yeah we are looking into this at the moment. Because in the down stream kernel (specifically, this one) it's already using lots of registers so the change in codegen causes spills. the flash attention tutorial is not as a big issue due to less registers used there.

@antiagainst antiagainst marked this pull request as ready for review June 12, 2024 05:50
@antiagainst antiagainst merged commit e8bc45d into triton-lang:main Jun 12, 2024
6 checks passed
@antiagainst antiagainst deleted the amd-disable-ll branch June 12, 2024 05:51
jataylo pushed a commit to ROCm/triton that referenced this pull request Jun 19, 2024
…ang#4126)

We have identified a 20% perf regression in our downstream flash
attention perf kernel after switching to linear layout. Initial analysis
shows register pressure is increased to cause spills. Further analysis
is still ongoing.

So this commit introduces a minimal way to selectively disable linear
layout only on AMD backend to avoid affecting NVIDIA backend while
continuing bring it up on AMD side.

(cherry picked from commit e8bc45d)
jataylo pushed a commit to jataylo/triton that referenced this pull request Jun 20, 2024
…ang#4126)

We have identified a 20% perf regression in our downstream flash
attention perf kernel after switching to linear layout. Initial analysis
shows register pressure is increased to cause spills. Further analysis
is still ongoing.

So this commit introduces a minimal way to selectively disable linear
layout only on AMD backend to avoid affecting NVIDIA backend while
continuing bring it up on AMD side.

(cherry picked from commit e8bc45d)
ptillet pushed a commit that referenced this pull request Jun 20, 2024
Cherry picks for release/3.0.x

General:
- e8bc45d [BACKEND][AMD] Disable linear layout due to perf regression
(#4126)
- 9a0a7c2 [AMD] Add basic verification to MFMA encoding (#4117)
 
for RDNA:
- 100e2aa [AMD][WMMA] Support dot3d (#3674)
- 4a1ea8e [AMD][gfx11] Fix BF16 wmma instr generation (#4135)
 
Proton HIP PRs:
- 328b86d [PROTON] Refactor GPU profilers (#4056)
- 60613fb [PROTON] Roctracer: convert agent id to gpu id for gpu ops
(#4090)
- c1776fa [PROTON][AMD] Add Proton HIP GPU Utilization Metrics (#4119)

---------

Co-authored-by: Lei Zhang <antiagainst@gmail.com>
Co-authored-by: Alexander Efimov <efimov.alexander@gmail.com>
Co-authored-by: Ilya V <152324710+joviliast@users.noreply.github.com>
Co-authored-by: Keren Zhou <kerenzhou@openai.com>
Co-authored-by: mwootton <michael.wootton@amd.com>
Co-authored-by: Corbin Robeck <corbin.robeck@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants