-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISC-V] Eliminate unnecessary add instruction in array indirection #112978
[RISC-V] Eliminate unnecessary add instruction in array indirection #112978
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
RISC-V Release-CLR-VF2: 9463 / 9539 (99.20%)
Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz Build information and commandsGIT: RISC-V Release-CLR-QEMU: 9463 / 9539 (99.20%)
Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz Build information and commandsGIT: RISC-V Release-FX-VF2: 398096 / 441821 (90.10%)
Build information and commandsGIT: RISC-V Release-FX-QEMU: 655992 / 684121 (95.89%)
Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz Build information and commandsGIT: |
Can you please share the jitdump of the IR node that this helps with? It does not look right to me that you need to enable x64-enabled-only paths for RISCV64. That probably means the change should be made elsewhere, given that the cases are handled just fine for ARM64 with more general address modes. |
RISC-V Release-CLR-VF2: 9463 / 9539 (99.20%)
Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz Build information and commandsGIT: RISC-V Release-FX-VF2: 422260 / 454849 (92.84%)
Build information and commandsGIT: RISC-V Release-CLR-QEMU: 9463 / 9539 (99.20%)
Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz Build information and commandsGIT: RISC-V Release-FX-QEMU: 605756 / 650538 (93.12%)
Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz Build information and commandsGIT: |
Here is the IR JIT dump:
If I understand correctly, the constant extraction is disabled for ARM because ARM does not support [reg + reg + imm] addressing mode. ARM only uses constant offset when either operand is a constant, leading to the [reg + imm] addressing mode. So, once either of the operand is known to be not a constant, ARM will fallback to [reg + reg]. For RISC-V, [reg + reg] can't be generated, neither do [reg + reg + imm]. However, if we create a From here:
And here:
Let me know what you think. Maybe I shouldn't rely on [reg + reg + imm] being generated properly and instead create a proper GT_LEA that only uses one register? |
Perhaps this code should be enabled for RISCV64 instead? runtime/src/coreclr/jit/morph.cpp Lines 3168 to 3180 in 035d268
|
@jakobbotsch Unfortunately, only enabling ARM preferred form for RISC-V still does not eliminate the unnecessary add instruction. JIT Dump:
Generated assembly: IN0005: 000020 addi a0, a0, 16 ; <- still generated here
; gcrRegs -[a0]
; byrRegs +[a0]
...
IN0008: 00002C slli a1, a1, 2
IN0009: 000030 add t6, a0, a1
; byrRegs +[t6]
IN000a: 000034 lw a0, 0(t6) ; <- instead of here
; byrRegs -[a0]
Although we could introduce a third form for RISC-V:
Which leads to the following IR form:
And an optimized assembly: IN0007: 000028 slli a1, a1, 2
IN0008: 00002C add a0, a0, a1
; gcrRegs -[a0]
; byrRegs +[a0]
IN0009: 000030 lw a0, 16(a0)
; byrRegs -[a0] But I am not so sure how it would affect with CSE/hoist optimization. I think with the limitation of RISC-V addressing, it's not an issue. What do you think? Maybe I should benchmark "sum all elements" like in this PR: #61293 ? |
Thanks for checking. I think what you have here looks reasonable as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you
Can you please follow the instructions in #112978 (comment) to agree with the CLA? |
@dotnet-policy-service agree company="Samsung Electronics" |
RISC-V supports the following addressing mode:
[reg]
&[reg + imm]
.genCreateAddrMode
already supports constant determination, however, it's currently only enabled for XARCH.Additionally, I disabled scale extraction for RISC-V.
Before patch:
After patch applied:
See related discussion here: #112917
Part of #84834, cc @dotnet/samsung