Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISC-V] Eliminate unnecessary add instruction in array indirection #112978

Merged
merged 2 commits into from
Mar 4, 2025

Conversation

fuad1502
Copy link
Contributor

@fuad1502 fuad1502 commented Feb 27, 2025

RISC-V supports the following addressing mode: [reg] & [reg + imm].

genCreateAddrMode already supports constant determination, however, it's currently only enabled for XARCH.

Additionally, I disabled scale extraction for RISC-V.

Before patch:

slli           a1, a1, 2
addi           a1, a1, 16
add            t6, a0, a1
lw             a0, 0(t6)

After patch applied:

slli           a1, a1, 2
add            a2, a0, a1
lw             a0, 16(a2)

See related discussion here: #112917

Part of #84834, cc @dotnet/samsung

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 27, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Feb 27, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@risc-vv
Copy link

risc-vv commented Feb 27, 2025

RISC-V Release-CLR-VF2: 9463 / 9539 (99.20%)
=======================
      passed: 9463
      failed: 59
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9645
 TOTAL tests: 9645
   REAL time: 2h 7min 36s 603ms
=======================

Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz

Build information and commands

GIT: d440f302599a871438c1b29761ec39c7b763dd0d
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/index-addr-opt
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-CLR-QEMU: 9463 / 9539 (99.20%)
=======================
      passed: 9463
      failed: 59
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9645
 TOTAL tests: 9645
   REAL time: 2h 46min 10s 569ms
=======================

Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz

Build information and commands

GIT: d440f302599a871438c1b29761ec39c7b763dd0d
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/index-addr-opt
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-VF2: 398096 / 441821 (90.10%)
=======================
      passed: 398096
      failed: 160
     skipped: 1413
      killed: 43565
------------------------
  TOTAL libs: 258
 TOTAL tests: 443234
   REAL time: 2h 44min 46s 617ms
=======================

Release-FX-VF2.md, Release-FX-VF2.xml, testfx_output.tar.gz

Build information and commands

GIT: d440f302599a871438c1b29761ec39c7b763dd0d
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/index-addr-opt
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-QEMU: 655992 / 684121 (95.89%)
=======================
      passed: 655992
      failed: 320
     skipped: 1453
      killed: 27809
------------------------
  TOTAL libs: 258
 TOTAL tests: 685574
   REAL time: 2h 27min 27s 421ms
=======================

Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz

Build information and commands

GIT: d440f302599a871438c1b29761ec39c7b763dd0d
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: fuad1502/runtime
BRANCH: riscv-jit-opt/index-addr-opt
CONFIG: Release
LIB_CONFIG: Release

@fuad1502 fuad1502 marked this pull request as ready for review February 27, 2025 06:58
@filipnavara filipnavara added the arch-riscv Related to the RISC-V architecture label Feb 27, 2025
@jakobbotsch
Copy link
Member

Can you please share the jitdump of the IR node that this helps with? It does not look right to me that you need to enable x64-enabled-only paths for RISCV64. That probably means the change should be made elsewhere, given that the cases are handled just fine for ARM64 with more general address modes.

@risc-vv
Copy link

risc-vv commented Feb 27, 2025

RISC-V Release-CLR-VF2: 9463 / 9539 (99.20%)
=======================
      passed: 9463
      failed: 59
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9645
 TOTAL tests: 9645
   REAL time: 2h 7min 57s 228ms
=======================

Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz

Build information and commands

GIT: 219a0180e8b9e2558988b0a044e5fc2e5f8e9c9c
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-VF2: 422260 / 454849 (92.84%)
=======================
      passed: 422260
      failed: 130
     skipped: 1477
      killed: 32459
------------------------
  TOTAL libs: 258
 TOTAL tests: 456326
   REAL time: 2h 58min 46s 358ms
=======================

Release-FX-VF2.md, Release-FX-VF2.xml, testfx_output.tar.gz

Build information and commands

GIT: 219a0180e8b9e2558988b0a044e5fc2e5f8e9c9c
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-CLR-QEMU: 9463 / 9539 (99.20%)
=======================
      passed: 9463
      failed: 59
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9645
 TOTAL tests: 9645
   REAL time: 2h 45min 34s 456ms
=======================

Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz

Build information and commands

GIT: 219a0180e8b9e2558988b0a044e5fc2e5f8e9c9c
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-QEMU: 605756 / 650538 (93.12%)
=======================
      passed: 605756
      failed: 325
     skipped: 1453
      killed: 44457
------------------------
  TOTAL libs: 258
 TOTAL tests: 651991
   REAL time: 2h 27min 57s 202ms
=======================

Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz

Build information and commands

GIT: 219a0180e8b9e2558988b0a044e5fc2e5f8e9c9c
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

@fuad1502
Copy link
Contributor Author

fuad1502 commented Feb 27, 2025

@jakobbotsch

Here is the IR JIT dump:

N016 ( 17, 25) [000004] ---XG+-----                         *  RETURN    int    $145
N015 ( 16, 24) [000016] ---XG+-----                         \--*  COMMA     int    <l:$182, c:$183>
N004 (  8, 11) [000008] ---X-+-----                            +--*  BOUNDS_CHECK_Rng void   $145
N001 (  1,  1) [000001] -----+-----                            |  +--*  LCL_VAR   int    V01 arg1         u:1 $c0
N003 (  3,  3) [000007] ---X-+-----                            |  \--*  ARR_LENGTH int    $180
N002 (  1,  1) [000000] -----+-----                            |     \--*  LCL_VAR   ref    V00 arg0         u:1 $80
N014 (  8, 13) [000019] n---G+-----                            \--*  IND       int    <l:$181, c:$c2>
N013 (  5, 11) [000015] -----+-----                               \--*  ARR_ADDR  byref int[] $2c0
N012 (  5, 11) [000014] -----+-N---                                  \--*  ADD       byref  $240
N005 (  1,  1) [000005] -----+-----                                     +--*  LCL_VAR   ref    V00 arg0         u:1 (last use) $80
N011 (  6, 15) [000013] -----+-----                                     \--*  ADD       long   $1c2
N009 (  4, 10) [000011] -----+-----                                        +--*  LSH       long   $1c1
N007 (  2,  5) [000009] -----+---U-                                        |  +--*  CAST      long <- uint $1c0
N006 (  1,  1) [000006] -----+-----                                        |  |  \--*  LCL_VAR   int    V01 arg1         u:1 (last use) $c0
N008 (  1,  4) [000010] -----+-N---                                        |  \--*  CNS_INT   long   2 $200
N010 (  1,  4) [000012] -----+-----                                        \--*  CNS_INT   long   16 $202

If I understand correctly, the constant extraction is disabled for ARM because ARM does not support [reg + reg + imm] addressing mode. ARM only uses constant offset when either operand is a constant, leading to the [reg + imm] addressing mode.

So, once either of the operand is known to be not a constant, ARM will fallback to [reg + reg].

For RISC-V, [reg + reg] can't be generated, neither do [reg + reg + imm]. However, if we create a GT_LEA with [reg + reg + imm], when it reaches emitInsLoadStoreOp or genLeaInstruction, it will generate them properly.

From here:

// Generate code for a load or store operation with a potentially complex addressing mode
// This method handles the case of a GT_IND with contained GT_LEA op1 of the x86 form [base + index*sccale + offset]

And here:

// So for the case of a LEA node of the form [Base + Index*Scale + Offset] we will generate:
// tmpReg = indexReg << scale;
// destReg = baseReg + tmpReg;
// destReg = destReg + offset;

Let me know what you think. Maybe I shouldn't rely on [reg + reg + imm] being generated properly and instead create a proper GT_LEA that only uses one register?

@jakobbotsch
Copy link
Member

Perhaps this code should be enabled for RISCV64 instead?

bool groupArrayRefWithElemOffset = false;
#ifdef TARGET_ARMARCH
groupArrayRefWithElemOffset = true;
// TODO: in some cases even on ARM we better use 1) shape because if "index" is invariant and "arrRef" is not
// we at least will be able to hoist/CSE "index + elemOffset" in some cases.
// See https://github.com/dotnet/runtime/pull/61293#issuecomment-964146497
// Don't use 2) for structs to reduce number of size regressions
if (varTypeIsStruct(elemTyp))
{
groupArrayRefWithElemOffset = false;
}
#endif

@fuad1502
Copy link
Contributor Author

fuad1502 commented Feb 28, 2025

@jakobbotsch Unfortunately, only enabling ARM preferred form for RISC-V still does not eliminate the unnecessary add instruction.

JIT Dump:

N014 ( 10, 18) [000019] n---G+-----                            \--*  IND       int    <l:$181, c:$c2>
N013 (  7, 16) [000015] -----+-----                               \--*  ARR_ADDR  byref int[] $2c0
N012 (  7, 16) [000014] -----+-N---                                  \--*  ADD       byref  $201
N007 (  3,  6) [000013] -----+-----                                     +--*  ADD       byref  $200
N005 (  1,  1) [000005] -----+-----                                     |  +--*  LCL_VAR   ref    V00 arg0         u:1 (
last use) $80
N006 (  1,  4) [000012] -----+-----                                     |  \--*  CNS_INT   long   16 $1c0
N011 (  4, 10) [000011] -----+-----                                     \--*  LSH       long   $241
N009 (  2,  5) [000009] -----+---U-                                        +--*  CAST      long <- uint $240
N008 (  1,  1) [000006] -----+-----                                        |  \--*  LCL_VAR   int    V01 arg1         u:
1 (last use) $c0
N010 (  1,  4) [000010] -----+-N---                                        \--*  CNS_INT   long   2 $1c1

Generated assembly:

IN0005: 000020      addi           a0, a0, 16 ; <- still generated here
                             ; gcrRegs -[a0]
                             ; byrRegs +[a0]
...
IN0008: 00002C      slli           a1, a1, 2
IN0009: 000030      add            t6, a0, a1
                             ; byrRegs +[t6]
IN000a: 000034      lw             a0, 0(t6) ; <- instead of here
                             ; byrRegs -[a0]

Although we could introduce a third form for RISC-V:

// We can generate two types of trees for "addr":
//
//  1) "arrRef + (index + elemOffset)"
//  2) "(arrRef + elemOffset) + index"
//  3) "elemOffset + (arrRef + index)" <- for RISC-V

Which leads to the following IR form:

N014 (  9, 14) [000019] n---G+-----                            \--*  IND       int    <l:$181, c:$c2>
N013 (  6, 12) [000015] -----+-----                               \--*  ARR_ADDR  byref int[] $2c0
N012 (  6, 12) [000014] -----+-N---                                  \--*  ADD       byref  $241
N010 (  6, 12) [000013] -----+-N---                                     +--*  ADD       byref  $240
N005 (  1,  1) [000005] -----+-----                                     |  +--*  LCL_VAR   ref    V00 arg0         u:1 (
last use) $80
N009 (  4, 10) [000011] -----+-----                                     |  \--*  LSH       long   $1c1
N007 (  2,  5) [000009] -----+---U-                                     |     +--*  CAST      long <- uint $1c0
N006 (  1,  1) [000006] -----+-----                                     |     |  \--*  LCL_VAR   int    V01 arg1        
 u:1 (last use) $c0
N008 (  1,  4) [000010] -----+-N---                                     |     \--*  CNS_INT   long   2 $200
N011 (  1,  4) [000012] -----+-----                                     \--*  CNS_INT   long   16 $202

And an optimized assembly:

IN0007: 000028      slli           a1, a1, 2
IN0008: 00002C      add            a0, a0, a1
                             ; gcrRegs -[a0]
                             ; byrRegs +[a0]
IN0009: 000030      lw             a0, 16(a0)
                             ; byrRegs -[a0]

But I am not so sure how it would affect with CSE/hoist optimization. I think with the limitation of RISC-V addressing, it's not an issue. What do you think? Maybe I should benchmark "sum all elements" like in this PR: #61293 ?

@jakobbotsch
Copy link
Member

Thanks for checking. I think what you have here looks reasonable as well.

Copy link
Contributor

@tomeksowi tomeksowi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

@jakobbotsch
Copy link
Member

Can you please follow the instructions in #112978 (comment) to agree with the CLA?

@fuad1502
Copy link
Contributor Author

fuad1502 commented Mar 4, 2025

@dotnet-policy-service agree company="Samsung Electronics"

@jakobbotsch jakobbotsch merged commit eb30fcf into dotnet:main Mar 4, 2025
112 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-riscv Related to the RISC-V architecture area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants