Skip to content

Have CSE correctly track float/simd and mask registers for their own enregistration #117118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 1, 2025

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented Jun 28, 2025

This updates CSE to correctly track float/simd and mask registers for their own enregistration.

Previously the logic attempted to skip the handling of these types or even misclassified simd/mask under the integer enregistration limits. This could lead to weird diffs and register selection in some cases due to suboptimal CSE considerations.

As is typical with these changes, it's "difficult" to assess whether its an improvement or not just by looking at the diffs, because the CSE decisions impact other optimizations, register selection, etc. The diffs are primarily impacting methods that use floating-point or SIMD and generally look positive overall.

In general, we're ending up with fewer things marked as CSE moderate and instead more being marked as CSE aggressive instead. This is because we're no longer eating up the CSE budget by classifying everything as integer. This leads to many cases with more total CSEs:

-;  V94 cse0         [V94,T11] (  3,  1.50)    long  ->   x1         "CSE #05: moderate"
-;  V95 cse1         [V95,T02] (  3, 12   )    long  ->   x5         "CSE #02: moderate"
-;  V96 cse2         [V96,T14] (  4, 15.84)   simd8  ->  d18         "CSE #07: moderate"
-;  V97 cse3         [V97,T15] (  4, 15.84)   simd8  ->  d19         "CSE #08: moderate"
+;  V94 cse0         [V94,T12] (  3,  1.50)    long  ->   x1         "CSE #05: moderate"
+;  V95 cse1         [V95,T02] (  3, 12   )    long  ->   x5         "CSE #02: aggressive"
+;  V96 cse2         [V96,T15] (  4, 15.84)   simd8  ->  d18         "CSE #07: aggressive"
+;  V97 cse3         [V97,T16] (  4, 15.84)   simd8  ->  d19         "CSE #08: aggressive"
+;  V98 cse4         [V98,T03] (  3, 11.88)    long  ->   x4         "CSE #06: aggressive"

This in turn shifts some temp numbers and impacts everything else in the method. Most of which look to be positive changes, such as:

-						;; size=236 bbWeight=1 PerfScore 81.00
+						;; size=220 bbWeight=1 PerfScore 73.00

Which is because we no longer reload some constants and do other computations unnecessarily:

; gcrRegs +[x19]
             ldr     d16, [@RWD00]
             ldr     d17, [@RWD08]
-            fmin    d16, d16, d17
-            str     d16, [fp, #0x30]	// [V08 cse0]
-            str     d16, [x19, #0x08]
-            ldr     d17, [@RWD16]
-            ldr     d18, [@RWD24]
-            fmin    d17, d17, d18
-            str     d17, [fp, #0x28]	// [V09 cse1]
-            str     d17, [x19, #0x10]
-            ldr     d18, [@RWD00]
-            ldr     d19, [@RWD08]
-            fmax    d18, d18, d19
-            str     d18, [fp, #0x20]	// [V10 cse2]
-            str     d18, [x19, #0x18]
+            fmin    d18, d16, d17
+            str     d18, [fp, #0x30]	// [V08 cse0]
+            str     d18, [x19, #0x08]
             ldr     d19, [@RWD16]
             ldr     d20, [@RWD24]
-            fmax    d19, d19, d20
-            str     d19, [fp, #0x18]	// [V11 cse3]
-            str     d19, [x19, #0x20]
+            fmin    d21, d19, d20
+            str     d21, [fp, #0x28]	// [V09 cse1]
+            str     d21, [x19, #0x10]
+            fmax    d16, d16, d17
+            str     d16, [fp, #0x20]	// [V10 cse2]
+            str     d16, [x19, #0x18]
+            fmax    d17, d19, d20
+            str     d17, [fp, #0x18]	// [V11 cse3]
+            str     d17, [x19, #0x20]

There's some tuning to be done, however, because there's some places where things like zero constants are being CSE'd where it'd be better to have morph mark against that so we can still optimize accordingly (which should be a simple fix):

             mov     w3, wzr
 						;; size=16 bbWeight=0.50 PerfScore 1.00
 G_M48160_IG04:        ; bbWeight=3.96, gcrefRegs=0000 {}, byrefRegs=0005 {x0 x2}, byref, isz
-            ldr     d18, [x0, w3, UXTW #3]
-            ldr     d19, [x2, w3, UXTW #3]
+            mov     w4, w3
+            ldr     d18, [x0, x4, LSL #3]
+            ldr     d19, [x2, x4, LSL #3]

There are many similar cases on x64 and with some similar fine tuning that could be done.

@tannergooding tannergooding added the NO-REVIEW Experimental/testing PR, do NOT review it label Jun 28, 2025
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 28, 2025
@tannergooding tannergooding changed the title Don't have CSE track simd or mask registers as part of the integer enregCount Have CSE correctly track simd and mask registers for their own enregistration Jun 28, 2025
@tannergooding tannergooding changed the title Have CSE correctly track simd and mask registers for their own enregistration Have CSE correctly track float/simd and mask registers for their own enregistration Jun 28, 2025
@tannergooding tannergooding removed the NO-REVIEW Experimental/testing PR, do NOT review it label Jun 28, 2025
@tannergooding
Copy link
Member Author

CC. @dotnet/jit-contrib. This should be ready for review.

There's a minor (+0.02% for most, but up to +0.04% on x64 Windows) throughput hit, but this gets us more into a correct shape for CSE considerations and removes some historical quirks where we were eating the integer CSE budget due to floating-point and mask types being in the method. So I think it's worth taking and then doing some follow-up targeted fixes (likely around not CSE'ing certain containable constants) where possible.

@tannergooding tannergooding marked this pull request as ready for review June 28, 2025 21:34
@risc-vv
Copy link

risc-vv commented Jun 28, 2025

@dotnet/samsung Could you please take a look? These changes may be related to riscv64.

@risc-vv
Copy link

risc-vv commented Jun 28, 2025

RISC-V Release-CLR-QEMU: 9086 / 9116 (99.67%)
=======================
      passed: 9086
      failed: 2
     skipped: 600
      killed: 28
------------------------
 TOTAL tests: 9716
VIRTUAL time: 35h 20min 51s 144ms
   REAL time: 36min 7s 681ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-CLR-VF2: 9087 / 9117 (99.67%)
=======================
      passed: 9087
      failed: 2
     skipped: 600
      killed: 28
------------------------
 TOTAL tests: 9717
VIRTUAL time: 10h 43min 39s 782ms
   REAL time: 43min 36s 331ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-FX-QEMU: 335773 / 336836 (99.68%)
=======================
      passed: 335773
      failed: 1054
     skipped: 39
      killed: 9
------------------------
 TOTAL tests: 336875
VIRTUAL time: 32h 34min 59s 596ms
   REAL time: 1h 11min 18s 986ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-FX-VF2: 499778 / 501520 (99.65%)
=======================
      passed: 499778
      failed: 1734
     skipped: 39
      killed: 8
------------------------
 TOTAL tests: 501559
VIRTUAL time: 20h 55min 7s 21ms
   REAL time: 2h 18min 24s 878ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

Build information and commands

GIT: 8cde53693255d7d0fa117f8c1bae90bfd35974f2
CI: bf2351b4b801509d156b00d7a6d660aed10fd64f
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

@tannergooding
Copy link
Member Author

@jakobbotsch I requested your review since you touch CSE most frequently.

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me. Pinged @AndyAyersMS on a question related to the CSE ML experiment.

@risc-vv
Copy link

risc-vv commented Jul 1, 2025

bed117e is being scheduled for building and testing

GIT: bed117e4ea9353fa48b66c9ccb88415dfd91583d
REPO: dotnet/runtime
BRANCH: main

@tannergooding tannergooding merged commit d5f9aa5 into dotnet:main Jul 1, 2025
114 of 117 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants