Sve: Preliminary support for agnostic VL for JIT scenarios #115948

kunalspathak · 2025-05-23T18:28:11Z

Overview

In .NET 9, we added SVE support to work on hardware that has vector length (VL) of 16-bytes (16B) long. This prohibits developer from using SVE feature on hardware that supports different vector lengths or for NativeAOT scenarios, where binaries once compiled for a particular VL, will need recompilation to run on hardware having different VL. This PR adds the preliminary support of limited vector lengths (32 bytes and 64 bytes) for JIT scenario. There will be follow-up PRs to include support for other vector lengths as well as for NativeAOT.

Vector<T> is the .NET's vector length agnostic type and we will leverage this type to generate SVE instructions. Currently, the heuristics is set such that Vector<T> will continue to generate NEON instructions if underlying VL is 16B. Only if VL > 16B, we will start generating SVE instructions for them.

TYP_SIMD*

SVE has variable length vectors ranging from 16B ~ 256B and should be power of 2. So applicable vector lengths can be 16B, 32B, 64B, 128B and 256B. This PR adds preliminary support for agnostic VL by reusing some of the existing logic of xarch around TYP_SIMD32 and TYP_SIMD64 and can be further expanded to TYP_SIMD128 and TYP_SIMD256. It was easier to port the logic at various places using existing higher vector length types rather than creating a type whose size will be determined at runtime and then handling the new type throughout the code base specially around value numbering. Once all the issues are ironed out, I will reconsider adding generalized TYP_SIMD type instead of 32B, 64B, etc.

Vector

Today, Vector<T> type is mapped to corresponding Vector128<T> intrinsics methods to generate NEON instructions. This is because NEON instructions operate on 16B data. We will detect the vector length and if it is > 16B, we will use SVE instructions. To do that, we will stop mapping Vector<T> -> Vector128<T>, but instead, introduced new intrinsics based on Vector<T>. These intrinsics correspond to the methods available on Vector<T>. Next, we will propagate these intrinsics throughout the code base. During codegen, when we see an intrinsic of Vector<T> type, we would know that we need to generate SVE instruction instead of NEON instruction.

Register allocation

In .NET 9, we adopted custom ABI for SVE registers. For now, we will continue to use that ABI. At call boundary, only lower-half of v8~v15 is callee-saved and today, we preserve the upper-half of live SIMD registers into those registers. Since SVE registers are wider, we might need more than v8~v15 to preserve the upper portion of the killed registers. Hence, I decided to just spill them on stack. In future, when we fine tune our ABI, we will update this design.

Other optimizations

In xarch, there are several other optimizations like ReadUtf8 or Memmove that takes benefit of higher VL. I tried to enable them for Arm64 with higher VL, but for some of them, I was not able to find an optimal equivalent SVE instructions. Some needed support of SVE2 instructions. Hence, I decided to not do any optimization around this. We will enable them in future incrementally.

Testing

I have introduced a DEBUG flag DOTNET_UseSveForVectorT. When this is set, we will hardcode the VL to 32B in order to kick off the Vector<T>/SVE path I mentioned above. This approach will work for superpmi / jitstress testing. I need to still validate its functioning during actual execution on Cobalt machines that just have 16B VL. I thought about introducing a flag like DOTNET_MinVectorTLengthForSve, which basically will specify what is the minimum vector length needed to trigger SVE instructions, and during testing, we could have set it to 16B, however I soon realized that there were lot of code paths, that takes dependency on TYP_SIMD16 and generate NEON instructions. Having DOTNET_UseSveForVectorT felt like better approach.

TODOs

There are several TODOs that I will address before marking the PR for review, but others might have to be done incrementally.

Reference: #115037

Examples

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static bool Test2(Vector<int> a, Vector<int> b)
    {
        return Vector.LessThanAll(a, b);
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void Test()
    {
        var a = GetVector<int>(5);
        var b = GetVector<int>(5);
        Vector<int> c = a + b;
        Consume(c);
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void Test(Vector<int> a)
    {
        var b = GetVector<int>(5);
        Vector<int> c = a + b;
        Consume(c);
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static Vector<int> Test(Vector<int> a)
    {
        var b = GetVector<int>(5);
        var c = a << 8;
        Consume(c);
        return Cond() ? c : b;
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void Test(Vector<int> a)
    {
        Vector<float> b = GetVector<float>(5.9f);
        Vector<float> c = GetVector<float>(5.9f);
        var result = Sve.CompareGreaterThan(b, c);
        Consume(result);
    }

kunalspathak · 2025-06-20T16:00:10Z

/azp run runtime-coreclr outerloop

azure-pipelines · 2025-06-20T16:00:31Z

Azure Pipelines successfully started running 1 pipeline(s).

kunalspathak · 2025-06-20T18:18:07Z

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-coreclr jitstressregs

azure-pipelines · 2025-06-20T18:18:31Z

Azure Pipelines successfully started running 3 pipeline(s).

kunalspathak · 2025-06-25T13:55:58Z

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-coreclr jitstressregs

azure-pipelines · 2025-06-25T13:56:24Z

Azure Pipelines successfully started running 3 pipeline(s).

risc-vv · 2025-07-04T08:11:16Z

@dotnet/samsung Could you please take a look? These changes may be related to riscv64.

risc-vv · 2025-07-09T11:40:35Z

RISC-V Release-FX-QEMU: 275972 / 277031 (99.62%)

=======================
      passed: 275972
      failed: 1053
     skipped: 39
      killed: 6
------------------------
 TOTAL tests: 277070
VIRTUAL time: 30h 12min 7s 686ms
   REAL time: 1h 11min 49s 285ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

Build information and commands

GIT: 7f88033009d1b80bdc860e9ead1343b2dae4b7aa
CI: d6c9c1ab3a7411819463edc05ded301e89ba586a
REPO: kunalspathak/runtime
BRANCH: variable-vl-3
CONFIG: Release
LIB_CONFIG: Release

risc-vv · 2025-07-28T09:21:36Z

@dotnet/samsung Could you please take a look? These changes may be related to riscv64.

a74nh · 2025-08-07T10:45:55Z

Checking the status of this PR on a graviton 3 (which has SVE256)

With this program

using System;
using System.Numerics;
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.Arm;
using System.Runtime.CompilerServices;

public class Program
{
    public static sbyte s_1;

    public static void Main()
    {
        if (Sve.IsSupported)
        {
            var a = Vector.Create<int>(42);
            var b = Vector.Create<int>(43);
            var c = a + b;
            Consume(c);
        }
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void Consume<T>(T v)
    {
        Console.WriteLine(v);
    }
}

Running as is:

❯ $CORE_ROOT/corerun ./bin/Release/net10.0/vector.dll

Assert failure(PID 228275 [0x00037bb3], Thread: 228275 [0x37bb3]): Assertion failed 'unreached' in 'System.SpanHelpers:SequenceCompareTo(byref,int,byref,int):int' during 'Generate code' (IL size 308; hash 0xbda601aa; Instrumented Tier0)

    File: /home/alahay01/dotnet/runtime_table/src/coreclr/jit/emitarm64sve.cpp:4438
    Image: /home/alahay01/dotnet/runtime_table/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/corerun

Breaking just before the error with dumps on:

Generating: N182 (???,???) [000249] -----------                            IL_OFFSET void   INLRT @ 0x066[E-] REG NA
Generating: N184 (???,???) [000141] -----+-----                  t141 =    LCL_VAR   byref  V00 arg0          x0 REG x0
Mapped BB30 to G_M65109_IG07
IN0044:             ldr     x0, [fp, #0x58]	// [V00 arg0]
							Byref regs: 0000 {} => 0001 {x0}
Generating: N186 (???,???) [000142] -----+-----                  t142 =    LCL_VAR   long   V06 loc2          x1 REG x1
IN0045:             ldr     x1, [fp, #0x30]	// [V06 loc2]
Generating: N188 (???,???) [000143] -c---+-----                  t143 =    CNS_INT   long   1 REG NA
                                                                        /--*  t142   long
                                                                        +--*  t143   long
Generating: N190 (???,???) [000144] -----+-----                  t144 = *  LSH       long   REG x1
IN0046:             lsl     x1, x1, #1
                                                                        /--*  t141   byref
                                                                        +--*  t144   long
Generating: N192 (???,???) [000145] -c---+-----                  t145 = *  LEA(b+(i*1)+0) byref  REG NA
                                                                        /--*  t145   byref
Generating: N194 (???,???) [000146] U--XG+-----                  t146 = *  IND       simd32 REG d16
							Byref regs: 0001 {x0} => 0000 {}

Thread 1 "corerun" hit Breakpoint 1, emitter::emitInsSve_R_R_R (this=0xffbef4036260, ins=INS_sve_ldr,
    attr=EA_SCALABLE, reg1=JITREG_V16, reg2=JITREG_R0, reg3=JITREG_R1, opt=INS_OPTS_NONE,
    sopt=INS_SCALABLE_OPTS_NONE) at /home/alahay01/dotnet/runtime_table/src/coreclr/jit/emitarm64sve.cpp:2954

I get the same error regardless of whether DOTNET_UseSveForVectorT is set or not. That'll be due to coreclr auto-enabling SVE for VectorT when vector length >128.

When in 128bit SVE mode on graviton3:

❯ $CORE_ROOT/corerun ./bin/Release/net10.0/vector.dll
<85, 85, 85, 85>

IN000b: 000000      stp     fp, lr, [sp, #-0x20]!
IN000c: 000004      mov     fp, sp
IN000d: 000008      str     xzr, [fp, #0x10]	// [V00 loc0]
IN000e: 00000C      str     xzr, [fp, #0x18]	// [V00 loc0+0x08]

G_M27646_IG02:        ; offs=0x000010, size=0x0028, bbWeight=1, PerfScore 10.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB02 [0001], byref

IN0001: 000010      movi    v16.4s, #0x2B
IN0002: 000014      str     q16, [fp, #0x10]	// [V00 loc0]
IN0003: 000018      ldr     q16, [fp, #0x10]	// [V00 loc0]
IN0004: 00001C      movi    v17.4s, #0x2A
IN0005: 000020      add     v0.4s, v16.4s, v17.4s
IN0006: 000024      movz    x0, #0x4B70      // code for Program:Consume[System.Numerics.Vector`1[int]](System.Numerics.Vector`1[int])
IN0007: 000028      movk    x0, #0x9E01 LSL #16
IN0008: 00002C      movk    x0, #0xF720 LSL #32
IN0009: 000030      ldr     x0, [x0]
IN000a: 000034      blr     x0

G_M27646_IG03:        ; offs=0x000038, size=0x0008, bbWeight=1, PerfScore 2.00, epilog, nogc, extend

IN000f: 000038      ldp     fp, lr, [sp], #0x20
IN0010: 00003C      ret     lr

With DOTNET_UseSveForVectorT=1:

❯ $CORE_ROOT/corerun ./bin/Release/net10.0/vector.dll
<85, 85, 85, 85>

IN000b: 000000      stp     fp, lr, [sp, #-0x20]!
IN000c: 000004      mov     fp, sp
IN000d: 000008      str     xzr, [fp, #0x10]	// [V00 loc0]
IN000e: 00000C      str     xzr, [fp, #0x18]	// [V00 loc0+0x08]

G_M27646_IG02:        ; offs=0x000010, size=0x0028, bbWeight=1, PerfScore 14.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB02 [0001], byref

IN0001: 000010      mov     z16.s, #43
IN0002: 000014      str     q16, [fp, #0x10]	// [V00 loc0]
IN0003: 000018      mov     z16.s, #42
IN0004: 00001C      ldr     q17, [fp, #0x10]	// [V00 loc0]
IN0005: 000020      add     z0.s, z16.s, z17.s
IN0006: 000024      movz    x0, #0x4B40      // code for Program:Consume[System.Numerics.Vector`1[int]](System.Numerics.Vector`1[int])
IN0007: 000028      movk    x0, #0x8E03 LSL #16
IN0008: 00002C      movk    x0, #0xE411 LSL #32
IN0009: 000030      ldr     x0, [x0]
IN000a: 000034      blr     x0

G_M27646_IG03:        ; offs=0x000038, size=0x0008, bbWeight=1, PerfScore 2.00, epilog, nogc, extend

IN000f: 000038      ldp     fp, lr, [sp], #0x20
IN0010: 00003C      ret     lr

a74nh · 2025-08-07T11:25:11Z

Assert failure(PID 228275 [0x00037bb3], Thread: 228275 [0x37bb3]): Assertion failed 'unreached' in 'System.SpanHelpers:SequenceCompareTo(byref,int,byref,int):int' during 'Generate code' (IL size 308; hash 0xbda601aa; Instrumented Tier0) - this is due to the WriteLine call.

Removing the WriteLine (so that Consume is empty) gives a segfault.

; Assembly listing for method Program:Main() (Tier0)
; Emitting BLENDED_CODE for generic ARM64 + SVE on Unix
; Tier0 code
; fp based frame
; partially interruptible
; compiling with minopt
; Final local variable assignments
;
;  V00 loc0         [V00    ] (  1,  1   )  simd32  ->  [fp+0x10]   do-not-enreg[S] must-init <System.Numerics.Vector`1[int]>
;# V01 OutArgs      [V01    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
;
; Lcl frame size = 32

G_M27646_IG01:  ;; offset=0x0000
            nop
            brk     #0
            stp     fp, lr, [sp, #-0x30]!
            mov     fp, sp
            str     xzr, [fp, #0x10]	// [V00 loc0]
            str     xzr, [fp, #0x18]	// [V00 loc0+0x08]
            str     xzr, [fp, #0x20]	// [V00 loc0+0x10]
            str     xzr, [fp, #0x28]	// [V00 loc0+0x18]
						;; size=32 bbWeight=1 PerfScore 7.00
G_M27646_IG02:  ;; offset=0x0020
            mov     z16.s, #43
            str     z16, [fp, #1, mul vl]	// [V00 loc0]
            mov     z16.s, #42
            ldr     z17, [fp, #1, mul vl]	// [V00 loc0]
            add     z0.s, z16.s, z17.s
            movz    x0, #0x4A50      // code for Program:Consume[System.Numerics.Vector`1[int]](System.Numerics.Vector`1[int])
            movk    x0, #0x5E02 LSL #16
            movk    x0, #0xF909 LSL #32
            ldr     x0, [x0]
            blr     x0
						;; size=40 bbWeight=1 PerfScore 18.50
G_M27646_IG03:  ;; offset=0x0048
            ldp     fp, lr, [sp], #0x30
            ret     lr
						;; size=8 bbWeight=1 PerfScore 2.00

; Total bytes of code 80, prolog size 32, PerfScore 27.50, instruction count 20, allocated bytes for code 80 (MethodHash=cb019401) for method Program:Main() (Tier0)

Thread 1 "corerun" received signal SIGTRAP, Trace/breakpoint trap.
0x0000ffffa9bc32ac in ?? ()
(gdb) bt
#0  0x0000ffffa9bc32ac in ?? ()
#1  0x0000fffff769389c in NativeExceptionHolderBase::Push (this=0xffffffffdb20)
    at /home/alahay01/dotnet/runtime_table/src/coreclr/pal/inc/pal.h:4032
#2  CallDescrWorkerWithHandler (pCallDescrData=0xffffffffdd20, fCriticalCall=0)
    at /home/alahay01/dotnet/runtime_table/src/coreclr/vm/callhelpers.cpp:57
#3  0x0000000000000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)

Edit: With DOTNET_TieredCompilation=0 get the same error. The SVE ldr/str above are optimised away, so it can't be that

dotnet-policy-service · 2025-09-06T15:00:08Z

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

kunalspathak added 30 commits March 28, 2025 11:12

Capture g_sve_length and compVectorTLength

d22af4f

Add InstructionSet_Vector

41a1d05

Add CORINFO_HFA_ELEM_VECTOR_VL

c7d8ede

Update the type of TYP_SIMD

926eb69

Passing Vector<T> to args and returns

2b39810

Rename TYP_SIMD -> TYP_SIMDVL

cf9ea60

Fix code to save/restore upper registers of VL

21f364b

misc changes

7a513ed

Bring TYP_SIMD32 and TYP_SIMD64 for Arm64

b1c9833

Eliminate TYP_SIMDVL

4f92c23

basic scneario of calling args/returning args

6e63a3c

returning Vectors

1eb159f

fix a bug

df7203f

standalone fix to generate sve mov instead of NEON mov

734aba5

standalone fix to generate ldr/str when emit_RR is called

a71b8de

Support Vector.Create

2e8cfd5

Do not do sve_mov for scalar variant

1d74f82

Support Vector.As

699d2e1

Support Vector.Abs

7f8ff24

Support Vector.Add

3d19d51

Introduce VariableVectorLength env variable

70c09f9

Support Vector.AndNot

53df3d7

Support Vector.As*

b1d4ce9

Support Vector.BitwiseAnd/BitwiseOr

29564cb

Support Vector.ConvertTo*

45ab7b9

Add CreateFalseMaskAll intrinsic

3837693

Temporary fix for scratch register size calculation. Need to revisit

ca1675c

Fix to squash in 9542e9cd047

7774e07

Support Vector.Equals*, GreaterThan*, LessThan*

c170a7e

Support Vector.Max/MaxNative

15f0384

kunalspathak added 5 commits June 19, 2025 00:01

Add entry for VectorMath test in ISA

9074461

Fix CreateSequence for float/double

bcb7bee

MUL with DuplicateScalarToVector

f151c64

Merge remote-tracking branch 'origin/main' into variable-vl-3

838ce58

fix merge conflict errors

39374e3

build-analysis bot mentioned this pull request Jun 20, 2025

browser-wasm Windows build error #116746

Closed

kunalspathak added 3 commits June 19, 2025 18:17

Fix the value numbering

324d241

disable Sve when it is not available

fc24657

jit format

a997047

kunalspathak added 5 commits June 23, 2025 17:42

fix the cmpOpNode return to TYP_MASK

f10bb0b

Merge remote-tracking branch 'origin/main' into variable-vl-3

303d7ce

fix merge conflict errors

49a536a

Merge remote-tracking branch 'origin/main' into variable-vl-3

8368b81

fix merge conflicts

61ed25f

build-analysis bot mentioned this pull request Jun 25, 2025

LibraryImportGenerator.Unit.Tests crashing on linux-x64 mono interpreter #100800

Open

fix parameter ordering because of bad merge conflict resolution

7f88033

JulieLeeMSFT unassigned kunalspathak Jul 30, 2025

dotnet-policy-service bot closed this Sep 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sve: Preliminary support for agnostic VL for JIT scenarios #115948

Sve: Preliminary support for agnostic VL for JIT scenarios #115948

Uh oh!

kunalspathak commented May 23, 2025 •

edited

Loading

Uh oh!

kunalspathak commented Jun 20, 2025

Uh oh!

azure-pipelines bot commented Jun 20, 2025

Uh oh!

kunalspathak commented Jun 20, 2025

Uh oh!

azure-pipelines bot commented Jun 20, 2025

Uh oh!

kunalspathak commented Jun 25, 2025

Uh oh!

azure-pipelines bot commented Jun 25, 2025

Uh oh!

risc-vv commented Jul 4, 2025

Uh oh!

risc-vv commented Jul 9, 2025 •

edited

Loading

Uh oh!

risc-vv commented Jul 28, 2025

Uh oh!

a74nh commented Aug 7, 2025

Uh oh!

a74nh commented Aug 7, 2025 •

edited

Loading

Uh oh!

dotnet-policy-service bot commented Sep 6, 2025

Uh oh!

Uh oh!

Sve: Preliminary support for agnostic VL for JIT scenarios #115948

Sve: Preliminary support for agnostic VL for JIT scenarios #115948

Uh oh!

Conversation

kunalspathak commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

TYP_SIMD*

Vector

Register allocation

Other optimizations

Testing

TODOs

Examples

Uh oh!

kunalspathak commented Jun 20, 2025

Uh oh!

azure-pipelines bot commented Jun 20, 2025

Uh oh!

kunalspathak commented Jun 20, 2025

Uh oh!

azure-pipelines bot commented Jun 20, 2025

Uh oh!

kunalspathak commented Jun 25, 2025

Uh oh!

azure-pipelines bot commented Jun 25, 2025

Uh oh!

risc-vv commented Jul 4, 2025

Uh oh!

risc-vv commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

risc-vv commented Jul 28, 2025

Uh oh!

a74nh commented Aug 7, 2025

Uh oh!

a74nh commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dotnet-policy-service bot commented Sep 6, 2025

Uh oh!

Uh oh!

kunalspathak commented May 23, 2025 •

edited

Loading

risc-vv commented Jul 9, 2025 •

edited

Loading

a74nh commented Aug 7, 2025 •

edited

Loading