Skip to content

ARM64 SVE: Duplicating a scalar to a vector is non optimal #108321

Closed
@a74nh

Description

@a74nh

SVE provides LD1RW to load a single 32bit value from memory and broadcast to all lanes in a vector.

I'm not quite sure why we didn't add this to the SVE API.

However, this can be done via:

Vector<uint> vec = Sve.DuplicateSelectedScalarToVector(Sve.LoadVector(Sve.CreateTrueMaskUInt32(), input), 0);

Which produces:

            ptrue   p0.s
            ld1w    { z17.s }, p0/z, [x7]
            mov     z17.s, s17

This could be optimised to:

            ptrue   p0.s
            ld1rw    { z17.s }, p0/z, [x7]

Regardless of whether an API method is added, the optimisation should be done.

Metadata

Metadata

Assignees

Labels

Priority:2Work that is important, but not critical for the releasearea-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIarm-sveWork related to arm64 SVE/SVE2 support

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions