ARM64 SVE: Duplicating a scalar to a vector is non optimal

SVE provides [LD1RW](https://docsmirror.github.io/A64/2023-09/ld1rw_z_p_bi.html) to load a single 32bit value from memory and broadcast to all lanes in a vector.

I'm not quite sure why we didn't add this to the SVE API.

However, this can be done via:
```
Vector<uint> vec = Sve.DuplicateSelectedScalarToVector(Sve.LoadVector(Sve.CreateTrueMaskUInt32(), input), 0);
```

Which produces:
```
            ptrue   p0.s
            ld1w    { z17.s }, p0/z, [x7]
            mov     z17.s, s17
```

This could be optimised to:
```
            ptrue   p0.s
            ld1rw    { z17.s }, p0/z, [x7]
```

Regardless of whether an API method is added, the optimisation should be done.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ARM64 SVE: Duplicating a scalar to a vector is non optimal #108321

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ARM64 SVE: Duplicating a scalar to a vector is non optimal #108321

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions