Skip to content

Superfluous vmovaps for Vector256<T>.GetElement() #433

Open
@damageboy

Description

@damageboy

Repro Repo:

https://github.com/damageboy/coreclr-redundant-vmovaps

Relevant piece of code:

https://github.com/damageboy/coreclr-redundant-vmovaps/blob/749eb3c2e753770cc8087116cb0b3ddf6ef39fdc/Program.cs#L24-L27

            var e0 = P.GetElement(0);
            var e1 = P.GetElement(1);
            var e2 = P.GetElement(2);
            var e3 = P.GetElement(3);

Generated asm:

https://github.com/damageboy/coreclr-redundant-vmovaps/blob/749eb3c2e753770cc8087116cb0b3ddf6ef39fdc/listing.asm#L22-L34

00007F67249407A4 C5FC28C8             vmovaps ymm1,ymm0
00007F67249407A8 C5F97ECB             vmovd   ebx,xmm1


;             var e1 = P.GetElement(1);
00007F67249407AC C5FC28C8             vmovaps ymm1,ymm0
00007F67249407B0 C4C37916CE01         vpextrd r14d,xmm1,1


;             var e2 = P.GetElement(2);
00007F67249407B6 C5FC28C8             vmovaps ymm1,ymm0
00007F67249407BA C4C37916CF02         vpextrd r15d,xmm1,2


;             var e3 = P.GetElement(3);
00007F67249407C0 C4C37916C403         vpextrd r12d,xmm0,3

Issue

In the asm listing, you can see that the first 3 GetElement() calls generate a superfluous vmovaps
to copy ymm0 to ymm1 before issuing vmovd for the first element or vpextrd for
elements 1-3.

For some reason, the first 3 are generating this extra copy/opcodes.
The 4th call is "doing the right thing", in that it simply extracts directly from xmm0 (the lower 128 bits of ymm0) without extra fanfare.

category:cq
theme:vector-codegen
skill-level:intermediate
cost:medium
impact:small

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIoptimization

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions