Skip to content

[mono][2/2] Add SIMD Support for s390x #116779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

saitama951
Copy link
Contributor

This is a follow-up patch to #116669 to add vector support to s390x

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jun 18, 2025
@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jun 18, 2025
@saitama951
Copy link
Contributor Author

@uweigand @nealef @iii-i , Can you please review the architecture specific code?

@saitama951
Copy link
Contributor Author

@@ -6713,7 +6873,7 @@ mono_emit_common_intrinsics (MonoCompile *cfg, MonoMethod *cmethod, MonoMethodSi
* for function arguments. When using SIMD intrinsics arguments optimized into OP_ARG needs to be decomposed
* into correspondig SIMD LOADX/STOREX instructions.
*/
#if defined(TARGET_WIN32) && defined(TARGET_AMD64)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: This is also dependent on a fix from here #116433

This is a followup patch to dotnet#116669 to add vector support to s390x
Copy link
Contributor

Tagging subscribers to this area: @steveisok, @vitek-karas
See info in area-owners.md if you want to be subscribed.

1 similar comment
Copy link
Contributor

Tagging subscribers to this area: @steveisok, @vitek-karas
See info in area-owners.md if you want to be subscribed.

@jkotas jkotas added arch-s390x Related to s390x architecture (unsupported) and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jun 18, 2025
@@ -2190,7 +2533,8 @@ mono_arch_peephole_pass_2 (MonoCompile *cfg, MonoBasicBlock *bb)
void
mono_arch_lowering_pass (MonoCompile *cfg, MonoBasicBlock *bb)
{
MonoInst *ins, *next;
MonoInst *ins, *next, *temp_ins;
int temp;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And that's the extent of any updates/corrections/deletions! What are the benchmarks like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nealef

| Faster                                                                           | base/diff |
| -------------------------------------------------------------------------------- | ---------:|
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.LessThanOrEqualAnyBenchm |    295.76 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.GreaterThanOrEqualAllBen |    290.34 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.LessThanOrEqualAnyBenchma |    288.97 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.GreaterThanOrEqualAllBenc |    280.58 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.LessThanAnyBenchmark     |    268.06 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.EqualsAnyBenchmark       |    267.94 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.GreaterThanAllBenchmark  |    264.60 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.GreaterThanAllBenchmark   |    255.37 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.LessThanAnyBenchmark      |    255.25 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.EqualsAnyBenchmark        |    254.00 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.DotBenchmark             |    127.60 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.DotBenchmark              |    120.02 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.GreaterThanOrEqualAllBen |    116.31 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.GreaterThanOrEqualAllBe |    115.63 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.LessThanOrEqualAnyBench |    115.57 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.GreaterThanOrEqualAnyBe |    114.82 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.LessThanOrEqualAnyBenchmark  |    114.40 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.GreaterThanOrEqualAnyBenchma |    114.00 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.EqualsAnyBenchmark      |    113.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.LessThanAnyBenchmark     |    112.76 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.LessThanAnyBenchmark    |    112.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.EqualsAnyBenchmark       |    112.37 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.LessThanAnyBenchmark    |    112.30 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.LessThanAnyBenchmark         |    112.21 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.GreaterThanAnyBenchmark      |    112.09 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.EqualsAnyBenchmark           |    112.06 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.EqualsAnyBenchmark      |    111.88 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.GreaterThanAnyBenchmark |    111.56 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.DotBenchmark            |    111.01 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.DotBenchmark             |    110.70 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.LessThanOrEqualAnyBenchm |    108.94 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.LessThanOrEqualAnyBench |    102.42 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.GreaterThanAllBenchmark  |    101.82 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.SumBenchmark            |     99.50 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.SumBenchmark             |     96.56 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.GreaterThanAllBenchmark |     96.28 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.SumBenchmark            |     95.17 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.SumBenchmark             |     87.98 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.SumBenchmark                   |     86.62 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.DotBenchmark             |     85.62 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.DotBenchmark                   |     84.72 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.DotBenchmark            |     83.73 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.GreaterThanOrEqualAllBenchmark |     72.48 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.GreaterThanOrEqualAllBen |     72.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.LessThanOrEqualAnyBench |     71.86 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.LessThanOrEqualAnyBenchm |     71.38 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.LessThanOrEqualAnyBenchmark    |     71.35 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.GreaterThanOrEqualAllBe |     70.56 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.GreaterThanAllBenchmark |     69.77 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.EqualsAnyBenchmark       |     68.04 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.LessThanAnyBenchmark    |     68.01 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.EqualsAnyBenchmark      |     67.95 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.GreaterThanAllBenchmark  |     67.92 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.GreaterThanAllBenchmark        |     67.90 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.LessThanAnyBenchmark           |     67.71 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.LessThanAnyBenchmark     |     67.43 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.SumBenchmark             |     63.52 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.SumBenchmark              |     62.25 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.GreaterThanOrEqualAllBe |     57.81 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.LessThanOrEqualAnyBench |     57.71 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.GreaterThanOrEqualAnyBe |     57.64 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.GreaterThanOrEqualAllBen |     57.61 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.EqualsAnyBenchmark      |     57.39 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.GreaterThanAnyBenchmark |     57.26 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.LessThanAnyBenchmark    |     57.25 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.GreaterThanAllBenchmark  |     56.74 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.LessThanOrEqualAnyBenchm |     56.72 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.LessThanAnyBenchmark    |     56.70 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.LessThanOrEqualAnyBench |     56.64 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.EqualsAnyBenchmark      |     56.62 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.LessThanAnyBenchmark     |     56.62 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.EqualsAnyBenchmark       |     56.61 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.GreaterThanAllBenchmark |     56.56 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.SumBenchmark             |     52.56 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.SumBenchmark            |     52.41 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.EqualsAnyBenchmark             |     48.66 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.EqualsAllBenchmark           |     45.82 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.EqualsAllBenchmark      |     45.38 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.LessThanOrEqualAllBenchmark  |     40.50 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.LessThanOrEqualAllBench |     40.36 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.GreaterThanOrEqualAllBe |     40.24 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.GreaterThanOrEqualAllBenchma |     40.09 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.EqualityOperatorBenchma |     40.00 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.GreaterThanAllBenchmark |     39.89 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.GreaterThanAllBenchmark      |     39.89 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.LessThanAllBenchmark    |     39.83 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.LessThanAllBenchmark         |     39.78 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.EqualityOperatorBenchmark    |     39.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.EqualsBenchmark              |     38.77 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.EqualsBenchmark         |     38.66 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.EqualsAllBenchmark       |     36.25 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.EqualsAllBenchmark       |     36.11 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.EqualsAllBenchmark      |     36.02 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.EqualsAllBenchmark       |     35.96 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.EqualsAllBenchmark       |     35.92 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.EqualsAllBenchmark      |     35.80 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.EqualsAllBenchmark        |     35.45 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.EqualsAllBenchmark      |     35.32 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.EqualsAllBenchmark      |     35.27 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.GreaterThanOrEqualAnyBenchmark |     31.49 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.LessThanAllBenchmark     |     31.30 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.LessThanOrEqualAllBenchmark    |     31.30 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.LessThanOrEqualAllBenchm |     31.23 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.GreaterThanAnyBenchmark  |     31.20 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.LessThanAllBenchmark     |     31.14 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.LessThanAllBenchmark    |     31.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.LessThanOrEqualAllBenchm |     31.11 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.GreaterThanOrEqualAnyBen |     31.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.LessThanAllBenchmark           |     31.07 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.GreaterThanOrEqualAnyBen |     31.07 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.GreaterThanAnyBenchmark        |     31.04 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.GreaterThanOrEqualAnyBe |     30.99 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.GreaterThanOrEqualAnyBen |     30.99 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.GreaterThanAnyBenchmark  |     30.93 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.GreaterThanAllBenchmark |     30.93 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.GreaterThanAnyBenchmark  |     30.93 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.LessThanAllBenchmark     |     30.83 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.GreaterThanOrEqualAnyBe |     30.80 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.GreaterThanOrEqualAnyBe |     30.79 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.LessThanOrEqualAllBenchm |     30.69 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.EqualityOperatorBenchmark      |     30.52 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.EqualityOperatorBenchmark |     30.50 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.GreaterThanAnyBenchmark   |     30.47 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.LessThanOrEqualAllBenchma |     30.45 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.LessThanOrEqualAllBench |     30.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.LessThanAllBenchmark      |     30.41 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.GreaterThanOrEqualAnyBen |     30.35 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.LessThanOrEqualAllBenchm |     30.32 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.LessThanOrEqualAllBench |     30.28 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.GreaterThanOrEqualAnyBenc |     30.19 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.LessThanOrEqualAllBench |     30.09 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.GreaterThanAnyBenchmark |     30.00 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.EqualityOperatorBenchma |     29.90 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.GreaterThanAnyBenchmark |     29.89 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.EqualityOperatorBenchmar |     29.89 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.LessThanAllBenchmark    |     29.83 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.LessThanAllBenchmark     |     29.77 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.LessThanAllBenchmark    |     29.76 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.EqualityOperatorBenchmar |     29.76 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.GreaterThanAnyBenchmark |     29.72 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.GreaterThanAnyBenchmark  |     29.70 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.LessThanOrEqualAllBench |     29.68 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.GreaterThanOrEqualAllBe |     29.68 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.EqualityOperatorBenchmar |     29.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.LessThanAllBenchmark    |     29.64 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.EqualityOperatorBenchma |     29.64 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.EqualityOperatorBenchma |     29.55 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.EqualityOperatorBenchmar |     29.53 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.EqualityOperatorBenchma |     29.53 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.InequalityOperatorBench |     26.97 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.InequalityOperatorBenchmark  |     26.51 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.EqualsAllBenchmark             |     25.82 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.InequalityOperatorBench |     21.77 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.InequalityOperatorBench |     21.60 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.InequalityOperatorBenchm |     21.60 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.InequalityOperatorBenchm |     21.53 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.InequalityOperatorBench |     21.52 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.InequalityOperatorBenchmark    |     21.51 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.InequalityOperatorBenchm |     21.32 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.InequalityOperatorBenchm |     21.28 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.InequalityOperatorBench |     21.27 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.InequalityOperatorBenchma |     21.11 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.EqualsBenchmark         |     20.06 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.EqualsBenchmark          |     14.59 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.EqualsBenchmark         |     14.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.EqualsBenchmark          |     14.27 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.EqualsBenchmark          |     13.84 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.EqualsBenchmark         |     13.39 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.EqualsBenchmark         |     13.26 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.EqualsBenchmark                |     12.80 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.MaxBenchmark                 |     11.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.MinBenchmark             |     10.98 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.LessThanOrEqualBenchmark |     10.84 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.GreaterThanOrEqualBenchm |     10.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.LessThanOrEqualBenchmark  |     10.49 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.GreaterThanBenchmark     |     10.34 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.MaxBenchmark             |     10.31 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.MaxBenchmark            |     10.29 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.MinBenchmark            |     10.29 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.MinBenchmark                 |     10.27 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.AddBenchmark            |     10.26 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.MinBenchmark              |     10.26 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.SubtractBenchmark            |     10.25 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.EqualsStaticBenchmark    |     10.24 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.AddBenchmark                 |     10.24 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.LessThanBenchmark        |     10.24 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.GreaterThanOrEqualBenchma |     10.20 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.MultiplyBenchmark            |     10.17 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.AddOperatorBenchmark         |     10.16 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.SubtractBenchmark       |     10.14 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.MultiplyBenchmark       |     10.07 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.GreaterThanBenchmark      |     10.00 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.SubtractionOperatorBenc |      9.97 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.EqualsStaticBenchmark     |      9.91 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.LessThanBenchmark         |      9.90 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.MaxBenchmark              |      9.88 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128.CeilingFloatBenchmark             |      9.57 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.EqualsBenchmark           |      9.52 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128.FloorFloatBenchmark               |      9.51 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.GreaterThanBenchmark         |      9.37 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.EqualsStaticBenchmark        |      9.36 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.LessThanBenchmark            |      9.21 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.AbsBenchmark            |      9.17 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.EqualsBenchmark          |      9.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.SubtractionOperatorBenchmark |      9.07 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.AddOperatorBenchmark    |      9.02 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.MultiplyOperatorBenchmark    |      9.01 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.LessThanOrEqualBenchmark     |      9.00 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.MultiplyOperatorBenchma |      9.00 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.AbsBenchmark                 |      8.95 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.GreaterThanBenchmark    |      8.95 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.LessThanBenchmark       |      8.94 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.LessThanOrEqualBenchmar |      8.94 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.GreaterThanOrEqualBench |      8.93 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.GreaterThanOrEqualBenchmark  |      8.89 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.NegateBenchmark          |      8.88 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.UnaryNegateOperatorBench |      8.84 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.UnaryNegateOperatorBenchm |      8.48 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.NegateBenchmark           |      8.47 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.EqualsStaticBenchmark   |      8.40 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.ConditionalSelectBenchmark   |      8.24 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.ConditionalSelectBenchm |      8.20 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.ConditionalSelectBenchma |      8.13 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.ConditionalSelectBenchm |      8.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.ConditionalSelectBenchm |      8.09 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.ConditionalSelectBenchm |      8.09 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.ConditionalSelectBenchm |      8.07 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.ConditionalSelectBenchma |      8.04 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.ConditionalSelectBenchma |      8.01 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.ConditionalSelectBenchma |      7.98 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.ConditionalSelectBenchmar |      7.94 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.DotBenchmark            |      7.78 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.MultiplyOperatorBenchmar |      7.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.AbsBenchmark             |      7.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.MultiplyBenchmark        |      7.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.MultiplyOperatorBenchmark |      7.08 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.DivideBenchmark         |      7.01 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.MultiplyBenchmark         |      7.01 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.DivisionOperatorBenchmark    |      6.99 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.ConditionalSelectBenchmark     |      6.88 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.SubtractBenchmark        |      6.80 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.SubtractionOperatorBench |      6.79 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.AddOperatorBenchmark     |      6.76 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.AddBenchmark             |      6.75 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.SquareRootBenchmark          |      6.74 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.MultiplyOperatorBenchmar |      6.73 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.UnaryNegateOperatorBench |      6.66 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.SquareRootBenchmark     |      6.66 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.NegateBenchmark          |      6.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.DivideBenchmark              |      6.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.MinBenchmark             |      6.57 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.UnaryNegateOperatorBenc |      6.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.MultiplyBenchmark        |      6.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.NegateBenchmark         |      6.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.DivisionOperatorBenchma |      6.29 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.LessThanOrEqualBenchmark |      6.13 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.GreaterThanBenchmark     |      6.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.GreaterThanBenchmark    |      6.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.AddOperatorBenchmark      |      6.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.MinBenchmark            |      6.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.MultiplyOperatorBenchma |      6.07 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.SubtractionOperatorBenchm |      6.07 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.MultiplyBenchmark       |      6.06 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.SubtractBenchmark         |      5.89 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.MaxBenchmark             |      5.87 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.MaxBenchmark            |      5.84 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.LessThanOrEqualBenchmar |      5.83 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.SubtractBenchmark        |      5.81 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.SubtractionOperatorBench |      5.79 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.AddOperatorBenchmark     |      5.79 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.AddBenchmark             |      5.75 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.AbsBenchmark             |      5.73 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.AddBenchmark              |      5.71 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.LessThanBenchmark        |      5.67 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.GreaterThanOrEqualBenchm |      5.66 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.EqualsStaticBenchmark    |      5.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.LessThanBenchmark       |      5.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.EqualsStaticBenchmark   |      5.62 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.UnaryNegateOperatorBenchmark |      5.54 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.AddOperatorBenchmark    |      5.49 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.UnaryNegateOperatorBenc |      5.47 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.GreaterThanOrEqualBench |      5.45 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.SubtractionOperatorBenc |      5.42 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.NegateBenchmark              |      5.39 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.NegateBenchmark         |      5.37 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.SubtractBenchmark       |      5.35 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.MinBenchmark             |      5.15 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.UnaryNegateOperatorBenchmark   |      5.13 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.AddBenchmark            |      5.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.MinBenchmark                   |      5.07 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.NegateBenchmark          |      5.05 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.UnaryNegateOperatorBench |      5.04 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.NegateBenchmark         |      4.98 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128.FloorDoubleBenchmark              |      4.98 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128.CeilingDoubleBenchmark            |      4.95 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.MultiplyBenchmark              |      4.95 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.UnaryNegateOperatorBenc |      4.95 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.NegateBenchmark                |      4.93 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.MinBenchmark            |      4.85 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.MultiplyOperatorBenchmark      |      4.75 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.MultiplyBenchmark        |      4.71 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.MultiplyOperatorBenchmar |      4.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.MultiplyOperatorBenchma |      4.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.AddBenchmark            |      4.63 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.AddOperatorBenchmark    |      4.59 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.SubtractBenchmark       |      4.53 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.MultiplyBenchmark       |      4.52 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.SubtractionOperatorBenc |      4.51 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.AbsBenchmark            |      4.50 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.MultiplyBenchmark       |      4.50 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.UnaryNegateOperatorBenc |      4.48 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.NegateBenchmark          |      4.46 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.UnaryNegateOperatorBench |      4.46 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.MaxBenchmark                   |      4.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.NegateBenchmark         |      4.43 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.MaxBenchmark             |      4.42 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.AddBenchmark                   |      4.39 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.AbsBenchmark             |      4.38 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.MultiplyOperatorBenchma |      4.36 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.LessThanOrEqualBenchmark       |      4.35 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.AddOperatorBenchmark           |      4.32 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.LessThanOrEqualBenchmark |      4.30 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.AbsBenchmark                   |      4.27 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.MaxBenchmark            |      4.27 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.UnaryNegateOperatorBenc |      4.26 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.LessThanOrEqualBenchmar |      4.23 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.EqualsStaticBenchmark          |      4.19 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.NegateBenchmark         |      4.19 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.GreaterThanBenchmark    |      4.16 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.GreaterThanOrEqualBenchmark    |      4.13 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.GreaterThanBenchmark           |      4.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.GreaterThanBenchmark     |      4.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.SubtractionOperatorBenchmark   |      4.11 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.SubtractBenchmark        |      4.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.DotBenchmark             |      4.07 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.DotBenchmark            |      4.07 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.GreaterThanOrEqualBenchm |      4.06 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.MinBenchmark             |      4.05 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.AddOperatorBenchmark     |      4.05 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.SubtractionOperatorBench |      4.04 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.MinBenchmark            |      4.04 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.AddBenchmark             |      4.04 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.AddOperatorBenchmark    |      4.02 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.SubtractBenchmark              |      4.00 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.GreaterThanOrEqualBench |      3.97 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.EqualsStaticBenchmark    |      3.95 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.SquareRootBenchmark     |      3.95 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.LessThanBenchmark        |      3.94 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.EqualsStaticBenchmark   |      3.94 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.LessThanBenchmark       |      3.93 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.SubtractBenchmark       |      3.91 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.SubtractionOperatorBenc |      3.90 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.LessThanBenchmark              |      3.83 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.AddBenchmark            |      3.81 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.MaxBenchmark            |      3.80 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.AndNotBenchmark          |      3.79 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.AbsBenchmark             |      3.78 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.AndNotBenchmark          |      3.77 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.AndNotBenchmark         |      3.75 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.DivideBenchmark         |      3.74 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.AndNotBenchmark          |      3.73 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.MaxBenchmark             |      3.72 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.AndNotBenchmark              |      3.71 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.DivisionOperatorBenchma |      3.70 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.AndNotBenchmark         |      3.68 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.GreaterThanOrEqualBench |      3.67 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.AndNotBenchmark          |      3.67 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.AndNotBenchmark         |      3.66 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.AndNotBenchmark         |      3.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.LessThanOrEqualBenchmar |      3.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.AndNotBenchmark         |      3.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.LessThanBenchmark       |      3.64 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.EqualsStaticBenchmark   |      3.64 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.GreaterThanBenchmark    |      3.63 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.AndNotBenchmark           |      3.62 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.AllBitsSetBenchmark     |      3.59 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.MinBenchmark            |      3.59 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.AllBitsSetBenchmark          |      3.55 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.DotBenchmark                 |      3.51 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.MaxBenchmark            |      3.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.AndNotBenchmark                |      3.43 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.DotBenchmark            |      3.42 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.AllBitsSetBenchmark      |      3.38 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.AllBitsSetBenchmark       |      3.37 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.SubtractBenchmark        |      3.35 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.SubtractBenchmark       |      3.31 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.SubtractionOperatorBenc |      3.28 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.LessThanOrEqualBenchmark |      3.27 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.LessThanOrEqualBenchmar |      3.27 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.GreaterThanBenchmark    |      3.25 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.AddBenchmark            |      3.24 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.AddBenchmark             |      3.23 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.GreaterThanBenchmark     |      3.23 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.SubtractionOperatorBench |      3.23 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.LessThanBenchmark        |      3.21 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.AddOperatorBenchmark    |      3.21 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.LessThanBenchmark       |      3.21 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.EqualsStaticBenchmark   |      3.21 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.EqualsStaticBenchmark    |      3.20 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.AddOperatorBenchmark     |      3.20 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.GreaterThanOrEqualBench |      3.14 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.GreaterThanOrEqualBenchm |      3.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.AllBitsSetBenchmark     |      3.06 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.AllBitsSetBenchmark      |      3.05 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.BitwiseOrBenchmark      |      2.58 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.XorBenchmark                 |      2.57 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.SumBenchmark            |      2.57 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.BitwiseAndBenchmark      |      2.50 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.BitwiseOrOperatorBenchm |      2.50 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.BitwiseAndBenchmark      |      2.48 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.BitwiseOrBenchmark       |      2.47 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.BitwiseAndBenchmark     |      2.47 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.BitwiseAndBenchmark     |      2.46 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.XorBenchmark             |      2.45 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.BitwiseOrBenchmark      |      2.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.BitwiseOrBenchmark       |      2.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.BitwiseAndBenchmark       |      2.42 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.BitwiseOrBenchmark      |      2.42 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.BitwiseAndBenchmark     |      2.42 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.AllBitsSetBenchmark     |      2.42 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.AllBitsSetBenchmark      |      2.42 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.BitwiseAndBenchmark     |      2.42 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.BitwiseAndBenchmark     |      2.41 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.BitwiseOrBenchmark       |      2.41 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.BitwiseAndBenchmark      |      2.41 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.XorBenchmark            |      2.41 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.BitwiseOrBenchmark           |      2.41 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.BitwiseOrBenchmark      |      2.41 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.BitwiseOrBenchmark        |      2.41 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.BitwiseAndBenchmark      |      2.40 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.BitwiseOrBenchmark      |      2.40 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.XorBenchmark             |      2.39 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.BitwiseOrBenchmark       |      2.39 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.XorBenchmark             |      2.38 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.BitwiseAndBenchmark            |      2.37 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.XorBenchmark             |      2.36 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.XorBenchmark            |      2.36 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.BitwiseAndBenchmark          |      2.36 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.XorBenchmark            |      2.36 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.XorBenchmark            |      2.36 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.XorBenchmark            |      2.35 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.ExclusiveOrOperatorBench |      2.35 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.XorBenchmark              |      2.35 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.BitwiseOrBenchmark             |      2.35 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.AllBitsSetBenchmark            |      2.35 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.XorBenchmark                   |      2.33 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.AllBitsSetBenchmark     |      2.32 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.BitwiseAndOperatorBenchmark    |      2.24 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.ExclusiveOrOperatorBenchmark   |      2.21 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.BitwiseAndOperatorBenchmark  |      2.17 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.AllBitsSetBenchmark     |      2.17 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.AllBitsSetBenchmark      |      2.16 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.BitwiseAndOperatorBench |      2.15 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.ExclusiveOrOperatorBenc |      2.15 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.BitwiseOrOperatorBenchmark     |      2.15 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.BitwiseAndOperatorBench |      2.14 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.BitwiseOrOperatorBenchmark   |      2.14 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.ExclusiveOrOperatorBenchmark |      2.14 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.BitwiseOrOperatorBenchma |      2.14 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.ExclusiveOrOperatorBenchm |      2.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.BitwiseOrOperatorBenchmar |      2.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.BitwiseOrOperatorBenchm |      2.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.BitwiseOrOperatorBenchma |      2.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.BitwiseAndOperatorBench |      2.11 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.BitwiseAndOperatorBenchm |      2.11 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.BitwiseAndOperatorBenchm |      2.11 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.ExclusiveOrOperatorBench |      2.11 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.BitwiseOrOperatorBenchma |      2.11 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.BitwiseAndOperatorBench |      2.11 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.ExclusiveOrOperatorBenc |      2.11 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.ExclusiveOrOperatorBenc |      2.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.BitwiseOrOperatorBenchma |      2.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.BitwiseAndOperatorBenchm |      2.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.ExclusiveOrOperatorBench |      2.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.BitwiseAndOperatorBenchma |      2.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.BitwiseOrOperatorBenchm |      2.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.ExclusiveOrOperatorBenc |      2.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.BitwiseAndOperatorBench |      2.09 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.ExclusiveOrOperatorBenc |      2.09 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.ExclusiveOrOperatorBench |      2.09 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.BitwiseAndOperatorBenchm |      2.08 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.BitwiseOrOperatorBenchm |      2.08 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.BitwiseOrOperatorBenchm |      2.08 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.OnesComplementBenchmark        |      1.78 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.OnesComplementBenchmark   |      1.76 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.OnesComplementBenchmark      |      1.76 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.OnesComplementBenchmark  |      1.76 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.OnesComplementBenchmark  |      1.75 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.OnesComplementBenchmark |      1.75 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.OnesComplementBenchmark |      1.75 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.OnesComplementBenchmark  |      1.75 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.OnesComplementBenchmark  |      1.75 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.OnesComplementBenchmark |      1.74 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.OnesComplementBenchmark |      1.73 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.OnesComplementBenchmark |      1.72 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.OnesComplementOperatorB |      1.60 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.OnesComplementOperatorB |      1.60 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.OnesComplementOperatorBe |      1.60 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.OnesComplementOperatorB |      1.59 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.OnesComplementOperatorB |      1.58 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.OnesComplementOperatorB |      1.58 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.OnesComplementOperatorBenchm |      1.57 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.OnesComplementOperatorBenchmar |      1.57 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.OnesComplementOperatorBen |      1.56 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.OnesComplementOperatorBe |      1.55 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.OnesComplementOperatorBe |      1.55 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.OnesComplementOperatorBe |      1.55 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.SquareRootBenchmark      |      1.40 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.SquareRootBenchmark            |      1.39 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.SquareRootBenchmark     |      1.38 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128.ConvertLongToDoubleBenchmark      |      1.32 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.SumBenchmark                 |      1.30 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.SumBenchmark            |      1.29 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.DivisionOperatorBenchma |      1.29 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.MultiplyOperatorBenchmar |      1.26 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.AbsBenchmark            |      1.25 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.AbsBenchmark            |      1.25 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.AbsBenchmark              |      1.25 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.AbsBenchmark            |      1.24 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128.ConvertULongToDoubleBenchmark     |      1.20 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.MultiplyOperatorBenchma |      1.20 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.SquareRootBenchmark     |      1.18 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.SquareRootBenchmark      |      1.17 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.SquareRootBenchmark      |      1.17 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.DivisionOperatorBenchmar |      1.16 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128.ConvertIntToFloatBenchmark        |      1.14 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128.ConvertDoubleToULongBenchmark     |      1.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.SquareRootBenchmark      |      1.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.DivisionOperatorBenchmar |      1.09 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.SquareRootBenchmark       |      1.09 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.DivisionOperatorBenchmark      |      1.09 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.DivisionOperatorBenchmar |      1.07 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128.ConvertDoubleToLongBenchmark      |      1.07 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.DivisionOperatorBenchma |      1.05 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128.ConvertUIntToFloatBenchmark       |      1.05 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.SquareRootBenchmark     |      1.05 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128.ConvertFloatToIntBenchmark        |      1.04 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.GetHashCodeBenchmark    |      1.03 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.GetHashCodeBenchmark    |      1.03 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.DivisionOperatorBenchma |      1.02 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nealef

| Faster                                                                           | base/diff |
| -------------------------------------------------------------------------------- | ---------:|
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.LessThanOrEqualAnyBenchm |    295.76 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.GreaterThanOrEqualAllBen |    290.34 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.LessThanOrEqualAnyBenchma |    288.97 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.GreaterThanOrEqualAllBenc |    280.58 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.LessThanAnyBenchmark     |    268.06 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.EqualsAnyBenchmark       |    267.94 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.GreaterThanAllBenchmark  |    264.60 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.GreaterThanAllBenchmark   |    255.37 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.LessThanAnyBenchmark      |    255.25 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.EqualsAnyBenchmark        |    254.00 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.DotBenchmark             |    127.60 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.DotBenchmark              |    120.02 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.GreaterThanOrEqualAllBen |    116.31 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.GreaterThanOrEqualAllBe |    115.63 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.LessThanOrEqualAnyBench |    115.57 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.GreaterThanOrEqualAnyBe |    114.82 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.LessThanOrEqualAnyBenchmark  |    114.40 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.GreaterThanOrEqualAnyBenchma |    114.00 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.EqualsAnyBenchmark      |    113.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.LessThanAnyBenchmark     |    112.76 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.LessThanAnyBenchmark    |    112.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.EqualsAnyBenchmark       |    112.37 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.LessThanAnyBenchmark    |    112.30 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.LessThanAnyBenchmark         |    112.21 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.GreaterThanAnyBenchmark      |    112.09 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.EqualsAnyBenchmark           |    112.06 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.EqualsAnyBenchmark      |    111.88 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.GreaterThanAnyBenchmark |    111.56 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.DotBenchmark            |    111.01 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.DotBenchmark             |    110.70 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.LessThanOrEqualAnyBenchm |    108.94 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.LessThanOrEqualAnyBench |    102.42 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.GreaterThanAllBenchmark  |    101.82 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.SumBenchmark            |     99.50 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.SumBenchmark             |     96.56 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.GreaterThanAllBenchmark |     96.28 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.SumBenchmark            |     95.17 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.SumBenchmark             |     87.98 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.SumBenchmark                   |     86.62 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.DotBenchmark             |     85.62 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.DotBenchmark                   |     84.72 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.DotBenchmark            |     83.73 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.GreaterThanOrEqualAllBenchmark |     72.48 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.GreaterThanOrEqualAllBen |     72.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.LessThanOrEqualAnyBench |     71.86 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.LessThanOrEqualAnyBenchm |     71.38 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.LessThanOrEqualAnyBenchmark    |     71.35 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.GreaterThanOrEqualAllBe |     70.56 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.GreaterThanAllBenchmark |     69.77 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.EqualsAnyBenchmark       |     68.04 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.LessThanAnyBenchmark    |     68.01 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.EqualsAnyBenchmark      |     67.95 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.GreaterThanAllBenchmark  |     67.92 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.GreaterThanAllBenchmark        |     67.90 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.LessThanAnyBenchmark           |     67.71 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.LessThanAnyBenchmark     |     67.43 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.SumBenchmark             |     63.52 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.SumBenchmark              |     62.25 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.GreaterThanOrEqualAllBe |     57.81 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.LessThanOrEqualAnyBench |     57.71 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.GreaterThanOrEqualAnyBe |     57.64 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.GreaterThanOrEqualAllBen |     57.61 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.EqualsAnyBenchmark      |     57.39 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.GreaterThanAnyBenchmark |     57.26 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.LessThanAnyBenchmark    |     57.25 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.GreaterThanAllBenchmark  |     56.74 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.LessThanOrEqualAnyBenchm |     56.72 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.LessThanAnyBenchmark    |     56.70 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.LessThanOrEqualAnyBench |     56.64 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.EqualsAnyBenchmark      |     56.62 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.LessThanAnyBenchmark     |     56.62 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.EqualsAnyBenchmark       |     56.61 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.GreaterThanAllBenchmark |     56.56 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.SumBenchmark             |     52.56 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.SumBenchmark            |     52.41 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.EqualsAnyBenchmark             |     48.66 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.EqualsAllBenchmark           |     45.82 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.EqualsAllBenchmark      |     45.38 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.LessThanOrEqualAllBenchmark  |     40.50 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.LessThanOrEqualAllBench |     40.36 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.GreaterThanOrEqualAllBe |     40.24 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.GreaterThanOrEqualAllBenchma |     40.09 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.EqualityOperatorBenchma |     40.00 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.GreaterThanAllBenchmark |     39.89 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.GreaterThanAllBenchmark      |     39.89 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.LessThanAllBenchmark    |     39.83 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.LessThanAllBenchmark         |     39.78 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.EqualityOperatorBenchmark    |     39.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.EqualsBenchmark              |     38.77 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.EqualsBenchmark         |     38.66 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.EqualsAllBenchmark       |     36.25 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.EqualsAllBenchmark       |     36.11 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.EqualsAllBenchmark      |     36.02 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.EqualsAllBenchmark       |     35.96 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.EqualsAllBenchmark       |     35.92 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.EqualsAllBenchmark      |     35.80 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.EqualsAllBenchmark        |     35.45 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.EqualsAllBenchmark      |     35.32 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.EqualsAllBenchmark      |     35.27 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.GreaterThanOrEqualAnyBenchmark |     31.49 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.LessThanAllBenchmark     |     31.30 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.LessThanOrEqualAllBenchmark    |     31.30 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.LessThanOrEqualAllBenchm |     31.23 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.GreaterThanAnyBenchmark  |     31.20 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.LessThanAllBenchmark     |     31.14 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.LessThanAllBenchmark    |     31.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.LessThanOrEqualAllBenchm |     31.11 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.GreaterThanOrEqualAnyBen |     31.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.LessThanAllBenchmark           |     31.07 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.GreaterThanOrEqualAnyBen |     31.07 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.GreaterThanAnyBenchmark        |     31.04 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.GreaterThanOrEqualAnyBe |     30.99 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.GreaterThanOrEqualAnyBen |     30.99 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.GreaterThanAnyBenchmark  |     30.93 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.GreaterThanAllBenchmark |     30.93 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.GreaterThanAnyBenchmark  |     30.93 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.LessThanAllBenchmark     |     30.83 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.GreaterThanOrEqualAnyBe |     30.80 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.GreaterThanOrEqualAnyBe |     30.79 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.LessThanOrEqualAllBenchm |     30.69 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.EqualityOperatorBenchmark      |     30.52 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.EqualityOperatorBenchmark |     30.50 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.GreaterThanAnyBenchmark   |     30.47 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.LessThanOrEqualAllBenchma |     30.45 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.LessThanOrEqualAllBench |     30.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.LessThanAllBenchmark      |     30.41 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.GreaterThanOrEqualAnyBen |     30.35 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.LessThanOrEqualAllBenchm |     30.32 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.LessThanOrEqualAllBench |     30.28 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.GreaterThanOrEqualAnyBenc |     30.19 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.LessThanOrEqualAllBench |     30.09 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.GreaterThanAnyBenchmark |     30.00 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.EqualityOperatorBenchma |     29.90 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.GreaterThanAnyBenchmark |     29.89 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.EqualityOperatorBenchmar |     29.89 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.LessThanAllBenchmark    |     29.83 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.LessThanAllBenchmark     |     29.77 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.LessThanAllBenchmark    |     29.76 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.EqualityOperatorBenchmar |     29.76 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.GreaterThanAnyBenchmark |     29.72 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.GreaterThanAnyBenchmark  |     29.70 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.LessThanOrEqualAllBench |     29.68 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.GreaterThanOrEqualAllBe |     29.68 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.EqualityOperatorBenchmar |     29.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.LessThanAllBenchmark    |     29.64 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.EqualityOperatorBenchma |     29.64 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.EqualityOperatorBenchma |     29.55 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.EqualityOperatorBenchmar |     29.53 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.EqualityOperatorBenchma |     29.53 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.InequalityOperatorBench |     26.97 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.InequalityOperatorBenchmark  |     26.51 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.EqualsAllBenchmark             |     25.82 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.InequalityOperatorBench |     21.77 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.InequalityOperatorBench |     21.60 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.InequalityOperatorBenchm |     21.60 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.InequalityOperatorBenchm |     21.53 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.InequalityOperatorBench |     21.52 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.InequalityOperatorBenchmark    |     21.51 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.InequalityOperatorBenchm |     21.32 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.InequalityOperatorBenchm |     21.28 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.InequalityOperatorBench |     21.27 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.InequalityOperatorBenchma |     21.11 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.EqualsBenchmark         |     20.06 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.EqualsBenchmark          |     14.59 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.EqualsBenchmark         |     14.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.EqualsBenchmark          |     14.27 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.EqualsBenchmark          |     13.84 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.EqualsBenchmark         |     13.39 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.EqualsBenchmark         |     13.26 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.EqualsBenchmark                |     12.80 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.MaxBenchmark                 |     11.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.MinBenchmark             |     10.98 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.LessThanOrEqualBenchmark |     10.84 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.GreaterThanOrEqualBenchm |     10.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.LessThanOrEqualBenchmark  |     10.49 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.GreaterThanBenchmark     |     10.34 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.MaxBenchmark             |     10.31 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.MaxBenchmark            |     10.29 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.MinBenchmark            |     10.29 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.MinBenchmark                 |     10.27 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.AddBenchmark            |     10.26 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.MinBenchmark              |     10.26 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.SubtractBenchmark            |     10.25 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.EqualsStaticBenchmark    |     10.24 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.AddBenchmark                 |     10.24 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.LessThanBenchmark        |     10.24 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.GreaterThanOrEqualBenchma |     10.20 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.MultiplyBenchmark            |     10.17 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.AddOperatorBenchmark         |     10.16 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.SubtractBenchmark       |     10.14 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.MultiplyBenchmark       |     10.07 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.GreaterThanBenchmark      |     10.00 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.SubtractionOperatorBenc |      9.97 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.EqualsStaticBenchmark     |      9.91 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.LessThanBenchmark         |      9.90 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.MaxBenchmark              |      9.88 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128.CeilingFloatBenchmark             |      9.57 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.EqualsBenchmark           |      9.52 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128.FloorFloatBenchmark               |      9.51 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.GreaterThanBenchmark         |      9.37 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.EqualsStaticBenchmark        |      9.36 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.LessThanBenchmark            |      9.21 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.AbsBenchmark            |      9.17 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.EqualsBenchmark          |      9.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.SubtractionOperatorBenchmark |      9.07 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.AddOperatorBenchmark    |      9.02 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.MultiplyOperatorBenchmark    |      9.01 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.LessThanOrEqualBenchmark     |      9.00 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.MultiplyOperatorBenchma |      9.00 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.AbsBenchmark                 |      8.95 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.GreaterThanBenchmark    |      8.95 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.LessThanBenchmark       |      8.94 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.LessThanOrEqualBenchmar |      8.94 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.GreaterThanOrEqualBench |      8.93 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.GreaterThanOrEqualBenchmark  |      8.89 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.NegateBenchmark          |      8.88 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.UnaryNegateOperatorBench |      8.84 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.UnaryNegateOperatorBenchm |      8.48 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.NegateBenchmark           |      8.47 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.EqualsStaticBenchmark   |      8.40 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.ConditionalSelectBenchmark   |      8.24 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.ConditionalSelectBenchm |      8.20 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.ConditionalSelectBenchma |      8.13 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.ConditionalSelectBenchm |      8.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.ConditionalSelectBenchm |      8.09 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.ConditionalSelectBenchm |      8.09 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.ConditionalSelectBenchm |      8.07 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.ConditionalSelectBenchma |      8.04 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.ConditionalSelectBenchma |      8.01 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.ConditionalSelectBenchma |      7.98 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.ConditionalSelectBenchmar |      7.94 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.DotBenchmark            |      7.78 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.MultiplyOperatorBenchmar |      7.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.AbsBenchmark             |      7.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.MultiplyBenchmark        |      7.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.MultiplyOperatorBenchmark |      7.08 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.DivideBenchmark         |      7.01 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.MultiplyBenchmark         |      7.01 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.DivisionOperatorBenchmark    |      6.99 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.ConditionalSelectBenchmark     |      6.88 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.SubtractBenchmark        |      6.80 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.SubtractionOperatorBench |      6.79 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.AddOperatorBenchmark     |      6.76 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.AddBenchmark             |      6.75 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.SquareRootBenchmark          |      6.74 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.MultiplyOperatorBenchmar |      6.73 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.UnaryNegateOperatorBench |      6.66 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.SquareRootBenchmark     |      6.66 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.NegateBenchmark          |      6.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.DivideBenchmark              |      6.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.MinBenchmark             |      6.57 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.UnaryNegateOperatorBenc |      6.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.MultiplyBenchmark        |      6.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.NegateBenchmark         |      6.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.DivisionOperatorBenchma |      6.29 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.LessThanOrEqualBenchmark |      6.13 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.GreaterThanBenchmark     |      6.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.GreaterThanBenchmark    |      6.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.AddOperatorBenchmark      |      6.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.MinBenchmark            |      6.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.MultiplyOperatorBenchma |      6.07 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.SubtractionOperatorBenchm |      6.07 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.MultiplyBenchmark       |      6.06 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.SubtractBenchmark         |      5.89 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.MaxBenchmark             |      5.87 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.MaxBenchmark            |      5.84 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.LessThanOrEqualBenchmar |      5.83 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.SubtractBenchmark        |      5.81 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.SubtractionOperatorBench |      5.79 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.AddOperatorBenchmark     |      5.79 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.AddBenchmark             |      5.75 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.AbsBenchmark             |      5.73 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.AddBenchmark              |      5.71 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.LessThanBenchmark        |      5.67 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.GreaterThanOrEqualBenchm |      5.66 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.EqualsStaticBenchmark    |      5.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.LessThanBenchmark       |      5.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.EqualsStaticBenchmark   |      5.62 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.UnaryNegateOperatorBenchmark |      5.54 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.AddOperatorBenchmark    |      5.49 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.UnaryNegateOperatorBenc |      5.47 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.GreaterThanOrEqualBench |      5.45 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.SubtractionOperatorBenc |      5.42 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.NegateBenchmark              |      5.39 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.NegateBenchmark         |      5.37 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.SubtractBenchmark       |      5.35 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.MinBenchmark             |      5.15 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.UnaryNegateOperatorBenchmark   |      5.13 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.AddBenchmark            |      5.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.MinBenchmark                   |      5.07 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.NegateBenchmark          |      5.05 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.UnaryNegateOperatorBench |      5.04 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.NegateBenchmark         |      4.98 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128.FloorDoubleBenchmark              |      4.98 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128.CeilingDoubleBenchmark            |      4.95 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.MultiplyBenchmark              |      4.95 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.UnaryNegateOperatorBenc |      4.95 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.NegateBenchmark                |      4.93 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.MinBenchmark            |      4.85 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.MultiplyOperatorBenchmark      |      4.75 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.MultiplyBenchmark        |      4.71 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.MultiplyOperatorBenchmar |      4.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.MultiplyOperatorBenchma |      4.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.AddBenchmark            |      4.63 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.AddOperatorBenchmark    |      4.59 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.SubtractBenchmark       |      4.53 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.MultiplyBenchmark       |      4.52 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.SubtractionOperatorBenc |      4.51 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.AbsBenchmark            |      4.50 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.MultiplyBenchmark       |      4.50 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.UnaryNegateOperatorBenc |      4.48 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.NegateBenchmark          |      4.46 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.UnaryNegateOperatorBench |      4.46 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.MaxBenchmark                   |      4.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.NegateBenchmark         |      4.43 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.MaxBenchmark             |      4.42 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.AddBenchmark                   |      4.39 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.AbsBenchmark             |      4.38 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.MultiplyOperatorBenchma |      4.36 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.LessThanOrEqualBenchmark       |      4.35 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.AddOperatorBenchmark           |      4.32 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.LessThanOrEqualBenchmark |      4.30 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.AbsBenchmark                   |      4.27 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.MaxBenchmark            |      4.27 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.UnaryNegateOperatorBenc |      4.26 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.LessThanOrEqualBenchmar |      4.23 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.EqualsStaticBenchmark          |      4.19 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.NegateBenchmark         |      4.19 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.GreaterThanBenchmark    |      4.16 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.GreaterThanOrEqualBenchmark    |      4.13 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.GreaterThanBenchmark           |      4.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.GreaterThanBenchmark     |      4.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.SubtractionOperatorBenchmark   |      4.11 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.SubtractBenchmark        |      4.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.DotBenchmark             |      4.07 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.DotBenchmark            |      4.07 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.GreaterThanOrEqualBenchm |      4.06 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.MinBenchmark             |      4.05 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.AddOperatorBenchmark     |      4.05 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.SubtractionOperatorBench |      4.04 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.MinBenchmark            |      4.04 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.AddBenchmark             |      4.04 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.AddOperatorBenchmark    |      4.02 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.SubtractBenchmark              |      4.00 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.GreaterThanOrEqualBench |      3.97 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.EqualsStaticBenchmark    |      3.95 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.SquareRootBenchmark     |      3.95 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.LessThanBenchmark        |      3.94 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.EqualsStaticBenchmark   |      3.94 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.LessThanBenchmark       |      3.93 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.SubtractBenchmark       |      3.91 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.SubtractionOperatorBenc |      3.90 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.LessThanBenchmark              |      3.83 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.AddBenchmark            |      3.81 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.MaxBenchmark            |      3.80 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.AndNotBenchmark          |      3.79 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.AbsBenchmark             |      3.78 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.AndNotBenchmark          |      3.77 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.AndNotBenchmark         |      3.75 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.DivideBenchmark         |      3.74 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.AndNotBenchmark          |      3.73 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.MaxBenchmark             |      3.72 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.AndNotBenchmark              |      3.71 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.DivisionOperatorBenchma |      3.70 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.AndNotBenchmark         |      3.68 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.GreaterThanOrEqualBench |      3.67 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.AndNotBenchmark          |      3.67 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.AndNotBenchmark         |      3.66 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.AndNotBenchmark         |      3.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.LessThanOrEqualBenchmar |      3.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.AndNotBenchmark         |      3.65 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.LessThanBenchmark       |      3.64 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.EqualsStaticBenchmark   |      3.64 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.GreaterThanBenchmark    |      3.63 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.AndNotBenchmark           |      3.62 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.AllBitsSetBenchmark     |      3.59 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.MinBenchmark            |      3.59 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.AllBitsSetBenchmark          |      3.55 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.DotBenchmark                 |      3.51 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.MaxBenchmark            |      3.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.AndNotBenchmark                |      3.43 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.DotBenchmark            |      3.42 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.AllBitsSetBenchmark      |      3.38 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.AllBitsSetBenchmark       |      3.37 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.SubtractBenchmark        |      3.35 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.SubtractBenchmark       |      3.31 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.SubtractionOperatorBenc |      3.28 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.LessThanOrEqualBenchmark |      3.27 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.LessThanOrEqualBenchmar |      3.27 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.GreaterThanBenchmark    |      3.25 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.AddBenchmark            |      3.24 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.AddBenchmark             |      3.23 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.GreaterThanBenchmark     |      3.23 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.SubtractionOperatorBench |      3.23 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.LessThanBenchmark        |      3.21 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.AddOperatorBenchmark    |      3.21 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.LessThanBenchmark       |      3.21 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.EqualsStaticBenchmark   |      3.21 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.EqualsStaticBenchmark    |      3.20 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.AddOperatorBenchmark     |      3.20 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.GreaterThanOrEqualBench |      3.14 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.GreaterThanOrEqualBenchm |      3.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.AllBitsSetBenchmark     |      3.06 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.AllBitsSetBenchmark      |      3.05 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.BitwiseOrBenchmark      |      2.58 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.XorBenchmark                 |      2.57 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.SumBenchmark            |      2.57 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.BitwiseAndBenchmark      |      2.50 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.BitwiseOrOperatorBenchm |      2.50 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.BitwiseAndBenchmark      |      2.48 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.BitwiseOrBenchmark       |      2.47 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.BitwiseAndBenchmark     |      2.47 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.BitwiseAndBenchmark     |      2.46 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.XorBenchmark             |      2.45 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.BitwiseOrBenchmark      |      2.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.BitwiseOrBenchmark       |      2.44 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.BitwiseAndBenchmark       |      2.42 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.BitwiseOrBenchmark      |      2.42 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.BitwiseAndBenchmark     |      2.42 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.AllBitsSetBenchmark     |      2.42 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.AllBitsSetBenchmark      |      2.42 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.BitwiseAndBenchmark     |      2.42 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.BitwiseAndBenchmark     |      2.41 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.BitwiseOrBenchmark       |      2.41 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.BitwiseAndBenchmark      |      2.41 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.XorBenchmark            |      2.41 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.BitwiseOrBenchmark           |      2.41 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.BitwiseOrBenchmark      |      2.41 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.BitwiseOrBenchmark        |      2.41 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.BitwiseAndBenchmark      |      2.40 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.BitwiseOrBenchmark      |      2.40 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.XorBenchmark             |      2.39 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.BitwiseOrBenchmark       |      2.39 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.XorBenchmark             |      2.38 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.BitwiseAndBenchmark            |      2.37 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.XorBenchmark             |      2.36 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.XorBenchmark            |      2.36 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.BitwiseAndBenchmark          |      2.36 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.XorBenchmark            |      2.36 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.XorBenchmark            |      2.36 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.XorBenchmark            |      2.35 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.ExclusiveOrOperatorBench |      2.35 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.XorBenchmark              |      2.35 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.BitwiseOrBenchmark             |      2.35 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.AllBitsSetBenchmark            |      2.35 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.XorBenchmark                   |      2.33 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.AllBitsSetBenchmark     |      2.32 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.BitwiseAndOperatorBenchmark    |      2.24 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.ExclusiveOrOperatorBenchmark   |      2.21 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.BitwiseAndOperatorBenchmark  |      2.17 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.AllBitsSetBenchmark     |      2.17 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.AllBitsSetBenchmark      |      2.16 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.BitwiseAndOperatorBench |      2.15 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.ExclusiveOrOperatorBenc |      2.15 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.BitwiseOrOperatorBenchmark     |      2.15 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.BitwiseAndOperatorBench |      2.14 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.BitwiseOrOperatorBenchmark   |      2.14 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.ExclusiveOrOperatorBenchmark |      2.14 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.BitwiseOrOperatorBenchma |      2.14 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.ExclusiveOrOperatorBenchm |      2.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.BitwiseOrOperatorBenchmar |      2.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.BitwiseOrOperatorBenchm |      2.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.BitwiseOrOperatorBenchma |      2.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.BitwiseAndOperatorBench |      2.11 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.BitwiseAndOperatorBenchm |      2.11 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.BitwiseAndOperatorBenchm |      2.11 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.ExclusiveOrOperatorBench |      2.11 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.BitwiseOrOperatorBenchma |      2.11 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.BitwiseAndOperatorBench |      2.11 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.ExclusiveOrOperatorBenc |      2.11 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.ExclusiveOrOperatorBenc |      2.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.BitwiseOrOperatorBenchma |      2.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.BitwiseAndOperatorBenchm |      2.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.ExclusiveOrOperatorBench |      2.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.BitwiseAndOperatorBenchma |      2.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.BitwiseOrOperatorBenchm |      2.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.ExclusiveOrOperatorBenc |      2.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.BitwiseAndOperatorBench |      2.09 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.ExclusiveOrOperatorBenc |      2.09 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.ExclusiveOrOperatorBench |      2.09 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.BitwiseAndOperatorBenchm |      2.08 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.BitwiseOrOperatorBenchm |      2.08 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.BitwiseOrOperatorBenchm |      2.08 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.OnesComplementBenchmark        |      1.78 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.OnesComplementBenchmark   |      1.76 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.OnesComplementBenchmark      |      1.76 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.OnesComplementBenchmark  |      1.76 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.OnesComplementBenchmark  |      1.75 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.OnesComplementBenchmark |      1.75 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.OnesComplementBenchmark |      1.75 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.OnesComplementBenchmark  |      1.75 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.OnesComplementBenchmark  |      1.75 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.OnesComplementBenchmark |      1.74 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.OnesComplementBenchmark |      1.73 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.OnesComplementBenchmark |      1.72 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.OnesComplementOperatorB |      1.60 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.OnesComplementOperatorB |      1.60 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.OnesComplementOperatorBe |      1.60 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.OnesComplementOperatorB |      1.59 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.OnesComplementOperatorB |      1.58 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.OnesComplementOperatorB |      1.58 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.OnesComplementOperatorBenchm |      1.57 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.OnesComplementOperatorBenchmar |      1.57 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.OnesComplementOperatorBen |      1.56 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.OnesComplementOperatorBe |      1.55 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.OnesComplementOperatorBe |      1.55 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.OnesComplementOperatorBe |      1.55 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.SquareRootBenchmark      |      1.40 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.SquareRootBenchmark            |      1.39 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.SquareRootBenchmark     |      1.38 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128.ConvertLongToDoubleBenchmark      |      1.32 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.SumBenchmark                 |      1.30 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.SumBenchmark            |      1.29 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.DivisionOperatorBenchma |      1.29 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.MultiplyOperatorBenchmar |      1.26 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.AbsBenchmark            |      1.25 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.AbsBenchmark            |      1.25 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.AbsBenchmark              |      1.25 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.AbsBenchmark            |      1.24 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128.ConvertULongToDoubleBenchmark     |      1.20 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.MultiplyOperatorBenchma |      1.20 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.SquareRootBenchmark     |      1.18 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.SquareRootBenchmark      |      1.17 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.SquareRootBenchmark      |      1.17 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.DivisionOperatorBenchmar |      1.16 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128.ConvertIntToFloatBenchmark        |      1.14 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128.ConvertDoubleToULongBenchmark     |      1.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.SquareRootBenchmark      |      1.10 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.DivisionOperatorBenchmar |      1.09 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.SquareRootBenchmark       |      1.09 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.DivisionOperatorBenchmark      |      1.09 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.DivisionOperatorBenchmar |      1.07 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128.ConvertDoubleToLongBenchmark      |      1.07 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.DivisionOperatorBenchma |      1.05 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128.ConvertUIntToFloatBenchmark       |      1.05 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.SquareRootBenchmark     |      1.05 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128.ConvertFloatToIntBenchmark        |      1.04 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.GetHashCodeBenchmark    |      1.03 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.GetHashCodeBenchmark    |      1.03 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.DivisionOperatorBenchma |      1.02 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you twice posted the same list (only improvements). What are the regressions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apologies, I think the clipboard missed it somehow

| Slower                                                                           | diff/base |
| -------------------------------------------------------------------------------- | ---------:|
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.MultiplyBenchmark       |      1.27 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.ZeroBenchmark                  |      1.22 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.ZeroBenchmark           |      1.22 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.ZeroBenchmark           |      1.22 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.ZeroBenchmark            |      1.22 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.ZeroBenchmark            |      1.22 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.ZeroBenchmark           |      1.21 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.ZeroBenchmark                |      1.21 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.ZeroBenchmark           |      1.21 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.ZeroBenchmark             |      1.21 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.ZeroBenchmark            |      1.21 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.ZeroBenchmark           |      1.21 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.ZeroBenchmark            |      1.20 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.DivideBenchmark          |      1.20 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.DivideBenchmark          |      1.20 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.MultiplyBenchmark        |      1.19 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.DivideBenchmark           |      1.19 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.DivideBenchmark         |      1.14 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.DivisionOperatorBenchmark |      1.12 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.DivisionOperatorBenchmar |      1.09 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.GetHashCodeBenchmark     |      1.07 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.GetHashCodeBenchmark    |      1.06 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Float.GetHashCodeBenchmark         |      1.06 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.DivideBenchmark          |      1.05 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.GetHashCodeBenchmark     |      1.05 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Int.DivideBenchmark                |      1.04 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.DivideBenchmark         |      1.03 |
| System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.DivideBenchmark         |      1.02 |

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the great improvements elsewhere, these regressions aren't too concerning. However, given that they do seem to be concentrated into just a few areas, maybe be can investigate what's causing them? E.g. the divide / multiply is interesting, I guess this is because we don't have (before z17) any vector integer divide and only 32-bit vector integer multiply instructions, so these are probably scalarized? However, they're already scalarized today, so it's not immediately obvious why this should cause a regression ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this could be some sort of noise though. but I'll give it a run again and take a look at the regressions

@@ -1569,14 +1571,151 @@ typedef struct {
#define s390_tmlh(c, r, m) S390_RI(c, 0xa70, r, m)
#define s390_tmll(c, r, m) S390_RI(c, 0xa71, r, m)
#define s390_tm(c, b, d, v) S390_SI(c, 0x91, b, d, v)
#define s390_trap2(code) S390_E(code, 0x01ff)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used for anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added this for the purposes of debugging asm routines at runtime. if you want we can remove this.

@@ -2022,6 +2086,8 @@ emit_sri_vector (MonoCompile *cfg, MonoMethod *cmethod, MonoMethodSignature *fsi
} else {
return emit_simd_ins_for_sig (cfg, klass, OP_VECTOR_IABS, -1, arg0_type, fsig, args);
}
#elif defined(TARGET_S390X)
return emit_simd_ins_for_sig (cfg, klass, OP_VEC_ABS, -1, arg0_type, fsig, args);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we use the same OP_VECTOR_IABS opcode as other platforms?

Copy link
Contributor Author

@saitama951 saitama951 Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#elif defined(TARGET_WASM)
                if (type_enum_is_float(arg0_type)) {
                        return emit_simd_ins_for_sig (cfg, klass, OP_XOP_X_X, arg0_type == MONO_TYPE_R8 ? INTRINS_WASM_FABS_V2 : INTRINS_WASM_FABS_V4, -1, fsig, args);
                } else {
                        return emit_simd_ins_for_sig (cfg, klass, OP_VECTOR_IABS, -1, arg0_type, fsig, args);
                }
#elif defined(TARGET_S390X)
                return emit_simd_ins_for_sig (cfg, klass, OP_VEC_ABS, -1, arg0_type, fsig, args);
#else

this handles integer and floats using different intrinsic's while we have a single pseudo opcode to handle that.

case OP_STOREX_ALIGNED_MEMBASE_REG:
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just define the ALIGNED versions? They could just be aliases for the unaligned versions, but in fact we can be more efficient by setting the alignment hint in the VL and VST instructions ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now I would introduce the aliases, this can be improvised later on, after benchmarking with various alignment values and choosing the best one based on the benchmark results

* remove SIY_1 (duplicacy)
* remove locgrnle and locghinle use locghiho instead
* reformat whole patch
* remove vflc use vfpso instead
* move common op's to a common ifdef in mini-ops
* remove NEW_INS , it's used no-where
* rewrite the whole logic for Vector conditional ops for floats
* update ANDN with vnc instruction
* add couple of comments
* remove some pseudo op in simd-intrinsics
* add aligned loads and stores
@saitama951
Copy link
Contributor Author

@uweigand Thank you for the in-depth review for the patch, much appreciated. I have addressed most of your review comments

@risc-vv
Copy link

risc-vv commented Jul 4, 2025

@dotnet/samsung Could you please take a look? These changes may be related to riscv64.

@risc-vv
Copy link

risc-vv commented Jul 4, 2025

bc1d701 is being scheduled for building and testing

GIT: bc1d70105ad01d003319b00eb44214b5bcd8c481
REPO: dotnet/runtime
BRANCH: main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-s390x Related to s390x architecture (unsupported) area-Codegen-JIT-mono community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants