Enable TensorPrimitive vectorization for Half for lots of methods #116934

stephentoub · 2025-06-23T21:31:57Z

This enables all of the following operators to be vectorized for T == Half:
Abs, Add, AddMultiply, BitwiseAnd, BitwiseOr, Ceiling, Clamp, CopySign, Cos, CosPi, Cosh, Decrement, DegreesToRadians, Divide, Exp, Exp10, Exp10M1, Exp2, Exp2M1, ExpM1, Floor, FusedAddMultiply, Hypot, Increment, Lerp, Log, Log10, Log10P1, Log2, Log2P1, LogP1, Max, MaxMagnitude, MaxMagnitudeNumber, MaxNumber, Min, MinMagnitude, MinMagnitudeNumber, MinNumber, Multiply, MultiplyAdd, MultiplyAddEstimate, Negate, OnesComplement, Reciprocal, Remainder, Round, Sigmoid, Sin, SinPi, Sinh, Sqrt, Subtract, Tan, TanPi, Tanh, Truncate, Xor.

I don't love that it required adding code to each of those methods, but I couldn't come up with anything better and kept the code per method to a minimum (I left the typeof(T) == typeof(Half) check at each call site, primarily for clarity).

I did not change the reductions, as anything that involved more than the same number of Half <-> float roundtrips as exists today ended up perturbing the results to the point where tests failed, and most of the reductions involve multiple operators.

Sample benchmarks:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Numerics.Tensors;

BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);

public class Bench
{
    private Half[] _x, _y, _d;

    [Params(1, 8, 800)]
    public int Length { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        _x = new Half[Length];
        _y = new Half[Length];
        _d = new Half[Length];
        var random = new Random(42);
        for (int i = 0; i < Length; i++)
        {
            _x[i] = (Half)random.NextSingle();
            _y[i] = (Half)random.NextSingle();
        }
    }

    [Benchmark]
    public void Add() => TensorPrimitives.Add(_x, _y, _d);

    [Benchmark]
    public void Exp() => TensorPrimitives.Exp(_x, _d);

    [Benchmark]
    public void Floor() => TensorPrimitives.Floor(_x, _d);

    [Benchmark]
    public void Xor() => TensorPrimitives.Xor(_x, _y, _d);
}

On my machine with AVX256 (not AVX512)...

Before:

Method	Length	Mean
Add	1	12.173 ns
Exp	1	13.374 ns
Floor	1	7.554 ns
Xor	1	6.763 ns
Add	8	55.226 ns
Exp	8	86.672 ns
Floor	8	45.207 ns
Xor	8	7.802 ns
Add	800	5,119.294 ns
Exp	800	8,246.779 ns
Floor	800	4,235.245 ns
Xor	800	340.764 ns

After:

Method	Length	Mean
Add	1	12.286 ns
Exp	1	21.634 ns
Floor	1	8.812 ns
Xor	1	6.544 ns
Add	8	23.209 ns
Exp	8	29.455 ns
Floor	8	17.650 ns
Xor	8	6.620 ns
Add	800	896.571 ns
Exp	800	1,629.144 ns
Floor	800	678.427 ns
Xor	800	26.408 ns

dotnet-policy-service · 2025-06-23T21:32:59Z

Tagging subscribers to this area: @dotnet/area-system-numerics-tensors
See info in area-owners.md if you want to be subscribed.

Copilot

Pull Request Overview

This PR adds Half (16-bit float) vectorization paths for many tensor operations by reinterpreting Half as short and invoking vectorized float operators, falling back to the generic path otherwise.
Key changes:

Inserted typeof(T) == typeof(Half) checks and Try*HalfAsShort calls in each public API method to enable vectorized paths for Half.
Implemented a new TensorPrimitives.Half.cs with a suite of Try* helpers and HalfAsShort operator wrappers for unary, binary, bitwise, and aggregation scenarios.
Adjusted Round to use return instead of break in the AwayFromZero case to ensure consistent control flow.

Reviewed Changes

Copilot reviewed 53 out of 53 changed files in this pull request and generated 2 comments.

File	Description
TensorPrimitives.Xor.cs, Truncate.cs, Tanh.cs, … Add.cs	Inserted Half-specific vectorization guard and fallback logic
TensorPrimitives.Round.cs	Changed `break` to `return` for `MidpointRounding.AwayFromZero`
TensorPrimitives.Half.cs	Added `Try*` helpers and `HalfAsShort` operator wrappers

Comments suppressed due to low confidence (1)

src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.Abs.cs:29

New Half-vectorized paths are introduced but there are no existing tests validating the vectorized output for lengths ≥ Vector128. Please add unit tests that supply spans of Half with lengths above the SIMD threshold to cover both scalar and vectorized branches.

        public static void Abs<T>(ReadOnlySpan<T> x, Span<T> destination)

...braries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.Half.cs

...ibraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.Add.cs

...braries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.Half.cs

stephentoub requested review from tannergooding and Copilot June 23, 2025 21:31

github-actions bot added the area-System.Numerics.Tensors label Jun 23, 2025

dotnet-policy-service bot assigned stephentoub Jun 23, 2025

Copilot AI reviewed Jun 23, 2025

View reviewed changes

...braries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.Half.cs Outdated Show resolved Hide resolved

...ibraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.Add.cs Outdated Show resolved Hide resolved

This was referenced Jun 24, 2025

browser-wasm Windows build error #116746

Open

[iOS/tvOS] System.Runtime.Tests crash with signal 4 #116815

Open

stephentoub force-pushed the vectorizehalf branch from 63fac2b to 979d5f5 Compare June 24, 2025 13:19

tannergooding reviewed Jun 24, 2025

View reviewed changes

...braries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.Half.cs Outdated Show resolved Hide resolved

tannergooding reviewed Jun 24, 2025

View reviewed changes

...braries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.Half.cs Outdated Show resolved Hide resolved

tannergooding approved these changes Jun 24, 2025

View reviewed changes

stephentoub added 2 commits June 24, 2025 17:52

Enable TensorPrimitive vectorization for Half for lots of methods

98c6c40

Rename AsShort to AsInt16

fe1740e

stephentoub force-pushed the vectorizehalf branch from 979d5f5 to fe1740e Compare June 24, 2025 21:56

stephentoub mentioned this pull request Jun 25, 2025

RegisteredInstallLocation_DotNetInfo_ListOtherArchitectures test failure in CI #117007

Open

stephentoub merged commit f77a7f5 into dotnet:main Jun 25, 2025
79 of 86 checks passed

stephentoub deleted the vectorizehalf branch June 25, 2025 00:59

LoopedBard3 mentioned this pull request Jul 1, 2025

[Perf] Windows/x64: 13 Regressions on 6/25/2025 1:00:18 AM +00:00 #117205

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable TensorPrimitive vectorization for Half for lots of methods #116934

Enable TensorPrimitive vectorization for Half for lots of methods #116934

Uh oh!

stephentoub commented Jun 23, 2025 •

edited

Loading

Uh oh!

dotnet-policy-service bot commented Jun 23, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Enable TensorPrimitive vectorization for Half for lots of methods #116934

Enable TensorPrimitive vectorization for Half for lots of methods #116934

Uh oh!

Conversation

stephentoub commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dotnet-policy-service bot commented Jun 23, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stephentoub commented Jun 23, 2025 •

edited

Loading