Skip to content

Enable TensorPrimitive vectorization for Half for lots of methods #116934

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 25, 2025

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented Jun 23, 2025

This enables all of the following operators to be vectorized for T == Half:
Abs, Add, AddMultiply, BitwiseAnd, BitwiseOr, Ceiling, Clamp, CopySign, Cos, CosPi, Cosh, Decrement, DegreesToRadians, Divide, Exp, Exp10, Exp10M1, Exp2, Exp2M1, ExpM1, Floor, FusedAddMultiply, Hypot, Increment, Lerp, Log, Log10, Log10P1, Log2, Log2P1, LogP1, Max, MaxMagnitude, MaxMagnitudeNumber, MaxNumber, Min, MinMagnitude, MinMagnitudeNumber, MinNumber, Multiply, MultiplyAdd, MultiplyAddEstimate, Negate, OnesComplement, Reciprocal, Remainder, Round, Sigmoid, Sin, SinPi, Sinh, Sqrt, Subtract, Tan, TanPi, Tanh, Truncate, Xor.

I don't love that it required adding code to each of those methods, but I couldn't come up with anything better and kept the code per method to a minimum (I left the typeof(T) == typeof(Half) check at each call site, primarily for clarity).

I did not change the reductions, as anything that involved more than the same number of Half <-> float roundtrips as exists today ended up perturbing the results to the point where tests failed, and most of the reductions involve multiple operators.

Sample benchmarks:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Numerics.Tensors;

BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);

public class Bench
{
    private Half[] _x, _y, _d;

    [Params(1, 8, 800)]
    public int Length { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        _x = new Half[Length];
        _y = new Half[Length];
        _d = new Half[Length];
        var random = new Random(42);
        for (int i = 0; i < Length; i++)
        {
            _x[i] = (Half)random.NextSingle();
            _y[i] = (Half)random.NextSingle();
        }
    }

    [Benchmark]
    public void Add() => TensorPrimitives.Add(_x, _y, _d);

    [Benchmark]
    public void Exp() => TensorPrimitives.Exp(_x, _d);

    [Benchmark]
    public void Floor() => TensorPrimitives.Floor(_x, _d);

    [Benchmark]
    public void Xor() => TensorPrimitives.Xor(_x, _y, _d);
}

On my machine with AVX256 (not AVX512)...

Before:

Method Length Mean
Add 1 12.173 ns
Exp 1 13.374 ns
Floor 1 7.554 ns
Xor 1 6.763 ns
Add 8 55.226 ns
Exp 8 86.672 ns
Floor 8 45.207 ns
Xor 8 7.802 ns
Add 800 5,119.294 ns
Exp 800 8,246.779 ns
Floor 800 4,235.245 ns
Xor 800 340.764 ns

After:

Method Length Mean
Add 1 12.286 ns
Exp 1 21.634 ns
Floor 1 8.812 ns
Xor 1 6.544 ns
Add 8 23.209 ns
Exp 8 29.455 ns
Floor 8 17.650 ns
Xor 8 6.620 ns
Add 800 896.571 ns
Exp 800 1,629.144 ns
Floor 800 678.427 ns
Xor 800 26.408 ns

Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-numerics-tensors
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds Half (16-bit float) vectorization paths for many tensor operations by reinterpreting Half as short and invoking vectorized float operators, falling back to the generic path otherwise.
Key changes:

  • Inserted typeof(T) == typeof(Half) checks and Try*HalfAsShort calls in each public API method to enable vectorized paths for Half.
  • Implemented a new TensorPrimitives.Half.cs with a suite of Try* helpers and HalfAsShort operator wrappers for unary, binary, bitwise, and aggregation scenarios.
  • Adjusted Round to use return instead of break in the AwayFromZero case to ensure consistent control flow.

Reviewed Changes

Copilot reviewed 53 out of 53 changed files in this pull request and generated 2 comments.

File Description
TensorPrimitives.Xor.cs, Truncate.cs, Tanh.cs, … Add.cs Inserted Half-specific vectorization guard and fallback logic
TensorPrimitives.Round.cs Changed break to return for MidpointRounding.AwayFromZero
TensorPrimitives.Half.cs Added Try* helpers and HalfAsShort operator wrappers
Comments suppressed due to low confidence (1)

src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.Abs.cs:29

  • New Half-vectorized paths are introduced but there are no existing tests validating the vectorized output for lengths ≥ Vector128. Please add unit tests that supply spans of Half with lengths above the SIMD threshold to cover both scalar and vectorized branches.
        public static void Abs<T>(ReadOnlySpan<T> x, Span<T> destination)

@stephentoub stephentoub merged commit f77a7f5 into dotnet:main Jun 25, 2025
79 of 86 checks passed
@stephentoub stephentoub deleted the vectorizehalf branch June 25, 2025 00:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants