-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Enable TensorPrimitive vectorization for Half for lots of methods #116934
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tagging subscribers to this area: @dotnet/area-system-numerics-tensors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds Half (16-bit float) vectorization paths for many tensor operations by reinterpreting Half as short and invoking vectorized float operators, falling back to the generic path otherwise.
Key changes:
- Inserted
typeof(T) == typeof(Half)
checks andTry*HalfAsShort
calls in each public API method to enable vectorized paths forHalf
. - Implemented a new
TensorPrimitives.Half.cs
with a suite ofTry*
helpers andHalfAsShort
operator wrappers for unary, binary, bitwise, and aggregation scenarios. - Adjusted
Round
to usereturn
instead ofbreak
in theAwayFromZero
case to ensure consistent control flow.
Reviewed Changes
Copilot reviewed 53 out of 53 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
TensorPrimitives.Xor.cs, Truncate.cs, Tanh.cs, … Add.cs | Inserted Half-specific vectorization guard and fallback logic |
TensorPrimitives.Round.cs | Changed break to return for MidpointRounding.AwayFromZero |
TensorPrimitives.Half.cs | Added Try* helpers and HalfAsShort operator wrappers |
Comments suppressed due to low confidence (1)
src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.Abs.cs:29
- New Half-vectorized paths are introduced but there are no existing tests validating the vectorized output for lengths ≥ Vector128. Please add unit tests that supply spans of Half with lengths above the SIMD threshold to cover both scalar and vectorized branches.
public static void Abs<T>(ReadOnlySpan<T> x, Span<T> destination)
...braries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.Half.cs
Outdated
Show resolved
Hide resolved
...ibraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.Add.cs
Outdated
Show resolved
Hide resolved
63fac2b
to
979d5f5
Compare
...braries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.Half.cs
Outdated
Show resolved
Hide resolved
...braries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.Half.cs
Outdated
Show resolved
Hide resolved
979d5f5
to
fe1740e
Compare
This enables all of the following operators to be vectorized for T == Half:
Abs, Add, AddMultiply, BitwiseAnd, BitwiseOr, Ceiling, Clamp, CopySign, Cos, CosPi, Cosh, Decrement, DegreesToRadians, Divide, Exp, Exp10, Exp10M1, Exp2, Exp2M1, ExpM1, Floor, FusedAddMultiply, Hypot, Increment, Lerp, Log, Log10, Log10P1, Log2, Log2P1, LogP1, Max, MaxMagnitude, MaxMagnitudeNumber, MaxNumber, Min, MinMagnitude, MinMagnitudeNumber, MinNumber, Multiply, MultiplyAdd, MultiplyAddEstimate, Negate, OnesComplement, Reciprocal, Remainder, Round, Sigmoid, Sin, SinPi, Sinh, Sqrt, Subtract, Tan, TanPi, Tanh, Truncate, Xor.
I don't love that it required adding code to each of those methods, but I couldn't come up with anything better and kept the code per method to a minimum (I left the typeof(T) == typeof(Half) check at each call site, primarily for clarity).
I did not change the reductions, as anything that involved more than the same number of Half <-> float roundtrips as exists today ended up perturbing the results to the point where tests failed, and most of the reductions involve multiple operators.
Sample benchmarks:
On my machine with AVX256 (not AVX512)...
Before:
After: