Skip to content

Pre-duplicate kernel values in ResizeKernelMap for faster FMA convolution #1515

Open
grepdemos/ImageSharp
#3
@antonfirsov

Description

@antonfirsov
Member

As @saucecontrol pointed out in his comment, we can get rid of VPERMS in the following code:

result256_0 = Fma.MultiplyAdd(
Unsafe.As<Vector4, Vector256<float>>(ref rowStartRef),
Avx2.PermuteVar8x32(Vector256.CreateScalarUnsafe(*(double*)bufferStart).AsSingle(), mask),
result256_0);
result256_1 = Fma.MultiplyAdd(
Unsafe.As<Vector4, Vector256<float>>(ref Unsafe.Add(ref rowStartRef, 2)),
Avx2.PermuteVar8x32(Vector256.CreateScalarUnsafe(*(double*)(bufferStart + 2)).AsSingle(), mask),
result256_1);

If FMA is detected we should allocate 4x buffer and to the duplication in ResizeKernelMap.Calculate, which should be much cheaper than doing it in every convolution:

public static ResizeKernelMap Calculate<TResampler>(
in TResampler sampler,
int destinationSize,
int sourceSize,
MemoryAllocator memoryAllocator)
where TResampler : struct, IResampler

Activity

added this to the Future milestone on Jan 21, 2021
changed the title [-]Pre-duplicate kernels in ResizeKernelMap for faster FMA convolution[/-] [+]Pre-duplicate kernel values in ResizeKernelMap for faster FMA convolution[/+] on Jan 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Participants

      @antonfirsov

      Issue actions

        Pre-duplicate kernel values in ResizeKernelMap for faster FMA convolution · Issue #1515 · SixLabors/ImageSharp