Skip to content

Pre-duplicate kernel values in ResizeKernelMap for faster FMA convolution #1515

Open
grepdemos/ImageSharp
#3
@antonfirsov

Description

@antonfirsov

As @saucecontrol pointed out in his comment, we can get rid of VPERMS in the following code:

result256_0 = Fma.MultiplyAdd(
Unsafe.As<Vector4, Vector256<float>>(ref rowStartRef),
Avx2.PermuteVar8x32(Vector256.CreateScalarUnsafe(*(double*)bufferStart).AsSingle(), mask),
result256_0);
result256_1 = Fma.MultiplyAdd(
Unsafe.As<Vector4, Vector256<float>>(ref Unsafe.Add(ref rowStartRef, 2)),
Avx2.PermuteVar8x32(Vector256.CreateScalarUnsafe(*(double*)(bufferStart + 2)).AsSingle(), mask),
result256_1);

If FMA is detected we should allocate 4x buffer and to the duplication in ResizeKernelMap.Calculate, which should be much cheaper than doing it in every convolution:

public static ResizeKernelMap Calculate<TResampler>(
in TResampler sampler,
int destinationSize,
int sourceSize,
MemoryAllocator memoryAllocator)
where TResampler : struct, IResampler

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions