Pre-duplicate kernel values in ResizeKernelMap for faster FMA convolution

As @saucecontrol pointed out in his [comment](https://github.com/SixLabors/ImageSharp/pull/1513/files#r561177884), we can get rid of `VPERMS` in the following code:

https://github.com/SixLabors/ImageSharp/blob/e2211c316daab3ae59eb85fbc189288849eb54d2/src/ImageSharp/Processing/Processors/Transforms/Resize/ResizeKernel.cs#L104-L112

If FMA is detected we should allocate 4x buffer and to the duplication in `ResizeKernelMap.Calculate`, which should be much cheaper than doing it in every convolution:

https://github.com/SixLabors/ImageSharp/blob/e2211c316daab3ae59eb85fbc189288849eb54d2/src/ImageSharp/Processing/Processors/Transforms/Resize/ResizeKernelMap.cs#L115-L120

	result256_0 = Fma.MultiplyAdd(
	Unsafe.As<Vector4, Vector256<float>>(ref rowStartRef),
	Avx2.PermuteVar8x32(Vector256.CreateScalarUnsafe((double)bufferStart).AsSingle(), mask),
	result256_0);

	result256_1 = Fma.MultiplyAdd(
	Unsafe.As<Vector4, Vector256<float>>(ref Unsafe.Add(ref rowStartRef, 2)),
	Avx2.PermuteVar8x32(Vector256.CreateScalarUnsafe((double)(bufferStart + 2)).AsSingle(), mask),
	result256_1);

	public static ResizeKernelMap Calculate<TResampler>(
	in TResampler sampler,
	int destinationSize,
	int sourceSize,
	MemoryAllocator memoryAllocator)
	where TResampler : struct, IResampler

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Pre-duplicate kernel values in ResizeKernelMap for faster FMA convolution #1515

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Pre-duplicate kernel values in ResizeKernelMap for faster FMA convolution #1515

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions