Heavily Optimized `std.mem.eql` with SIMD #18389

Rexicon226 · 2023-12-28T07:27:57Z

The current stdlib implementation is not as good as it can be for u8, which is the most commonly used one by far. @kprotty and I have taken the task of vectorizing and accelerating this function by orders of magnitude. The idea here is to split this api into 2 different functions. The normal eql function, which works on anything, and the much more optimized eqlBytes, which specifically optimizes bytes. eql calls eqlBytes if T is u8.

For any other integers, we still use a much faster xor accumulator design, see benchmarks below.

Benchmarks

The benchmarks are run in as good of a script as I was able to make, including using cpu counters, pinning the process to 1 core, and clearing caches between different function runs. Warmups happen for each function, as well as at least 1000 runs.

The CPU the benchmarks were run on:

AMD Ryzen 9 6900HS Creator Edition (16) @ 3.293GHz

Important to note that the cpu does have constant_tsc enabled.

There are 3 benchmarks for the 3 categories, start, middle, same.

start is when the difference is at the first byte. This is meant to show the start-up cost, and how fast it exists.
middle shows a sort of "middle" case. Is it able to find it faster in smaller inputs? Does it have an early exit?
same is the worst case scenario. Here the entire input must be checked, and is actually the most common input. People will usually use equality to see it's true rather than false.

Run with: zig build-exe benchmark.zig -lc -OReleaseFast

benchmarks

The start on both versions is measured under 100 cycles, which I consider to be starting to get close to the error margin. At this level, there isn't any point to comparing them, and both running this fast will have virtually no performance difference for the user.

Another benchmark I tried was running the perf_test.zig inside of lib/std/zig. I felt that the parser and tokenizer would be good spot with many mem.eql usages.

stdlib	xor eql
`parsing speed: 117.96MiB/s`	`parsing speed: 124.25MiB/s`

that is a solid 6% speed increase on average. Note that I up the iteration count from 100 to 10000, to get the most precise readings.

I also added sched_setaffinity , as it seems the getaffinity was present, but there was no setaffinity and I needed it for the benchmarking.

Co-authored-by: Protty <45520026+kprotty@users.noreply.github.com>

this allows stage2 to make use of the optimized eql.

lib/std/mem.zig

Co-authored-by: Ryan Liptak <squeek502@hotmail.com>

instead of just checking pow of 2.

lib/std/mem.zig

no point in having it public as it creates a duplicate way of doing something. this conflicts with the zen.

lib/std/mem.zig

andrewrk · 2024-01-09T07:51:05Z

Great work! I love to see this kind of collaboration.

your next mission...

* optimized memeql * add `sched_setaffinity` to `std.os.linux` Co-authored-by: Protty <45520026+kprotty@users.noreply.github.com> Co-authored-by: Ryan Liptak <squeek502@hotmail.com>

Rexicon226 and others added 4 commits December 26, 2023 21:20

optimized memeql

8f9e597

Co-authored-by: Protty <45520026+kprotty@users.noreply.github.com>

add sched_setaffinity to std.os.linux

e49c717

disable for stage2 x86_64 and x86

7cf7777

switch to using boolean reduces

3325f52

this allows stage2 to make use of the optimized eql.

squeek502 reviewed Dec 28, 2023

View reviewed changes

lib/std/mem.zig Outdated Show resolved Hide resolved

Rexicon226 and others added 2 commits December 28, 2023 00:00

fix mem.zig to use and not or

942ae0d

Co-authored-by: Ryan Liptak <squeek502@hotmail.com>

change to use std.meta.hasUniqueRepresentation

3577423

instead of just checking pow of 2.

Vexu reviewed Dec 28, 2023

View reviewed changes

lib/std/mem.zig Outdated Show resolved Hide resolved

IntegratedQuantum reviewed Dec 28, 2023

View reviewed changes

lib/std/mem.zig Outdated Show resolved Hide resolved

karlseguin reviewed Dec 28, 2023

View reviewed changes

lib/std/mem.zig Show resolved Hide resolved

Rexicon226 force-pushed the optimized-mem-eql branch from d24b412 to 3325f52 Compare December 29, 2023 01:13

Rexicon226 and others added 3 commits December 28, 2023 18:13

Merge branch 'master' into optimized-mem-eql

e22cd0d

Merge branch 'master' into optimized-mem-eql

9b0cc80

remove pub from eqlBytes

56e852d

no point in having it public as it creates a duplicate way of doing something. this conflicts with the zen.

sno2 reviewed Dec 29, 2023

View reviewed changes

lib/std/mem.zig Outdated Show resolved Hide resolved

rootbeer reviewed Dec 29, 2023

View reviewed changes

lib/std/mem.zig Outdated Show resolved Hide resolved

Rexicon226 added 2 commits December 29, 2023 19:02

prevent eqlBytes in comptime

1ca8232

move eqlBytes branch back to the start of the function.

fe916bf

Vexu approved these changes Dec 30, 2023

View reviewed changes

andrewrk merged commit 2f8e434 into ziglang:master Jan 9, 2024

Rexicon226 deleted the optimized-mem-eql branch January 9, 2024 07:53

Rexicon226 mentioned this pull request Nov 22, 2024

Poor code generation for std.mem.eql #8689

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Heavily Optimized `std.mem.eql` with SIMD #18389

Heavily Optimized `std.mem.eql` with SIMD #18389

Uh oh!

Rexicon226 commented Dec 28, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andrewrk commented Jan 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Uh oh!

Heavily Optimized std.mem.eql with SIMD #18389

Heavily Optimized std.mem.eql with SIMD #18389

Uh oh!

Conversation

Rexicon226 commented Dec 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andrewrk commented Jan 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Heavily Optimized `std.mem.eql` with SIMD #18389

Heavily Optimized `std.mem.eql` with SIMD #18389

Rexicon226 commented Dec 28, 2023 •

edited

Loading