Skip to content

Conversation

@jedisct1
Copy link
Contributor

This is a rewrite of the BLAKE3 implementation, with vectorization.

On Apple Silicon, the new implementation is about twice as fast as the previous one.

With AVX2, it is more than 4 times faster.

With AVX512, it is more than 7.5x faster than the previous implementation (from 678 MB/s to 5086 MB/s).

This is a rewrite of the BLAKE3 implementation, with vectorization.

On Apple Silicon, the new implementation is about twice as fast as
the previous one.

With AVX2, it is more than 4 times faster.

With AVX512, it is more than 7.5x faster than the previous
implementation (from 678 MB/s to 5086 MB/s).
@alexrp
Copy link
Member

alexrp commented Oct 14, 2025

If you don't mind, I suggest also opening a PR on Codeberg just for CI purposes; we have more targets covered there.

(I had to remove the RISC-V runners from GitHub because the runner binary is built and installed by hand, i.e. no auto-updates, and GitHub will just refuse to let a runner connect once its version is old enough, even if nothing is actually wrong with it.)

@rpkak
Copy link
Contributor

rpkak commented Oct 14, 2025

#15375

@jedisct1 jedisct1 merged commit 6669885 into ziglang:master Oct 15, 2025
9 checks passed
@jedisct1 jedisct1 deleted the blake3 branch November 1, 2025 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants