New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Sha2 functions #687

Merged
merged 3 commits into from Jan 14, 2018

Conversation

Projects
None yet
2 participants
@tiehuis
Member

tiehuis commented Jan 13, 2018

We take the fastest time measurement taken across multiple runs. Tested
across multiple compiler flags and the best chosen.

Cpu: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz
Gcc: 7.2.1 20171224
Clang: 5.0.1
Zig: 0.1.1.304f6f1d

See https://www.nayuki.io/page/fast-sha2-hashes-in-x86-assembly.

Gcc -O2
    219 Mb/s
Clang -O2
    213 Mb/s
Zig --release-fast
    284 Mb/s
Zig --release-safe
    211 Mb/s
Zig
    6 Mb/s
Gcc -O2
    350 Mb/s
Clang -O2
    354 Mb/s
Zig --release-fast
    426 Mb/s
Zig --release-safe
    300 Mb/s
Zig
    11 Mb/s

release-safe is a bit slower here. Haven't delved into the assembly to see why yet.

tiehuis added some commits Jan 13, 2018

Add Sha2 functions
We take the fastest time measurement taken across multiple runs. Tested
across multiple compiler flags and the best chosen.

```
Cpu: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz
Gcc: 7.2.1 20171224
Clang: 5.0.1
Zig: 0.1.1.304f6f1d
```

See https://www.nayuki.io/page/fast-sha2-hashes-in-x86-assembly.

```
Gcc -O2
    219 Mb/s
Clang -O2
    213 Mb/s
Zig --release-fast
    284 Mb/s
Zig --release-safe
    211 Mb/s
Zig
    6 Mb/s
```

```
Gcc -O2
    350 Mb/s
Clang -O2
    354 Mb/s
Zig --release-fast
    426 Mb/s
Zig --release-safe
    300 Mb/s
Zig
    11 Mb/s
```
@tiehuis

This comment has been minimized.

Member

tiehuis commented Jan 13, 2018

Seems like an i386 issue, probably assuming a variable width incorrectly somewhere. Can reproduce the same issue with wine. Getting late here, so will fix this up tomorrow.

@andrewrk

This comment has been minimized.

Member

andrewrk commented Jan 13, 2018

This looks like #537.

Feel free to disable the test for 32 bit windows. There are enough instances of this that we should probably remove 32 bit windows support from the readme and remove it from the test matrix, until we can pass all the tests.

@andrewrk andrewrk merged commit e7e7625 into master Jan 14, 2018

4 checks passed

continuous-integration/appveyor/branch AppVeyor build succeeded
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@tiehuis tiehuis deleted the sha2 branch Jan 17, 2018

@andrewrk

This comment has been minimized.

Member

andrewrk commented Mar 19, 2018

@tiehuis
I'm trying to reproduce your findings here. I see that the nayuki page has a test harness for the C implementations. What did you do to come up with the zig numbers?

@andrewrk

This comment has been minimized.

Member

andrewrk commented Mar 19, 2018

Ah, I found std/crypto/throughput_test.zig

@andrewrk

This comment has been minimized.

Member

andrewrk commented Mar 19, 2018

I looked into this a little bit, and I noticed that the throughput test in zig was getting LTO (in the sense that we emit only a single LLVM module/ .o file) while the C benchmark had to make function calls across .o files. So I copied the sha256.c function into the test .c file and made all the functions static. This actually did not change the timings. I found that your zig implementation of sha256 generates 14% faster machine code.

I looked at a comparison of the assembly and it's hard to tell exactly what is different, but it appears that the zig implementation has slightly better instruction selection.

@andrewrk

This comment has been minimized.

Member

andrewrk commented Mar 19, 2018

Your zig sha256 implementation is faster than the hand rolled assembly from the nayuki page.

  • nayuki hand rolled x86_64 assembly sha256: 192.6 MB/s
  • zig stdlib sha256 throughput test: 204 MB/s
@tiehuis

This comment has been minimized.

Member

tiehuis commented Mar 19, 2018

The initial tests here I actually exported the zig functions with c bindings so the call overhead was the same.

@andrewrk

This comment has been minimized.

Member

andrewrk commented Mar 20, 2018

I think I know what's going on. If you look at https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/sha-256-implementations-paper.pdf there's a section called "Optimizations with rorx". I believe that LLVM is able to come up with these optimizations with the zig implementation, but it somehow does not discover them with the clang implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment