Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What tools do you use for benchmarking? #326

Closed
bjne opened this issue Mar 8, 2019 · 7 comments
Closed

What tools do you use for benchmarking? #326

bjne opened this issue Mar 8, 2019 · 7 comments
Labels

Comments

@bjne
Copy link

bjne commented Mar 8, 2019

Today I made some benchmarks using jsnell/zlib-bench, using the following config:

my @versions = (
    { id => 'baseline', repository => 'https://github.com/madler/zlib.git', commit_or_branch => 'cacf7f1' },
    { id => 'cloudflare', repository => 'https://github.com/cloudflare/zlib.git', commit_or_branch => 'e55212b' },
    { id => 'intel', repository => 'https://github.com/jtkukunas/zlib.git', commit_or_branch => '4b0ef4e'},
    { id => 'zlib-ng', repository => 'https://github.com/zlib-ng/zlib-ng.git', commit_or_branch => '3ee7109', CONFIGURE_FLAGS => '--zlib-compat'},
);

None of the results are in favor of zlib-ng, even tho it afaik uses patches from both intel and cf?
Am I missing something here? Why does intel outperform everyone on decompression?
What does cloudflare do at level 9?

results:

                     baseline               cloudflare             intel                  zlib-ng

 dec exec  time [s]:  1.78 ± 0.00 (100.00%)  1.52 ± 0.00 ( 85.74%)  1.37 ± 0.00 ( 77.07%)  1.56 ± 0.00 ( 87.75%)
 dec html  time [s]:  1.02 ± 0.00 (100.00%)  0.87 ± 0.00 ( 85.42%)  0.68 ± 0.00 ( 67.01%)  0.90 ± 0.00 ( 88.32%)
 dec jpeg  time [s]:  0.24 ± 0.00 (100.00%)  0.13 ± 0.00 ( 52.57%)  0.14 ± 0.00 ( 60.33%)  0.24 ± 0.00 (100.84%)
 dec pngp  time [s]:  1.12 ± 0.00 (100.00%)  0.86 ± 0.00 ( 76.44%)  0.74 ± 0.00 ( 66.07%)  0.89 ± 0.00 ( 79.71%)

compress executable -5 (2 iterations)
 exec 0.33 time [s]:  0.47 ± 0.00 (100.00%)  0.25 ± 0.00 ( 53.22%)  0.26 ± 0.00 ( 55.07%)  0.27 ± 0.00 ( 58.73%)
 html 0.33 time [s]:  0.28 ± 0.00 (100.00%)  0.16 ± 0.00 ( 55.74%)  0.15 ± 0.00 ( 52.17%)  0.18 ± 0.00 ( 64.24%)
 jpeg 1.00 time [s]:  0.20 ± 0.00 (100.00%)  0.14 ± 0.00 ( 71.31%)  0.19 ± 0.00 ( 96.80%)  0.20 ± 0.00 (102.78%)
 pngp 0.14 time [s]:  0.35 ± 0.00 (100.00%)  0.16 ± 0.00 ( 46.29%)  0.19 ± 0.00 ( 55.16%)  0.20 ± 0.00 ( 55.36%)

compress executable -9 (2 iterations)
 exec 0.33 time [s]:  2.55 ± 0.00 (100.00%)  1.09 ± 0.00 ( 42.72%)  2.00 ± 0.00 ( 78.40%)  1.97 ± 0.00 ( 77.23%)
 html 0.33 time [s]:  0.77 ± 0.00 (100.00%)  0.43 ± 0.00 ( 56.55%)  0.68 ± 0.00 ( 88.47%)  0.68 ± 0.00 ( 88.42%)
 jpeg 1.00 time [s]:  0.19 ± 0.00 (100.00%)  0.14 ± 0.00 ( 73.97%)  0.14 ± 0.00 ( 73.81%)  0.16 ± 0.00 ( 83.79%)
 pngp 0.12 time [s]:  7.02 ± 0.00 (100.00%)  3.85 ± 0.00 ( 54.89%)  5.74 ± 0.00 ( 81.77%)  5.15 ± 0.00 ( 73.38%)
@Dead2
Copy link
Member

Dead2 commented Mar 8, 2019

Both intel and cloudflare forks make assumptions that we do not dare to. We have fixed a lot of corner-case bugs and regressions in code from both sources. Also we do not have all the latest cloudflare patches, since they are complicated to port over and in some cases make assumptions we can not live with.

zlib-ng is compatible with multiple architectures and contain optimizations for multiple instruction set combinations. Our target above all is that the code should be safe, even if that means we cannot utilize some of the optimizations others do. I see from your benchmarks that we are doing pretty good compared to stock zlib at least.

PR #310 should improve our decompression numbers quite a bit, at least if you test on an x86 platform.

We are currently trying to get to the point where we can release a stable 1.0 version with a quality that can be used in programs and distros by default. After that, we can once again start looking at optimization opportunities.

We have been talking about making a couple of our benchmarking utils available on github, so that is probably going to happen pretty soon, once we get rid of some hard-coded assumptions that might only be valid on our own developer workstation and such.

I am using a home-made python script, while @sebpop I think is using the chromium zlib benchmarking suite.
jsnell's benchmark is pretty good.
Squash benchmark is not so good since it is unable to run the configure/cmake scripts and thus compiles the code with all optimizations disabled, so the performance is almost identical to stock zlib.

@sebpop
Copy link
Contributor

sebpop commented Mar 8, 2019

Why does intel outperform everyone on decompression?

The problem with all those benchmarks (jsnell/zlib-bench, #261 Chromium zlib benchmark)
is that they compare decompression speeds of different representations of the compressed data.
All those benchmarks take as input some raw data, compress it, then decompress that representation.
One cannot compare the reported numbers for the decompress when the input to decompress comes from different compress implementations, as the representation of the compressed data differs.

@bmrzycki fixed the chromium benchmark for the decompress to be immune to this issue:
the new decompress benchmark collects compressed files as compressed by different compression levels, and different implementations of zlib compress.
With identical input files one can compare the speed of different implementations of decompress.

@bjne
Copy link
Author

bjne commented Mar 8, 2019

Why does intel outperform everyone on decompression?

The problem with all those benchmarks (jsnell/zlib-bench, #261 Chromium zlib benchmark)
is that they compare decompression speeds of different representations of the compressed data.
All those benchmarks take as input some raw data, compress it, then decompress that representation.
One cannot compare the reported numbers for the decompress when the input to decompress comes from different compress implementations, as the representation of the compressed data differs.

I think this sounds like a bad excuse, and dont really see the relevance. Compression size is equal
for all implementation (insignificant difference)

@bmrzycki fixed the chromium benchmark for the decompress to be immune to this issue:
the new decompress benchmark collects compressed files as compressed by different compression levels, and different implementations of zlib compress.
With identical input files one can compare the speed of different implementations of decompress.

..which is not always what one would want

@sebpop
Copy link
Contributor

sebpop commented Mar 8, 2019

dont really see the relevance. Compression size is equal for all implementation (insignificant difference)

I will give you an example where compressed format change may impact the speed of decompression:
I can add a filter to compress that will discard all longest_match strings of length 3. You will see in the compressed file more redundant strings of lengths 2, 1, 4, and probably larger strings as well. This change seems innocuous. Now considering that in the case of strings of length 3 ARM implements memset with a vld3 and x86 has no equivalent as in 8e7f11c.
With that encoder vs. an encoder without the filter you will see no big difference in compressed file size, and you will see a big difference in performance when decompressing the files encoded by those two encoders on ARM vs. x86.

This is only one of many examples I can think of that shows that compressed file representation is important when comparing different decompress implementations.

@neurolabusc
Copy link

Related to this issue, I have provided a dataset that shows a dramatic difference between CloudFlare and zlib-ng. The GZ format is popular in my field to compress brain images in the NIfTI format. These are often huge datasets, acquired as 16-bit integers with poor signal to noise. It is common to zero signal outside the brain, and conduct computations on 32-bit floating point data. These files have long runs of repeated values (e.g. all voxels outside the brain are zero). These images appear to compress much faster with CloudFlare. I would be very interested if anyone has any idea if zlib-ng can be updated to be more competitive (and I think this would have a real impact in my field.

I am very aware that the zlib-ng team is focused on robustness over all-out performance. The cmake files in zlib-ng provide a lot of clever functions, and the benefits of zlib-ng are substantial relative to the standard zlib. Therefore, I completely expect people to defer investigating this issue until after the 1.0 release.

As a minor aside, I do think the zlib-ng configure script could get a couple tweaks. First, it specifies -O2 while the baseline and CloudFlare use -O3. I think the general consensus is that -O3 is now very robust, and can provide the compiler a chance to do some nice optimizations. In my brief testing, this had no impact on my benchmarks, but given that the reference zlib library uses this, I think it is a safe choice. The other thing is that the configure script will build several tools with -O2 -O2, this has no impact, but for clarity I would only specify the optimization level once.

@neurolabusc
Copy link

neurolabusc commented Feb 25, 2020

@sebpop my pigz-bench attempts to address your concern regarding decompression tests. Each zlib-variant contributes compressed files to the corpus, and each lib-variant decompresses ALL of these files. So each tool decompresses the same files. At the moment, on x86-64 CloudFlare does have superior compression performance, perhaps reflecting the aggressive optimization that make assumptions @Dead2 refers to (though I have not detected any issues). On the other hand, the zlib-ng does show superior decompression for x86-64. The benchmark shows how pigz can be compiled with each variant, allowing the user to choose the variant that suits their taste.

@sebpop
Copy link
Contributor

sebpop commented Feb 25, 2020

Thanks @neurolabusc for addressing the input data issue in your decompression benchmark. I will use your benchmark in my experiments. Thanks!

@Dead2 Dead2 closed this as completed Feb 25, 2021
@zlib-ng zlib-ng locked and limited conversation to collaborators Feb 25, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Projects
None yet
Development

No branches or pull requests

5 participants