What tools do you use for benchmarking? #326

bjne · 2019-03-08T10:29:49Z

Today I made some benchmarks using jsnell/zlib-bench, using the following config:

my @versions = (
    { id => 'baseline', repository => 'https://github.com/madler/zlib.git', commit_or_branch => 'cacf7f1' },
    { id => 'cloudflare', repository => 'https://github.com/cloudflare/zlib.git', commit_or_branch => 'e55212b' },
    { id => 'intel', repository => 'https://github.com/jtkukunas/zlib.git', commit_or_branch => '4b0ef4e'},
    { id => 'zlib-ng', repository => 'https://github.com/zlib-ng/zlib-ng.git', commit_or_branch => '3ee7109', CONFIGURE_FLAGS => '--zlib-compat'},
);

None of the results are in favor of zlib-ng, even tho it afaik uses patches from both intel and cf?
Am I missing something here? Why does intel outperform everyone on decompression?
What does cloudflare do at level 9?

results:

                     baseline               cloudflare             intel                  zlib-ng

 dec exec  time [s]:  1.78 ± 0.00 (100.00%)  1.52 ± 0.00 ( 85.74%)  1.37 ± 0.00 ( 77.07%)  1.56 ± 0.00 ( 87.75%)
 dec html  time [s]:  1.02 ± 0.00 (100.00%)  0.87 ± 0.00 ( 85.42%)  0.68 ± 0.00 ( 67.01%)  0.90 ± 0.00 ( 88.32%)
 dec jpeg  time [s]:  0.24 ± 0.00 (100.00%)  0.13 ± 0.00 ( 52.57%)  0.14 ± 0.00 ( 60.33%)  0.24 ± 0.00 (100.84%)
 dec pngp  time [s]:  1.12 ± 0.00 (100.00%)  0.86 ± 0.00 ( 76.44%)  0.74 ± 0.00 ( 66.07%)  0.89 ± 0.00 ( 79.71%)

compress executable -5 (2 iterations)
 exec 0.33 time [s]:  0.47 ± 0.00 (100.00%)  0.25 ± 0.00 ( 53.22%)  0.26 ± 0.00 ( 55.07%)  0.27 ± 0.00 ( 58.73%)
 html 0.33 time [s]:  0.28 ± 0.00 (100.00%)  0.16 ± 0.00 ( 55.74%)  0.15 ± 0.00 ( 52.17%)  0.18 ± 0.00 ( 64.24%)
 jpeg 1.00 time [s]:  0.20 ± 0.00 (100.00%)  0.14 ± 0.00 ( 71.31%)  0.19 ± 0.00 ( 96.80%)  0.20 ± 0.00 (102.78%)
 pngp 0.14 time [s]:  0.35 ± 0.00 (100.00%)  0.16 ± 0.00 ( 46.29%)  0.19 ± 0.00 ( 55.16%)  0.20 ± 0.00 ( 55.36%)

compress executable -9 (2 iterations)
 exec 0.33 time [s]:  2.55 ± 0.00 (100.00%)  1.09 ± 0.00 ( 42.72%)  2.00 ± 0.00 ( 78.40%)  1.97 ± 0.00 ( 77.23%)
 html 0.33 time [s]:  0.77 ± 0.00 (100.00%)  0.43 ± 0.00 ( 56.55%)  0.68 ± 0.00 ( 88.47%)  0.68 ± 0.00 ( 88.42%)
 jpeg 1.00 time [s]:  0.19 ± 0.00 (100.00%)  0.14 ± 0.00 ( 73.97%)  0.14 ± 0.00 ( 73.81%)  0.16 ± 0.00 ( 83.79%)
 pngp 0.12 time [s]:  7.02 ± 0.00 (100.00%)  3.85 ± 0.00 ( 54.89%)  5.74 ± 0.00 ( 81.77%)  5.15 ± 0.00 ( 73.38%)

The text was updated successfully, but these errors were encountered:

Dead2 · 2019-03-08T10:57:54Z

Both intel and cloudflare forks make assumptions that we do not dare to. We have fixed a lot of corner-case bugs and regressions in code from both sources. Also we do not have all the latest cloudflare patches, since they are complicated to port over and in some cases make assumptions we can not live with.

zlib-ng is compatible with multiple architectures and contain optimizations for multiple instruction set combinations. Our target above all is that the code should be safe, even if that means we cannot utilize some of the optimizations others do. I see from your benchmarks that we are doing pretty good compared to stock zlib at least.

PR #310 should improve our decompression numbers quite a bit, at least if you test on an x86 platform.

We are currently trying to get to the point where we can release a stable 1.0 version with a quality that can be used in programs and distros by default. After that, we can once again start looking at optimization opportunities.

We have been talking about making a couple of our benchmarking utils available on github, so that is probably going to happen pretty soon, once we get rid of some hard-coded assumptions that might only be valid on our own developer workstation and such.

I am using a home-made python script, while @sebpop I think is using the chromium zlib benchmarking suite.
jsnell's benchmark is pretty good.
Squash benchmark is not so good since it is unable to run the configure/cmake scripts and thus compiles the code with all optimizations disabled, so the performance is almost identical to stock zlib.

sebpop · 2019-03-08T15:34:31Z

Why does intel outperform everyone on decompression?

The problem with all those benchmarks (jsnell/zlib-bench, #261 Chromium zlib benchmark)
is that they compare decompression speeds of different representations of the compressed data.
All those benchmarks take as input some raw data, compress it, then decompress that representation.
One cannot compare the reported numbers for the decompress when the input to decompress comes from different compress implementations, as the representation of the compressed data differs.

@bmrzycki fixed the chromium benchmark for the decompress to be immune to this issue:
the new decompress benchmark collects compressed files as compressed by different compression levels, and different implementations of zlib compress.
With identical input files one can compare the speed of different implementations of decompress.

bjne · 2019-03-08T18:53:20Z

Why does intel outperform everyone on decompression?

The problem with all those benchmarks (jsnell/zlib-bench, #261 Chromium zlib benchmark)
is that they compare decompression speeds of different representations of the compressed data.
All those benchmarks take as input some raw data, compress it, then decompress that representation.
One cannot compare the reported numbers for the decompress when the input to decompress comes from different compress implementations, as the representation of the compressed data differs.

I think this sounds like a bad excuse, and dont really see the relevance. Compression size is equal
for all implementation (insignificant difference)

@bmrzycki fixed the chromium benchmark for the decompress to be immune to this issue:
the new decompress benchmark collects compressed files as compressed by different compression levels, and different implementations of zlib compress.
With identical input files one can compare the speed of different implementations of decompress.

..which is not always what one would want

sebpop · 2019-03-08T21:05:35Z

dont really see the relevance. Compression size is equal for all implementation (insignificant difference)

I will give you an example where compressed format change may impact the speed of decompression:
I can add a filter to compress that will discard all longest_match strings of length 3. You will see in the compressed file more redundant strings of lengths 2, 1, 4, and probably larger strings as well. This change seems innocuous. Now considering that in the case of strings of length 3 ARM implements memset with a vld3 and x86 has no equivalent as in 8e7f11c.
With that encoder vs. an encoder without the filter you will see no big difference in compressed file size, and you will see a big difference in performance when decompressing the files encoded by those two encoders on ARM vs. x86.

This is only one of many examples I can think of that shows that compressed file representation is important when comparing different decompress implementations.

neurolabusc · 2020-01-25T15:21:19Z

Related to this issue, I have provided a dataset that shows a dramatic difference between CloudFlare and zlib-ng. The GZ format is popular in my field to compress brain images in the NIfTI format. These are often huge datasets, acquired as 16-bit integers with poor signal to noise. It is common to zero signal outside the brain, and conduct computations on 32-bit floating point data. These files have long runs of repeated values (e.g. all voxels outside the brain are zero). These images appear to compress much faster with CloudFlare. I would be very interested if anyone has any idea if zlib-ng can be updated to be more competitive (and I think this would have a real impact in my field.

I am very aware that the zlib-ng team is focused on robustness over all-out performance. The cmake files in zlib-ng provide a lot of clever functions, and the benefits of zlib-ng are substantial relative to the standard zlib. Therefore, I completely expect people to defer investigating this issue until after the 1.0 release.

As a minor aside, I do think the zlib-ng configure script could get a couple tweaks. First, it specifies -O2 while the baseline and CloudFlare use -O3. I think the general consensus is that -O3 is now very robust, and can provide the compiler a chance to do some nice optimizations. In my brief testing, this had no impact on my benchmarks, but given that the reference zlib library uses this, I think it is a safe choice. The other thing is that the configure script will build several tools with -O2 -O2, this has no impact, but for clarity I would only specify the optimization level once.

neurolabusc · 2020-02-25T13:42:58Z

@sebpop my pigz-bench attempts to address your concern regarding decompression tests. Each zlib-variant contributes compressed files to the corpus, and each lib-variant decompresses ALL of these files. So each tool decompresses the same files. At the moment, on x86-64 CloudFlare does have superior compression performance, perhaps reflecting the aggressive optimization that make assumptions @Dead2 refers to (though I have not detected any issues). On the other hand, the zlib-ng does show superior decompression for x86-64. The benchmark shows how pigz can be compiled with each variant, allowing the user to choose the variant that suits their taste.

sebpop · 2020-02-25T15:02:26Z

Thanks @neurolabusc for addressing the input data issue in your decompression benchmark. I will use your benchmark in my experiments. Thanks!

gdevenyi mentioned this issue Jan 28, 2020

Fix double specification of -O2 in ./configure #507

Merged

gdevenyi mentioned this issue Feb 7, 2020

Add initial AVX2 support. #502

Merged

neurolabusc mentioned this issue Feb 25, 2020

Improved zlib/pigz on ARM WorksOnArm/equinix-metal-arm64-cluster#195

Closed

nmoinvaz added the question label Jun 13, 2020

Dead2 closed this as completed Feb 25, 2021

zlib-ng locked and limited conversation to collaborators Feb 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

What tools do you use for benchmarking? #326

What tools do you use for benchmarking? #326

bjne commented Mar 8, 2019

Dead2 commented Mar 8, 2019

sebpop commented Mar 8, 2019 •

edited

bjne commented Mar 8, 2019

sebpop commented Mar 8, 2019 •

edited

neurolabusc commented Jan 25, 2020

neurolabusc commented Feb 25, 2020 •

edited

sebpop commented Feb 25, 2020

This issue was moved to a discussion.

This issue was moved to a discussion.

What tools do you use for benchmarking? #326

What tools do you use for benchmarking? #326

Comments

bjne commented Mar 8, 2019

Dead2 commented Mar 8, 2019

sebpop commented Mar 8, 2019 • edited

bjne commented Mar 8, 2019

sebpop commented Mar 8, 2019 • edited

neurolabusc commented Jan 25, 2020

neurolabusc commented Feb 25, 2020 • edited

sebpop commented Feb 25, 2020

This issue was moved to a discussion.

sebpop commented Mar 8, 2019 •

edited

sebpop commented Mar 8, 2019 •

edited

neurolabusc commented Feb 25, 2020 •

edited