Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deflateBound for HASH_BITS > 15 #1039

Closed
wants to merge 1 commit into from
Closed

Conversation

atdt
Copy link
Contributor

@atdt atdt commented Sep 2, 2021

This check was changed from HASH_BITS != 15 to HASH_BITS < 15 in commit e7bb6db, leading zlib-ng to produce upper bounds that are too small in certain cases.

This check was changed from HASH_BITS != 15 to HASH_BITS < 15 in commit e7bb6db, leading zlib-ng to produce upper bounds that are too small in certain cases.
@nmoinvaz nmoinvaz added the bug label Sep 2, 2021
@Dead2
Copy link
Member

Dead2 commented Sep 3, 2021

The reason why this was changed (in the wrong commit it seems) is related to cbc3962.

So, we default to 16 now and where 15 was the previous default and also used to be the max, 16 is currently always used unless manually changed in deflate.h. (This might change)

16 tends to compress better than 15, and the original check enabled a calculation with more buffer space when it is not 15. That was obviously correct before, but with the new default being 16 this change would never get the "tight" bound activated.

So, perhaps the real problem is that the tight bound calculation is too tight in some cases perhaps?

@atdt How did you trigger the problem? What compression level did you use? Was it level 1 (deflate_quick) by any chance?

@atdt
Copy link
Contributor Author

atdt commented Sep 3, 2021

@Dead2 Sorry, I should have included a reproduction case.

I ran into this in the form of a test failure in Apache Arrow's unit tests (TestGZipInputStream/CompressedInputStreamTest.RandomData). Here's a minimum repro:

#include <stdio.h>
#include <stdlib.h>
#include <sys/random.h>

#include "zlib.h"

int main(int argc, char *argv[]) {
  const int len = 3 << 20; /* 3 MiB */
  unsigned char buffer[len];
  if (getrandom(buffer, len, GRND_RANDOM) != len) {
    perror("getrandom()");
    exit(1);
  }

  z_stream c_stream; /* compression stream */
  int err;
  int estimateLen = 0;
  unsigned char *outBuf = NULL;

  c_stream.zalloc = Z_NULL;
  c_stream.zfree = Z_NULL;
  c_stream.opaque = (voidpf)0;
  c_stream.avail_in = len;
  c_stream.next_in = (z_const unsigned char *)buffer;
  c_stream.avail_out = 0;
  c_stream.next_out = outBuf;

  err = deflateInit2(&c_stream, Z_DEFAULT_COMPRESSION, Z_DEFLATED, -MAX_WBITS,
                     Z_BEST_COMPRESSION, Z_DEFAULT_STRATEGY);
  if (err != Z_OK) {
    fprintf(stderr, "deflateInit2() error: %d\n", err);
    exit(1);
  }

  /* calculate actual output length and update structure */
  estimateLen = deflateBound(&c_stream, len);
  fprintf(stderr, "estimateLen = %d\n", estimateLen);
  outBuf = malloc(estimateLen);

  if (outBuf != NULL) {
    /* update zlib configuration */
    c_stream.avail_out = estimateLen;
    c_stream.next_out = outBuf;

    /* do the compression */
    err = deflate(&c_stream, Z_FINISH);

    if (err == Z_STREAM_END) {
      fprintf(stderr, "deflate(): OK\n");
    } else {
      fprintf(stderr, "deflate() error: got %d; expected Z_STREAM_END (%d)\n",
              err, Z_STREAM_END);
      exit(1);
    }
  }

  err = deflateEnd(&c_stream);
  if (err != Z_OK) {
    fprintf(stderr, "deflateEnd() error: %d\n", err);
    exit(1);
  }

  free(outBuf);
}

With zlib, the output I get is:

estimateLen = 3588101
deflate(): OK

With LD_PRELOAD=/path/to/libz.so.1.2.11.zlib-ng (built with ZLIB_COMPAT), I get:

estimateLen = 3146695
deflate() error: got 0; expected Z_STREAM_END (1)

The Z_OK return value of deflate() with zlib-ng indicates that there was not enough output space.

@Dead2
Copy link
Member

Dead2 commented Sep 22, 2021

@nmoinvaz @mtl1979 Any of you have time and insight to have a look at the cause of this?
A temporary workaround could be to always use the pessimistic estimate, but do we want to go there?

@mtl1979
Copy link
Collaborator

mtl1979 commented Sep 22, 2021

@Dead2 The shift amounts are way too high at the bottom of deflateBound().... I'm getting 768, 192 and 0 for the results of the shifts. I would expect the highest number to be closer to 300000.

@nmoinvaz
Copy link
Member

@Dead2 I have not been feeling well. Maybe I can take a look at another time.

@Dead2
Copy link
Member

Dead2 commented Sep 22, 2021

@nmoinvaz Health and family first, always 😄
Get better 💊

@Dead2
Copy link
Member

Dead2 commented Dec 21, 2021

This was fixed by #1071

@Dead2 Dead2 closed this Dec 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants