Error encoding block when compressing /dev/urandom #10

axel-angel · 2013-01-05T17:20:03Z

When compressing my /dev/urandom for testing I always get this error eventually. It comes something like 15 seconds after starting. It won't reproduce with XZ or PXZ. I don't have the time to dig further but it seems like a bug to me. I can reproduce every time.

I took the last version on the repo and used:
dd if=/dev/urandom | pixz -7 > /dev/null

I can compress without errors with -6. The option -8 raises the error too. I can upload here a test file but it's quite bug: 224 Mio. I'm using -7 because it seems a good compromise between strong compression and multi threading.

vasi · 2013-01-06T09:06:45Z

Hmm, that's a pretty weird use-case, compressing random data should yield no improvement. But it's still a bug, so I'll see if I can reproduce. Can I get the details of the computer on which you experience these bugs? OS, OS version, version of pixz, number of CPUs, and amount of RAM are the important bits.

vasi · 2013-01-06T09:11:04Z

Also, does this happen with just /dev/urandom? Other regular files? Other dev files? What if you copy some data from urandom to a regular file, and then compress that?

A quick test on my Mac can't reproduce the problem, I can try in Linux or FreeBSD or something later.

axel-angel · 2013-01-07T22:47:13Z

I have successfully reproduced the problem with a save with:
dd if=/dev/urandom | tee out | pixz -7 > /dev/null
I have finally reduced to a file of 40 Mio and it was initially 200 Mio but the problem can be reproduced at wish. You can find my computer spec at the end. I have modified the code so it crash dump on the error, I have a core of 323 Mio uncompressed (and 183 Mio in GZIP -9). What should I test, or do with that? Is there any variables or something I can search? I can upload it somewhere too.

If I would have to guess the problem, I would say there is some neaty multi threading issue somewhere, a kind of concurrency interaction. My idea is to add some random sleep in the part it crash and see if it happen less often. What do you think?

It seems this is exclusively the thread 0 that crashes every time, because I see in the stack trace (thnum=0):
_5 0x000000000040683e in encode_thread (thnum=0) at write.c:305
Is that a constant like the CPU id always in the same order? Something like first CPU gets 0, 2nd gets 1, and so on?

I don't have tested with a lot of files except regular files. I would say this problem arises because the entropy of urandom is very high, if not equal to 1. I could test with video, images and compressed files later (if that could help).

My specs: https://gist.github.com/4479221

vasi · 2013-01-07T23:04:06Z

Hmm, the core dump won't help me unless I have the debug binaries. Could you get a backtrace for me? If you don't know how to do that, I can give you instructions :)

Also, if you can upload your 'out' file that causes the crash somewhere, that would be great.

axel-angel · 2013-01-07T23:05:29Z

Here is the backtrace: https://gist.github.com/4479415
Please note that I have added an assert in the die() so the program will dump its core.

Edit: It seems the thnum may be sometimes not 0, it was 1 a few times.

axel-angel · 2013-01-07T23:15:27Z

If this can help, here is a dump of some variables in the scope of the die. I don't know but I hope this may give you some insight about the problem: https://gist.github.com/4479481

vasi · 2013-01-07T23:57:33Z

Oooh, this is interesting, thanks. It's definitely not a concurrency problem. It looks like somehow we're not allocating enough space for the output, but that really shouldn't happen.

I would really appreciate if you could upload the data that causes the crash.

axel-angel · 2013-01-08T00:17:11Z

I am unsure where I can upload my file. Any idea? Anyway, you could generate a >50 Mio file with /dev/urandom on your machine and it should cause the problem as well, there doesn't seem to be anything special with my version, it is only a high entropy file.

axel-angel · 2013-01-08T00:31:49Z

It seems my last experiment confirmed the fact that high entropy files may cause the problem. I took the BigBuckBunny movie and compressed it twice and it crashed. I did:
   wget http://download.blender.org/peach/bigbuckbunny_movies/BigBuckBunny_320x180.mp4
   pixz -7 BigBuckBunny_320x180.mp4
   pixz -7 BigBuckBunny_320x180.mp4.xz << Crashed and wrote 12 octets of the output

vasi · 2013-01-08T00:37:32Z

Yay, I can reproduce the bug now! Thanks :) I'll see what I can figure out.

vasi · 2013-01-08T11:45:18Z

Whew, that was a doozy of a bug! Thanks so much for your help finding the bug and tracking it down. In my limited testing, incompressible input now works ok, and normal compressible input continues to work. It would be great if you could test a bit too, and let me know whether it's working for you.

Here's a detailed explanation of what went wrong. Normally, each pixz compression thread works like this pseudocode:

setup_compression();
while (get_input_block()) {
  allocate_output_space(lzma_block_buffer_bound(input_size()));
  do_compression();
}
cleanup_compression();

Two important notes about this:

We only do setup once, not for every block. This saves time!
We know how much space to allocate using a function lzma_block_buffer_bound() from the liblzma API.

It turns out that lzma_block_buffer_bound() works correctly only in very particular conditions. It assumes that when the output gets too big, compression switches to a special "incompressible data" mode. Unfortunately, this special mode only activates when we do compression with a particular function. More unfortunately, using that function doesn't allow us to do the only-setup-once technique.

So I've emailed the liblzma author, asking him to fix this situation. And in the meantime, I've implemented the special mode in pixz. It's ugly to have such low-level code there, but it wasn't too hard to implement.

axel-angel · 2013-01-08T20:43:31Z

Sorry for my wrong guess about multi-threading, I hope it didn't make you loose time. I'm very glad that I was of any help to you and pixz. I have tried to compress it more than 5 time and it seems the problem is now fixed. I have compressed it with pixz -7 then decompressed it with unxz and checked the md5sum at the end (correct). I tested with /dev/urandom too and all seem fine now. Good job.

Your code looks involved so I cannot check its correctness. I would need to study liblzma and your program too, sorry.

vasi · 2013-01-08T23:30:34Z

No need to be sorry, you did great. Closing :)

vasi closed this as completed Jan 8, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error encoding block when compressing /dev/urandom #10

Error encoding block when compressing /dev/urandom #10

axel-angel commented Jan 5, 2013

vasi commented Jan 6, 2013

vasi commented Jan 6, 2013

axel-angel commented Jan 7, 2013

vasi commented Jan 7, 2013

axel-angel commented Jan 7, 2013

axel-angel commented Jan 7, 2013

vasi commented Jan 7, 2013

axel-angel commented Jan 8, 2013

axel-angel commented Jan 8, 2013

vasi commented Jan 8, 2013

vasi commented Jan 8, 2013

axel-angel commented Jan 8, 2013

vasi commented Jan 8, 2013

Error encoding block when compressing /dev/urandom #10

Error encoding block when compressing /dev/urandom #10

Comments

axel-angel commented Jan 5, 2013

vasi commented Jan 6, 2013

vasi commented Jan 6, 2013

axel-angel commented Jan 7, 2013

vasi commented Jan 7, 2013

axel-angel commented Jan 7, 2013

axel-angel commented Jan 7, 2013

vasi commented Jan 7, 2013

axel-angel commented Jan 8, 2013

axel-angel commented Jan 8, 2013

vasi commented Jan 8, 2013

vasi commented Jan 8, 2013

axel-angel commented Jan 8, 2013

vasi commented Jan 8, 2013