Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error encoding block when compressing /dev/urandom #10

Closed
axel-angel opened this issue Jan 5, 2013 · 13 comments
Closed

Error encoding block when compressing /dev/urandom #10

axel-angel opened this issue Jan 5, 2013 · 13 comments

Comments

@axel-angel
Copy link

When compressing my /dev/urandom for testing I always get this error eventually. It comes something like 15 seconds after starting. It won't reproduce with XZ or PXZ. I don't have the time to dig further but it seems like a bug to me. I can reproduce every time.

I took the last version on the repo and used:
dd if=/dev/urandom | pixz -7 > /dev/null

I can compress without errors with -6. The option -8 raises the error too. I can upload here a test file but it's quite bug: 224 Mio. I'm using -7 because it seems a good compromise between strong compression and multi threading.

@vasi
Copy link
Owner

vasi commented Jan 6, 2013

Hmm, that's a pretty weird use-case, compressing random data should yield no improvement. But it's still a bug, so I'll see if I can reproduce. Can I get the details of the computer on which you experience these bugs? OS, OS version, version of pixz, number of CPUs, and amount of RAM are the important bits.

@vasi
Copy link
Owner

vasi commented Jan 6, 2013

Also, does this happen with just /dev/urandom? Other regular files? Other dev files? What if you copy some data from urandom to a regular file, and then compress that?

A quick test on my Mac can't reproduce the problem, I can try in Linux or FreeBSD or something later.

@axel-angel
Copy link
Author

I have successfully reproduced the problem with a save with:
   dd if=/dev/urandom | tee out | pixz -7 > /dev/null
I have finally reduced to a file of 40 Mio and it was initially 200 Mio but the problem can be reproduced at wish. You can find my computer spec at the end. I have modified the code so it crash dump on the error, I have a core of 323 Mio uncompressed (and 183 Mio in GZIP -9). What should I test, or do with that? Is there any variables or something I can search? I can upload it somewhere too.

If I would have to guess the problem, I would say there is some neaty multi threading issue somewhere, a kind of concurrency interaction. My idea is to add some random sleep in the part it crash and see if it happen less often. What do you think?

It seems this is exclusively the thread 0 that crashes every time, because I see in the stack trace (thnum=0):
   _5 0x000000000040683e in encode_thread (thnum=0) at write.c:305
Is that a constant like the CPU id always in the same order? Something like first CPU gets 0, 2nd gets 1, and so on?

I don't have tested with a lot of files except regular files. I would say this problem arises because the entropy of urandom is very high, if not equal to 1. I could test with video, images and compressed files later (if that could help).

My specs: https://gist.github.com/4479221

@vasi
Copy link
Owner

vasi commented Jan 7, 2013

Hmm, the core dump won't help me unless I have the debug binaries. Could you get a backtrace for me? If you don't know how to do that, I can give you instructions :)

Also, if you can upload your 'out' file that causes the crash somewhere, that would be great.

@axel-angel
Copy link
Author

Here is the backtrace: https://gist.github.com/4479415
Please note that I have added an assert in the die() so the program will dump its core.

Edit: It seems the thnum may be sometimes not 0, it was 1 a few times.

@axel-angel
Copy link
Author

If this can help, here is a dump of some variables in the scope of the die. I don't know but I hope this may give you some insight about the problem: https://gist.github.com/4479481

@vasi
Copy link
Owner

vasi commented Jan 7, 2013

Oooh, this is interesting, thanks. It's definitely not a concurrency problem. It looks like somehow we're not allocating enough space for the output, but that really shouldn't happen.

I would really appreciate if you could upload the data that causes the crash.

@axel-angel
Copy link
Author

I am unsure where I can upload my file. Any idea? Anyway, you could generate a >50 Mio file with /dev/urandom on your machine and it should cause the problem as well, there doesn't seem to be anything special with my version, it is only a high entropy file.

@axel-angel
Copy link
Author

It seems my last experiment confirmed the fact that high entropy files may cause the problem. I took the BigBuckBunny movie and compressed it twice and it crashed. I did:
   wget http://download.blender.org/peach/bigbuckbunny_movies/BigBuckBunny_320x180.mp4
   pixz -7 BigBuckBunny_320x180.mp4
   pixz -7 BigBuckBunny_320x180.mp4.xz << Crashed and wrote 12 octets of the output

@vasi
Copy link
Owner

vasi commented Jan 8, 2013

Yay, I can reproduce the bug now! Thanks :) I'll see what I can figure out.

@vasi
Copy link
Owner

vasi commented Jan 8, 2013

Whew, that was a doozy of a bug! Thanks so much for your help finding the bug and tracking it down. In my limited testing, incompressible input now works ok, and normal compressible input continues to work. It would be great if you could test a bit too, and let me know whether it's working for you.

Here's a detailed explanation of what went wrong. Normally, each pixz compression thread works like this pseudocode:

setup_compression();
while (get_input_block()) {
  allocate_output_space(lzma_block_buffer_bound(input_size()));
  do_compression();
}
cleanup_compression();

Two important notes about this:

  1. We only do setup once, not for every block. This saves time!
  2. We know how much space to allocate using a function lzma_block_buffer_bound() from the liblzma API.

It turns out that lzma_block_buffer_bound() works correctly only in very particular conditions. It assumes that when the output gets too big, compression switches to a special "incompressible data" mode. Unfortunately, this special mode only activates when we do compression with a particular function. More unfortunately, using that function doesn't allow us to do the only-setup-once technique.

So I've emailed the liblzma author, asking him to fix this situation. And in the meantime, I've implemented the special mode in pixz. It's ugly to have such low-level code there, but it wasn't too hard to implement.

@axel-angel
Copy link
Author

Sorry for my wrong guess about multi-threading, I hope it didn't make you loose time. I'm very glad that I was of any help to you and pixz. I have tried to compress it more than 5 time and it seems the problem is now fixed. I have compressed it with pixz -7 then decompressed it with unxz and checked the md5sum at the end (correct). I tested with /dev/urandom too and all seem fine now. Good job.

Your code looks involved so I cannot check its correctness. I would need to study liblzma and your program too, sorry.

@vasi
Copy link
Owner

vasi commented Jan 8, 2013

No need to be sorry, you did great. Closing :)

@vasi vasi closed this as completed Jan 8, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants