New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid comment length #48

Closed
dtjohnson opened this Issue Dec 12, 2016 · 8 comments

Comments

Projects
None yet
3 participants
@dtjohnson

dtjohnson commented Dec 12, 2016

I'm having trouble unzipping a zip file uploaded by a user. The zip opens fine in any other unzip software I've tried. The error I'm getting is:

Error: invalid comment length. expected: 12298. found: 0
    at /usr/src/app/node_modules/yauzl/index.js:125:25
    at /usr/src/app/node_modules/yauzl/index.js:539:5
    at /usr/src/app/node_modules/fd-slicer/index.js:32:7
    at FSReqWrap.wrapper [as oncomplete] (fs.js:681:17)

If I comment out line 125 of index.js where the error is thrown, the file does seem to unzip properly. Any thoughts?

@thejoshwolfe

This comment has been minimized.

Owner

thejoshwolfe commented Dec 12, 2016

Can you give me the zipfile (or a zipfile) that reproduces the problem?

@dtjohnson

This comment has been minimized.

dtjohnson commented Dec 12, 2016

Sure, but I'd rather not share the zip publicly. Would you mind sending me an email and I'll send it to you? (My email is on my GitHub page.)

@dtjohnson

This comment has been minimized.

dtjohnson commented Dec 12, 2016

Just sent to you via email. It was generated with some .NET library. I can get more details if helpful.

@thejoshwolfe

This comment has been minimized.

Owner

thejoshwolfe commented Dec 12, 2016

The file you sent me looks like it got an html documented concatenated to the end of it. I'm not sure what the html page is, but it looks like it's got an option to download a zipfile in it. I isolated the html document and emailed it to you.

I'm not sure if your zipfile creator did that on purpose or if it's a bug, but including an html page at the end of a zipfile certainly seems strange to me, especially when the document makes img and script references to external sources. I suspect your http server erroneously concatenated some html content at the end of a zipfile download.

So why does yauzl reject the file when others accept it? Here's a tldr: the zipfile is malformed, and yauzl is more strictly standards compliant than most/all other zipfile readers. For this particular problem, yauzl is being very picky in an attempt to avoid a specific problem that arises from a design flaw in the zipfile spec.

The following is a technical description of exactly what's wrong with the zipfile, a justification for yauzl's handling of the situation, and why another zipfile reader (Info-ZIP's unzip command) tolerates the malformed zipflie.

The .zip file specification is flawed

The high-level structure of a .zip file dictates that a reader must first look for the End of Central Directory Record, which is located at the very end of a .zip file. The final field of the End of Central Directory Record is a variable-length comment. The length of the comment is recorded in a field in the End of Central Directory Record before the comment itself.

Here's a diagram to emphasize how flawed this design is:

| Entire .zip file                                             |
... | End of Central Directory Record                          |
... | magic number | some stuff | comment length | comment ... |

^^^   ^^^^^^^^^^^^                                         ^^^
|     |                                                      |
|     we're looking for this                                 |
|                                                            |
could be any size                            could be any size

We can't find the magic number from the beginning, because it could be any distance into the zipfile from the beginning. We can't find the magic number from the end, because it could be any distance into the zipfile from the end (up to 32KB). The defining characteristic of a zipfile is the magic number in the End of Central Directory Record, which is located in the middle of the file.

The only way to find the End of Central Directory Record is to do a linear search backwards from the end of the file, but even that is not guaranteed to find it. This is because the comment itself can be anything; it can be any bytes; it can even contain the magic number we're looking for. This means that literally the only way to find the End of Central Directory Record is to use heuristics to guess where the zipfile creator meant for it to be. The .zip file specification is ambiguous; it is flawed.

A simple fix to the spec would have been to forbid the magic number from appearing in the comment (or to move the comment to before the End of Central Directory Record, or to remove the comment entirely).

yauzl's behavior

yauzl searches backwards for the magic number, and once it is found, yauzl does some additional checking to make sure this magic number is actually part of an End of Central Directory Record. This check involves sanity checking all the fields in the End of Central Directory Record, including verifying that the comment length field is correct. If any of the fields look fishy, the zipfile is rejected.

Info-ZIP's unzip's behavior

The commandline program unzip made by Info-ZIP is more trusting than yauzl. unzip searches backwards for the magic number, just like yauzl. This search correctly assumes that the End of Central Directory Record might include a comment of any length. However, after finding the magic number, unzip does not check if the comment length correctly describes the comment that must follow. Rather, unzip only checks to make sure the comment length is small enough to not cause an out-of-bounds read beyond the end of the zip file. This means that unzip tolerates arbitrary data append to the end of a zipfile without even so much as a warning. The zip file spec does not allow this arbitrary data, so unzip is non-conformantly tolerating this data.

Your zip file

The document at the end of your zip file is less than 32KB in size (actual size is ~16KB), and being put at the end means that zipfile readers will search past it when looking for the magic number. As stated above, unzip will simply ignore that extra data, but yauzl requires that the End of Central Directory Record confirms that that arbitrary data is actually the comment field. Being a comment is the only justification for why a zip file reader should ignore it, so yauzl is being more conformant by rejecting it.

There's something unsettling about this situation, which is that the document just happens to be less than 32KB, which is the maximum length of a zip file comment. If the document were larger than 32KB, the zipfile would be rejected by any zipfile reader that only checks for possible sized comments. I thought that perhaps unzip would be such a program, but actually I determined through experimentation that unzip will only ignore 67749 bytes at the end of a zipfile. I have no idea why that's the threshold, but if your html document had been about 5.5x larger than it is, then even unzip would reject your file.

What to do now

The ideal solution to this problem is obviously for your zipfile creator to create well formed zipfiles. If perchance the html document were intentionally appended to the end of the zipfile, then the zipfile creator should set the comment length field in the End of Central Directory Record to represent this (and check to make sure the document is small enough to fit).

If all of that is beyond your control to change, then it's possible to make a last-minute repair to the zipfile by deleting the bytes from the end of the zipfile. To do this, search backwards from the end for the magic number (see the spec's mention of "central file header signature"), and delete everything that follows the End of Central Directory Record.

@dtjohnson

This comment has been minimized.

dtjohnson commented Dec 13, 2016

Wow. Thanks so much for the very thorough analysis and explanation. I'm sure the extra data at the end of the zip is due to some bug from the zip generation code the creator uses. I'll push on them to fix their bug but I'm not very confident they will fix it quickly...

@JayXon

This comment has been minimized.

JayXon commented Dec 13, 2016

The maximum comment length should be 65535 (64K-1 or 0xffff) because the field is unsigned.

@thejoshwolfe

This comment has been minimized.

Owner

thejoshwolfe commented Dec 13, 2016

The maximum comment length should be 65535 (64K-1 or 0xffff) because the field is unsigned.

Oh right. That's how the code is written; I forgot it was unsigned in the above comment. Thanks for the correction.

EDIT: Actually, I guess my code is off by 1. Oops. So I'm reading 1 more byte than possibly necessary in the first read, and some zipfiles will be rejected with a different error message than you might expect. I'll fix that in the next release.

@dtjohnson

This comment has been minimized.

dtjohnson commented Dec 14, 2016

I confirmed that the creator of the zip does indeed have a bug in their code. It will take them time to fix it so for now I'm just ignoring this comment length error. Thanks for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment