Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Invalid comment length #48
I'm having trouble unzipping a zip file uploaded by a user. The zip opens fine in any other unzip software I've tried. The error I'm getting is:
If I comment out line 125 of index.js where the error is thrown, the file does seem to unzip properly. Any thoughts?
The file you sent me looks like it got an html documented concatenated to the end of it. I'm not sure what the html page is, but it looks like it's got an option to download a zipfile in it. I isolated the html document and emailed it to you.
I'm not sure if your zipfile creator did that on purpose or if it's a bug, but including an html page at the end of a zipfile certainly seems strange to me, especially when the document makes img and script references to external sources. I suspect your http server erroneously concatenated some html content at the end of a zipfile download.
So why does yauzl reject the file when others accept it? Here's a tldr: the zipfile is malformed, and yauzl is more strictly standards compliant than most/all other zipfile readers. For this particular problem, yauzl is being very picky in an attempt to avoid a specific problem that arises from a design flaw in the zipfile spec.
The following is a technical description of exactly what's wrong with the zipfile, a justification for yauzl's handling of the situation, and why another zipfile reader (Info-ZIP's
The .zip file specification is flawed
The high-level structure of a .zip file dictates that a reader must first look for the End of Central Directory Record, which is located at the very end of a .zip file. The final field of the End of Central Directory Record is a variable-length comment. The length of the comment is recorded in a field in the End of Central Directory Record before the comment itself.
Here's a diagram to emphasize how flawed this design is:
We can't find the magic number from the beginning, because it could be any distance into the zipfile from the beginning. We can't find the magic number from the end, because it could be any distance into the zipfile from the end (up to 32KB). The defining characteristic of a zipfile is the magic number in the End of Central Directory Record, which is located in the middle of the file.
The only way to find the End of Central Directory Record is to do a linear search backwards from the end of the file, but even that is not guaranteed to find it. This is because the comment itself can be anything; it can be any bytes; it can even contain the magic number we're looking for. This means that literally the only way to find the End of Central Directory Record is to use heuristics to guess where the zipfile creator meant for it to be. The .zip file specification is ambiguous; it is flawed.
A simple fix to the spec would have been to forbid the magic number from appearing in the comment (or to move the comment to before the End of Central Directory Record, or to remove the comment entirely).
yauzl searches backwards for the magic number, and once it is found, yauzl does some additional checking to make sure this magic number is actually part of an End of Central Directory Record. This check involves sanity checking all the fields in the End of Central Directory Record, including verifying that the comment length field is correct. If any of the fields look fishy, the zipfile is rejected.
Wow. Thanks so much for the very thorough analysis and explanation. I'm sure the extra data at the end of the zip is due to some bug from the zip generation code the creator uses. I'll push on them to fix their bug but I'm not very confident they will fix it quickly...
Oh right. That's how the code is written; I forgot it was unsigned in the above comment. Thanks for the correction.
EDIT: Actually, I guess my code is off by 1. Oops. So I'm reading 1 more byte than possibly necessary in the first read, and some zipfiles will be rejected with a different error message than you might expect. I'll fix that in the next release.