Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: invalid central directory file header signature when unzipping big files, zipped by mac os #69

Closed
d-mg opened this issue Jan 8, 2018 · 51 comments

Comments

@d-mg
Copy link

d-mg commented Jan 8, 2018

Using the terminal to zip them there is no problems with unzipping. But when I use the GUI for zipping, and only with bigger files (around 8 GB and up) I get the following error:

Error: invalid central directory file header signature: 0x2d8c9fa4
at /Users/dmg_/projects/test-zipper/node_modules/yauzl/index.js:253:70
at /Users/dmg_/projects/test-zipper/node_modules/yauzl/index.js:622:5
at /Users/dmg_/projects/test-zipper/node_modules/fd-slicer/index.js:32:7
at FSReqWrap.wrapper [as oncomplete] (fs.js:658:17)

Also when running zipinfo I get:

Archive: Archive.zip
Zip file size: 5946143505 bytes, number of entries: 140
warning [Archive.zip]: 4294967296 extra bytes at beginning or within zipfile
(attempting to process anyway)

I did some digging through issues and some searching and I'm pretty sure these extra bytes make it that Yauzl can't find the header of the central directory.

Do you have any clue why GUI Mac OS zipped zipfiles have these extra bytes? I do see it puts some extra '__MACOSX' folders in the zip and suspect the presence of these to have something to do with it, but that's more of a guess.

@thejoshwolfe
Copy link
Owner

Thanks for the report. Is it possible to provide a download for a zip file that reproduces this problem? You mention it happens with very large files, like 8GB, so perhaps hosting a download would be infeasible.

Alternatively, could you provide minimal steps to reproduce the issue? Perhaps something like:

  1. head -c 8000000000 /dev/urandom > file.bin
  2. right click on file.bin, and choose something.

I don't have a Mac, so i'd be forwarding these instructions to a friend who can transfer me an 8GB file easily.

@thejoshwolfe
Copy link
Owner

Do you have any clue why GUI Mac OS zipped zipfiles ...

An easy answer is: yes, I have a clue. The Mac Archive Utility is a piece of trash when it comes to the zip file format. This will be the fourth (I believe) compatibility issue that boils down to Apple getting the zip file format embarrassingly wrong.

I'm not surprised at all that this is an issue. In fact, I previously thought that Archive Utility was simply unable to handle zip files larger than 4GB due to lack of ZIP64 support. Are you sure the zip files that you create are readable by anyone? Like, can the GUI itself read its own files that it creates that are 8GB in size? If so, then there's some strange format at play that isn't ZIP64 format.

@d-mg
Copy link
Author

d-mg commented Jan 12, 2018

Did some more testing and it seems to fail somewhere between 4.19 and 4.33 GB.

Since I'm trying to zip 4k mp4's I used ffmpeg to create files (warning, don't watch them with your volume opened completely) https://www.ffmpeg.org/download.html

this creates a 32 seconds mp4 (4.19 GB)
ffmpeg -f rawvideo -video_size 3840x2160 -pixel_format yuv420p -framerate 25 -i /dev/urandom -ar 48000 -ac 2 -f s16le -i /dev/urandom -c:a aac -t 32 output32secs.mp4

this creates a 33 seconds mp4 (4.33 GB)
ffmpeg -f rawvideo -video_size 3840x2160 -pixel_format yuv420p -framerate 25 -i /dev/urandom -ar 48000 -ac 2 -f s16le -i /dev/urandom -c:a aac -t 33 output33secs.mp4

The 32 seconds one unzips fine with yauzl, the 33 seconds (or any larger) one fails with:
Error: invalid central directory file header signature: XXXXXXXX.

I zip them by simply going into Finder on my Mac, right clicking the file and choosing compress "output33secs.mp4".
The resulting zip file causes the error when unzipping with Yauzl.
Zipping them through the command line zip somezip output33secs.mp4 creates a zip file that Yauzl unzips just fine.

Hope this is enough to reproduce the problem, if not let me know.

@thejoshwolfe
Copy link
Owner

Thanks, I've reproduced the issue. Indeed, the Info-ZIP unzip command line program can't read this file correctly either, so this issue is not specific to yauzl.

I'll do some research and tell report back here. (It may take a few weeks.) I'm curious to see what the hell Archive Utility did that it can read its own malformed zipfile but no one else can.

@thejoshwolfe
Copy link
Owner

I've determined what the issue is. It's pretty much the same problem as #60.

When the Mac Archive Utility needs to write 32-bit values in the metadata, it doesn't check for overflow and simply writes the lower 32 bits of the value. There are 3 values that get corrupted like this in the case of archiving a file that's over 4 GB in size: the "offset of start of central directory with respect to the starting disk number" in the end of central directory record and the "compressed size" and "uncompressed size" fields of the local file header and central directory file header (and data descriptor, but that doesn't matter too much).

This means that the metadata is an outright lie. This is absolutely a bug in Archive Utility, and it's a nasty one too. The correct behavior would have been to use ZIP64 format, but Archive Utility has no idea what that is.

There may or may not be any way to correctly and reliably recover from this corruption. I'm still thinking about it.

@d-mg
Copy link
Author

d-mg commented Jan 22, 2018

Thank you for looking into it. Also I noticed that sometimes the unzip CLI tries and unzips part of the file but they you lose half of the video file and it's completely not reliable. Hope you do find a solution, but in the meantime my workaround is just avoiding the Archive Utility. I did send a report to apple, if they get back to me with any relevant information I will share that here as well.

@thejoshwolfe
Copy link
Owner

Oh. How did you do that? I was unable to find any bug tracker for either Archive Utility or the ditto command line program, which is allegedly the backend for Archive Utility's zipfile functionality.

If Archive Utility's zipfile implementation were open source, i could even make a pull request fixing their bugs, but I didn't find that either.

@d-mg
Copy link
Author

d-mg commented Jan 22, 2018

I reported it here: https://www.apple.com/feedback/macos.html
It's pretty much just sending feedback for macOs, they state that they will look at everything sent in there but that they might not get back to you. It's a long shot but who knows if people start nagging they might fix it...

@overlookmotel
Copy link
Contributor

@d-mg Out of interest, what version of Mac OS did you create the faulty zip file on?

Absolutely bonkers that Mac OS has such a broken ZIP implementation, especially as it's meant to cater for "creatives" who are the most likely people to be zipping up large (e.g. video) files.

@thejoshwolfe Are you saying that Mac OS Archive Utility does not make use of ZIP64 extra fields at all?

@overlookmotel
Copy link
Contributor

overlookmotel commented Mar 31, 2018

If the number of files in the ZIP is not enormous, I guess that "offset of start of central directory with respect to the starting disk number" can be deduced from the total length of the ZIP file and lower 32 bits of the offset recorded by MacOS.

Each central directory file header is max 196654 bytes (46 + 65536 + 65536 + 65536). So if the ZIP contains less than 21000 files (approx), the central directory cannot be bigger than 4GiB. So then the low 32 bits are enough to determine the location of start of the central directory.

That doesn't solve the problem of "compressed size" and "uncompressed size" (and also, presumably, the "relative offset of local file header") fields also being truncated to 32 bit. But I guess you could use the info that is available to step through the file in 4 GiB jumps to check all possible locations for matching local file headers.

Ergh! But do you think this would work?

My work involves receiving a lot of large ZIP files from sources all over the place, and I sadly have no way to prevent them being created with Mac OS Archive Utility (I suspect most are), so I would be motivated to help solve this problem.

I also have a massive collection of large ZIP files from varied sources to test an implementation on.

@d-mg
Copy link
Author

d-mg commented Mar 31, 2018

@overlookmotel I was using High Sierra for this

@thejoshwolfe
Copy link
Owner

Ergh! But do you think this would work?

I developed a plan with a lot of the same ideas you presented to make this work, but it's very difficult to implement. The gist is to track a non-deterministic multistate of possibilities depending on what kind of corruption the zipfile has.

The user would need to explicitly request this complexity for three reasons: I don't want everyone to pay the performance cost of Apple's incompetence/hostility if they don't want to; the user needs to use the library differently when this complexity is enabled; the complexity can change the correct interpretation of a correct zipfile to be incorrect in rare cases.

The Mac Archive Utility behaves predictably, albeit incorrectly, when laying out data in the zipfile, so I can check if the zipfile is deviating from the Archive Utility-style. If we know we aren't dealing with Archive Utility, we can drop the complexity and behave normally. However, we can't be certain that we are dealing with Archive Utility until we verify that one of these overflow corruptions has happened.

Each case of corruption can be detected heuristically by seeking to the locations that would cause the overflow to give us the value we see, and checking the zipfile data structure signatures. This is not a perfect solution, because there may be a valid zipfile that perfectly mimics the Archive Utility corruption. That's the main problem with this Archive Utility bug is that it creates "valid" zipfiles with the wrong content. So whether we should assume that it's corrupted or correct is a guess, and that's one of the reasons why the user would have to explicitly ask for this feature to be enabled.

The probability of this "false positive" detection happening randomly is low, so it's not the worst thing about this idea.

The worst part is that this it overall sucks. It's really complicated, it's going to negatively impact performance at least a little bit for everyone, and it's all due to flagrant incompetence/hostility at Apple being widely adopted. My motivation is really the limiting factor for implementing a fix for this, but hearing people say that this is important to them really helps. 👍

@thejoshwolfe Are you saying that Mac OS Archive Utility does not make use of ZIP64 extra fields at all?

Yes. I haven't seen the source code, because I believe it's closed source, but all of my experience and experimentation with it suggest that the implementation is oblivious to the ZIP64 format entirely, and apparently also oblivious to the concept of integer overflow and bounds checking.

@overlookmotel
Copy link
Contributor

overlookmotel commented Apr 1, 2018

@thejoshwolfe Thanks for your lengthy reply.

I agree that it totally sucks! But, at least for me, it would be really useful to resolve.

Complexity: To my mind, this is the biggest downside. It would make the code considerably more complex. I imagine it would likely require a refactor to split different steps of file parsing into separate functions, so functions can call themselves recursively as they try different possibilities. That's a lot of code churn.

Performance: As far as I can see, I doubt that it would impact performance in any significant way for non-Apple ZIP files. The workarounds for the Apple case, I think, only need kick in if a valid Central Directory Record cannot be found in the location it should be. So the execution path for non-Apple archives could remain the same.

False positives: I hope possibility of false positives could be reduced to zero, or at least to an infinitesimal level. I hope (but don't know yet) that Archive Utility ZIP files can be "fingerprinted" in some way. For example, maybe they always set the "version made by" field to a particular value which no-one else uses. If so, then the wacky heuristics could be disabled if it's not an Archive Utility file - so anything not made by Archive Utility would behave as before, and the only files which would be parsed differently would be ones which at present yauzl can't deal with anyway.

Explicit enabling: I agree this should have to be explicitly enabled by the user.

I would like to do some digging on this and see what I can find.

In principle, if (1) I can come up with something half-sensible which doesn't affect performance significantly or risk screwing up valid ZIP files and (2) am willing to do a lot of the work on implementation, would you likely be willing to accept the downside of additional code complexity in order to support Archive Utility files? (I totally understand if you don't)

@overlookmotel
Copy link
Contributor

@d-mg Thanks for letting me know.

@thejoshwolfe
Copy link
Owner

would you likely be willing to accept ...

To be honest, since you asked, probably not. The effort to thoroughly review a pull request of this magnitude would be on par with authoring the code from scratch. So I'd rather just write it myself, if that makes sense. There's an awful lot of careful design and consideration that needs to be done to ensure this change doesn't regress anything already in place, such as performance and API-level corner case behavior (neither of which are very well tested automatically here).

But thanks for offering. Your enthusiasm is becoming my enthusiasm, which is the currency that funds this project. I'll revisit this issue in a few weeks after I work through some major non-software issues I'm in the middle of at the moment. Stay tuned for updates.

@overlookmotel
Copy link
Contributor

overlookmotel commented Apr 2, 2018

OK, fair enough. I've got really interested in this, and was looking forward to getting my hands dirty, but hey! I really appreciate it that you're willing to implement.

In case it's useful, I did a little digging into how Archive Utility structures ZIP files.

Structure of ZIP files

Structure:

  • Files are included in body of ZIP file in same order as they are listed in Central Directory
  • 1st Local File Header located at byte 0 in the ZIP file
  • File data always followed by Data Descriptor signature 0x08074b50, then CRC32 & compressed size & uncompressed size (all same as in the CDR - i.e. sometimes incorrect)
  • File data, CDR and EoCDR are all contiguous - no gaps in between

Fingerprinting:

  • End of Central Directory Record comment field is length 0
  • "Version made by" for all files = 789
  • "Version needed to extract" for all files = 20
  • Extra fields always contains an entry with ID 22613 (values differ, and no idea what this is)
  • "General purpose bit flag" always = 8
  • "Compression method" always = 8
  • In all Local File Headers: CRC32 & compressed size & uncompressed size all = 0
  • Filename in each Local File Header matches filename in CDR
  • Sometimes (but not always) Mac OS resource forks (path __MACOSX/._<filename>) are included as an extra file, following the file itself

All of the above hold true for both ZIP files created by graphical Archive Utility and command line ditto. I can't find any obvious preferences or other options which change these behaviors.

Proviso: I've not checked that many files, and this is all on one machine running Mac OS Sierra 10.12.6. It's possible behavior may vary on different versions.

What does this mean?

The consistencies in structure should make the file easier/more reliable to parse than it might otherwise be:

File Header 1 is at byte 0. File Header 2 is immediately after end of file 1 Data Descriptor, and so on. So to get the correct locations and compressed lengths of each file, it should only be necessary to seek for the Data Descriptor for each file, firstly in location where it should be, and then in 4GiB steps onward from there until the Data Descriptor is found. Data Descriptor is 16 bytes, so risk of false positives is very small.

Of course, there's no way of knowing the size of uncompressed data without uncompressing it.

Also, hopefully some of these attributes can be used to "fingerprint" Archive Utility's ZIP files.

Overall, this makes me hopeful that implementing this will not be too much of a nightmare, and should be pretty robust in terms of avoiding screwing up valid files.

What I'm going to do next

I have a large collection of ZIP files from a bunch of different sources (by accident - just haven't had time to clear them off my drives). Many of these will no doubt have been created with Archive Utility, and many are 4GB+. I also have the unzipped contents of each (unzipped with various different tools). In many cases the unzipped files are in formats which have built-in checksums so I can be confident they've been unzipped correctly.

I plan to run yauzl on all of these files and see what it fails on. That'll give us a corpus of files to test the new version on once Archive Utility workarounds have been implemented.

I'll let you know where I get to.

NB I have no idea what the "major non-software issues" you are facing at the moment may be. But just to say, please don't feel the need to reply. I appreciate the attention you've given this issue, and it can wait until you're back in action.

@overlookmotel
Copy link
Contributor

I've done some work on this.

Implementation

I'm afraid I couldn't resist getting into it fully. I've made an implementation that parses faulty Mac Archive Utility ZIP files. It is here: https://github.com/overlookmotel/yauzl-mac

It's not published on npm as I intended it as a proof of concept and testing ground for seeing if this could work. Obviously I hope it'll be useful to you in adding an implementation of this feature to yauzl proper.

My implementation is very hacky as most of yauzl's guts are not exposed on the public API. So, for example, it has to use events-intercept to capture emitted 'entry' events and do some more work (locating the end of the file) before re-emitting the event.

The guts of it are in:

  • lib/open.js - locating the start of the Central Directory
  • lib/readEntry.js - locating the end point of each file for files over 4 GiB in size
  • lib/openReadStream.js - validation of uncompressed size only to lower 32 bits of what's stated

It depends on a change to yauzl which exposes more details about the Central Directory on ZipFile objects which I've submitted as PR #78.

The modifications don't do anything unless .open() is called with supportMacArchiveUtility option set.

Testing

I've tested it out on a collection of 331 random zip files. Most are large (10GB+), and about 150 of them are Mac Archive Utility-created ZIPs which yauzl was not able to unzip.

With the implementation linked to above, it can unzip all of the files except 3. All of these I've compared to the result from unzipping with command line unzip, ditto and unar (had to use different programs to successfully unzip different files) and they all come out identically.

Of the 3 failing ZIP files, 2 are seemingly Mac Archive Utility ZIP files, but they have no Central Directory at all. The files end straight after the last Data Descriptor. These are fully corrupt and I don't think yauzl should attempt to unzip them. I don't know how these files were made but I wonder if the zipping process errored out before finishing. Weird!

The last failing file is a strange case. It's not a Mac Archive Utility file, but I haven't figured out what's wrong with it yet.

The vast majority of the files I've tested are ZIPs containing less than 10 files. However, this implementation also successfully extracts all the files from the ZIP file linked to in #60 which has 71,609 files.

Please let me know what you think when you get a chance.

@thejoshwolfe
Copy link
Owner

@overlookmotel awesome! I took a look at your https://github.com/overlookmotel/yauzl-mac , but I didn't quite get to a point where I understand how it works.

Those test results are pretty impressive.

The files end straight after the last Data Descriptor.

Out of curiosity, how are you examining the corrupt zipfiles to determine things like this? I've been using a tool I made (that's undergoing an overhaul and translation into a native language here: https://github.com/thejoshwolfe/hexdump-zip ). Are you just using a general hexdump/viewer or are you using something specifically for zipfiles?

@overlookmotel
Copy link
Contributor

Thanks!

As you said, Archive Utility is faulty, but faulty in an entirely predictable way. So actually, it wasn't hard to get it to parse the faulty files.

My implementation is more complex than it should be only because it's doing all kinds of weird things to monkey-patch yauzl from the outside. It'd be much cleaner implementing inside yauzl. If you're having trouble disentangling how my code works, please just shout.

Out of curiosity, how are you examining the corrupt zipfiles?

I was just using a file viewer (HexFiend on Mac OS). That's a pain, obviously, but I've not had to do much of it. It's easy to spot that the Data Descriptor is at the end of the file as it's so short.

hexdump-zip looks good. I'll have a play (once I've got my head around installing zig).

@overlookmotel
Copy link
Contributor

overlookmotel commented Apr 24, 2018

By the way, if you'd prefer I release the Archive Utility parsing capability as a separate npm module, I'd be happy to do that. Or maybe that could be a temporary stopgap if you're not going to have time for this in near future?

If so, could you possibly merge #78 to make that possible?

@thejoshwolfe
Copy link
Owner

Sorry for the delay. I'm going to work on this issue next week.

It's easy to spot that the Data Descriptor is at the end of the file as it's so short.

I discovered that Archive Utility does not consistently put Data Descriptors at the end of file contents. I can't see a pattern for when they are there and when they aren't.

once I've got my head around installing zig

A lot of effort has gone into making this painless for Linux and Windows. Unfortunately Mac is the difficult OS to get a Zig compiler on. If you have one of those other OS's, it might be worth downloading the precompiled, dependency-less binary that's auto built from zig's master branch here: https://ziglang.org/download/

@overlookmotel
Copy link
Contributor

overlookmotel commented Apr 27, 2018

Sorry for the delay. I'm going to work on this issue next week.

No worries!

I discovered that Archive Utility does not consistently put Data Descriptors at the end of file contents. I can't see a pattern for when they are there and when they aren't.

The pattern I've seen is:

  • Files are compressed with deflate and do have Data Descriptor.
  • Folders are flagged as not compressed (but have size 0 anyway) and do not have Data Descriptor. Their paths end with /.
  • Symlinks are not compressed and do not have Data Descriptor.

Symlinks are stored like a file with the file content being the path the link points to.

Local File Headers:

  • Files & folders have Extra Fields length of 16
  • Symlinks have no Extra Fields

Central Directory:

  • Files & folders have Extra Fields length of 12
  • Symlinks have no Extra Fields

Weird that the Extra Fields in the Local File Header and Central Directory don't match!

I haven't found any Archive Utility zip files that don't fit this pattern (apart from the 2 I mentioned which have no Central Directory at all).

Have you seen files which don't fit this pattern?

@overlookmotel
Copy link
Contributor

A lot of effort has gone into making this painless for Linux and Windows. Unfortunately Mac is the difficult OS to get a Zig compiler on. If you have one of those other OS's, it might be worth downloading the precompiled, dependency-less binary that's auto built from zig's master branch here: https://ziglang.org/download/

Ah bugger. I am Mac!

@andrewrk
Copy link
Collaborator

@thejoshwolfe you can cross compile a macos hexdump-zip binary like this:

zig build-exe --library c src/hexdump-zip.zig --target-os macosx --target-arch x86_64

I just tried it, it works. Here it is
hexdump-zip.zip

@overlookmotel
Copy link
Contributor

overlookmotel commented Apr 27, 2018

@andrewrk Unfortunately I can't run that on Mac OS Sierra 10.12.6. It errors. But this isn't the right issue to list the errors etc. Where is?

@thejoshwolfe
Copy link
Owner

Extra fields always contains an entry with ID 22613 (values differ, and no idea what this is)

This is mentioned in the spec:

0x5855        Info-ZIP UNIX (original, also OS/2, NT, etc)

This is a struct that stores some unix file info, like mtime, atime, and uid.

@BenV
Copy link

BenV commented Oct 1, 2018

Hey all, I was wondering if there was any additional work planned on this problem? We are running into a similar issue with these ZIPs from the Archive utility, and it would be great if there was a way to extract them. Happy to help out in any way.

@overlookmotel
Copy link
Contributor

Have released this on npm now - yauzl-mac.

Please let me know how you get on.

@overlookmotel
Copy link
Contributor

@thejoshwolfe Would you please consider merging #78? At present yauzl-mac has to depend on a fork of yauzl to access some details of the central directory that it needs to parse the file. Obviously it'd be ideal if could depend on yauzl itself instead instead.

@BenV
Copy link

BenV commented Oct 4, 2018

@overlookmotel Thanks so much for publishing this to npm, I had been following your efforts and planned on using your module as a stop-gap until this library officially added support. I will plan on testing this out hopefully this week and let you know how it goes. I have high hopes since it sounds like you've done pretty extensive testing already.

@thejoshwolfe
Copy link
Owner

@overlookmotel I gotta admit https://github.com/overlookmotel/yauzl-mac is really hard to read. I wonder if just forking yauzl would have made more sense. Ah well.

I believe this is the set of fields we have to check for overflow:

  • offset of start of central directory with respect to the starting disk number - when the total compressed size of the files is over 4GiB.
  • size of the central directory - when the central directory entry metadata including file names is over 4 GiB.
  • total number of entries in the central directory - when there's more than 0xffff entries.
  • relative offset of local header - when there's at least 4GiB of file metadata before a local file header.

Did I miss any?

The minSize/maxSize calculation you're doing is intriguing, but I'm worried that it's just too complicated for yauzl. If we're going to heuristically support Mac Archive Utility zipfiles, we should enable it by default, and it can't be perfect. There's only so many guards that we can put in to prevent falsely identifying a Mac zipfile, and the min/max heuristic might not be worth the benefit.

One thing I'd really like to get going with this is the ability to test code without lugging around enormous zip files. I've come up with a plan for a test harness that should fit reasonably into git and hopefully be performant enough to run by default with npm test.

I didn't make as much progress on this tonight as I was hoping, but I'll continue work on this shortly.

@overlookmotel
Copy link
Contributor

overlookmotel commented Oct 5, 2018

Yes, I'm sorry it is hard to read.

Probably I should have just forked yauzl. However, I generally avoid forks as then there's a burden of keeping it up to date, and you'd made it clear that you wouldn't accept a PR as you wanted to implement yourself (fair enough). Really I was just doing it because I got interested and wanted to solve the problem for the sake of solving it, so I did it as code external to yauzl.

I think the fields you list above are the only ones that need checking - except a few others if you try to fingerprint Mac OS ZIP files.

My logic with the fingerprinting of Mac OS Archive files is:

  • Mac ZIPs are corrupt
  • Parsing them involves trial-and-error (look for central directory and file headers in several places)
  • That trial-and-error means more IO calls. You want to avoid that if you can see it's not necessary because you can see it's not a Mac ZIP.
  • It's possible (though vanishingly small probability) that a properly made ZIP file could be incorrectly parsed due to trying to treat it as a Mac ZIP. I wanted to make damn sure that supporting Mac ZIPs doesn't in any way impact on any other ZIP files which are correctly formed.

The minSize/maxSize logic is perhaps over-complicated. Just so you understand my motivation: I am primarily using yauzl on ZIP files which are not stored locally, so every IO call requires an HTTP request. Therefore the cost of trial-and-error searching imposes more cost than it would for the average user, and I wanted to keep it to absolute minimum.

It'd be completely legitimate for you to decide that the complexity vs performance trade-off isn't worth it for the common use case and to skip this.

Testing: Yes, I've struggled to come up with a good way of writing tests without having to have massive ZIP files to test it on. Very intrigued as to what solution you've come up with!

Just to say, I am more than happy to help with this. I would happily turn my implementation into a PR, but you've said you don't want that. But if you'd like a 2nd pair of eyes on your implementation, or to talk you through my spaghetti code, please just say.

@overlookmotel
Copy link
Contributor

Also, when you do have a working implementation, I'd be happy to run it on my large collection of ZIPs to flush out any bugs.

@overlookmotel
Copy link
Contributor

overlookmotel commented Oct 5, 2018

@BenV Just so you know, yesterday I rebased PR #78 (which yauzl-mac depends on) onto latest yauzl master, and I haven't tested again since then.

I think it's unlikely, but it's possible that might have introduced a bug. If it has, it'll be a syntax error / total crash kind of bug, not a nasty insidious one that wrecks your ZIP files - so don't panic! But if you find any problems please post an issue and I'll fix asap.

@BenV
Copy link

BenV commented Oct 5, 2018

@overlookmotel I attempted to use your library, but ran into some problems due to the fact that it is reading entries before I have a chance to install my own error handler.

We have some other invalid zips that have absolute paths and we want to ignore those entries, and unfortunately that no longer works when I use yauzl-mac since the absolute path errors are thrown before I have a chance to install my error handler.

I'll play around with it a bit more and see if I can come up with a workable solution, but if you have any ideas of what I should do to move forward it would be appreciated.

@BenV
Copy link

BenV commented Oct 5, 2018

@overlookmotel I was able to get things working by adding a hacky errorFilter option into a yauzl fork that allowed me to ignore certain errors and automatically call readEntry again, which prevents the error event from being emitted when I don't want it to (for absolute/relative paths), so I now have a yauzl-mac library that has all the fixes I need in it. I'll let you know if I run into any issues.

@overlookmotel
Copy link
Contributor

Hi @BenV.

Could you possibly raise this as an issue at https://github.com/overlookmotel/yauzl-mac/issues ? I don't quite understand what you're saying, so keen to ask you questions and fix it. But don't want to crowd this thread with talk about something not directly related to yauzl.

@thejoshwolfe
Copy link
Owner

Very intrigued as to what solution you've come up with!

See #92

@overlookmotel
Copy link
Contributor

For anyone facing same problems with Mac OS ZIP files, please see yauzl-promise.

v4, just published, is a from-scratch reimplementation of yauzl with a promise-based API, and supports Mac OS ZIP files.

@thejoshwolfe
Copy link
Owner

Sorry yall. This issue completely killed my enthusiasm for this project. I have finally returned, but the cost was that i'm never going to support Mac Archive Utility corrupt zip files.

Please let me know if anyone finds a way to file a bug report against Apple, and will advise people to upvote it or something.

@thejoshwolfe thejoshwolfe closed this as not planned Won't fix, can't repro, duplicate, stale Feb 15, 2024
@overlookmotel
Copy link
Contributor

overlookmotel commented Feb 15, 2024

Hi @thejoshwolfe. I'm really sorry. I feel responsible.

In the intervening time, I've published a fork yauzl-promise which does support Mac OS faulty zips, but also changes API to one based on promises, and makes some other changes. That's gained a bit of traction, but the download numbers are completely dwarfed by those of your original yauzl. So I'm sure many will prefer the original API, and will massively appreciate you being back on the scene.

Again, sorry I pushed this issue and contributed to you losing enthusiasm. Very glad you're back.

@danielweck
Copy link

danielweck commented Feb 15, 2024

Thanks @thejoshwolfe (and @overlookmotel too) for supporting/maintaining this open source codebase. Best wishes, Dan

@thejoshwolfe
Copy link
Owner

@overlookmotel you are absolutely not responsible for my feelings, haha. You've done great work here. I really appreciate your enthusiasm. If anyone's responsible for my burnout, it's me for not being more empathetic to myself (therapy is great, yall; I highly recommend therapy.). And the source of the burnout isn't anyone in this community; it's Apple. 🙂

@overlookmotel
Copy link
Contributor

Thanks. That's kind of you. And +1 on therapy!

@nrutman
Copy link

nrutman commented Mar 13, 2024

@thejoshwolfe and @overlookmotel, I wanted to follow-up on this thread because I'm using yauzl and coming up against the same OSX issues described in this thread. With the increasing adoption of Macs/OSX, I'm concerned that the users of my app may try to use OSX-generated zip files and see the problems that this thread indicates.

@overlookmotel, I mostly wanted to see if you are continuing to maintain yauzl-promise and whether some of the recent changes to yauzl will make their way there, or whether that library is no longer being maintained. (It looks like nothing was committed in the past 10 months.)

Unfortunately, it seems like Apple's problems are a reality for anyone who wants to provide unzip functionality in the marketplace. I'm just trying to figure out the best way forward for my project.

@overlookmotel
Copy link
Contributor

It looks like nothing was committed in the past 10 months.

Nothing committed because no-one has reported any bugs! If there are any new features of yauzl which you feel are important for yauzl-promise to replicate, feel free to raise an issue (or, even better, make a PR).

thejoshwolfe has been very clear that he does not intend to support faulty MacOS ZIP files, so I'd suggest if it's a problem for you (as it was for me), use yauzl-promise instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants