Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] decodeFileData and canDecodeFileData #82

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

thejoshwolfe
Copy link
Owner

Here is a documented implementation (but no tests yet) for an alternative to decompress: false called decodeFileData: false. This new option simply ignores all the metadata about an entry's file data encoding including compressionMethod and generalPurposeBitFlags (for encryption), and gives you the raw file data as it is encoded in the zipfile.

@maxogden , Do you have an opinion on this new API? It's a departure from the ideas in #38, however I think this makes the start and end options easier to use.

The decompress and decrypt options become deprecated, and isEncrypted() is obsoleted by canDecodeFileData(), which takes into account encryption and compression method.

@overlookmotel
Copy link
Contributor

Sorry I'm so slow to come back - have been tied up with work. Thanks for finding a solution to my issue #80.

There's one use case which the old options could handle, which wouldn't be covered by this new API. If an entry is encrypted and compressed and the user wants to receive the unencrypted but compressed data, decodeFileData doesn't give this option. An example would be if the user has some means of decompressing Deflate64 so they want the compressed data but need it decrypted first.

In my experience, this is a very niche case. I've never actually come across an encrypted ZIP file myself. But it's part of the standard so could be good to support. Chances are as soon as you rule out something being practically useless, that's the moment when someone pops up and shouts "I need this!"

Personally I like this simpler new API, but I wonder if the API change is necessary. An alternative might be to make the existing options a bit more liberal. e.g.:

  • If entry data is not compressed, decompress option is ignored (no error)
  • If entry data is not encrypted, decrypt option is ignored (no error)
  • decompress and decrypt options default to true if null/undefined

I think this is produces the same default behavior as at present, but makes it simpler to use the API. If someone wants all file data decrypted and uncompressed, they use default options. If they want raw data for everything, they can set decompress and decrypt to false for all calls to openReadStream(), regardless of whether entries are compressed or encrypted or not, and they won't get any errors.

I'm not sure if this would constitute a breaking change or not.

But that's just a few thoughts I'm throwing in in case they're useful. I'm not strongly arguing for this approach.

@thejoshwolfe
Copy link
Owner Author

Just to be clear, yauzl does not support any decryption. The only allowed values for the decrypt option are false and null. I was thinking that perhaps yauzl could one day support decryption with decrypt: true, but at the moment it's a YAGNI violation. The reason the decrypt option was introduced at all was to support ignoring encryption end reading the raw data. So my new idea is to have an option explicitly for reading the raw data without any mention of encryption.

But it's part of the standard so could be good to support.

I disagree with this. Having read most of the details in the zipfile spec, there's a lot of useless junk in there that no one supports except the original author, for example compression methods 1-7. Fully supporting the zipfile spec is a fool's errand (and a waste of runtime resources).

For what it's worth, I really don't want to support decryption specifically due to the terrible performance of pure JavaScript at the kind of tight-loop byte math it would require, and I really don't want to introduce a native code dependency to do it in C either.

I'm always listening for people presenting compelling usecases for supporting more aspects of the spec, but I have yet to be convinced that decryption is worth the cost.

It's currently possible (and still will be after this proposal) to do any kind of decryption outside of yauzl by processing the raw bytes yourself. This is also true of unusual compression methods, as you know. yauzl has always been designed to introduce as little overhead as possible, so there shouldn't be any performance argument for doing these things inside vs outside yauzl. The question is if it's common enough (e.g. deflate encryption) or convenient enough (e.g. #66) to justify the cost of everyone getting the code to handle it.

@overlookmotel
Copy link
Contributor

Ah sorry I totally misunderstood. I thought yauzl did support decryption already.

I agree with you that there's no need to add decryption just because it's in the ZIP spec. I've never seen a ZIP file with encryption, and by the sounds of things the "basic encryption" is pretty insecure anyway, so no-one should be encouraged to use it.

So the use case I presented where user wants to decrypt but not uncompress does not apply - they can't do this anyway. So my criticism has no merit!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants