Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for archive + file comments #44

Merged
merged 3 commits into from Nov 15, 2018

Conversation

mojodna
Copy link
Contributor

@mojodna mojodna commented Nov 12, 2018

This adds support (and tests) for archive and file comments. Archive comments are set by providing comment in .end()'s options. File comments are set by providing fileComment when adding files.

@thejoshwolfe
Copy link
Owner

Thanks for the PR. What's the usecase for adding comments?

This has been proposed and rejected before in #15. I'm open to adding support for this if there's a real usecase.

@mojodna
Copy link
Contributor Author

mojodna commented Nov 12, 2018

Short version: application-specific metadata as part of the central directory.

I’m building out the JavaScript tooling for https://medium.com/@mojodna/tapalcatl-cloud-optimized-tile-archives-1db8d4577d92, particularly the archive generation piece. A preliminary survey of unzip libraries across multiple platforms showed support for comments and they fit the use-case really well, since a single read gets both the central directory and application-specific metadata.

The facilitating library is here: https://github.com/mojodna/tapalcatl-js

Thanks to RandomAccessReader, doing block-aligned remote reads was totally straightforward to implement. Thanks!

@thejoshwolfe
Copy link
Owner

Sounds like a compelling usecase to me!

And thanks for this high-quality PR! I'll work on merging this tonight.

@mojodna
Copy link
Contributor Author

mojodna commented Nov 15, 2018

Awesome, thanks!

@thejoshwolfe
Copy link
Owner

So here's something disappointing about the zipfile spec and ecosystem: technically the zipfile comment is supposed to be encoded in CP437, but in practice I've only seen software that treats it as UTF-8. The overlap between UTF-8 and CP437 is the codepoints 0x20...0x7e (printable ASCII, no whitespace except single space ' '), which means if you restrict your file comment to those characters, everything will work. If you don't, things might work?

Do you need newlines (or hard tabs, or other ascii control characters, or non-ascii unicode characters) in your zipfile comment? For example, the python json.dumps(obj) will meet this restriction, but JavaScript's JSON.stringify(obj) will not (because it allows non-ascii unicode characters), and any pretty-printed json will also not meet this restriction, because of newlines.

Note that the file comments on entries in the zipfile can be UTF-8, so there's no problem there.

@thejoshwolfe thejoshwolfe merged commit 09a58cf into thejoshwolfe:master Nov 15, 2018
@thejoshwolfe
Copy link
Owner

oops. accidentally merged. .... this PR probably won't reopen will it 🤔

@@ -606,7 +608,7 @@ function dateToDosDateTime(jsDate) {
}

function writeUInt64LE(buffer, n, offset) {
// can't use bitshift here, because JavaScript only allows bitshiting on 32-bit integers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💩💩💩

@thejoshwolfe
Copy link
Owner

Merged in #45. Thanks again @mojodna!

@thejoshwolfe
Copy link
Owner

@mojodna I published yazl version 2.5.0 which includes this feature. Please let me know how it works for you.

The CP437 situation is pretty silly. I'm going to great lengths to adhere to an obscure part of the spec that no one else seems to care about. I'm seriously considering giving up on cp437 entirely both in yazl and yauzl. 🤔

@andrewrk
Copy link
Collaborator

The CP437 situation is pretty silly. I'm going to great lengths to adhere to an obscure part of the spec that no one else seems to care about. I'm seriously considering giving up on cp437 entirely both in yazl and yauzl. 🤔

I have a suggestion:

Create your own zip file specification. Give it a new catchy name something like "Frenzied Zip File Format Specification" and make the tweaks you want, such as clearing up ambiguity, declaring that cp437 is only used in certain circumstances, whatever. Then make it clear that most zip file readers are actually adhering to Frenzied Zip File Format Specification anyway, not the "official" specification. Make a blog post talking about the problems with the official specs, and promote your fork spec, and point out that most people are using your spec anyway, we should just all admit it and agree this is the new standard.

@thejoshwolfe
Copy link
Owner

The Mac Archive Utility can't read self extracting zip files. So not everyone is on the same page in the wild.

I guess I could just have a special section for Archive Utility zipfiles, since they're so different anyway. A blog post is something I've been meaning to do for years though.

@mojodna
Copy link
Contributor Author

mojodna commented Nov 18, 2018

@thejoshwolfe awesome, thanks so much!

I did a bit of experimentation to see how various tools / libs handle UTF-8 encoded comments. The short version is that all the ones I've tested just pass the bytes through (especially Python 3, where it comes back as bytes) and don't worry about their encoding.

Archive creator:

var fs = require("fs");
var yazl = require("yazl");
var zipfile = new yazl.ZipFile();
zipfile.outputStream.pipe(fs.createWriteStream("test.zip"));
zipfile.addFile("package.json", "package.json", {
  fileComment: "abcd⎋⌘👋"
});
zipfile.end({ comment: Buffer.from("abcd⎋⌘👋") })

unzip:

$ unzip -v test.zip
Archive:  test.zip
abcd⎋⌘👋
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
     694  Defl:N      333  52% 11-17-2018 17:26 1dd024d9  package.json
abcd⎋⌘👋
--------          -------  ---                            -------
     694              333  52%                            1 file

Python 2:

>>> import zipfile
>>> z = zipfile.ZipFile("test.zip")
>>> z.comment
'abcd\xe2\x8e\x8b\xe2\x8c\x98\xf0\x9f\x91\x8b'
>>> print z.comment
abcd⎋⌘👋

Python 3:

>>> import zipfile
>>> z = zipfile.ZipFile("test.zip")
>>> z.comment
b'abcd\xe2\x8e\x8b\xe2\x8c\x98\xf0\x9f\x91\x8b'
>>> print(z.comment)
b'abcd\xe2\x8e\x8b\xe2\x8c\x98\xf0\x9f\x91\x8b'
>>> print(z.comment.decode("utf-8"))
abcd⎋⌘👋

JDK 8 (via Scala 2.12.7):

scala> import java.util.zip.ZipFile
import java.util.zip.ZipFile

scala> val z = new ZipFile("test.zip")
z: java.util.zip.ZipFile = java.util.zip.ZipFile@1d23ff23

scala> z.getComment()
res1: String = abcd⎋⌘👋

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants