Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments are CP437? #19

Closed
kirkouimet opened this issue Jul 10, 2015 · 2 comments
Closed

Comments are CP437? #19

kirkouimet opened this issue Jul 10, 2015 · 2 comments
Labels

Comments

@kirkouimet
Copy link

Hey @thejoshwolfe,

First off, you are awesome. Love your approach with this library.

Going through the code, I was curious why you are decoding comment text using CP437 encoding, I couldn't find a reference to this encoding in the PKWare spec. Should UTF8 just work fine?

@thejoshwolfe
Copy link
Owner

Here is the zip file spec I followed: https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

And a quote from Appendix D:

The ZIP format has historically supported only the original IBM PC character
encoding set, commonly referred to as IBM Code Page 437.

The appendix goes on to explain how to specify UTF-8 encoding for the file names and comments of individual files. There is no zipfile-wide charset specification and no way I can find to specify the charset of the zipfile comment. My interpretation of this situation is that the zipfile comment is always the "default" encoding of CP437.

@kirkouimet
Copy link
Author

Thanks for this thoughtful and comprehensive response, and thanks for all of your work on this project

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants