-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
problems with zip support #586
Comments
Hey, sorry i somehow missed the notification email about this. Will have a look. |
Had quick look at
In this file it seems to be 0x35 🤔 do you know if there is some other way to detect it? or could this be some left over data etc because of how the zip writer is implemented? $ fq -o line_bytes=8 '(.end_of_central_directory_record | d), (.gap0 | dd)' zip64.zip
│00 01 02 03 04 05 06 07│01234567│.end_of_central_directory_record{}:
0xb0│50 4b 05 06 │PK.. │ signature: raw bits (valid)
0xb0│ 00 00 │ .. │ disk_nr: 0
0xb0│ 00 00│ ..│ central_directory_start_disk_nr: 0
0xb8│01 00 │.. │ nr_of_central_directory_records_on_disk: 1
0xb8│ 01 00 │ .. │ nr_of_central_directory_records: 1
0xb8│ 2f 00 00 00│ /...│ size_of_central_directory: 47
0xc0│35 00 00 00 │5... │ offset_of_start_of_central_directory: 53
0xc0│ 00 00│ │ ..│ │ comment_length: 0
│ │ │ comment: ""
│00 01 02 03 04 05 06 07│01234567│
0x60│ 50 4b 06 06│ PK..│.gap0: raw bits
0x68│2c 00 00 00 00 00 00 00│,.......│
0x70│1e 03 2d 00 00 00 00 00│..-.....│
0x78│00 00 00 00 01 00 00 00│........│
0x80│00 00 00 00 01 00 00 00│........│
0x88│00 00 00 00 2f 00 00 00│..../...│
0x90│00 00 00 00 35 00 00 00│....5...│
0x98│00 00 00 00 50 4b 06 07│....PK..│
0xa0│00 00 00 00 64 00 00 00│....d...│
0xa8│00 00 00 00 01 00 00 00│........│ |
You can identify a zip64 by the existence of the zip64 EOCD, which you can recognize by the 'PK\06\06'. |
Sorry, that was slightly wrong: You look for the Zip64 end of central directory locator (0x07064b50) directly before the standard ZIP EOCD that points you to the Zip64 EOCD and then you can parse from there. |
Yes helps, and that is actually you how the decoder works now, it heuristically seeks from the end backwards after the "zip32" EOCD signature. So should it look for both and decode but prefer the zip64 one? otherwise there might be a zip32 gap i guess? |
No, in a zip64 there are both. From the appnote:
So at the end you always have a zip(32) EOCD. If you go backwards from that and find a zip64 EOCD locator, it's a zip64 and you need to parse the zip64 EOCD (which it tells you how to find). If you find a central directory header instead, you know it's not a zip64. |
Aha i see, thanks. Will look into a fix soon, let me know if you want to have a go at it. The code looks a messy now, so could probably be cleanup up and refactored a bit also. |
Had a look at I did Will have a look at the toml code (uses github.com/BurntSushi/toml) and try figure out some way to fail toml decoding faster. |
No hurry. Since I'm already maintaining a zip library, I'll leave this one up to you ;) Interesting analysis about the toml decoder. I had thought files would be decoded more on-demand, not all in advance. Good to know! |
I was a bit fast to judge, toml is slow but finishes, it seems to actually be the xml decoder that eat a lot of memory for some reason hmm. Currently i'm not doing any on-demand decoding, have thought about it and would be interesting to look into, but fq's decode and jq code is quite complex as it is :) so will see. Focus have mostly been on making things possible over speed and efficiency. But some format do have options to disable sub-decoding, the zip format should probably have |
encoding/xml and github.com/BurntSushi/toml both reads a lot before detecting that it can't decode. Now we instead read one UTF-8 and make sure it's valid xml or toml. Should speed up probing Relatd to #586 bigzero-zip.zip
encoding/xml and github.com/BurntSushi/toml both reads a lot before detecting that it can't decode. Now we instead read one UTF-8 and make sure it's valid xml or toml. Should speed up probing Related to #586 bigzero-zip.zip
encoding/xml and github.com/BurntSushi/toml both reads a lot before detecting that it can't decode. Now we instead read one UTF-8 and make sure it's valid xml or toml. Should speed up probing Related to #586 bigzero-zip.zip
#594 makes decoding bigzero-zip.zip quite a lot faster but will still use some cpu and memory as is uncompresses to memory. This will also speed up probing in many other cases. If your curious, fq does not have any special probe code instead it's up to decoder that are in the "probe" group to fail fast. |
There will always be zip(32) EOCD but optinally a zip64 EOCD Related to #586
Both issues should be resolved now. Give it a try if you can. I noticed that there were some more (non-broken) files from the libzip regression tests that fq dont like, maybe i will have a look at those also. |
Thanks, the zip64 eocd parsing looks good now.
|
Oh yes i messed up, this should fix it #596 |
All good now, thank you! |
🥳 thanks for reporting and nice bug report |
You added fq to NetBSD ports? thanks for that, were no issues? have no idea how the golang support is on *BSDs. Also I can add you to the list of ppl i notify when doing a new release if you want? |
Yes, I did; and pkgsrc is portable and also used e.g. on Illumos, macOS, Linux and other operating systems :) Golang support in pkgsrc and NetBSD in special is quite good. Sure, let me know about new releases. Thank you |
Aha didn't know 👍 could possibly add it to the fq README
Good to know. The only thing i'm a bit worried about fq-wise is the readline module which has some os-specific code, but the REPL seems to work fine?
Will do |
Yes, the REPL works fine. (NetBSD is the upstream for editline (a BSD licensed readline).) |
Good, fq uses a fork of https://github.com/chzyer/readline which is similar to libreadline etc but implemented in go so is very convenient to use and build. Trying to stay away from cgo to not have to deal with c build issues... have enough of those in my life anyway :) |
I tried fq on zip archives, and I had two problems so far:
using https://github.com/nih-at/libzip/blob/main/regress/bigzero-zip.zip
The text was updated successfully, but these errors were encountered: