Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zip: Fix incorrect time/date, add extended timestamp and refactor #793

Merged
merged 1 commit into from
Oct 23, 2023

Conversation

wader
Copy link
Owner

@wader wader commented Oct 21, 2023

MSDOS time/date was read in wrong order and also did not take into account that the bit ranges in the shortis are in little-endian.

Remodel modification_time/date to be one struct with fat_time, fat_date LE shorts and then synthetic values for day, hours, minute etc and also a unix field with the timestamp as unix time.

Also refactor and clenaup extra fields/extended code a bit.

Fixes #792

@TomiBelan
Copy link

day, month, year, hour, minute looks correct! Thanks!
seconds is divided by 2, as expected.
But, in unix: 1697925626 (2023-10-21T22:00:26Z), should it really be Z (Zulu time a.k.a. UTC)?

Example:
(notice 20:00 vs 22:00)

[machine ~]$ touch foo
[machine ~]$ stat foo
  File: foo
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: 8,5     Inode: 30503682    Links: 1
Access: (0644/-rw-r--r--)  Uid: (50000/    user)   Gid: (50000/    user)
Access: 2023-10-21 22:00:25.451721086 +0200
Modify: 2023-10-21 22:00:25.451721086 +0200
Change: 2023-10-21 22:00:25.451721086 +0200
 Birth: 2023-10-21 22:00:25.451721086 +0200
[machine ~]$ zip file.zip foo
  adding: foo (stored 0%)
[machine ~]$ go run github.com/wader/fq@zip-correct-date-time-fields d file.zip
    |00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15|0123456789abcdef012345|.{}: file.zip (zip)
    |                                                                 |                      |  local_files[0:1]:
    |                                                                 |                      |    [0]{}: local_file
0x00|50 4b 03 04                                                      |PK..                  |      signature: raw bits (valid)
0x00|            0a 00                                                |    ..                |      version_needed: 10
    |                                                                 |                      |      flags{}:
0x00|                  00                                             |      .               |        unused0: 0
0x00|                  00                                             |      .               |        strong_encryption: false
0x00|                  00                                             |      .               |        compressed_patched_data: false
0x00|                  00                                             |      .               |        enhanced_deflation: false
0x00|                  00                                             |      .               |        data_descriptor: false
0x00|                  00                                             |      .               |        compression0: false
0x00|                  00                                             |      .               |        compression1: false
0x00|                  00                                             |      .               |        encrypted: false
0x00|                     00                                          |       .              |        reserved0: 0
0x00|                     00                                          |       .              |        mask_header_values: false
0x00|                     00                                          |       .              |        reserved1: false
0x00|                     00                                          |       .              |        language_encoding: false
0x00|                     00                                          |       .              |        unused1: 0
0x00|                        00 00                                    |        ..            |      compression_method: "none" (0)
    |                                                                 |                      |      last_modification{}:
0x00|                              0d b0                              |          ..          |        fat_time: 45069
    |                                                                 |                      |        second: 13
    |                                                                 |                      |        minute: 0
    |                                                                 |                      |        hour: 22
0x00|                                    55 57                        |            UW        |        fat_date: 22357
    |                                                                 |                      |        day: 21
    |                                                                 |                      |        month: 10
    |                                                                 |                      |        year: 43
    |                                                                 |                      |        unix: 1697925626 (2023-10-21T22:00:26Z)
0x00|                                          00 00 00 00            |              ....    |      crc32_uncompressed: 0x0
0x00|                                                      00 00 00 00|                  ....|      compressed_size: 0
0x16|00 00 00 00                                                      |....                  |      uncompressed_size: 0
0x16|            03 00                                                |    ..                |      file_name_length: 3
0x16|                  1c 00                                          |      ..              |      extra_field_length: 28
0x16|                        66 6f 6f                                 |        foo           |      file_name: "foo"
    |                                                                 |                      |      extra_fields[0:2]:
    |                                                                 |                      |        [0]{}: extra_field
0x16|                                 55 54                           |           UT         |          tag: 0x5455 (extended timestamp)
0x16|                                       09 00                     |             ..       |          size: 9
    |                                                                 |                      |          flags{}:
0x16|                                             03                  |               .      |            unused: 0
0x16|                                             03                  |               .      |            creation_time_present: false
0x16|                                             03                  |               .      |            access_time_present: true
0x16|                                             03                  |               .      |            modification_time_present: true
0x16|                                                d9 2d 34 65      |                .-4e  |          modification_time: 1697918425 (2023-10-21T20:00:25Z)
0x16|                                                            d9 2d|                    .-|          access_time: 1697918425 (2023-10-21T20:00:25Z)
0x2c|34 65                                                            |4e                    |
    |                                                                 |                      |        [1]{}: extra_field
0x2c|      75 78                                                      |  ux                  |          tag: 0x7875 (UNIX UID/GID)
0x2c|            0b 00                                                |    ..                |          size: 11
0x2c|                  01 04 50 c3 00 00 04 50 c3 00 00               |      ..P....P...     |          data: raw bits
    |                                                                 |                      |      uncompressed: raw bits
    |                                                                 |                      |  central_directories[0:1]:
    |                                                                 |                      |    [0]{}: central_directory
0x2c|                                                   50 4b 01 02   |                 PK.. |      signature: raw bits (valid)
0x2c|                                                               1e|                     .|      version_made_by: 798
0x42|03                                                               |.                     |
0x42|   0a 00                                                         | ..                   |      version_needed: 10
    |                                                                 |                      |      flags{}:
0x42|         00                                                      |   .                  |        unused0: 0
0x42|         00                                                      |   .                  |        strong_encryption: false
0x42|         00                                                      |   .                  |        compressed_patched_data: false
0x42|         00                                                      |   .                  |        enhanced_deflation: false
0x42|         00                                                      |   .                  |        data_descriptor: false
0x42|         00                                                      |   .                  |        compression0: false
0x42|         00                                                      |   .                  |        compression1: false
0x42|         00                                                      |   .                  |        encrypted: false
0x42|            00                                                   |    .                 |        reserved0: 0
0x42|            00                                                   |    .                 |        mask_header_values: false
0x42|            00                                                   |    .                 |        reserved1: false
0x42|            00                                                   |    .                 |        language_encoding: false
0x42|            00                                                   |    .                 |        unused1: 0
0x42|               00 00                                             |     ..               |      compression_method: "none" (0)
    |                                                                 |                      |      last_modification{}:
0x42|                     0d b0                                       |       ..             |        fat_time: 45069
    |                                                                 |                      |        second: 13
    |                                                                 |                      |        minute: 0
    |                                                                 |                      |        hour: 22
0x42|                           55 57                                 |         UW           |        fat_date: 22357
    |                                                                 |                      |        day: 21
    |                                                                 |                      |        month: 10
    |                                                                 |                      |        year: 43
    |                                                                 |                      |        unix: 1697925626 (2023-10-21T22:00:26Z)
0x42|                                 00 00 00 00                     |           ....       |      crc32_uncompressed: 0x0
0x42|                                             00 00 00 00         |               ....   |      compressed_size: 0
0x42|                                                         00 00 00|                   ...|      uncompressed_size: 0
0x58|00                                                               |.                     |
0x58|   03 00                                                         | ..                   |      file_name_length: 3
0x58|         18 00                                                   |   ..                 |      extra_field_length: 24
0x58|               00 00                                             |     ..               |      file_comment_length: 0
0x58|                     00 00                                       |       ..             |      disk_number_where_file_starts: 0
0x58|                           00 00                                 |         ..           |      internal_file_attributes: 0
0x58|                                 00 00 a4 81                     |           ....       |      external_file_attributes: 2175008768
0x58|                                             00 00 00 00         |               ....   |      relative_offset_of_local_file_header: 0
0x58|                                                         66 6f 6f|                   foo|      file_name: "foo"
    |                                                                 |                      |      extra_fields[0:2]:
    |                                                                 |                      |        [0]{}: extra_field
0x6e|55 54                                                            |UT                    |          tag: 0x5455 (extended timestamp)
0x6e|      05 00                                                      |  ..                  |          size: 5
    |                                                                 |                      |          flags{}:
0x6e|            03                                                   |    .                 |            unused: 0
0x6e|            03                                                   |    .                 |            creation_time_present: false
0x6e|            03                                                   |    .                 |            access_time_present: true
0x6e|            03                                                   |    .                 |            modification_time_present: true
0x6e|               d9 2d 34 65                                       |     .-4e             |          modification_time: 1697918425 (2023-10-21T20:00:25Z)
    |                                                                 |                      |        [1]{}: extra_field
0x6e|                           75 78                                 |         ux           |          tag: 0x7875 (UNIX UID/GID)
0x6e|                                 0b 00                           |           ..         |          size: 11
0x6e|                                       01 04 50 c3 00 00 04 50 c3|             ..P....P.|          data: raw bits
0x84|00 00                                                            |..                    |
    |                                                                 |                      |      file_comment: ""
    |                                                                 |                      |  end_of_central_directory_record{}:
0x84|      50 4b 05 06                                                |  PK..                |    signature: raw bits (valid)
0x84|                  00 00                                          |      ..              |    disk_nr: 0
0x84|                        00 00                                    |        ..            |    central_directory_start_disk_nr: 0
0x84|                              01 00                              |          ..          |    nr_of_central_directory_records_on_disk: 1
0x84|                                    01 00                        |            ..        |    nr_of_central_directory_records: 1
0x84|                                          49 00 00 00            |              I...    |    size_of_central_directory: 73
0x84|                                                      3d 00 00 00|                  =...|    offset_of_start_of_central_directory: 61
0x9a|00 00|                                                           |..|                   |    comment_length: 0
    |                                                                 |                      |    comment: ""
[machine ~]$ unzip -lv file.zip
Archive:  file.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
       0  Stored        0   0% 2023-10-21 22:00 00000000  foo
--------          -------  ---                            -------
       0                0   0%                            1 file
[machine ~]$ bsdtar tvvf file.zip
-rw-r--r--  0 50000  50000       0 Oct 21 22:00 foo
Archive Format: ZIP 1.0 (uncompressed),  Compression: none

@wader wader force-pushed the zip-correct-date-time-fields branch from f79afdf to c571ed7 Compare October 21, 2023 21:46
@wader
Copy link
Owner Author

wader commented Oct 21, 2023

Good catch. As i understand it the last modification that uses msdos format (not the extended timestamp) has no timezone? is local? so i changed to remove the "Z".

I also changed so that second field is now shown *2, is less confusing?

@TomiBelan
Copy link

Removing Z sounds good to me. This page also says MSDOS times don't have a timezone: http://fileformats.archiveteam.org/wiki/MS-DOS_date/time

About raw seconds or *2, what does fq do in other such cases? For what it's worth, year is also year+1980.
IMHO showing raw_second: 13, raw_year: 43, unix: ... (2023-10-21T22:00:26) would be a good solution.

Currently the output is:

    |                                               |                |      last_modification{}:
0x40|                           0d b0               |         ..     |        fat_time: 45069
    |                                               |                |        second: 26
    |                                               |                |        minute: 0
    |                                               |                |        hour: 22
0x40|                                 55 57         |           UW   |        fat_date: 22357
    |                                               |                |        day: 21
    |                                               |                |        month: 10
    |                                               |                |        year: 43
    |                                               |                |        unix: 1697925626 (2023-10-21T22:00:26)
...
    |                                               |                |      extra_fields[0:2]:
...
0x70|         d9 2d 34 65                           |   .-4e         |          modification_time: 1697918425 (2023-10-21T20:00:25Z)

I feel I'm nitpicking really minor details at this point, but 1697925626 is suboptimal. It should probably be unix: 1697918426 (2023-10-21T22:00:26). Here 2023-10-21T22:00:26 is literally from the zip and 1697918426 is a best guess assuming local timezone. Or not show the unix timestamp guesstimate at all, only the (...). But this is just a nitpick, the current version is also good enough for me.

@wader
Copy link
Owner Author

wader commented Oct 21, 2023

Removing Z sounds good to me. This page also says MSDOS times don't have a timezone: http://fileformats.archiveteam.org/wiki/MS-DOS_date/time

About raw seconds or *2, what does fq do in other such cases? For what it's worth, year is also year+1980. IMHO showing raw_second: 13, raw_year: 43, unix: ... (2023-10-21T22:00:26) would be a good solution.

It's mostly up to a format decoder how to "model" things. Each field (the thing in the tree and has a name) is tied to a "decode value" that consist of an optional backing bit buffer and range (otherwise seen as "synthetic"), a actual value, an optional symbolic value and a description. The symbolic value is the default value if set, otherwise actual is used in expression etc. One can use toactual to access to actual (there is also tosym but is less usefull)

So in this case i can see some alternatives: (using year as example)

year: synthetic with actual value 2000
raw_year: synthetic with actual value 20

Or

year: synthetic with actual value 20 and symbolic value 2000

If the msdos timestamp was not little-endian with non-byte-aligned bit-ranges (month and minute is not a continuous bit range i think?) i would probably have model year as actual 20 and symbolic 2000. Maybe should do that even when synthetic then?

Currently the output is:

    |                                               |                |      last_modification{}:
0x40|                           0d b0               |         ..     |        fat_time: 45069
    |                                               |                |        second: 26
    |                                               |                |        minute: 0
    |                                               |                |        hour: 22
0x40|                                 55 57         |           UW   |        fat_date: 22357
    |                                               |                |        day: 21
    |                                               |                |        month: 10
    |                                               |                |        year: 43
    |                                               |                |        unix: 1697925626 (2023-10-21T22:00:26)
...
    |                                               |                |      extra_fields[0:2]:
...
0x70|         d9 2d 34 65                           |   .-4e         |          modification_time: 1697918425 (2023-10-21T20:00:25Z)

I feel I'm nitpicking really minor details at this point, but 1697925626 is suboptimal. It should probably be unix: 1697918426 (2023-10-21T22:00:26). Here 2023-10-21T22:00:26 is literally from the zip and 1697918426 is a best guess assuming local timezone. Or not show the unix timestamp guesstimate at all, only the (...). But this is just a nitpick, the current version is also good enough for me.

No worries! glad there is someone else to discuss with, quite a lot of time writing decoders is actually lots of debating with one self about such things :)

Guess here means use locally configured timezone where fq i running? yeah that feels a bit shaky, have a feeling it's good to keep the output not dependent on such things. So is the option to include it all or clearly somehow indicated assumed timezone (UTC?)? rename it to unix_utc or unix_guess etc?

What did you mean by "only the (...)" btw?

@TomiBelan
Copy link

Hmm, does that mean you can show "year: 23 (2023)"? But having both year and raw_year sounds good too.

Oh sorry, by "only the (...)" I meant remove the timestamp number and only keep the part that is currently shown between ( and ). I.e. (2023-10-21T22:00:26)

@wader
Copy link
Owner Author

wader commented Oct 21, 2023

Hmm, does that mean you can show "year: 23 (2023)"? But having both year and raw_year sounds good too.

Yeap but it would probably be the other way around: year: 2003 (23) is symbolic is set it's displayed as <sym> (<actual>). Let's try that.

Oh sorry, by "only the (...)" I meant remove the timestamp number and only keep the part that is currently shown between ( and ). I.e. (2023-10-21T22:00:26)

Aha i see, problem is the (2023-10-21T22:00:26) is the description so as it works now there has to be some kind of field to tie it to.

@wader wader force-pushed the zip-correct-date-time-fields branch 3 times, most recently from 0403a55 to a148f76 Compare October 23, 2023 09:06
@wader
Copy link
Owner Author

wader commented Oct 23, 2023

Added a description to unix_guess and added some docs about time zones. Looks ok?

MSDOS time/date was read in wrong order and also did not take into account
that the bit ranges in the shortis are in little-endian.

Remodel modification_time/date to be one struct with fat_time, fat_date LE shorts
and then synthetic values for day, hours, minute etc and also a unix field with the
timestamp as unix time.

Also refactor and clenaup extra fields/extended code a bit.

Fixes #792
@wader wader force-pushed the zip-correct-date-time-fields branch from a148f76 to a83cac6 Compare October 23, 2023 09:11
@TomiBelan
Copy link

Looks ok.

My main disagreement is that in the grand question of "use UTC (thus, behave differently than what unzip programs would do) or use local time (thus, fq output depends on user's timezone)?" I'm still not so sure I'd pick UTC... but I see the arguments in favor, and the current output is a good realization of that choice.

@wader
Copy link
Owner Author

wader commented Oct 23, 2023

Thanks for reviewing. I'm trying to convince myself if it's a good or bad idea for the output to depend on local settings or not, feels somehow safer and less surprising to not?

What would you pick instead of UTC?

@TomiBelan
Copy link

Either UTC, or local, or nothing.
Disadvantage of UTC: doesn't match what you'd get if you unzip the zip and stat the extracted file.
Disadvantage of local: fq output depends on user's local timezone, this might be unexpected and needs to be handled in tests.
Disadvantage of nothing: "the (2023-10-21T22:00:26) is the description so as it works now there has to be some kind of field to tie it to."

UTC is fine. Ship it.

@wader
Copy link
Owner Author

wader commented Oct 23, 2023

Agree! thanks for talking your time

@wader wader merged commit a4cfdcf into master Oct 23, 2023
5 checks passed
@wader wader deleted the zip-correct-date-time-fields branch October 23, 2023 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

zip: last_modification_date and last_modification_time are mislabeled or swapped
2 participants