Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot parse zip file containing 65535 files, or with a central directory offset of 0xffffffff, if not in Zip64 format #108

Closed
AxbB36 opened this issue May 2, 2019 · 5 comments

Comments

@AxbB36
Copy link

AxbB36 commented May 2, 2019

Create ffff.zip containing 65535 files as follows:

$ seq 1 65535 | while read n; do touch -d '2019-05-01 00:00:00 UTC' $(printf %04x $n); done
$ TZ=UTC zip -X ffff.zip $(seq 1 65535 | while read n; do printf "%04x\n" $n; done)

UnZip 6.0 can parse it:

$ unzip -l ffff.zip | tail -n 3
        0  2019-05-01 00:00   ffff
---------                     -------
        0                     65535 files

But this yauzl program cannot:

let yauzl = require("yauzl");
yauzl.open(process.argv[2], {lazyEntries: true}, (err, zipfile) => {
    if (err)
        throw err;
    zipfile.on("entry", entry => {
        zipfile.openReadStream(entry, (err, r) => {
            if (err)
                throw err;
            let n = 0;
            r.on("data", chunk => n += chunk.length);
            r.on("end", () => {
                console.log(`${n}\t${entry.fileName}`);
                zipfile.readEntry();
            });
        });
    });
    zipfile.readEntry();
});

The error message is:

$ node ziplist.js ffff.zip
ziplist.js:4
        throw err;
        ^

Error: invalid zip64 end of central directory locator signature
    at node_modules/yauzl/index.js:154:27
    at node_modules/yauzl/index.js:631:5
    at node_modules/fd-slicer/index.js:32:7
    at FSReqWrap.wrapper [as oncomplete] (fs.js:658:17)

yauzl interprets an entryCount of 0xffff (or a centralDirectoryOffset of 0xffffffff) to mean that a Zip64 end of central directory locator must be present:

yauzl/index.js

Lines 140 to 142 in 02a5ca6

if (!(entryCount === 0xffff || centralDirectoryOffset === 0xffffffff)) {
return callback(null, new ZipFile(reader, centralDirectoryOffset, totalSize, entryCount, comment, options.autoClose, options.lazyEntries, decodeStrings, options.validateEntrySizes, options.strictFileNames));
}

APPNOTE.TXT seems to say that the implication goes the other way: instead of 0xffff ⇒ Zip64, it is Zip64 ⇒ 0xffff; i.e., a value of 0xffff does not necessarily imply that Zip64 information must be present.

4.4.1.4 If one of the fields in the end of central directory record is too small to hold required data, the field SHOULD be set to -1 (0xFFFF or 0xFFFFFFFF) and the ZIP64 format record SHOULD be created.

How some other implementations handle it

UnZip searches for a zip64 end of central directory locator unconditionally (whether or not there is a 0xffff or 0xffffffff), and does not error if the locator is not found. process.c:find_ecrec:

    /* Next: Check for existence of Zip64 end-of-cent-dir locator
       ECLOC64. This structure must reside on the same volume as the
       classic ECREC, at exactly (ECLOC64_SIZE+4) bytes in front
       of the ECREC.
       The ECLOC64 structure directs to the longer ECREC64 structure
       A ECREC64 will ALWAYS exist for a proper Zip64 archive, as
       the "Version Needed To Extract" field is required to be set
       to 4.5 or higher whenever any Zip64 features are used anywhere
       in the archive, so just check for that to see if this is a
       Zip64 archive.
     */
    result = find_ecrec64(__G__ searchlen+76);
        /* 76 bytes for zip64ec & zip64 locator */
    if (result != PK_COOL) {
        if (error_in_archive < result)
            error_in_archive = result;
        return error_in_archive;
    }

process.c:find_ecrec64:

    if (memcmp((char *)byterecL, end_centloc64_sig, 4) ) {
      /* not found */
      return PK_COOL;
    }

Python zipfile also searches for a zip64 end of central directory locator unconditionally, and does not error if it does not find the expected signature:
https://github.com/python/cpython/blob/v3.7.0/Lib/zipfile.py#L258-L259
https://github.com/python/cpython/blob/v3.7.0/Lib/zipfile.py#L282-L284
https://github.com/python/cpython/blob/v3.7.0/Lib/zipfile.py#L197-L202

    data = fpin.read(sizeEndCentDir64Locator)
    if len(data) != sizeEndCentDir64Locator:
        return endrec
    sig, diskno, reloff, disks = struct.unpack(structEndArchive64Locator, data)
    if sig != stringEndArchive64Locator:
        return endrec

Go archive/zip searches for a zip64 end of central directory locator only if entryCount is 0xffff, or centralDirectoryOffset is 0xffffffff, or the central directory size is 0xffffffff. It doesn't error if the locator is not found.
https://github.com/golang/go/blob/go1.12.4/src/archive/zip/reader.go#L502-L511

	// These values mean that the file can be a zip64 file
	if d.directoryRecords == 0xffff || d.directorySize == 0xffff || d.directoryOffset == 0xffffffff {
		p, err := findDirectory64End(r, directoryEndOffset)
		if err == nil && p >= 0 {
			err = readDirectory64End(r, p, d)
		}
		if err != nil {
			return nil, err
		}
	}
@thejoshwolfe
Copy link
Owner

Thanks for the detailed report! I'll take a look.

@AxbB36
Copy link
Author

AxbB36 commented May 13, 2019

This issue also affects zip files that have a central directory offset of 0xffffffff. Here is a recipe to make a test case for that.

ffffffff-centralDirectoryOffset.zip.gz.gz (remove 2 layers of gzip)

# 216186 * 19867 = 0xffffffff - len("pad") - 30
dd if=/dev/zero bs=216186 count=19867 of=pad
touch -d '2019-05-01 00:00:00 UTC' pad
rm -f ffffffff-centralDirectoryOffset.zip
TZ=UTC zip -0 -X ffffffff-centralDirectoryOffset.zip pad

zipinfo -v says:

  The central directory is 49 (0000000000000031h) bytes long,
  and its (expected) offset in bytes from the beginning of the zipfile
  is 4294967295 (00000000FFFFFFFFh).

@AxbB36 AxbB36 changed the title Cannot parse zip file containing 65535 files, if not in Zip64 format Cannot parse zip file containing 65535 files, or with a central directory offset of 0xffffffff, if not in Zip64 format May 13, 2019
@thejoshwolfe
Copy link
Owner

@AxbB36 This is fixed in yauzl version 3.1.1. I didn't make an automated test for this, because creating performant tests for large numbers is pretty difficult (see test/zip64.js), but i manually verified the test case you outlined in the OP works with the examples/dump.js example.

@thejoshwolfe
Copy link
Owner

This issue also affects zip files that have a central directory offset of 0xffffffff

Oh, I may not have fixed this issue. Are you getting an error expected zip64 extended information extra field?

@thejoshwolfe
Copy link
Owner

Oh, I may not have fixed this issue.

Ok, I fixed the entry handling as well in version 3.1.2. I think this issue is fully fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants