Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ability to fetch from zip files #17408

Closed
andrewrk opened this issue Oct 5, 2023 · 16 comments · Fixed by #19729
Closed

ability to fetch from zip files #17408

andrewrk opened this issue Oct 5, 2023 · 16 comments · Fixed by #19729
Labels
contributor friendly This issue is limited in scope and/or knowledge of Zig internals. enhancement Solving this issue will likely involve adding new logic or components to the codebase. zig build system
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented Oct 5, 2023

zig fetch foo.zip should work.

Relevant logic is here:

zig/src/Package/Fetch.zig

Lines 946 to 1024 in f7bc55c

fn unpackResource(
f: *Fetch,
resource: *Resource,
uri_path: []const u8,
tmp_directory: Cache.Directory,
) RunError!void {
const eb = &f.error_bundle;
const file_type = switch (resource.*) {
.file => FileType.fromPath(uri_path) orelse
return f.fail(f.location_tok, try eb.printString("unknown file type: '{s}'", .{uri_path})),
.http_request => |req| ft: {
// Content-Type takes first precedence.
const content_type = req.response.headers.getFirstValue("Content-Type") orelse
return f.fail(f.location_tok, try eb.addString("missing 'Content-Type' header"));
if (ascii.eqlIgnoreCase(content_type, "application/x-tar"))
break :ft .tar;
if (ascii.eqlIgnoreCase(content_type, "application/gzip") or
ascii.eqlIgnoreCase(content_type, "application/x-gzip") or
ascii.eqlIgnoreCase(content_type, "application/tar+gzip"))
{
break :ft .@"tar.gz";
}
if (ascii.eqlIgnoreCase(content_type, "application/x-xz"))
break :ft .@"tar.xz";
if (!ascii.eqlIgnoreCase(content_type, "application/octet-stream")) {
return f.fail(f.location_tok, try eb.printString(
"unrecognized 'Content-Type' header: '{s}'",
.{content_type},
));
}
// Next, the filename from 'content-disposition: attachment' takes precedence.
if (req.response.headers.getFirstValue("Content-Disposition")) |cd_header| {
break :ft FileType.fromContentDisposition(cd_header) orelse {
return f.fail(f.location_tok, try eb.printString(
"unsupported Content-Disposition header value: '{s}' for Content-Type=application/octet-stream",
.{cd_header},
));
};
}
// Finally, the path from the URI is used.
break :ft FileType.fromPath(uri_path) orelse {
return f.fail(f.location_tok, try eb.printString(
"unknown file type: '{s}'",
.{uri_path},
));
};
},
.git => .git_pack,
.dir => |dir| return f.recursiveDirectoryCopy(dir, tmp_directory.handle) catch |err| {
return f.fail(f.location_tok, try eb.printString(
"unable to copy directory '{s}': {s}",
.{ uri_path, @errorName(err) },
));
},
};
switch (file_type) {
.tar => try unpackTarball(f, tmp_directory.handle, resource.reader()),
.@"tar.gz" => try unpackTarballCompressed(f, tmp_directory.handle, resource, std.compress.gzip),
.@"tar.xz" => try unpackTarballCompressed(f, tmp_directory.handle, resource, std.compress.xz),
.git_pack => unpackGitPack(f, tmp_directory.handle, resource) catch |err| switch (err) {
error.FetchFailed => return error.FetchFailed,
error.OutOfMemory => return error.OutOfMemory,
else => |e| return f.fail(f.location_tok, try eb.printString(
"unable to unpack git files: {s}",
.{@errorName(e)},
)),
},
}
}

@andrewrk andrewrk added enhancement Solving this issue will likely involve adding new logic or components to the codebase. contributor friendly This issue is limited in scope and/or knowledge of Zig internals. zig build system labels Oct 5, 2023
@andrewrk andrewrk added this to the 0.13.0 milestone Oct 5, 2023
@MFAshby
Copy link

MFAshby commented Oct 6, 2023

I happen to have been working with ZIP files in zig anyway, so I might take a shot at this if it's ok. I have a branch to start with:

master...MFAshby:build_zip_support

I'll figure out how to build & test it before submitting a PR.

ZIP files are a little different from TAR archives in that they might have data at the start of the file which is not part of the ZIP at all, (a fact taken advantage of by things like self-extracting ZIPs, and the https://redbean.dev/ web server). They also store the metadata preceding the file data, and repeated in a 'central directory' at the end of the file.

Probably the more reliable way to extract the ZIP is to use the central directory, which would need a little code restructuring so that the unpackZip function can seek to the end of the file first. This branch doesn't do that because it only gets a reader for the ZIP contents, which it can't necessarily seek back and forth.

@andrewrk
Copy link
Member Author

andrewrk commented Oct 6, 2023

I know enough about zip files to confidently say that the End of Central Directory Record should certainly be used as the source of truth, rather than trying to scan a stream for local headers which may actually just be file contents. I suggest to wait until #17392 lands because the logic is much more straightforward and will not need any restructuring. The function is passed a temporary directory and a reader stream of a remote zip file. So it just needs to pipe the stream into a file in the temporary directory, then unzip all the files into the temporary directory.

@The-King-of-Toasters
Copy link
Contributor

A while back I rewrote @jorangreef's pure in Zig, and was about to refactor it to use a Reader before I got tied up with work. I think this would be a better base for a ZIP reader, as pure is a a static-analysis tool to protect against Zip bombs more than a generic Zip library. Of course I would need some helping hands if this were ever to land in the stdlib.

@andrewrk
Copy link
Member Author

andrewrk commented Oct 9, 2023

But one might want to wait until #17392 is done to start working on this.

The package manager rework landed in f7bc55c. I updated the "relevant logic" snippet above.

@joeltg
Copy link
Sponsor Contributor

joeltg commented Apr 5, 2024

Alright I started on this and have a working implementation of a ZIP reader in 400 lines of pure Zig with an Iterator / pipeToFileSystem API just like std.tar :)

Code is at https://github.com/nDimensional/zig-zip for now but I'd like to PR this into lib/std/zip.zig and get package fetching working.

I have a couple questions for you @andrewrk if you have input

  • I'm using std.posix.mmap to read the ZIP file since extraction involves jumping around a lot. Is this ok / sufficiently portable for the package manager?
  • What's the best way to test ZIP extraction? I see that there's a suite of tar files in lib/std/tar/testdata, but given recent events I would feel uncomfortable checking more compressed binary blobs into the repo 🙃. Plus, as far as I can tell, ZIP doesn't have an authoritative RFC or test suite like tar.

@MFAshby
Copy link

MFAshby commented Apr 6, 2024

What's the best way to test ZIP extraction? I see that there's a suite of tar files in lib/std/tar/testdata, but given recent events I would feel uncomfortable checking more compressed binary blobs into the repo 🙃. Plus, as far as I can tell, ZIP doesn't have an authoritative RFC or test suite like tar.

libzip has a regression testing suite. I think APPNOTE.txt is the spec from the original developer of the zip format, it does not come with a test suite though.

@joeltg
Copy link
Sponsor Contributor

joeltg commented Apr 18, 2024

Okay here's my current idea. My implementation has a small test suite that uses the system zip command (via std.ChildProcess) to create archives, extract them using zip.pipeToFileSystem, and then verify the result.

My understanding is that this type of testing is appropriate for https://github.com/ziglang/contrib-testing - so maybe I open a PR to create lib/std/zip.zig without tests, and then move zip_test.zig into its own repo and add it to ziglang/contrib-testing?

@nektro
Copy link
Contributor

nektro commented Apr 18, 2024

no, a std.zip implementation would need tests in the main repo that do not rely on any system tools to run. there's multiple existing suites with files and known contents that can be adapted for our use.

eg test blocks that reference files, read them in, and use std.testing to compare the resulting structures

@joeltg
Copy link
Sponsor Contributor

joeltg commented Apr 18, 2024

Okay happy to do that too, just wondered if there was a way to avoid checking more binary blobs into the repo :)

@marler8997
Copy link
Contributor

I'm using std.posix.mmap to read the ZIP file since extraction involves jumping around a lot. Is this ok / sufficiently portable for the package manager?

Here's an mmap that should work on posix and windows: https://github.com/marler8997/zigx/blob/master/MappedFile.zig

@andrewrk
Copy link
Member Author

I'm using std.posix.mmap to read the ZIP file since extraction involves jumping around a lot. Is this ok / sufficiently portable for the package manager?

No, mmap is problematic because instead of convenient error codes from I/O you get signals which are nearly impossible to handle correctly.

Mmap should be avoided for this use case.

@marler8997
Copy link
Contributor

I was thinking of taking a stab at this and wanted to clarify my understanding of the solutions. Does this breakdown look correct?

Solution 1: Process zip as a stream (front to back)

No. zip files cannot be interpreted correctly front-to-back, they must be done back to front starting from the central directory header/record (https://games.greggman.com/game/zip-rant/).

Solution 2: Read entire zip contents from reader into memory

No. A zip file could be too large to fit into memory. Let's write it to a file first instead, then we can load only the parts we need into memory as we go.

Solution 3: Use MMAP after writing it to disk

No. Nearly impossible to handle IO errors correctly

Solution 4: Read/Process zip file in chunks after writing it to disk using normal reads/writes

Yes

@marler8997
Copy link
Contributor

marler8997 commented Apr 21, 2024

Here's my version that avoids mmap based on Joel's implementation:

https://github.com/marler8997/zig-zip/blob/3bcdd558410ae0c78068e9d6635e177fcce4f206/src/zip.zig

I'll see if I can get a PR together for it.

@joeltg
Copy link
Sponsor Contributor

joeltg commented Apr 21, 2024

Nice! Although there's still more I need to work into my impl, at least Zig64 support and executable bit preservation

@marler8997
Copy link
Contributor

marler8997 commented Apr 21, 2024

FYI, I added a bufferedReader when passing the file stream to the inflate decompressor...and performance went from about 3 to 4 times slower than the mmap version to about the same performance. (i.e. decompressing zig-windows-x86_64-0.12.0.zip went from 29 seconds to 8 seconds on Windows and from 4.5 to 1.5 seconds on linux).

@marler8997
Copy link
Contributor

Here's the branch I'm currently working on: master...marler8997:zig:zip

marler8997 added a commit to marler8997/zig that referenced this issue Apr 22, 2024
fixes ziglang#17408

Co-authored-by: Joel Gustafson <joelg@mit.edu>
marler8997 added a commit to marler8997/zig that referenced this issue Apr 22, 2024
fixes ziglang#17408

Co-authored-by: Joel Gustafson <joelg@mit.edu>
marler8997 added a commit to marler8997/zig that referenced this issue Apr 22, 2024
fixes ziglang#17408

Co-authored-by: Joel Gustafson <joelg@mit.edu>
marler8997 added a commit to marler8997/zig that referenced this issue Apr 22, 2024
fixes ziglang#17408

Co-authored-by: Joel Gustafson <joelg@mit.edu>
marler8997 added a commit to marler8997/zig that referenced this issue Apr 22, 2024
fixes ziglang#17408

Co-authored-by: Joel Gustafson <joelg@mit.edu>
marler8997 added a commit to marler8997/zig that referenced this issue Apr 22, 2024
fixes ziglang#17408

Co-authored-by: Joel Gustafson <joelg@mit.edu>
marler8997 added a commit to marler8997/zig that referenced this issue Apr 22, 2024
fixes ziglang#17408

Co-authored-by: Joel Gustafson <joelg@mit.edu>
marler8997 added a commit to marler8997/zig that referenced this issue Apr 22, 2024
fixes ziglang#17408

Improved by helpful reviews from Josh Wolfe and Auguste Rame.

Co-authored-by: Joel Gustafson <joelg@mit.edu>
marler8997 added a commit to marler8997/zig that referenced this issue Apr 22, 2024
fixes ziglang#17408

Improved by helpful reviews from Josh Wolfe and Auguste Rame.

Co-authored-by: Joel Gustafson <joelg@mit.edu>
marler8997 added a commit to marler8997/zig that referenced this issue Apr 23, 2024
fixes ziglang#17408

Improved by helpful reviews from Josh Wolfe and Auguste Rame.

Co-authored-by: Joel Gustafson <joelg@mit.edu>
marler8997 added a commit to marler8997/zig that referenced this issue Apr 23, 2024
fixes ziglang#17408

Improved by helpful reviews from Josh Wolfe and Auguste Rame.

Co-authored-by: Joel Gustafson <joelg@mit.edu>
marler8997 added a commit to marler8997/zig that referenced this issue Apr 23, 2024
fixes ziglang#17408

Improved by helpful reviews from Josh Wolfe and Auguste Rame and Andrew
Kelley.

Co-authored-by: Joel Gustafson <joelg@mit.edu>
marler8997 added a commit to marler8997/zig that referenced this issue Apr 23, 2024
fixes ziglang#17408

Improved by helpful reviews from Josh Wolfe and Auguste Rame and Andrew
Kelley.

Co-authored-by: Joel Gustafson <joelg@mit.edu>
marler8997 added a commit to marler8997/zig that referenced this issue Apr 23, 2024
fixes ziglang#17408

Improved by helpful reviews from Josh Wolfe and Auguste Rame and Andrew
Kelley.

Co-authored-by: Joel Gustafson <joelg@mit.edu>
marler8997 added a commit to marler8997/zig that referenced this issue Apr 23, 2024
fixes ziglang#17408

Improved by helpful reviews from Josh Wolfe and Auguste Rame and Andrew
Kelley.

Co-authored-by: Joel Gustafson <joelg@mit.edu>
marler8997 added a commit to marler8997/zig that referenced this issue Apr 27, 2024
fixes ziglang#17408

Improved by helpful reviews from Josh Wolfe and Auguste Rame and Andrew
Kelley.

Co-authored-by: Joel Gustafson <joelg@mit.edu>
marler8997 added a commit to marler8997/zig that referenced this issue Apr 28, 2024
fixes ziglang#17408

Improved by helpful reviews from Josh Wolfe and Auguste Rame and Andrew
Kelley.

Co-authored-by: Joel Gustafson <joelg@mit.edu>
marler8997 added a commit to marler8997/zig that referenced this issue Apr 28, 2024
fixes ziglang#17408

Improved by helpful reviews from Josh Wolfe and Auguste Rame and Andrew
Kelley.

Co-authored-by: Joel Gustafson <joelg@mit.edu>
marler8997 added a commit to marler8997/zig that referenced this issue Apr 28, 2024
fixes ziglang#17408

Improved by helpful reviews from Josh Wolfe and Auguste Rame and Andrew
Kelley.

Co-authored-by: Joel Gustafson <joelg@mit.edu>
marler8997 added a commit to marler8997/zig that referenced this issue Apr 29, 2024
fixes ziglang#17408

Improved by helpful reviews from Josh Wolfe and Auguste Rame and Andrew
Kelley.

Co-authored-by: Joel Gustafson <joelg@mit.edu>
marler8997 added a commit to marler8997/zig that referenced this issue Apr 29, 2024
fixes ziglang#17408

Improved by helpful reviews from Josh Wolfe and Auguste Rame and Andrew
Kelley.

Co-authored-by: Joel Gustafson <joelg@mit.edu>
marler8997 added a commit to marler8997/zig that referenced this issue Apr 29, 2024
fixes ziglang#17408

Improved by helpful reviews from Josh Wolfe and Auguste Rame and Andrew
Kelley.

Co-authored-by: Joel Gustafson <joelg@mit.edu>
marler8997 added a commit to marler8997/zig that referenced this issue Apr 30, 2024
fixes ziglang#17408

Improved by helpful reviews from Josh Wolfe and Auguste Rame and Andrew
Kelley.

Co-authored-by: Joel Gustafson <joelg@mit.edu>
jacobly0 pushed a commit to jacobly0/zig that referenced this issue May 1, 2024
fixes ziglang#17408

Improved by helpful reviews from Josh Wolfe and Auguste Rame and Andrew
Kelley.

Co-authored-by: Joel Gustafson <joelg@mit.edu>
jacobly0 pushed a commit that referenced this issue May 1, 2024
fixes #17408

Improved by helpful reviews from Josh Wolfe and Auguste Rame and Andrew
Kelley.

Co-authored-by: Joel Gustafson <joelg@mit.edu>
marler8997 added a commit to marler8997/zig that referenced this issue May 2, 2024
fixes ziglang#17408

Helpful reviewers/testers include Joshe Wolfe, Auguste Rame, Andrew
Kelley and Jacob Young.

Co-authored-by: Joel Gustafson <joelg@mit.edu>
jacobly0 pushed a commit to jacobly0/zig that referenced this issue May 2, 2024
fixes ziglang#17408

Helpful reviewers/testers include Joshe Wolfe, Auguste Rame, Andrew
Kelley and Jacob Young.

Co-authored-by: Joel Gustafson <joelg@mit.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor friendly This issue is limited in scope and/or knowledge of Zig internals. enhancement Solving this issue will likely involve adding new logic or components to the codebase. zig build system
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants