Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

package manager: filter unpack errors on paths excluded by manifest #19500

Merged
merged 18 commits into from
Apr 9, 2024

Conversation

ianic
Copy link
Sponsor Contributor

@ianic ianic commented Mar 31, 2024

Fixes #18089, checks the remaining 2 items, and also fixes #17460.

The first item was to create diagnostic errors in git unpack, instead of overwriting files with the same name on case insensitive file system. Implemented here.

The second one was filtering tar/git unpack errors if they are on path not included by manifest (build.zig.zon). For that, I created UnpackResult which collects errors from tar\git diagnostic, allows filtering them by included paths and emits remaining errors to an ErrorBundle.

Added tar tests where I create a tarball containing two files with the same name, simulating case insensitive file system, and then call Fetch.run with and without manifest which excludes those files. Can be run with zig test src/Package.zig.

I used Ian's collection of pathological packages and his nice trick to simulate case insensitive, no symlink file system by mounting FAT32 file. I made a test around that. It depends on access to github repository, and local FAT32 mount, but it covers a lot of cases both for tar and git packages. Should I remove it because of all these dependencies? (it will be skipped if FAT32 file system is not found).

I also tested on Windows and macOS with different packages and the results are the same as described in previous try.

@jacobly0 jacobly0 changed the title pacakge manager: filter unpack errors on paths excluded by manifest package manager: filter unpack errors on paths excluded by manifest Apr 1, 2024
@andrewrk
Copy link
Member

andrewrk commented Apr 2, 2024

Thanks for doing this other approach. I think it's a good sign that @ianprime0509's feedback from #19324 no longer applies.

I used Ian's collection of pathological packages and his nice trick to simulate case insensitive, no symlink file system by mounting FAT32 file. I made a test around that. It depends on access to github repository, and local FAT32 mount, but it covers a lot of cases both for tar and git packages. Should I remove it because of all these dependencies? (it will be skipped if FAT32 file system is not found).

Nice work creating these test cases. The dependencies are indeed a problem for upstreaming them. I think it would be best to maintain this test suite separately. I have done something similar with https://github.com/ziglang/contrib-testing - perhaps that could be a place for these extra tests to live?

@ianic
Copy link
Sponsor Contributor Author

ianic commented Apr 3, 2024

I removed that complicated test from Fetch.zig, moving it to the zig-fetch-test.
There I symlinked zig repo so I can run that test on each change in the source without building zig binary. So I still get fast response during development.
There I also compare package manager output of locally build zig binary and release build on a few real word packages.

@ianic
Copy link
Sponsor Contributor Author

ianic commented Apr 4, 2024

Rebasing to the latest changes in package manager.
Add unit test to cover this PR and the previous one #19111.
Update 'integration tests' project.

unpackResource now returns UnpackResult structure ( root_dir is now part of that structure) instead of cryptic ?[]const u8.

ianic added a commit to ianic/zig that referenced this pull request Apr 5, 2024
Based on file content. Detects elf magic header or shebang line.

Executable bit is ignored in hash calculation, as it was before this. So
packages hashes are not changed.

Reference:
ziglang#17463 (comment)

Fixes: 17463

Test is here:
https://github.com/ianic/zig-fetch-test/blob/7c4600d7bb263f9b72fe3d0b70071f42be89e25c/src/main.zig#L307
(if ziglang#19500 got accepted I'll move this test to the Fetch.zig)
andrewrk pushed a commit that referenced this pull request Apr 6, 2024
Based on file content. Detects elf magic header or shebang line.

Executable bit is ignored in hash calculation, as it was before this. So
packages hashes are not changed.

Reference:
#17463 (comment)

Fixes: 17463

Test is here:
https://github.com/ianic/zig-fetch-test/blob/7c4600d7bb263f9b72fe3d0b70071f42be89e25c/src/main.zig#L307
(if #19500 got accepted I'll move this test to the Fetch.zig)
@ianic
Copy link
Sponsor Contributor Author

ianic commented Apr 8, 2024

To summarize changes in this PR:

lib/std/tar.zig

Type of root_dir in Diagnostic changed from nullable to non-nullable with default empty string value.
The idea was to avoid null checking on each place where this is used and make consistent with Cache.Path.sub_path.
Change is covered by test.

src/Package/Fetch/git.zig

Add unable_to_create_file to the diagnostic, like it is in tar diagnostic.

src/Fetch.zig

unpackTarball and unpackGitPack returns now UnpackResult. They fill UnpackResult with diagnostic errors instead of raising them. Those errors are then filtered for excluded paths in runResource when we have manifest. If there are remaining errors after filtering they are then raised with UnpackResult.bundleErrors.

The rest of changes in Fetch.zig are added UnpackResult structure and tests.

Test case files are in Fetch/testdata folder. Test embeds one test file, creates temporary directory, saves embedded file to the temporary directory, initializes Fetch with TestFetchBuilder, runs Fetch on the saved file (cache directory is also positioned in the temporary directory), analyzes cache directory output or fetch errors.

src/Package.zig

Expose Fetch test so we can test it with zig test src/Package.zig. Fetch imports "../Package.zig" making it not testable with zig test Fetch.zig.

Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do my suggestions make sense?

const pkg_dir = try unpackResource(f, resource, uri_path, tmp_directory);
// Fetch and unpack a resource into a temporary directory.
var unpack_result = try unpackResource(f, resource, uri_path, tmp_directory);
defer unpack_result.deinit();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the arena allocator for this data so that this deinit() is not needed. The general-purpose allocator causes contention between threads, whereas each thread has its own arena allocator. It also simplifies memory management to use the arena allocator.

Comment on lines 490 to 494
try unpack_result.filterErrors(filter);
if (unpack_result.hasErrors()) {
try unpack_result.bundleErrors(eb, try f.srcLoc(f.location_tok));
return error.FetchFailed;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
try unpack_result.filterErrors(filter);
if (unpack_result.hasErrors()) {
try unpack_result.bundleErrors(eb, try f.srcLoc(f.location_tok));
return error.FetchFailed;
}
try unpack_result.validate(eb, f.location_tok, filter);

Request to combine all that logic into a validate function.

Comment on lines 1744 to 1745
allocator: std.mem.Allocator,
errors: std.ArrayListUnmanaged(Error) = .{},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
allocator: std.mem.Allocator,
errors: std.ArrayListUnmanaged(Error) = .{},
errors: []Error,

Comment on lines 1776 to 1789
fn free(self: Error, allocator: std.mem.Allocator) void {
switch (self) {
.unable_to_create_sym_link => |info| {
allocator.free(info.file_name);
allocator.free(info.link_name);
},
.unable_to_create_file => |info| {
allocator.free(info.file_name);
},
.unsupported_file_type => |info| {
allocator.free(info.file_name);
},
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn free(self: Error, allocator: std.mem.Allocator) void {
switch (self) {
.unable_to_create_sym_link => |info| {
allocator.free(info.file_name);
allocator.free(info.link_name);
},
.unable_to_create_file => |info| {
allocator.free(info.file_name);
},
.unsupported_file_type => |info| {
allocator.free(info.file_name);
},
}
}

Comment on lines 1796 to 1771
fn deinit(self: *UnpackResult) void {
for (self.errors.items) |item| {
item.free(self.allocator);
}
self.errors.deinit(self.allocator);
self.allocator.free(self.root_error_message);
self.allocator.free(self.root_dir);
self.* = undefined;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn deinit(self: *UnpackResult) void {
for (self.errors.items) |item| {
item.free(self.allocator);
}
self.errors.deinit(self.allocator);
self.allocator.free(self.root_error_message);
self.allocator.free(self.root_dir);
self.* = undefined;
}

ianic added 18 commits April 9, 2024 15:00
Test that UnpackResult prints same error output as diagnostic.
Test that we are still outputing same errors.
Report only errors which are not filtered by paths in build.zig.zon.
On case insensitive file systems, don't overwrite files with same name
in different casing. Add diagnostic error so caller could decide what to do.
Using test cases from:
https://github.com/ianprime0509/pathological-packages repository.
Depends on existence of the FAT32 file system. Folder is in FAT32 file
system because it is case insensitive and and does not support symlinks.

It is complicated test case requires internet connection, depends on
existence of FAT32 in the specific location. But it is so valuable for
development. Running `zig test Package.zig` is so much faster than
building zig binary and running `zig fetch URL`. Committing it here
although it should probably be removed.
Should include folder structure, at least root folder so it can be found
in pipeToFileSystem.
To be consistent with paths in manifest.
Make it consistent with Cache.Path sub_path.
Remove null check in many locations.
Prepare test cases, store them in Fetch/testdata.
They cover changes in this PR as well from previous one ziglang#19111.
Use stripRoot in less places. Strip it while copying error from
diagnostic to unpack result so other palaces can be free of this logic.
Reference:
ziglang#19500 (comment)

Arena is now used for Diagnostic (tar and git). `deinit` is not called on Diagnostic
allowing us to reference strings from Diagnostic in UnpackResult without
dupe.

That seamed reasonable to me. Instead of using gpa for Diagnostic, and
then dupe to arena. Or using arena for both and making dupe so we can deinit
Diagnostic.
@ianic
Copy link
Sponsor Contributor Author

ianic commented Apr 9, 2024

I switch to arena both UnpackResult and Diagnostic (tar and git). deinit is not called on Diagnostic
allowing us to reference strings from Diagnostic in UnpackResult without dupe.

That seemed reasonable to me. Instead of using gpa for Diagnostic, and then dupe to arena. Or using arena for both and making dupe so we can deinit Diagnostic.

@ianic ianic requested a review from andrewrk April 9, 2024 19:02
Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Thanks for all the follow-ups.

Comment on lines +1802 to +1866
fn validate(self: *UnpackResult, f: *Fetch, filter: Filter) !void {
self.filterErrors(filter);
if (self.hasErrors()) {
const eb = &f.error_bundle;
try self.bundleErrors(eb, try f.srcLoc(f.location_tok));
return error.FetchFailed;
}
}

// Filter errors by manifest inclusion rules.
fn filterErrors(self: *UnpackResult, filter: Filter) void {
var i = self.errors_count;
while (i > 0) {
i -= 1;
if (self.errors[i].excluded(filter)) {
self.errors_count -= 1;
const tmp = self.errors[i];
self.errors[i] = self.errors[self.errors_count];
self.errors[self.errors_count] = tmp;
}
}
}

// Emmit errors to an `ErrorBundle`.
fn bundleErrors(
self: *UnpackResult,
eb: *ErrorBundle.Wip,
src_loc: ErrorBundle.SourceLocationIndex,
) !void {
if (self.errors_count == 0 and self.root_error_message.len == 0)
return;

const notes_len: u32 = @intCast(self.errors_count);
try eb.addRootErrorMessage(.{
.msg = try eb.addString(self.root_error_message),
.src_loc = src_loc,
.notes_len = notes_len,
});
const notes_start = try eb.reserveNotes(notes_len);
for (self.errors, notes_start..) |item, note_i| {
switch (item) {
.unable_to_create_sym_link => |info| {
eb.extra.items[note_i] = @intFromEnum(try eb.addErrorMessage(.{
.msg = try eb.printString("unable to create symlink from '{s}' to '{s}': {s}", .{
info.file_name, info.link_name, @errorName(info.code),
}),
}));
},
.unable_to_create_file => |info| {
eb.extra.items[note_i] = @intFromEnum(try eb.addErrorMessage(.{
.msg = try eb.printString("unable to create file '{s}': {s}", .{
info.file_name, @errorName(info.code),
}),
}));
},
.unsupported_file_type => |info| {
eb.extra.items[note_i] = @intFromEnum(try eb.addErrorMessage(.{
.msg = try eb.printString("file '{s}' has unsupported type '{c}'", .{
info.file_name, info.file_type,
}),
}));
},
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that all of this would then be combined into one function now, eliminating the need to mutate the errors array in filterErrors. However, I think this is good enough to merge, so I'll leave you with this idea to implement if you like it.

@andrewrk andrewrk merged commit 215de3e into ziglang:master Apr 9, 2024
10 checks passed
ianic added a commit to ianic/zig that referenced this pull request Apr 11, 2024
andrewrk pushed a commit that referenced this pull request Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants