Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inter-process race condition when writing builtin.zig file #14978

Closed
andrewrk opened this issue Mar 17, 2023 · 5 comments
Closed

inter-process race condition when writing builtin.zig file #14978

andrewrk opened this issue Mar 17, 2023 · 5 comments
Labels
bug Observed behavior contradicts documented or intended behavior contributor friendly This issue is limited in scope and/or knowledge of Zig internals. frontend Tokenization, parsing, AstGen, Sema, and Liveness.
Milestone

Comments

@andrewrk
Copy link
Member

Zig Version

0.11.0-dev.2150+68c7261e1

Steps to Reproduce and Observed Behavior

zig build test
2023-03-17T02:35:33.7612383Z run-translated-c division of floating literals build-exe: error: the following command failed with 1 compilation errors:
2023-03-17T02:35:33.7616740Z C:\actions-runner\_work\zig\zig\build-release\stage3-release\bin\zig.exe build-exe C:\actions-runner\_work\zig\zig\build-release\zig-local-cache\o\c00ed4aa0da85b3d6f53e1b3a4de5d8e\source.zig -lc --cache-dir C:\actions-runner\_work\zig\zig\build-release\zig-local-cache --global-cache-dir C:\actions-runner\_work\zig\zig\build-release\zig-global-cache --name translated_c -L C:\actions-runner\_work\zig\zig\..\zig+llvm+lld+clang-aarch64-windows-gnu-0.11.0-dev.670+f7fea080b\lib -I C:\actions-runner\_work\zig\zig\..\zig+llvm+lld+clang-aarch64-windows-gnu-0.11.0-dev.670+f7fea080b\include --zig-lib-dir C:\actions-runner\_work\zig\zig\lib --enable-cache --listen=- 
2023-03-17T02:41:34.5577258Z Build Summary: 4011/4162 steps succeeded; 147 skipped; 1 failed; 47994/50636 tests passed; 2642 skipped (disable with -fno-summary)
2023-03-17T02:41:34.6550307Z |  +- run-translated-c division of floating literals run transitive failure
2023-03-17T02:41:34.6550500Z |  |  +- run-translated-c division of floating literals build-exe 1 errors
2023-03-17T02:41:34.6550999Z |  |     +- run-translated-c division of floating literals translate-c success 69ms
2023-03-17T02:41:34.6551249Z |  |        +- WriteFile source.c success
2023-03-17T02:41:34.6653922Z error: unable to write builtin.zig to C:\actions-runner\_work\zig\zig\build-release\zig-local-cache\tmp\da8554bcb2aed65d: AccessDenied

Expected Behavior

Expected the tests to pass.

The relevant code is here:

zig/src/Module.zig

Lines 3985 to 4015 in 68c7261

if (builtin_pkg.root_src_directory.handle.statFile(builtin_pkg.root_src_path)) |stat| {
if (stat.size != file.source.len) {
const full_path = try builtin_pkg.root_src_directory.join(gpa, &.{
builtin_pkg.root_src_path,
});
defer gpa.free(full_path);
log.warn(
"the cached file '{s}' had the wrong size. Expected {d}, found {d}. " ++
"Overwriting with correct file contents now",
.{ full_path, file.source.len, stat.size },
);
try writeBuiltinFile(file, builtin_pkg);
} else {
file.stat = .{
.size = stat.size,
.inode = stat.inode,
.mtime = stat.mtime,
};
}
} else |err| switch (err) {
error.BadPathName => unreachable, // it's always "builtin.zig"
error.NameTooLong => unreachable, // it's always "builtin.zig"
error.PipeBusy => unreachable, // it's not a pipe
error.WouldBlock => unreachable, // not asking for non-blocking I/O
error.FileNotFound => try writeBuiltinFile(file, builtin_pkg),
else => |e| return e,
}

@andrewrk andrewrk added bug Observed behavior contradicts documented or intended behavior contributor friendly This issue is limited in scope and/or knowledge of Zig internals. frontend Tokenization, parsing, AstGen, Sema, and Liveness. labels Mar 17, 2023
@andrewrk andrewrk added this to the 0.11.0 milestone Mar 17, 2023
@jacobly0
Copy link
Member

jacobly0 commented May 14, 2023

This happens on windows if std.fs.AtomicFile.finish is called simultaneously to overwrite the same file. The error should reproduce by adding a sleep after NtSetInformationFile in std.os.renameatW and launching two processes using the std.fs.AtomicFile API.

diff --git a/lib/std/os.zig b/lib/std/os.zig
index 5554e6e31..634c53f6d 100644
--- a/lib/std/os.zig
+++ b/lib/std/os.zig
@@ -2697,6 +2697,8 @@ pub fn renameatW(
         .FileRenameInformation,
     );
 
+    std.time.sleep(std.time.ns_per_s);
+
     switch (rc) {
         .SUCCESS => return,
         .INVALID_HANDLE => unreachable,
const std = @import("std");
pub fn main() !void {
    var af = try std.fs.cwd().atomicFile("builtin.zig", .{});
    defer af.deinit();
    try af.finish();
}

@matu3ba
Copy link
Contributor

matu3ba commented May 17, 2023

I think the correct fix is to 1. use FILE_RENAME_POSIX_SEMANTICS as described in https://stackoverflow.com/a/51737582/9306292 and 2. as fallback before the Win10 update (like for deleting files with posix semantics) https://stackoverflow.com/a/57358387/9306292 SetFileInformationByHandle with FILE_RENAME_INFO.ReplaceIfExists.

Without supporting early Win10 versions, we can get away with the saner option 1.

Appendum: Alternative options in the comment section of 2.
Funny part: Accepted answer is wrong and was never updated.

@jacobly0
Copy link
Member

jacobly0 commented May 17, 2023

  1. is already status quo and is what causes the spurious error.

This specific issue is not caused by a lack of atomicity, but rather by an unanticipated (and badly documented) error condition.

@matu3ba
Copy link
Contributor

matu3ba commented May 17, 2023

Mhm, that's true. MS docs mentions "If FILE_RENAME_REPLACE_IF_EXISTS is also specified, allow replacing a file even if there are existing handles to it." https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/ns-ntifs-_file_rename_information

Not really sure, but the problem could be the same as in DeleteFile without posix semantics, where other process handles must be closed first or worse only in some (error) cases.

@jacobly0
Copy link
Member

jacobly0 commented May 17, 2023

Yes, if you look carefully at where I placed the sleep, it is in between NtSetInformationFile and CloseHandle. It has already been colloquially shown using my repro that adding FILE_RENAME_POSIX_SEMANTICS avoids the error. The error happens in the second process after a successful rename in the first process, but before the first process closes the original handle.

matu3ba added a commit to matu3ba/zig that referenced this issue May 19, 2023
matu3ba added a commit to matu3ba/zig that referenced this issue May 19, 2023
matu3ba added a commit to matu3ba/zig that referenced this issue May 19, 2023
@andrewrk andrewrk modified the milestones: 0.11.0, 0.11.1 Jul 22, 2023
matu3ba added a commit to matu3ba/zig that referenced this issue Aug 6, 2023
matu3ba added a commit to matu3ba/zig that referenced this issue Aug 21, 2023
matu3ba added a commit to matu3ba/zig that referenced this issue Aug 21, 2023
andrewrk pushed a commit that referenced this issue Aug 24, 2023
Mitigates #14978.

Uses the same strategy as in #16499 suggested by @squeek502
andrewrk added a commit that referenced this issue Dec 29, 2023
This issue already existed in master branch, however, the more
aggressive caching of builtin.zig in this branch made it happen more
often. I added doc comments to AtomicFile to explain when this problem
can occur.

For the compiler's use case, error.AccessDenied can be simply swallowed
because it means the destination file already exists and there is
nothing else to do besides proceed with the AtomicFile cleanup.

I never solved the mystery of why the log statements weren't printing
but those are temporary debugging instruments anyway, and I am already
too many yaks deep to whip out another razor.

closes #14978
andrewrk added a commit that referenced this issue Jan 1, 2024
This issue already existed in master branch, however, the more
aggressive caching of builtin.zig in this branch made it happen more
often. I added doc comments to AtomicFile to explain when this problem
can occur.

For the compiler's use case, error.AccessDenied can be simply swallowed
because it means the destination file already exists and there is
nothing else to do besides proceed with the AtomicFile cleanup.

I never solved the mystery of why the log statements weren't printing
but those are temporary debugging instruments anyway, and I am already
too many yaks deep to whip out another razor.

closes #14978
andrewrk added a commit that referenced this issue Jan 2, 2024
This issue already existed in master branch, however, the more
aggressive caching of builtin.zig in this branch made it happen more
often. I added doc comments to AtomicFile to explain when this problem
can occur.

For the compiler's use case, error.AccessDenied can be simply swallowed
because it means the destination file already exists and there is
nothing else to do besides proceed with the AtomicFile cleanup.

I never solved the mystery of why the log statements weren't printing
but those are temporary debugging instruments anyway, and I am already
too many yaks deep to whip out another razor.

closes #14978
andrewrk added a commit that referenced this issue Jan 2, 2024
This issue already existed in master branch, however, the more
aggressive caching of builtin.zig in this branch made it happen more
often. I added doc comments to AtomicFile to explain when this problem
can occur.

For the compiler's use case, error.AccessDenied can be simply swallowed
because it means the destination file already exists and there is
nothing else to do besides proceed with the AtomicFile cleanup.

I never solved the mystery of why the log statements weren't printing
but those are temporary debugging instruments anyway, and I am already
too many yaks deep to whip out another razor.

closes #14978
andrewrk added a commit that referenced this issue Jan 2, 2024
This issue already existed in master branch, however, the more
aggressive caching of builtin.zig in this branch made it happen more
often. I added doc comments to AtomicFile to explain when this problem
can occur.

For the compiler's use case, error.AccessDenied can be simply swallowed
because it means the destination file already exists and there is
nothing else to do besides proceed with the AtomicFile cleanup.

I never solved the mystery of why the log statements weren't printing
but those are temporary debugging instruments anyway, and I am already
too many yaks deep to whip out another razor.

closes #14978
@andrewrk andrewrk removed this from the 0.11.1 milestone Jan 2, 2024
@andrewrk andrewrk added this to the 0.12.0 milestone Jan 2, 2024
scheibo added a commit to pkmn/engine that referenced this issue Jan 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior contributor friendly This issue is limited in scope and/or knowledge of Zig internals. frontend Tokenization, parsing, AstGen, Sema, and Liveness.
Projects
None yet
Development

No branches or pull requests

3 participants