Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux system calls may return unreachable or unexpected error codes #10776

Open
Mathnerd314 opened this issue Feb 3, 2022 · 15 comments
Open
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. os-linux proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@Mathnerd314
Copy link

Zig Version

0.9.0

Steps to Reproduce

For this to work you must have a Linux kernel with CONFIG_BPF_KPROBE_OVERRIDE=y. Apparently this is the default on Arch Linux, but for NixOS I had to add:

  boot.kernelPatches = [ {
    name = "kprobe-bpf-override-config";
    patch = null;
    extraConfig = ''
      FUNCTION_ERROR_INJECTION y
      BPF_KPROBE_OVERRIDE y
      '';
    } ];

and build a kernel.
Once that is done you can run:

git clone --recursive https://github.com/trailofbits/ebpfault
cd ebpfault/libraries/ebpf-common/src
git fetch https://github.com/trailofbits/ebpf-common.git
git cherry-pick b20da60eec9c5bd051c5570950e72a34b3244c30 f75a900af068a2751c85c1332dade3167114400f
cd ../../..
nix-shell -E 'with import <nixpkgs> { }; llvmPackages_8.stdenv.mkDerivation { name = "ebpfault"; buildInputs = [ cmake llvmPackages_8.llvm zig ]; cmakeFlags = ["-DEBPFAULT_ENABLE_INSTALL:BOOL=true" "-DEBPF_COMMON_ENABLE_TESTS:BOOL=true" "-DEBPF_COMMON_ENABLE_SANITIZERS:BOOL=false" "-DCMAKE_BUILD_TYPE:STRING=RelWithDebInfo"]; }'

Then in the shell:

cmakeConfigurePhase
buildPhase
cat > efault.json <<EOF
{"fault_injectors": [
    {
      "syscall_name": "chdir",
      "error_list": [
        {
          "exit_code": "-EFAULT",
          "probability": 100
        }
      ]
    }
  ]
}
EOF
cat > einval.json <<EOF
{"fault_injectors": [
    {
      "syscall_name": "chdir",
      "error_list": [
        {
          "exit_code": "-EINVAL",
          "probability": 100
        }
      ]
    }
  ]
}
EOF
cat > chdir.zig <<EOF
const std = @import("std");

pub fn main() !void {
    try std.os.chdir(".");
}
EOF
zig build-exe chdir.zig
sudo ./ebpfault --config efault.json --exec ./chdir
sudo ./ebpfault --config einval.json --exec ./chdir

This is only the easiest way to get rare kernel errors to show up. As Matthew Wilcox writes, there are 70+ filesystems and also hooks such as Linux Security Modules. These can all return their own error codes under the correct conditions. Simply do git grep -ho '\-E[A-Z]*' fs or git grep -ho '\-E[A-Z]*' security in a kernel git repo to see the wide variety of error codes - nearly all error codes are in use. Furthermore a new release of the kernel may return new error codes. Figuring out the actual set of error codes each system call may return is a "Sisyphean" task that is best left unattempted.

Expected Behavior

I expect Zig to allow writing robust programs, so crashing with a panic is definitely not what I expected. I should be able to see the exact error code such as EINVAL or EFAULT when it comes up in practice, in the stack trace - the message for Unexpected doesn't count because it only shows up in debug mode. I should be able to handle the error code in a way appropriate to the application, without modifying Zig's standard library.

Actual Behavior

On running sudo ./ebpfault --config efault.json --exec ./chdir:

timestamp: 4844456200127 syscall: chdir process_id: 11026 thread_id: 11026 injected_error: -EFAULT
       r15 0000000000000000        r14 0000000000000000        r13 0000000000000000 
       r12 ffffa821812e7f58        rbp ffffa821812e7f48        rbx 0000000000000000 
       r11 0000000000000000        r10 0000000000000000         r9 0000000000000000 
        r8 0000000000000000        rax ffffffffb62d3040        rcx 0000000000000000 
       rdx ffffffffffffffff        rsi 0000000000000050        rdi ffffa821812e7f58 
  orig_rax 0000000000000000        rip ffffffffb62d3041         cs 0000000000000010 
    eflags 0000000000000206        rsp ffffa821812e7f38         ss 0000000000000018 
thread 11026 panic: reached unreachable code

/nix/store/xnvnk2dpcig6xcx55xd8bvwdznsd0h9h-zig-0.9.0/lib/zig/std/os.zig:2625:19: 0x2349ee in std.os.chdirZ (chdir)
        .FAULT => unreachable,
                  ^
/nix/store/xnvnk2dpcig6xcx55xd8bvwdznsd0h9h-zig-0.9.0/lib/zig/std/os.zig:2610:22: 0x234916 in std.os.chdir (chdir)
        return chdirZ(&dir_path_c);
                     ^
ebpfault/build/chdir.zig:4:21: 0x22d19a in main (chdir)
    try std.os.chdir(".");
                    ^
/nix/store/xnvnk2dpcig6xcx55xd8bvwdznsd0h9h-zig-0.9.0/lib/zig/std/start.zig:553:37: 0x225e4a in std.start.callMain (chdir)
            const result = root.main() catch |err| {
                                    ^
/nix/store/xnvnk2dpcig6xcx55xd8bvwdznsd0h9h-zig-0.9.0/lib/zig/std/start.zig:495:12: 0x2071fe in std.start.callMainWithArgs (chdir)
    return @call(.{ .modifier = .always_inline }, callMain, .{});
           ^
/nix/store/xnvnk2dpcig6xcx55xd8bvwdznsd0h9h-zig-0.9.0/lib/zig/std/start.zig:409:17: 0x206296 in std.start.posixCallMainAndExit (chdir)
    std.os.exit(@call(.{ .modifier = .always_inline }, callMainWithArgs, .{ argc, argv, envp }));
                ^
/nix/store/xnvnk2dpcig6xcx55xd8bvwdznsd0h9h-zig-0.9.0/lib/zig/std/start.zig:322:5: 0x2060a2 in std.start._start (chdir)
    @call(.{ .modifier = .never_inline }, posixCallMainAndExit, .{});
    ^

On running sudo ./ebpfault --config einval.json --exec ./chdir:

timestamp: 4918261533081 syscall: chdir process_id: 11058 thread_id: 11058 injected_error: -EINVAL
       r15 0000000000000000        r14 0000000000000000        r13 0000000000000000 
       r12 ffffa8218127ff58        rbp ffffa8218127ff48        rbx 0000000000000000 
       r11 0000000000000000        r10 0000000000000000         r9 0000000000000000 
        r8 0000000000000000        rax ffffffffb62d3040        rcx 0000000000000000 
       rdx ffffffffffffffff        rsi 0000000000000050        rdi ffffa8218127ff58 
  orig_rax 0000000000000000        rip ffffffffb62d3041         cs 0000000000000010 
    eflags 0000000000000206        rsp ffffa8218127ff38         ss 0000000000000018 

unexpected errno: 22
/nix/store/xnvnk2dpcig6xcx55xd8bvwdznsd0h9h-zig-0.9.0/lib/zig/std/debug.zig:458:19: 0x208048 in std.debug.writeCurrentStackTrace (chdir)
    while (it.next()) |return_address| {
                  ^
/nix/store/xnvnk2dpcig6xcx55xd8bvwdznsd0h9h-zig-0.9.0/lib/zig/std/debug.zig:115:31: 0x206aca in std.debug.dumpCurrentStackTrace (chdir)
        writeCurrentStackTrace(stderr, debug_info, detectTTYConfig(), start_addr) catch |err| {
                              ^
/nix/store/xnvnk2dpcig6xcx55xd8bvwdznsd0h9h-zig-0.9.0/lib/zig/std/os.zig:4985:40: 0x209682 in std.os.unexpectedErrno (chdir)
        std.debug.dumpCurrentStackTrace(null);
                                       ^
/nix/store/xnvnk2dpcig6xcx55xd8bvwdznsd0h9h-zig-0.9.0/lib/zig/std/os.zig:2632:45: 0x23499b in std.os.chdirZ (chdir)
        else => |err| return unexpectedErrno(err),
                                            ^
/nix/store/xnvnk2dpcig6xcx55xd8bvwdznsd0h9h-zig-0.9.0/lib/zig/std/os.zig:2610:22: 0x234916 in std.os.chdir (chdir)
        return chdirZ(&dir_path_c);
                     ^
ebpfault/build/chdir.zig:4:21: 0x22d19a in main (chdir)
    try std.os.chdir(".");
                    ^
/nix/store/xnvnk2dpcig6xcx55xd8bvwdznsd0h9h-zig-0.9.0/lib/zig/std/start.zig:553:37: 0x225e4a in std.start.callMain (chdir)
            const result = root.main() catch |err| {
                                    ^
/nix/store/xnvnk2dpcig6xcx55xd8bvwdznsd0h9h-zig-0.9.0/lib/zig/std/start.zig:495:12: 0x2071fe in std.start.callMainWithArgs (chdir)
    return @call(.{ .modifier = .always_inline }, callMain, .{});
           ^
/nix/store/xnvnk2dpcig6xcx55xd8bvwdznsd0h9h-zig-0.9.0/lib/zig/std/start.zig:409:17: 0x206296 in std.start.posixCallMainAndExit (chdir)
    std.os.exit(@call(.{ .modifier = .always_inline }, callMainWithArgs, .{ argc, argv, envp }));
                ^
/nix/store/xnvnk2dpcig6xcx55xd8bvwdznsd0h9h-zig-0.9.0/lib/zig/std/start.zig:322:5: 0x2060a2 in std.start._start (chdir)
    @call(.{ .modifier = .never_inline }, posixCallMainAndExit, .{});
    ^
error: Unexpected
/nix/store/xnvnk2dpcig6xcx55xd8bvwdznsd0h9h-zig-0.9.0/lib/zig/std/os.zig:4987:5: 0x209691 in std.os.unexpectedErrno (chdir)
    return error.Unexpected;
    ^
/nix/store/xnvnk2dpcig6xcx55xd8bvwdznsd0h9h-zig-0.9.0/lib/zig/std/os.zig:2632:23: 0x2349a8 in std.os.chdirZ (chdir)
        else => |err| return unexpectedErrno(err),
                      ^
/nix/store/xnvnk2dpcig6xcx55xd8bvwdznsd0h9h-zig-0.9.0/lib/zig/std/os.zig:2610:9: 0x23492c in std.os.chdir (chdir)
        return chdirZ(&dir_path_c);
        ^
ebpfault/build/chdir.zig:4:5: 0x22d1b9 in main (chdir)
    try std.os.chdir(".");
    ^
@Mathnerd314 Mathnerd314 added the bug Observed behavior contradicts documented or intended behavior label Feb 3, 2022
@matu3ba
Copy link
Contributor

matu3ba commented Feb 9, 2022

Can you attach the used Zig code as well? https://github.com/trailofbits/ebpfault does not have Zig code and chdir is probably based on some Zig code.

@Mathnerd314
Copy link
Author

Not sure what you mean. I cooked up chdir.zig based on Hello World. Using Hello World directly and hooking the write call didn't work well because the error message writes were hooked too and the panics didn't output anything. I think there was an infinite loop, panic during a panic during a panic etc. It's easier to see what's going on using a different syscall.

@matu3ba
Copy link
Contributor

matu3ba commented Feb 11, 2022

I think there was an infinite loop, panic during a panic during a panic etc

Can you run the program with strace and show the output behind <details> ``` strace output ``` </details> ?
strace should be able to capture at least the initial signal and might give a hint.

@Mathnerd314
Copy link
Author

Mathnerd314 commented Feb 11, 2022

It's a bit tricky to strace simultaneously with ebpfault - you have to start the process and then attach to it from both of them. But it's possible. Here is the panic-during-panic behavior with strace:

$ cat hello.zig
const std = @import("std");

pub fn main() !void {
    const stdin = std.io.getStdIn();
    var line_buf: [20]u8 = undefined;
    _ = try stdin.read(&line_buf);
    const stdout = std.io.getStdOut().writer();
    try stdout.print("Hello, {s}!\n", .{"world"});
}
$ zig build-exe hello.zig
$ ./hello &
[1] 83419
[1]+  Stopped                 ./hello
$ sudo ./ebpfault --config write.json -p 83419 &
[2] 83427
Generating fault injectors...
 > write
   Error list:
   - 100% => -EINVAL
$ strace -p 83419 &
[3] 83443
strace: Process 83419 attached
--- stopped by SIGTTIN ---
$ fg 1
./hello
--- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=82295, si_uid=1000} ---
read(0, 1
"1\n", 20)                      = 2
write(1, "Hello, ", 7timestamp: 288916424653781 syscall: write process_id: 83419 thread_id: 83419 injected_error: -EINVAL
       r15 0000000000000000        r14 0000000000000000        r13 0000000000000000 
       r12 ffffbf2fc294bf58        rbp ffffbf2fc294bf48        rbx 0000000000000000 
       r11 0000000000000000        r10 0000000000000000         r9 0000000000000000 
        r8 0000000000000001        rax ffffffffa98d8b70        rcx 0000000000000000 
       rdx ffffffffffffffff        rsi ffffffffaa695bd9        rdi ffffbf2fc294bf58 
  orig_rax 0000000000000000        rip ffffffffa98d8b71         cs 0000000000000010 
    eflags 0000000000000202        rsp ffffbf2fc294bf38         ss 0000000000000018 

)                  = -1 EINVAL (Invalid argument)
rt_sigaction(SIGSEGV, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x205900}, NULL, 8) = 0
rt_sigaction(SIGILL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x205900}, NULL, 8) = 0
rt_sigaction(SIGBUS, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x205900}, NULL, 8) = 0
gettid()                                = 83419
write(2, "thread ", 7)                  = -1 EINVAL (Invalid argument)
timestamp: 288916425190934 syscall: write process_id: 83419 thread_id: 83419 injected_error: -EINVAL
       r15 0000000000000000        r14 0000000000000000        r13 0000000000000000 
rt_sigaction(SIGSEGV, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x205900},        r12 ffffbf2fc294bf58        rbp ffffbf2fc294bf48        rbx 0000000000000000 
       r11 0000000000000003        r10 0000000000000000         r9 0000000000000000 
        r8 0000000000000001        rax ffffffffa98d8b70        rcx 0000000000000000 
       rdx ffffffffffffffff        rsi ffffffffaa695bd9        rdi ffffbf2fc294bf58 
  orig_rax 0000000000000000        rip ffffffffa98d8b71         cs 0000000000000010 
    eflags 0000000000000202        rsp ffffbf2fc294bf38         ss 0000000000000018 

NULL, 8) = 0
rt_sigaction(SIGILL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x205900}, NULL, 8) = 0
rt_sigaction(SIGBUS, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x205900}, NULL, 8) = 0
write(2, "Panicked during a panic. Abortin"..., 35) = -1 EINVAL (Invalid argument)
timestamp: 288916425740521 syscall: write process_id: 83419 thread_id: 83419 injected_error: -EINVAL
       r15 0000000000000000        r14 0000000000000000        r13 0000000000000000 
       r12 ffffbf2fc294bf58        rbp ffffbf2fc294bf48        rbx 0000000000000000 
rt_sigaction(SIGSEGV, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x205900},        r11 0000000000000000        r10 0000000000000000         r9 0000000000000000 
        r8 0000000000000001        rax ffffffffa98d8b70        rcx 0000000000000000 
       rdx ffffffffffffffff        rsi ffffffffaa695bd9        rdi ffffbf2fc294bf58 
  orig_rax 0000000000000000        rip ffffffffa98d8b71         cs 0000000000000010 
    eflags 0000000000000202        rsp ffffbf2fc294bf38         ss 0000000000000018 

NULL, 8) = 0
rt_sigaction(SIGILL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x205900}, NULL, 8) = 0
rt_sigaction(SIGBUS, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x205900}, NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[HUP INT RT_32], [], 8) = 0
gettid()                                = 83419
tkill(83419, SIGABRT)                   = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=83419, si_uid=1000} ---
+++ killed by SIGABRT (core dumped) +++
Aborted (core dumped)
[3]+  Done                    strace -p 83419
All processes have been terminated
Exiting...
[2]+  Done                    sudo ./ebpfault --config write.json -p 83419

So here with strace the behavior is consistent with the chdir one. But when you try to run hello directly under ebpfault with sudo ./ebpfault --config write.json --exec ./hello (without strace) it just hangs. Not sure why. This is the infinite loop I referred to (although it doesn't give 100% CPU usage or anything like that).

@Vexu Vexu added standard library This issue involves writing Zig code for the standard library. os-linux labels Apr 28, 2022
@Vexu Vexu added this to the 0.12.0 milestone Apr 28, 2022
@Mathnerd314
Copy link
Author

#11178 is an example of how this happens "in the wild" as opposed to with fault injection.

@matu3ba
Copy link
Contributor

matu3ba commented Jun 9, 2022

it just hangs. Not sure why.

I had a similar experience with stage1, when it was unable to come up with a stack trace and panicked in between.
Though this should show up in strace log somewhere with a reoccurring pattern, if the log works as expected.

I dont understand how the tool handles file stream redirects though (especially the standard streams stdin,stdout and stderr).

@matu3ba
Copy link
Contributor

matu3ba commented Jun 9, 2022

#11178 is an example of how this happens "in the wild" as opposed to with fault injection.

Maybe call it use case? Ideally this would be fixed upstream by the Kernel having a proper machine readable interface.

@Mathnerd314
Copy link
Author

Mathnerd314 commented Jun 10, 2022

Ideally this would be fixed upstream by the Kernel having a proper machine readable interface.

The interface is in fact really simple: any syscall can return any error code. This can be proven by modifying my steps-to-reproduce to the syscall and error code of your choice. Even if you ignore this case, Matthew Wilcox's message confirms that the kernel reserves the right to use whatever error codes they want, and they will never maintain such an interface. The only way to produce a more specific interface without playing error code whack-a-mole would be an automated analysis of the kernel code. But such an analysis would still give 50+ error codes for each syscall, so it is arguable if it would help.

@andrewrk
Copy link
Member

andrewrk commented Jun 10, 2022

Most calls look something like this:

        switch (system.getErrno(res)) {
            .SUCCESS => return,
            .FAULT => unreachable,
            .INVAL => unreachable,
            .ACCES => return error.AccessDenied,
            .NOMEM => return error.SystemResources,
            .NOTDIR => return error.FileNotFound,
            .PERM => return error.AccessDenied,
            else => |err| return unexpectedErrno(err),
        }

The only two codes that are handled with unreachable are:

  • Invalid pointer passed to the kernel
  • Syscall API used incorrectly

Unexpected error codes are handled by returning error.Unexpected, so that applications can handle even unexpected errors. Note that even in release builds, error.Unexpected is returned for unexpected error codes. The message in debug builds is only a hint that we should add a new switch case item to the standard library. And yes I am aware that for some syscalls, EINVAL is used to indicate a problem executing the syscall rather than incorrect API usage - we evaluate this on a case-by-case basis.

You could patch the Linux kernel to do anything, such as returning EINVAL for exit. However this would understandably cause undefined behavior in applications.

So, yes, the Linux kernel is assumed by the standard library to have certain kinds of behavior. If you want to reach into a lower level of abstraction, you can use std.os.linux directly and handle the usize that the kernel returns.

I expect Zig to allow writing robust programs, so crashing with a panic is definitely not what I expected. I should be able to see the exact error code such as EINVAL or EFAULT when it comes up in practice, in the stack trace - the message for Unexpected doesn't count because it only shows up in debug mode. I should be able to handle the error code in a way appropriate to the application, without modifying Zig's standard library.

Generally, zig does allow writing robust programs, and robust programs should expect the kernel to never return EFAULT from chdir. Crashing in this case could be argued to be more robust, in fact, since proceeding with an unpredictable kernel may have unintended consequences. Or presumably, we have discovered that our pointer is dangling, which is dangerous and panicking is the right move here.

If you have issues with how a specific syscall is wrapped, then you are welcome to file a bug report for that function and we can discuss it.

@andrewrk andrewrk closed this as not planned Won't fix, can't repro, duplicate, stale Jun 10, 2022
@andrewrk andrewrk removed the bug Observed behavior contradicts documented or intended behavior label Jun 10, 2022
@andrewrk andrewrk modified the milestones: 0.12.0, 0.10.0 Jun 10, 2022
@Mathnerd314
Copy link
Author

Unexpected error codes are handled by returning error.Unexpected, so that applications can handle even unexpected errors.

With this method there is no way to get the original error code. Meanwhile C's errno and _doserrno provide a cross-platform way to get the full error code.

You could patch the Linux kernel to do anything, such as returning EINVAL for exit. However this would understandably cause undefined behavior in applications.

One practical case this would happen is where a new kernel is released but a stable branch of Zig is in use. The current std.os design does not allow handling a new error code in the application, so the standard library would have to be patched to add the error code. This ad-hoc patching is fragile and tedious.

I'd like to be able to handle unexpected error codes specifically in the application. The behavior isn't undefined if you're expecting it to happen. But you can't handle the errors with std.os, because the error code is not accessible.

If you want to reach into a lower level of abstraction, you can use std.os.linux directly and handle the usize that the kernel returns.

It seems that given the current design using std.os.linux is the only choice. But using std.os.linux you lose all the advantages of std.os (slices, cross-platform, retrying on EINTR, etc.), so I don't think it's really an option. Rather than a lower level of abstraction, I would prefer a high-level abstraction that provides decent error codes.

If you have issues with how a specific syscall is wrapped, then you are welcome to file a bug report for that function and we can discuss it.

The reason I filed this issue was to get ahead of the game and avoid the next dozen issues about error codes. Also because enumerating the codes returned is a "Sisyphean task". If you really want some cases Zig is missing, how about EBADF, EDQUOT, ENXIO, EOPNOTSUPP, EROFS, ETXTBSY, EWOULDBLOCK in openZ() and EINTR in fsync().

@andrewrk
Copy link
Member

One practical case this would happen is where a new kernel is released but a stable branch of Zig is in use. The current std.os design does not allow handling a new error code in the application, so the standard library would have to be patched to add the error code. This ad-hoc patching is fragile and tedious.

Fair point. I will re-open this as a proposal.

@andrewrk andrewrk reopened this Jun 10, 2022
@andrewrk andrewrk added breaking Implementing this issue could cause existing code to no longer compile or have different behavior. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. labels Jun 10, 2022
@andrewrk andrewrk modified the milestones: 0.10.0, 0.12.0 Jun 10, 2022
@Mathnerd314
Copy link
Author

Boiling it down, I think all you need is something like this:

threadlocal var lastErrNo: E = SUCCESS;

pub fn errno(r: usize) E {
  const err = system.getErrno(r);
  lastErrNo = err; // store errno during conversion to Error
  return err;
}

Then you can do:

os.openZ("/dev/null", os.O.RDWR, 0) catch |err| switch (err) {
  error.Unexpected => switch(std.os.lastErrNo) { 
    .EBADF => ...
  }
}

lastErrNo could even be OS-specific so you would have to write std.os.linux.lastErrNo.

It doesn't cover unreachable, but as you say I think the kernel can be assumed to have some amount of sanity.

@andrewrk
Copy link
Member

Zig std lib will not be repeating the crime against humanity that is thread-local errno.

@Mathnerd314
Copy link
Author

Well, with #2647, the full error code can be added to the Unexpected error:

pub const UnexpectedError = error{
    /// The Operating System returned an error code not handled by Zig's standard library.
    Unexpected: usize,
};
...
os.openZ("/dev/null", os.O.RDWR, 0) catch |err| switch (err) {
  error.Unexpected => |errCode| switch(errCode) { 
    .EBADF => ...
  }
}

You said that you preferred passing the data as an out parameter. I guess that would work too:

var errCode: usize = 0; 
os.openZ("/dev/null", os.O.RDWR, 0, &errCode) catch |err| switch (err) {
  error.Unexpected => switch(errCode) { 
    .EBADF => ...
  }
}

But it would mean every function in std.os would have to take an errCode parameter.

@mnemnion
Copy link

mnemnion commented Jun 7, 2024

Generally, zig does allow writing robust programs, and robust programs should expect the kernel to never return EFAULT from chdir. Crashing in this case could be argued to be more robust, in fact, since proceeding with an unpredictable kernel may have unintended consequences. Or presumably, we have discovered that our pointer is dangling, which is dangerous and panicking is the right move here.

I wanted a note on this issue that, even in the case that the only thing to do is crash, unreachable is not a reliable way to do that. It only panics under certain build conditions, so having it as an errno prong will just corrupt the program if it's hit during Fast/Small builds.

Ideally, std.posix doesn't induce crashes. It seems to me that an errno result bad enough to crash is about to get the process, if not the kernel, killed; no need to rush things. But if a crash is the most correct thing, it should be induced with @panic(), which will do so reliably, rather than a => unreachable, which will not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. os-linux proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

No branches or pull requests

5 participants