New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*WIP* error return trace proof of concept #684

Merged
merged 9 commits into from Jan 15, 2018

Conversation

Projects
None yet
2 participants
@andrewrk
Member

andrewrk commented Jan 12, 2018

There is some cleanup needed before this is ready to merge. But this solves issue #651.

As a demonstration, watch what happens when I mischeviously insert a subtle bug deep in the standard library:

--- a/std/os/child_process.zig
+++ b/std/os/child_process.zig
@@ -369,6 +369,10 @@ pub const ChildProcess = struct {
         const err_pipe = try makePipe();
         %defer destroyPipe(err_pipe);
 
+        if (self.argv[0][1] == 'n') {
+            return error.ProcessNotFound;
+        }
+
         block_SIGCHLD();
         const pid_result = posix.fork();
         const pid_err = posix.getErrno(pid_result);

If the 2nd character of the program we want to execute is n then return ProcessNotFound. Yikes, imagine trying to find this bug.

As it so happens, when we run zig build it will run /nix-store/.../llvm-config on my computer, so we can observe this getting triggered:

./zig build --build-file ../build.zig

I get this beautiful output:
screenshot_2018-01-12_02-07-21

BOOM! We caught the culprit red-handed. Nice try!!

(here is the output in text form)

attempt to unwrap error: ProcessNotFound
/home/andy/dev/zig/build/lib/zig/std/os/child_process.zig:373:29: 0x0000000000270e7e in ??? (build)
        if (self.argv[0][1] == 'n') {
                            ^
/home/andy/dev/zig/build/lib/zig/std/os/child_process.zig:194:23: 0x0000000000252c45 in ??? (build)
        const child = try ChildProcess.init(argv, allocator);
                      ^
/home/andy/dev/zig/build/lib/zig/std/build.zig:735:17: 0x000000000024972d in ??? (build)
                return result.stdout;
                ^
/home/andy/dev/zig/build.zig:202:5: 0x000000000024a83a in ??? (build)
    return result;
    ^

/home/andy/dev/zig/build/lib/zig/std/special/panic.zig:19:43: 0x000000000021e892 in ??? (build)
                @import("std").debug.panic("");
                                          ^
???:?:?: 0x0000000000241f49 in ??? (???)
    ???

/home/andy/dev/zig/build.zig:50:47: 0x00000000002457f6 in ??? (build)
    const llvm = findLLVM(b, llvm_config_exe) catch unreachable;
                                              ^
/home/andy/dev/zig/build/lib/zig/std/special/build_runner.zig:115:15: 0x00000000002413cd in ??? (build)
    root.build(&builder) catch unreachable;
              ^
/home/andy/dev/zig/build/lib/zig/std/special/bootstrap.zig:65:21: 0x000000000023fa61 in ??? (build)
    return root.main();
                    ^
/home/andy/dev/zig/build/lib/zig/std/special/bootstrap.zig:54:13: 0x000000000023f8ef in ??? (build)
    callMain(argc, argv, envp) catch std.os.posix.exit(1);
            ^

Build failed. Use the following command to reproduce the failure:
./../zig-cache/build ./zig ./.. ./../zig-cache

Notice that if we were able to strip out these 2 stack frames (which I know how we can accomplish)...

/home/andy/dev/zig/build/lib/zig/std/special/panic.zig:19:43: 0x000000000021e892 in ??? (build)
                @import("std").debug.panic("");
                                          ^
???:?:?: 0x0000000000241f49 in ??? (???)
    ???

...then the error return trace flows directly into the stack trace. That is, the return result at build.zig:202 returns exactly to the beginning of the stack trace - the catch unreachable at build.zig:50 which is what triggered the "attempt to unwrap error".

@andrewrk

This comment has been minimized.

Member

andrewrk commented Jan 13, 2018

checklist

  • typecheck the panic function
  • fix duplicate stack trace code
  • fix getting debug info twice in default panic handler
  • remove extra call stack frames
  • stack trace type variable number of frames
  • call graph analysis
  • trace pointer as the first parameter
  • build without it in release-fast by default
  • ability to explicitly enable in the above modes
  • generate as tail call
  • catch have optional trace value

andrewrk added some commits Jan 14, 2018

error return trace pointer prefixes other params
instead of being last. This increases the chances that it can
remain in the same register between calls.
@orodley

This comment has been minimized.

orodley commented Jan 14, 2018

Just out of curiosity... why does the deepest entry in the stack trace have a source location pointing at the == in self.argv[0][1] == 'n'? Shouldn't it point at the next line, return error.ProcessNotFound;?

@andrewrk

This comment has been minimized.

Member

andrewrk commented Jan 15, 2018

Yes it should point at the next line. There's either an issue with zig's dwarf code, LLVM's debug info code, or the way we specify debug info for the return statement. I'll look into it.

andrewrk added some commits Jan 15, 2018

clean up error return tracing
 * error return tracing is disabled in release-fast mode
 * add @errorReturnTrace
 * zig build API changes build return type from `void` to `%void`
 * allow `void`, `noreturn`, and `u8` from main. closes #535

@andrewrk andrewrk merged commit c9ac607 into master Jan 15, 2018

4 checks passed

continuous-integration/appveyor/branch AppVeyor build succeeded
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@andrewrk andrewrk deleted the wip-err-ret-trace branch Jan 15, 2018

@andrewrk

This comment has been minimized.

Member

andrewrk commented Jan 17, 2018

@orodley as of 4bdfc8a the location is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment