stack traces for errors #651

andrewrk · 2017-12-07T22:34:53Z

This proposal depends on having failable functions instead of error union types (see #632).

We would like to get rid of the %%x prefix operator in favor of %return x or x %% unreachable, but one reason that is problematic is that, at least in debug and release-safe modes, if you do %%x on an error instead of %return x, you get a nice stack trace.

Well, kind of. You get the stack trace of where you tried to unwrap the error, but not the stack trace of where the error was created and how it bubbled up the call stack.

This proposal describes a way to get access to this stack trace.

Consider this code. I will use the new syntax from proposal #632, where %return x is replaced with try x and a %% b is replaced with a !! b.

const std = @import("std");
const warn = std.debug.warn;

fn main() -> noreturn {
    // Consider calling main2 with values 3, 9, 11
    main2(3) !! |err| {
        warn("{}\n", err);
        std.debug.printStackTrace();
        std.os.exit(1);
    };
    std.os.exit(0);
}

fn main2(value: i32) error {
    const x = try foo(value);
    const y = try bar(x);
    try baz(y);
}

fn foo(param: i32) error -> i32 {
    if (param == 0)
        return error.ItBroke;
    if (param > 10)
        return error.ItBroke;
    return param;
}

fn bar(param: i32) error -> i32 {
    if (param == 4)
        return error.ItBroke;
    if (param == 5)
        return error.ItBroke;
    return param;
}

fn baz(param: i32) error {
    if (param == 6)
        return error.ItBroke;
    if (param == 7)
        return error.ItBroke;
}

Here, when we run the program, we're going to see either a successful exit code, or "ItBroke" and a failure exit code.

The problem is that if you get "ItBroke" that doesn't tell you where exactly the error originated from.
The printed stack trace will have only the main() function in it and nothing else.

Here's what I propose:

A function that can fail has a secret parameter passed to it which is a pointer to a stack trace node, which looks something like this:

const StackTraceNode = struct {
    instruction_pointer_address: usize,
    next: ?&StackTraceNode,
}

When a failable function returns, it sets the instruction_pointer_address to the current instruction pointer.

A function that calls a failable function has a StackTraceNode in its stack frame, and passes a pointer to it to the failable function that it calls. If the caller itself is failable, it sets next from the node that it was passed from its caller, to the node that it passes to its callee.

As a result, at every callsite of a failable function, we have access to a full callstack of where the
error originated from and how it was passed up the stack. The data remains valid until a function call, at which point the data is obliterated. This strategy is incompatible with tail calls, which would have to be disabled for failable functions. This is a pretty big flaw.

We need language support to enable capturing the stack trace data in a way that can't accidentally lead to calling a function and accessing invalid memory.

Consider the example above again, but with a different main implementation with this proposal:

fn main() -> noreturn {
    // Consider calling main2 with values 3, 9, 11
    main2(3) !! |err, stack_trace_object| {
        // OK wait a minute.... this proposal has a big problem
    };
    std.os.exit(0);
}

Alright, this proposal can't work as is. Because what are we going to do with a linked list of instruction pointers that becomes invalid memory if we do so much as call memcpy? The only thing we would be able to do is iterate over it and copy it to a buffer whose size was known before the function call was made. At which point, why bother with using dead stack memory? We could just create this data structure beforehand.

So here's how I'll alter the proposal to make it still work. We have a fixed size number of stack frames,
such as 31. In a non-failable function that calls a failable function, we allocate this structure on the stack and initialize it to all 0:

const StackTrace = struct {
    index: usize,
    instruction_pointer_addresses: [31]usize,
};

Once again we pass a pointer to this struct as a secret parameter to failable functions, and just before returning, we do:

instruction_pointer_address[index] = @currentInstructionPointer();
index = (index + 1) % 31;

This assumes that the deepest stack entries are the most significant. It is compatible with tail calls. Now our example code can look like:

fn main() -> noreturn {
    // Consider calling main2 with values 3, 9, 11
    main2(3) !! |err, stack_trace_object| {
        warn("{}\n", err);
        std.debug.printStackTrace(stack_trace_object);
        std.os.exit(1);
    };
    std.os.exit(0);
}

The printStackTrace function can iterate over the StackTrace object, and use DWARF or PDB info to turn instruction pointer addresses into a human readable stack trace.

31 was chosen because 31 * 8 + 8 = 256 bytes on a 64 bit system, or 128 bytes on a 32 bit system, is a pretty reasonable amount of bytes to put on a stack. Note that it's not for every call, but 256 bytes in the stack frame of the first non-failable function to call a failable function.

We could potentially analyze the call graph, in a similar way that we want to do for #157, to find out the maximum number of function calls - tail calls included - that contain consecutive failable functions. This would give us our global value to use for the instruction_pointer_addresses array.

Another possible modification to this proposal, is that we could enable this for every function. Then at any given point, you can use @getReturnTrace() to obtain a StackTrace explaining not the stack of how you got to the current instruction, but the "return trace" of how you got back to the current instruction.

This would prevent a lot of optimizations and should probably only be allowed in Debug mode. Functions with safety disabled should probably not contain the code to add themselves to this return trace. They would be mysteriously missing.

Doing this for failable functions, however, should probably be allowed in at least Debug and ReleaseSafe mode. For ReleaseFast mode, maybe this is still OK. But in any mode, we should omit the code if the stack trace object is never captured.

Examples

I realized that all of these have only a maximum stack depth of 2, but hopefully it still illustrates the point.

1 fn main2(value: i32) error {
2     const x = try foo(value);
3     const y = try bar(x);
4     try baz(y);
5 }
6 
7 fn foo(param: i32) error -> i32 {
8     if (param == 0)
9         return error.ItBroke;
10     if (param > 10)
11         return error.ItBroke;
12     return param;
13 }
14 
15 fn bar(param: i32) error -> i32 {
16     if (param == 4)
17         return error.ItBroke;
18     if (param == 5)
19         return error.ItBroke;
20     return param;
21 }
22 
23 fn baz(param: i32) error {
24     if (param == 6)
25         return error.ItBroke;
26     if (param == 7)
27         return error.ItBroke;
28 }

Example: main2(11)

/path/to/filename.zig:line:column:
11         return error.ItBroke;
/path/to/filename.zig:line:column:
2     const x = try foo(value);

Example: main2(6)

/path/to/filename.zig:line:column:
25         return error.ItBroke;
/path/to/filename.zig:line:column:
4     try baz(y);

Example: main2(5)

/path/to/filename.zig:line:column:
19         return error.ItBroke;
/path/to/filename.zig:line:column:
3     const y = try bar(x);

The text was updated successfully, but these errors were encountered:

PavelVozenilek · 2017-12-08T03:09:24Z

I had once implemented stack trace support for Win32 C applications.

In most cases stack traces are not that useful (and Visual C++ already presents handy call stack in the debugger). Often the error is clear and stack trace only annoys. I implemented error dialog so that it shows the stack trace only after "Details" button is clicked.

In the proposal above user should be able to ask to show the current stack trace, if he thinks it could be useful. Always printing it out does not feel practical.
Collecting stack traces via PDB inspection is not the fastest (requires I/O, needs mutex protection). I eventually dropped collecting stack trace for every allocated block - cost was too high in comparison with few instructions needed to alloc() via incrementing allocator.

Since returning an error could happen quite often performance should be of concern.
There's relatively simple way to collect stack traces portably, w/o need for special tooling:
1. Stack traces are kept as pointers inside the table behaving like a stack.
2. Every function would append __FILE__ ; __LINE__ : __function__ equivalent (kept in memory as readonly string) into the table. Whenever the function returns it would just revert the top of the stack table to previous value. Every thread could have its own stack table, eliminating the need for locks.
3. Compiler would place all this code into each function, completely invisibly. Leaf calls w/o errors would be omitted.
4. If error is returned accumulated traces would be kept in (they will be always valid) and the caller which handles the error would have all needed information.
5. Recursive calls would eventually exhaust the table, and subsequent traces would be lost. After recursion is finished the table would be usable again, with important parts (older stack traces) intact.

C pseudocode how it could be implemented:

static const char* stack_traces[1024];
static unsigned stack_traces_top = 0;

void foo(void)
{
  // ==== this is generated automatically ====
  unsigned old_top = stack_traces_top;
  if (stack_traces_top < 1024) {
    stack_traces[++stack_traces_top] = __FILE__LINE__FUNCTION__;
  }
  // ==== end of autogenerated code ====

  ...

  // normal return
  if (...) {
    // ==== this is generated automatically ====
    stack_traces_top = old_top;
    // ==== end of autogenerated code ====
    return;
  } 

  // returns an error - nothing is automatically generated
  if (...) {
     return error.SomeError;
   }

  // when error from some call is handled here
  print_stack_table(old_top, stack_traces_top); // will show where error happened

  // ==== this is generated automatically ====
  stack_traces_top = old_top;
  // ==== end of autogenerated code ====
}

Disadvantages of this mechanism:

It would collect file, line where every function starts and function name, not the place where a function is called (so it wouldn't be able to distinguish between successive calls of the same function).
The technique is unfit for release mode (unlike PDB digging).

This stack traces mechanism would allow to implement builtin @getStackTrace() proposed here: #648 (comment) . Whenever compiler notices this builtin it should insert stack trace support into that function (and into all possible callers all the way up), even if they do not use errors.

thejoshwolfe · 2017-12-18T21:24:35Z

Consider this example:

fn onMessage() -> %void {
    handleRequest() %% |err1| switch (err1) {
        error.NoMem => return err1, // example 1
        error.InvalidRequest => {
            logError(err) %% |err2| switch (err2) {
                error.IOError => return err2, // example 2
            };
            return err1; // example 3
        },
    };
}

Say handleRequest() returns a NoMem error. It gets effectively rethrown at example 1. What happens to the stack trace in that case? Does it get truncated to omit the call to handleRequest() and instead say that example 1 is the origin of the error? Or perhaps example 1 is omitted from the stack trace? Or perhaps example 1 is appended to the stack trace, which makes the least sense.

Say handleRequest() returns an InvalidRequest error, then logError() returns an IOError. We were previously returning an InvalidRequest error, but then we change to returning an IOError at example 2. What happens to the original stack trace? Do we get a completely different stack trace for the IOError coming out of logError()? And if so, does it include the call to onMessage()?

Say we get to example 3. This is mostly the same situation as example 1, but we need to make sure the logError() call didn't clobber anything with our err1 stack trace.

andrewrk · 2017-12-18T22:00:48Z

The return-point stack trace, when we come back from handleRequest(), has some addresses in it already, the first one being the point where the error.NoMem was first returned from a function, the last one being the return statement inside handleRequest which returned back to onMessage.

If we return at example 1, then the return-point stack trace gets a new address appended which is the address of the return instruction at example 1. It gets truncated in the sense that a ring buffer truncates data, if and only if there is not enough pre-allocated items in the stack trace to handle the new entry. Why do you say this make the least amount of sense? This is exactly what a return-point trace is. It answers the question, "how did I get back to this location?"

When we call logError, the stack trace is untouched until the return error.IOError is encountered. Here, we append the address of this return instruction to the trace, and then all the following return (or try) statements, until we get back to example 2. Then we append the address of the return statement at example 2. So if we had enough room for everything, the final stack trace would have the return-point trace from both errors. This trace is extremely useful; even though there is an odd jump in the middle of the trace from the return/try in handleRequest to the return error.IOError in the callstack of logError, the trace tells exactly what error conditions happened and how the application handled them.

Example 3 is exactly the same as example 1, because modifications to the stack trace data only happen in these two places:

when the first non-failable function calls a failable function (initial creation)
when a failable function returns a possible error (append address of return instruction)

thejoshwolfe · 2017-12-18T23:52:18Z

Ok, I was thinking about "appending" backwards. This makes a little more sense.

Here's my interpretation of what we'd get from example 1:

file.zig:123:5: 0x0000000000000000 in allocate()
    return error.NonMem;
    ^
file.zig:123:5: 0x0000000000000000 in handleRequest()
    %return allocate();
    ^
file.zig:123:5: 0x0000000000000000 in onMessage()
        error.NoMem => return err1, // example 1
                       ^
file.zig:123:5: 0x0000000000000000 in main()
    %return onMessage();
    ^

It's a little confusing that we get a trace of something inside handleRequest() but we don't see where handleRequest() is called. I would have expected something like this:

file.zig:123:5: 0x0000000000000000 in onMessage()
    handleRequest() %% |err1| switch (err1) {
    ^

I suppose that's not in the trace, because it's not a stack trace but rather a return trace.

Ok, let's do example 2:

file.zig:123:5: 0x0000000000000000 in handleRequest()
        return error.InvalidRequest;
        ^
std/os/index.zig:224:13: 0x0000000000000000 in posixWrite()
            return switch (write_err) {
            ^
file.zig:123:5: 0x0000000000000000 in logError()
        error.AccessDenied => return error.IOError;
                              ^
file.zig:123:5: 0x0000000000000000 in onMessage()
                error.IOError => return err2, // example 2
                                 ^
file.zig:123:5: 0x0000000000000000 in main()
    %return onMessage();
    ^

Here we can see the trace jumping all over the place.

Let's say in example 3 that logError() caught and ignored some error:

file.zig:123:5: 0x0000000000000000 in handleRequest()
        return error.InvalidRequest;
        ^
std/os/index.zig:224:13: 0x0000000000000000 in posixWrite()
            return switch (write_err) {
            ^
file.zig:123:5: 0x0000000000000000 in onMessage()
            return err1; // example 3
            ^
file.zig:123:5: 0x0000000000000000 in main()
    %return onMessage();
    ^

It's pretty mysterious how posixWrite() got called at all. We could look near example 3 and determine that it was probably related to the logError() call, but in a more convoluted example, there'd be no way to know how posixWrite() was called.

Maybe this is ok? It's more like a log of where in the source code errors happened rather than any kind of stack trace. Whenever a function returns an error, it adds an entry to the log. The log won't necessarily be a coherent history of control flow, but you get some clues about errors that happen. Is that the idea?

andrewrk · 2017-12-19T05:58:54Z

All of these examples accurately describe the proposal, and yes I agree with the conclusion that this is the idea.

In a lot of cases you have a single return error.Something followed by a chain of %returns. For this case it will be perfectly clear how the error originated. This will make it more attractive to use %return (or the proposed try) since the error case causes a useful return trace to be printed.

See #632 closes #545 closes #510 this makes #651 higher priority

See #651

andrewrk · 2018-01-15T21:32:03Z

fixed with #684

See #651

andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Dec 7, 2017

andrewrk added this to the 0.3.0 milestone Dec 7, 2017

andrewrk added a commit that referenced this issue Jan 9, 2018

remove %% prefix operator

3c09411

See #632 closes #545 closes #510 this makes #651 higher priority

andrewrk modified the milestones: 0.3.0, 0.2.0 Jan 9, 2018

andrewrk added the accepted This proposal is planned. label Jan 9, 2018

andrewrk added a commit that referenced this issue Jan 12, 2018

functions which can return errors have secret stack trace param

7ec7838

See #651

andrewrk mentioned this issue Jan 12, 2018

*WIP* error return trace proof of concept #684

Merged

andrewrk closed this as completed Jan 15, 2018

andrewrk added a commit that referenced this issue Jan 17, 2018

fix error return traces pointing to off-by-one source line

4bdfc8a

See #651

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stack traces for errors #651

stack traces for errors #651

andrewrk commented Dec 7, 2017 •

edited

Loading

PavelVozenilek commented Dec 8, 2017

thejoshwolfe commented Dec 18, 2017

andrewrk commented Dec 18, 2017

thejoshwolfe commented Dec 18, 2017 •

edited

Loading

andrewrk commented Dec 19, 2017

andrewrk commented Jan 15, 2018

stack traces for errors #651

stack traces for errors #651

Comments

andrewrk commented Dec 7, 2017 • edited Loading

Examples

PavelVozenilek commented Dec 8, 2017

thejoshwolfe commented Dec 18, 2017

andrewrk commented Dec 18, 2017

thejoshwolfe commented Dec 18, 2017 • edited Loading

andrewrk commented Dec 19, 2017

andrewrk commented Jan 15, 2018

andrewrk commented Dec 7, 2017 •

edited

Loading

thejoshwolfe commented Dec 18, 2017 •

edited

Loading