Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a builtin for comptime source information #2029

Open
Sahnvour opened this Issue Mar 3, 2019 · 12 comments

Comments

Projects
None yet
4 participants
@Sahnvour
Copy link
Member

Sahnvour commented Mar 3, 2019

I want to propose a way to query the compiler in order to get information on the context of the compilation, at call site. In particular, this is mostly used to get source filename, line, function name.

Classical uses of this feature are logging, asserts, scoped variables (instrumentation, debugging...) and probably others I can't think of.

In C and C++ this is done through the __FILE__ and __LINE__ macros which are expanded by the preprocessor.
In Rust, there are file! and line! macros.
D also makes use of __FILE__ and such, though I don't know if it's actually a preprocessor.
And so on.

At the moment, something similar could be achieved by capturing a stack trace and looking at debug info, but this would not be very efficient. It would also be easily disturbed by the optimizations performed.

Zig could provide a @sourceInfo builtin function, returning an instance along the lines of

pub const SourceInfo = struct {
    file: []const u8,
    line: comptime_int,
    column: comptime_int,
    module: type,
    function: type,
    function_name: []const u8,
}

Example usage:

pub fn main() void {
    log(@sourceInfo(), "something something");
}

Example output:

[21:18:04] in 'main' (main.zig:2) | something something

Now, this leaves a usability concern: the other languages cited as examples all provide a way to the library writer to make get this information from client code. This means they can provide a log(msg: string) function or macro that transparently uses the source information.
In current Zig, the only way is to explicitly call @sourceInfo() and pass it to the function expecting it.

One solution is to provide a @callerSourceInfo to get the source information from the call site into the callee function. But it doesn't play nice with function pointers or extern functions.

@andrewrk

This comment has been minimized.

Copy link
Member

andrewrk commented Mar 3, 2019

This was previously rejected in #577, but you have made these new arguments:

capturing a stack trace and looking at debug info would not be very efficient.

It would also be easily disturbed by the optimizations performed.

I want to point out that a full stack trace does not need to be captured - in fact only a single usize returned by @returnAddress would need to be captured. A logging library could emit the addresses, which could be processed later with debug info to replace the addresses with source info, or it could resolve the source information immediately. This solves the problem where the caller would have to call @sourceInfo(). (The log function would want to use noinline #640 )

The problem of debug info not being efficient is a problem with the status quo implementation in userland, but that's not a fundamental problem. Applications which want to resolve debug information during runtime can pay an upfront cost to prepare the debug info structures, and then during the main program execution, lookups by address can be fast. However I still think that a more ideal solution would be simply printing addresses, and having the log processing (which can happen on a different machine, with debug info present that is not necessarily present on the main machine) perform the address to source translations.

As for optimizations, it's true that @sourceInfo() would be completely immune to optimizations, but I also think that debug info is sufficient even in optimized builds to map a machine code instruction back to the source information of where it came from, down to the line and column.

I'd like to push back on this. I think that a strategy with @returnAddress() and debug info is ultimately more optimal, for performance, code size, and convention (by convention I mean that a usize which corresponds to a machine code instruction address can be considered by applications and libraries to fully describe a particular source code expression, and becomes the standard way to store "here's how xyz corresponds to the source code") I'd rather improve the facilities for dealing with instruction addresses, rather than provide an additional solution to solve the same problem.

@andrewrk andrewrk added this to the 0.5.0 milestone Mar 3, 2019

@andrewrk andrewrk added the proposal label Mar 3, 2019

@Sahnvour

This comment has been minimized.

Copy link
Member Author

Sahnvour commented Mar 3, 2019

Ha, I failed to find this previous issue.

I forgot about noinline but that only addresses half of the point: what if the function calling log is small (and log being noinline it does not contribute much to its size via a single call instr.) and gets inlined into its caller ? Can we guarantee that its instructions will carry the "right" debug info, referring to their original source ?
Maybe Zig does this better, but in my experience debug info are not reliable in optimized builds. As in: pairing inlined function code only with its calling source line, sometimes apparently multiple unrelated lines resolving to the same instructions, ...

It could be that one's writing a program where they would rather pay for a constant string of every file in their executable, rather than forcing noinline calls and post-processing of the addresses. This has been proved a faster approach for logging, but there are other usecases (such as real-time profiling/instrumenting that usually use __FILE__ etc.) that maybe cannot afford to have a lengthy post-processing. I've worked on (windows) projects where we record a big number of stack traces, and the impact of reading debug info was huge, with DIA.

If we can guarantee this though, I agree with you on the rest. I'm just not 100% positive that will be sufficiently accurate and fast.

@andrewrk

This comment has been minimized.

Copy link
Member

andrewrk commented Mar 3, 2019

what if the function calling log is small and gets inlined into its caller ? Can we guarantee that its instructions will carry the "right" debug info, referring to their original source ?

This would be guaranteed to be correct. The call to log would always be a call instruction, and the return address from within log implementation would be the next machine code instruction after call. So no matter how much the call to log gets inlined, still the return address would point to the correct thing.

sometimes apparently multiple unrelated lines resolving to the same instructions, ...

Indeed it is possible for the optimizer to merge multiple source lines into the same machine code, so that it becomes ambiguous which "parent" source line would correspond to a given instruction. I had forgotten, but here's an example of when that happened to me at OkCupid.

This is not just a problem with the technique described in this issue; it is a problem with the DWARF info emitted by gcc/clang. This problem exists in general for debugging using these debug symbols. Arguably the debug info should be able to represent when a single machine code address corresponds to multiple source locations. But even if that improvement were made, we're left facing the reality that @sourceInfo() would correctly disambiguate, while using debug info would not. However, in this case, the disambiguation done by @sourceInfo() would come at a cost. The very reason that there is an ambiguity in the other case is that it was more friendly to optimization, and the optimizer was able to produce efficient machine code that mapped to multiple source locations. So I could argue that the appropriate way to disambiguate would be to use @optimizeFor(.Debug) (#978) in a given scope. Otherwise, presumably, what was being optimized for was Release, and the optimization that causes multiple source lines to map to a single instruction is desired.

Here is how I would like to move forward with this issue:

  1. Try to make a proof of concept that the logging use case can be solved in userland, even with optimizations on, and that the performance is on-par with @sourceInfo(). This proof of concept will have to work with PDB as well as DWARF.
  2. Additionally make a proof of concept for the real-time profiling/instrumenting use case.
  3. If both of these proofs of concepts can be accomplished, then close this issue. Otherwise accept this issue.
@andrewrk

This comment has been minimized.

Copy link
Member

andrewrk commented Mar 4, 2019

@Sahnvour here's your gist:

const warn = @import("std").debug.warn;

fn log(msg: []const u8) void {
    warn("{} {}\n", @returnAddress(), msg);
    unreachable;
}

fn foo(n: u32) u32 {
    @noInlineCall(log, "log from foo");
    return n * 2;
}

fn bar(n: u32) u32 {
    return @inlineCall(foo, n);
}

pub fn main() void {
    warn("{}\n", bar(13));
}

output:

$ zig run returnaddress.zig
u8@7ff6c887f03f log from foo
reached unreachable code
D:\DEV\Zig\test\returnaddress.zig:5:5: 0x7ff6c887f250 in log (run.obj)
    unreachable;
    ^
D:\DEV\Zig\test\returnaddress.zig:15:12: 0x7ff6c887f03f in bar (run.obj)
    return @inlineCall(foo, n);
           ^
D:\DEV\Zig\test\returnaddress.zig:19:21: 0x7ff6c887f004 in main (run.obj)
    warn("{}\n", bar(13));
                    ^
D:\DEV\Install\zig\lib\zig\std\special\bootstrap.zig:51:40: 0x7ff6c887efe3 in ??? (run.obj)
    std.os.windows.ExitProcess(callMain());
                                       ^
???:?:?: 0x7ffb24fa3dc4 in ??? (???)


???:?:?: 0x7ffb27223691 in ??? (???)

You're pointing out that according to the stack trace, foo got inlined. But this does not show the source info for the @returnAddress() in log. Here's printing just the source info for the @returnAddress():

const std = @import("std");
const warn = std.debug.warn;

fn log(msg: []const u8) void {
    const outstream = &(std.io.getStdErr() catch unreachable).outStream().stream;
    std.debug.printSourceAtAddress(std.debug.getSelfDebugInfo() catch unreachable, outstream, @returnAddress(), true) catch unreachable;
}

fn foo(n: u32) u32 {
    @noInlineCall(log, "log from foo");
    return n * 2;
}

fn bar(n: u32) u32 {
    return @inlineCall(foo, n);
}

pub fn main() void {
    warn("{}\n", bar(13));
}
/home/andy/downloads/zig/build/test.zig:10:5: 0x22371d in ??? (test)
    @noInlineCall(log, "log from foo");
    ^
26

It says in ??? because I didn't implement the DWARF code for finding the symbol yet (it will show something on macos and windows), but it would probably be bar as you're pointing out, because it gets inlined. But my point is that it still points to the correct source expression. Even though the @noInlineCall(log, "log from foo"); got inlined, the return address still gets resolved to test.zig, line 10, column 5.

@Sahnvour

This comment has been minimized.

Copy link
Member Author

Sahnvour commented Mar 4, 2019

Well on my machine, here's the output with your modification:

$ zig run returnaddress.zig
D:\DEV\Zig\test\returnaddress.zig:15:12: 0x7ff6c691f03f in bar (run.obj)
    return @inlineCall(foo, n);

If I force inline the call to bar, the program outputs the line from main.
So the real source line is lost. I assume the compiler does not care about the platform it is emitting debug info for, so that must be the fault of LLVM or the PDB format, I guess ?

@andrewrk

This comment has been minimized.

Copy link
Member

andrewrk commented Mar 4, 2019

I see, that's good to know. Regardless of the outcome of this issue, I think it will be worth having a chat with some LLVM folks and see if this is a fundamental limitation of the PDB format, or if LLVM's debug info code needs to be improved.

If it is a fundamental limitation, or if improving LLVM's debug info is unlikely to happen, then that's pretty good reasoning for accepting this issue.

@Sahnvour

This comment has been minimized.

Copy link
Member Author

Sahnvour commented Mar 4, 2019

This is on par with my experience debugging on windows, WinDBG/VS' debugger are great but sometimes they just can't tell you where the inlined instructions are coming from.
I just stepped through the executable of our test case with WinDBG, and indeed all the instructions resulting from inlining are mapped to the calling source line. Tested with .inline 1 (see https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/debugging-optimized-code-and-inline-functions-external) for what it's worth.

@andrewrk andrewrk added the accepted label Mar 4, 2019

@dbandstra

This comment has been minimized.

Copy link
Contributor

dbandstra commented Mar 18, 2019

A potential use for FILE/__dirname that I came across: If you have a comptime utility function that calls @embedFile, and you want to call that util function from another file, with a relative path.

@trishume

This comment has been minimized.

Copy link

trishume commented Apr 18, 2019

Here's a potentially significant downside to offering this: Any use of it makes function-level incremental compilation impossible.

Consider this scenario: You have a 1000 line file with a bunch of log or panic calls that include line number information. You add a print call for debugging to a small function near the top of the file. But uh oh, now every single function in the rest of the file needs to be re-compiled instead of just the function you modified, because the source information they used changed.

If Zig wants to eventually have function-level incremental recompilation, which IMO is the best promise for <100ms common case debugging recompiles on arbitrarily sized projects, it should try to avoid this problem.

There's a separate but related problem where you do this within your compiler by entangling span information into your data structures. But that can be fixed with heavy refactoring, the actual behavioural changes can't be fixed as easily.

See Rust encountering this problem: rust-lang/rust#47389

@andrewrk

This comment has been minimized.

Copy link
Member

andrewrk commented Apr 18, 2019

Thanks for bringing this up. That's a really good point and one that we hadn't considered before in this proposal. Ability to do function-level incremental compilation is definitely something that is planned and so this is a serious consideration.

@andrewrk andrewrk removed the accepted label Apr 18, 2019

@trishume

This comment has been minimized.

Copy link

trishume commented Apr 18, 2019

I should note that the proposed use case for accessing the current file for path for relative compile time file lookup should be fine from this perspective. Another thing that should be fine is accessing the current function name. The file+function+message should be sufficient to locate the source of for example log messages fairly easily.

I like your idea of using the return address and debug info instead is also a great idea. I think a log library that output like [myfile.zig:0xRETADDR myFunction] log message if you passed sourceInfo would be quite serviceable. Then if you wanted lines you could pipe it through a formatter that replaced all return addresses with lines and columns inline and then it would follow the standard file:line:col format that terminals like iTerm2 allow you to command click to jump to.

@daurnimator daurnimator referenced this issue Apr 18, 2019

Open

external and compressed debug info #2300

0 of 3 tasks complete
@Sahnvour

This comment has been minimized.

Copy link
Member Author

Sahnvour commented Apr 18, 2019

That's serious concern, because I think source information is an important feature, but fast incremental compilation is even more so. Note that the proposal currently is, in my opinion, not final, regarding the usability issue described in my first comment. I thought before of doing something similar to what's described in the Rust issue, and maybe this can help achieve a better proposal.

Semi-serious workaround: only count line numbers inside functions, 0 being associated to the line of the first token of the declaration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.