Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add .debug_line support and integrate it with incremental compilation #5963

Closed
andrewrk opened this issue Jul 31, 2020 · 4 comments
Closed
Labels
accepted This proposal is planned. frontend Tokenization, parsing, AstGen, Sema, and Liveness. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented Jul 31, 2020

Zig self-hosted .debug_line layout (DWARF v5)

header
  unit_length - (relocatable, size of all line number info for entire compilation unit)
  version u16 = 5
  address_size u8 = 4 or 8
  segment_selector_size u8 = 0
  header_length usize (relocatable, use to pad the header)
  minimum_instruction_length u8 = 1
  maximum_operations_per_instruction u8 = 1
  default_is_stmt u8 = 1
  line_base i8 = 1
  line_range u8 = 1
  opcode_base u8 = 0x0d
  standard_opcode_lengths = [opcode_base]u8 (1...opcode_base - 1)
  directory_entry_format_count u8 = 1
  directory_entry_format [2]u8 = {DW_LNCT_path, DW_FORM_string}
  directories_count uleb128 = (number of directories)
  directories (encoded according to directory_entry_format)
    first one is cwd of compilation unit
    remaining are absolute or relative to that cwd
  file_name_entry_format_count u8 = 1
  file_name_entry_format [6]u8 = {
    DW_LNCT_path, DW_FORM_string,
    DW_LNCT_directory_index, DW_FORM_data1,
    DW_LNCT_size, DW_FORM_udata,
  }
  file_names_count uleb128 = (number of files)
  file_names (encoded according to file_name_entry_format)
    first one is root source file

padding (controlled by header_length)

for each file: (starts out at line 1 column 0)
  DW_LNS_set_file

  for each function:
    DW_LNE_set_address  xx xx xx xx (relocatable address, move with the function)
    DW_LNS_set_prologue_end
    DW_LNS_advance_line (relocatable, advance to the opening `{`)
    DW_LNS_copy
    process the IR debug info instructions using DW_LNS_advance_line, DW_LNS_advance_pc, DW_LNS_copy
    DW_LNS_set_epilogue_begin
    DW_LNS_advance_line (advance to the terminating `}`)
    DW_LNS_copy
    NOPs used as padding so the function can grow

  DW_LNE_end_sequence
  NOPs used as padding so the file can grow

Strategy

The header is treated as a separate component that can be written without modifying the line number
program for the files. It becomes "dirty" and rewritten when the size of the global line number program
for the compilation unit changes, or when any directories or files are modified. If it grows so
much that it fills all the padding, then any overlapped files are moved to the end.

A two-byte NOP can be represented as {DW_LNS_negate_stmt, DW_LNS_negate_stmt}.

Each file is treated as a separate component that can be moved within the global line number program.
When a file is moved, no modifications need to be made to the line number program. A file becomes
"dirty" when its file index is changed. It is not planned to support moving file indexes, and so
it is not planned for a file to ever get dirty.

Each function is treated as a separate component that can be moved within the file.
However, the order of functions within a file must match the order of functions within the source
file, because the line offsets are intentionally relative to the previous function, so that when
incrementally compiling a function, only one line number offset needs to be modified.

During incremental compilation, functions are sometimes relocated to a different virtual address.
When this happens, the DW_LNE_set_address instruction is updated with the new
virtual address of the function. The rest of the program is unmodified.

During incremental compilation, new lines of code are sometimes added to a function.
When this happens, the machine code and debug info for the function will be re-generated,
and the DW_LNS_advance_line in the prologue, which is relative to the previous function's line end,
will be updated to be correct. This will cause the following functions in the file to have correct
line number information without being updated.

During incremental compilation, blank lines are sometimes added between functions, without any modifications to the function bodies. When this happens, the function following the added whitespace has its prologue DW_LNS_advance_line value updated to make up the difference. This is a single-byte write to the ELF file. The rest of the line number program remains unchanged.

@andrewrk andrewrk added proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. accepted This proposal is planned. frontend Tokenization, parsing, AstGen, Sema, and Liveness. labels Jul 31, 2020
@andrewrk andrewrk added this to the 0.7.0 milestone Jul 31, 2020
@andrewrk andrewrk added this to In progress in self hosted compiler Jul 31, 2020
andrewrk added a commit that referenced this issue Aug 2, 2020
 * the .debug_line header is written properly
 * link.File.Elf gains:
   - SrcFn, which is now a field in Module.Fn
   - SrcFile, which is now a field in Module.Scope.File
 * link.File.Elf gets a whole *Package field rather than only
   root_src_dir_path.
 * the fields first_dbg_line_file and last_dbg_line_file tell where the
   Line Number Program begins and ends, which alows moving files when
   the header gets too big, and allows appending files to the end.
 * codegen is passed a buffer for emitting .debug_line
   Line Number Program opcodes for functions.

See #5963

There is some work-in-progress code here, but I need to go make some
experimental changes to changing how to represent source locations and I
want to do that in a separate commit.
andrewrk added a commit that referenced this issue Aug 3, 2020
 * the .debug_line header is written properly
 * link.File.Elf gains:
   - SrcFn, which is now a field in Module.Fn
   - SrcFile, which is now a field in Module.Scope.File
 * link.File.Elf gets a whole *Package field rather than only
   root_src_dir_path.
 * the fields first_dbg_line_file and last_dbg_line_file tell where the
   Line Number Program begins and ends, which alows moving files when
   the header gets too big, and allows appending files to the end.
 * codegen is passed a buffer for emitting .debug_line
   Line Number Program opcodes for functions.

See #5963

There is some work-in-progress code here, but I need to go make some
experimental changes to changing how to represent source locations and I
want to do that in a separate commit.
@andrewrk
Copy link
Member Author

andrewrk commented Aug 3, 2020

We're going to have to adjust this based on the fact that it's a common bug in debugging tooling to not properly handle extended opcodes in the vendor-speciifc range. Thanks to Tom de Vries this is now fixed in GDB, however it will take several years before this fix is widely available, and this bug is apparently also present in lldb. That leaves us the 2-byte NOPs as the only way to pad the program, which represents a serious performance cost, especially when building a compilation for the first time. Every time a function is compiled, it would mean moving all those NOPs and the other functions in the file to make room for the new function to be inserted into sorted order.

So here's the new plan forward. It's simpler.

for each function: (assume starts out at line 1 column 0)
  DW_LNE_set_address  xx xx xx xx (relocatable address, move with the function)
  DW_LNS_advance_line (relocatable, advance to the opening `{`)
  DW_LNS_set_file
  DW_LNS_copy
  process the IR debug info instructions using DW_LNS_advance_line, DW_LNS_advance_pc, DW_LNS_copy
  DW_LNE_end_sequence
  NOPs used as padding so the function can grow

Now it's only functions we have to care about, not files, and they can be in any order. The tradeoff here is that when a file is modified, all functions in the file that have their line numbers shifted must get their DW_LNS_advance_line relocation updated. I expect this to be significantly cheaper than moving the entire functions's line number program bodies (and the padding) for every function being compiled.

I filed a DWARF specification proposal to add DW_LNS_jmp. If that proposal is accepted, then this can be improved, however, it is now my belief that unless that happens, the above scheme is the best way to do incremental compilation with line numbers.

@andrewrk
Copy link
Member Author

andrewrk commented Aug 4, 2020

Landed in 952a397

@andrewrk andrewrk closed this as completed Aug 4, 2020
@andrewrk andrewrk moved this from In progress to Done in self hosted compiler Aug 5, 2020
@abidh
Copy link

abidh commented Jul 12, 2021

@andrewrk I am handling the dwarf proposal that you submitted. Would it be possible for you to provide some examples of how this new opcode will be used and what problem will it solve.

@andrewrk
Copy link
Member Author

@abidh ah thank you for reminding me. I just remembered that I didn't reply to your email yet, I will do that this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. frontend Tokenization, parsing, AstGen, Sema, and Liveness. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
No open projects
Development

No branches or pull requests

2 participants