Skip to content

Incorrect wasm dwarf adresses in LLDB #5274

@MaartenS11

Description

@MaartenS11

I am currently working on supporting LLDB debugging of WebAssembly programs in our wasm VM (WARDuino). While doing so I noticed that sometimes LLDB would set breakpoints on addresses that would never be hit.

Example:

Process 1 stopped
* thread #1, name = 'warduino', stop reason = step in
    frame #0: 0x000007ed main.wasm`memset(dest=0x00000000, c=65216, n=0) at memset.c:17:6
   14            * conditional ensures that all the subsequently used
   15            * offsets are well-defined and in the dest region. */
   16
-> 17           if (!n) return dest;
   18           s[0] = c;
   19           s[n-1] = c;
   20           if (n <= 2) return dest;
(lldb) s

The step would result in LLDB placing a breakpoint using the message Z0,7f3,1 which I received in our debugger. This messages tries to add a breakpoint on address 7f3, but this is invalid. Looking at the disassembly (with wasm-objdump -d) here is 7f2 and 7f4 but not 7f3, that's not a valid address.

0007e6 func[27] <memset>:
 0007e7: 01 7f                      | local[5] type=i32
 0007e9: 01 7f                      | local[6] type=i32
 0007eb: 01 7e                      | local[7] type=i64
 0007ed: 02 40                      | block
 0007ef: 20 02                      |   local.get 2
 0007f1: 45                         |   i32.eqz
 0007f2: 0d 00                      |   br_if 0
 0007f4: 20 00                      |   local.get 0
 0007f6: 20 01                      |   local.get 1
 0007f8: 3a 00 00                   |   i32.store8 0 0
 0007fb: 20 00                      |   local.get 0
 0007fd: 20 02                      |   local.get 2
 0007ff: 6a                         |   i32.add

What appears to be happening is that lldb looks at the code starting from 0x7e6 (the very beginning of the function which includes the definition of the locals and so on) and goes from there in which case 0x7f3 is valid.

(lldb) disassemble --count 10
main.wasm`memset:
    0x7e6 <+0>:  throw_ref
    0x7e7 <+1>:  i32.lt_u
    0x7e8 <+2>:  if
    0x7ea <+4>:  local.get 1
    0x7ec <+6>:  local.set 4
    0x7ee <+8>:  br     1                         ; Invalid depth argument!
    0x7f0 <+10>: end
    0x7f1 <+11>: block
    0x7f3 <+13>: loop                             ; label2:
    0x7f5 <+15>: local.get 3

And if we try to put a breakpoint on memset it actually sets it on 0x7e6 which is not an instruction and so a wasm VM will never hit it.

(lldb) b memset
Breakpoint 1: where = main.wasm`memset, address = 0x000007e6

I was unsure whether this was an issue with LLDB or something on my end or the compiler, so I tried to use clang instead. In a basic C program compiled with clang I get the following:

Adding breakpoints on function in LLDB:

(lldb) b fib
Breakpoint 2: where = test-dbg.wasm`fib + 21 at test-dbg.c:5:9, address = 0x0000007c
(lldb) b _main
Breakpoint 1: where = test-dbg.wasm`_main + 22 at test-dbg.c:17:14, address = 0x000000f9

Disassembly:

000067 func[1] <fib>:
 000068: 01 7f                      | local[1] type=i32
 00006a: 23 80 80 80 80 00          | global.get 0 <__stack_pointer>
 000070: 41 20                      | i32.const 32
 000072: 6b                         | i32.sub
 000073: 21 01                      | local.set 1
 000075: 20 01                      | local.get 1
 000077: 20 00                      | local.get 0
 000079: 36 02 1c                   | i32.store 2 28
 00007c: 20 01                      | local.get 1                       <- Function breakpoint
 00007e: 41 00                      | i32.const 0
 000080: 36 02 18                   | i32.store 2 24
 000083: 20 01                      | local.get 1
 000085: 41 01                      | i32.const 1
 000087: 36 02 14                   | i32.store 2 20
 00008a: 20 01                      | local.get 1
 00008c: 41 00                      | i32.const 0
 00008e: 36 02 10                   | i32.store 2 16
...
0000e3 func[2] <_main>:
 0000e4: 01 7f                      | local[0] type=i32
 0000e6: 23 80 80 80 80 00          | global.get 0 <__stack_pointer>
 0000ec: 41 10                      | i32.const 16
 0000ee: 6b                         | i32.sub
 0000ef: 21 00                      | local.set 0
 0000f1: 20 00                      | local.get 0
 0000f3: 24 80 80 80 80 00          | global.set 0 <__stack_pointer>
 0000f9: 20 00                      | local.get 0                       <- Function breakpoint
 0000fb: 41 00                      | i32.const 0
 0000fd: 36 02 0c                   | i32.store 2 12
 000100: 02 40                      | block
 000102: 03 40                      |   loop
 000104: 20 00                      |     local.get 0

These are actually valid addresses for instructions that are on the first line of each of these functions, meanwhile TinyGo's debug information seems to make LLDB put breakpoints on the very beginning of the function definition, not the first instructions (so TinyGo would use 0x67 and 0xe3).

Dwarf info in LLDB

TinyGo program

(lldb) image lookup -vn memset
1 match found in .../main.wasm:
        Address: main.wasm[0x00000465] (main.wasm.code + 1125)
        Summary: main.wasm`memset
         Module: file = ".../main.wasm", arch = "wasm32"
    CompileUnit: id = {0x00000002}, file = "/opt/homebrew/Cellar/tinygo/0.40.1/lib/wasi-libc/libc-top-half/musl/src/string/memset.c", language = "c11"
       Function: id = {0x0000026a}, name = "memset", range = [0x000007e6-0x00000963)
       FuncType: id = {0x0000026a}, byte-size = 0, decl = memset.c:4, compiler_type = "void *(void *, int, size_t)"
         Blocks: id = {0x0000026a}, range = [0x000007e6-0x00000963)
         Symbol: id = {0x0000001b}, range = [0x000007e6-0x00000963), name="memset"
       Variable: id = {0x0000029b}, name = "dest", type = "void *", valid ranges = <block>, location = DW_OP_WASM_location 0x0 0x0, DW_OP_stack_value, decl = memset.c:4
       Variable: id = {0x000002ab}, name = "c", type = "int", valid ranges = <block>, location = [0x00000465, 0x0000050f) -> DW_OP_WASM_location 0x0 0x1, DW_OP_stack_value, decl = memset.c:4
       Variable: id = {0x000002ba}, name = "n", type = "size_t", valid ranges = <block>, location = [0x00000465, 0x000004f8) -> DW_OP_WASM_location 0x0 0x2, DW_OP_stack_value, decl = memset.c:4
       Variable: id = {0x000002c9}, name = "s", type = "unsigned char *", valid ranges = <block>, location = [0x00000465, 0x000004de) -> DW_OP_WASM_location 0x0 0x0, DW_OP_stack_value, decl = memset.c:10

C program

(lldb) image lookup -vn fib
1 match found in .../test-dbg.wasm:
        Address: test-dbg.wasm[0x00000002] (test-dbg.wasm.code + 2)
        Summary: test-dbg.wasm`fib at test-dbg.c:4
         Module: file = ".../test-dbg.wasm", arch = "wasm32"
    CompileUnit: id = {0x00000000}, file = ".../test-dbg.c", language = "c11"
       Function: id = {0x00000026}, name = "fib", range = [0x00000067-0x000000e2)
       FuncType: id = {0x00000026}, byte-size = 0, decl = test-dbg.c:4, compiler_type = "int (int)"
         Blocks: id = {0x00000026}, range = [0x00000067-0x000000e2)
      LineEntry: [0x00000067-0x0000007c): .../test-dbg.c:4
         Symbol: id = {0x00000001}, range = [0x00000067-0x000000e2), name="fib"
       Variable: id = {0x0000003e}, name = "n", type = "int", valid ranges = <block>, location = DW_OP_fbreg +28, decl = test-dbg.c:4
       Variable: id = {0x0000004c}, name = "a", type = "int", valid ranges = <block>, location = DW_OP_fbreg +24, decl = test-dbg.c:5
       Variable: id = {0x0000005a}, name = "b", type = "int", valid ranges = <block>, location = DW_OP_fbreg +20, decl = test-dbg.c:6
(lldb) image lookup -vn _main
1 match found in .../test-dbg.wasm:
        Address: test-dbg.wasm[0x0000007e] (test-dbg.wasm.code + 126)
        Summary: test-dbg.wasm`_main at test-dbg.c:16
         Module: file = ".../test-dbg.wasm", arch = "wasm32"
    CompileUnit: id = {0x00000000}, file = ".../test-dbg.c", language = "c11"
       Function: id = {0x00000099}, name = "_main", range = [0x000000e3-0x0000014c)
       FuncType: id = {0x00000099}, byte-size = 0, decl = test-dbg.c:16, compiler_type = "void (void)"
         Blocks: id = {0x00000099}, range = [0x000000e3-0x0000014c)
      LineEntry: [0x000000e3-0x000000f9): .../test-dbg.c:16
         Symbol: id = {0x00000002}, range = [0x000000e3-0x0000014c), name="_main"

It appears LLDB here uses the end of the LineEntry element which appears to be absent in the binary I got from TinyGo.

Summary

The DWARF information provided by the TinyGo compiler seems to point LLDB to the very start of a function definition whereas clang points to one of the first few instructions. This causes LLDB to sometimes put breakpoins on impossible program counters (when using TinyGo) that are sometimes in between two instructions because it does not know where to find the address of the first valid instruction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions