I am currently working on supporting LLDB debugging of WebAssembly programs in our wasm VM (WARDuino). While doing so I noticed that sometimes LLDB would set breakpoints on addresses that would never be hit.
Example:
Process 1 stopped
* thread #1, name = 'warduino', stop reason = step in
frame #0: 0x000007ed main.wasm`memset(dest=0x00000000, c=65216, n=0) at memset.c:17:6
14 * conditional ensures that all the subsequently used
15 * offsets are well-defined and in the dest region. */
16
-> 17 if (!n) return dest;
18 s[0] = c;
19 s[n-1] = c;
20 if (n <= 2) return dest;
(lldb) s
The step would result in LLDB placing a breakpoint using the message Z0,7f3,1 which I received in our debugger. This messages tries to add a breakpoint on address 7f3, but this is invalid. Looking at the disassembly (with wasm-objdump -d) here is 7f2 and 7f4 but not 7f3, that's not a valid address.
0007e6 func[27] <memset>:
0007e7: 01 7f | local[5] type=i32
0007e9: 01 7f | local[6] type=i32
0007eb: 01 7e | local[7] type=i64
0007ed: 02 40 | block
0007ef: 20 02 | local.get 2
0007f1: 45 | i32.eqz
0007f2: 0d 00 | br_if 0
0007f4: 20 00 | local.get 0
0007f6: 20 01 | local.get 1
0007f8: 3a 00 00 | i32.store8 0 0
0007fb: 20 00 | local.get 0
0007fd: 20 02 | local.get 2
0007ff: 6a | i32.add
What appears to be happening is that lldb looks at the code starting from 0x7e6 (the very beginning of the function which includes the definition of the locals and so on) and goes from there in which case 0x7f3 is valid.
(lldb) disassemble --count 10
main.wasm`memset:
0x7e6 <+0>: throw_ref
0x7e7 <+1>: i32.lt_u
0x7e8 <+2>: if
0x7ea <+4>: local.get 1
0x7ec <+6>: local.set 4
0x7ee <+8>: br 1 ; Invalid depth argument!
0x7f0 <+10>: end
0x7f1 <+11>: block
0x7f3 <+13>: loop ; label2:
0x7f5 <+15>: local.get 3
And if we try to put a breakpoint on memset it actually sets it on 0x7e6 which is not an instruction and so a wasm VM will never hit it.
(lldb) b memset
Breakpoint 1: where = main.wasm`memset, address = 0x000007e6
I was unsure whether this was an issue with LLDB or something on my end or the compiler, so I tried to use clang instead. In a basic C program compiled with clang I get the following:
Adding breakpoints on function in LLDB:
(lldb) b fib
Breakpoint 2: where = test-dbg.wasm`fib + 21 at test-dbg.c:5:9, address = 0x0000007c
(lldb) b _main
Breakpoint 1: where = test-dbg.wasm`_main + 22 at test-dbg.c:17:14, address = 0x000000f9
Disassembly:
000067 func[1] <fib>:
000068: 01 7f | local[1] type=i32
00006a: 23 80 80 80 80 00 | global.get 0 <__stack_pointer>
000070: 41 20 | i32.const 32
000072: 6b | i32.sub
000073: 21 01 | local.set 1
000075: 20 01 | local.get 1
000077: 20 00 | local.get 0
000079: 36 02 1c | i32.store 2 28
00007c: 20 01 | local.get 1 <- Function breakpoint
00007e: 41 00 | i32.const 0
000080: 36 02 18 | i32.store 2 24
000083: 20 01 | local.get 1
000085: 41 01 | i32.const 1
000087: 36 02 14 | i32.store 2 20
00008a: 20 01 | local.get 1
00008c: 41 00 | i32.const 0
00008e: 36 02 10 | i32.store 2 16
...
0000e3 func[2] <_main>:
0000e4: 01 7f | local[0] type=i32
0000e6: 23 80 80 80 80 00 | global.get 0 <__stack_pointer>
0000ec: 41 10 | i32.const 16
0000ee: 6b | i32.sub
0000ef: 21 00 | local.set 0
0000f1: 20 00 | local.get 0
0000f3: 24 80 80 80 80 00 | global.set 0 <__stack_pointer>
0000f9: 20 00 | local.get 0 <- Function breakpoint
0000fb: 41 00 | i32.const 0
0000fd: 36 02 0c | i32.store 2 12
000100: 02 40 | block
000102: 03 40 | loop
000104: 20 00 | local.get 0
These are actually valid addresses for instructions that are on the first line of each of these functions, meanwhile TinyGo's debug information seems to make LLDB put breakpoints on the very beginning of the function definition, not the first instructions (so TinyGo would use 0x67 and 0xe3).
Dwarf info in LLDB
TinyGo program
(lldb) image lookup -vn memset
1 match found in .../main.wasm:
Address: main.wasm[0x00000465] (main.wasm.code + 1125)
Summary: main.wasm`memset
Module: file = ".../main.wasm", arch = "wasm32"
CompileUnit: id = {0x00000002}, file = "/opt/homebrew/Cellar/tinygo/0.40.1/lib/wasi-libc/libc-top-half/musl/src/string/memset.c", language = "c11"
Function: id = {0x0000026a}, name = "memset", range = [0x000007e6-0x00000963)
FuncType: id = {0x0000026a}, byte-size = 0, decl = memset.c:4, compiler_type = "void *(void *, int, size_t)"
Blocks: id = {0x0000026a}, range = [0x000007e6-0x00000963)
Symbol: id = {0x0000001b}, range = [0x000007e6-0x00000963), name="memset"
Variable: id = {0x0000029b}, name = "dest", type = "void *", valid ranges = <block>, location = DW_OP_WASM_location 0x0 0x0, DW_OP_stack_value, decl = memset.c:4
Variable: id = {0x000002ab}, name = "c", type = "int", valid ranges = <block>, location = [0x00000465, 0x0000050f) -> DW_OP_WASM_location 0x0 0x1, DW_OP_stack_value, decl = memset.c:4
Variable: id = {0x000002ba}, name = "n", type = "size_t", valid ranges = <block>, location = [0x00000465, 0x000004f8) -> DW_OP_WASM_location 0x0 0x2, DW_OP_stack_value, decl = memset.c:4
Variable: id = {0x000002c9}, name = "s", type = "unsigned char *", valid ranges = <block>, location = [0x00000465, 0x000004de) -> DW_OP_WASM_location 0x0 0x0, DW_OP_stack_value, decl = memset.c:10
C program
(lldb) image lookup -vn fib
1 match found in .../test-dbg.wasm:
Address: test-dbg.wasm[0x00000002] (test-dbg.wasm.code + 2)
Summary: test-dbg.wasm`fib at test-dbg.c:4
Module: file = ".../test-dbg.wasm", arch = "wasm32"
CompileUnit: id = {0x00000000}, file = ".../test-dbg.c", language = "c11"
Function: id = {0x00000026}, name = "fib", range = [0x00000067-0x000000e2)
FuncType: id = {0x00000026}, byte-size = 0, decl = test-dbg.c:4, compiler_type = "int (int)"
Blocks: id = {0x00000026}, range = [0x00000067-0x000000e2)
LineEntry: [0x00000067-0x0000007c): .../test-dbg.c:4
Symbol: id = {0x00000001}, range = [0x00000067-0x000000e2), name="fib"
Variable: id = {0x0000003e}, name = "n", type = "int", valid ranges = <block>, location = DW_OP_fbreg +28, decl = test-dbg.c:4
Variable: id = {0x0000004c}, name = "a", type = "int", valid ranges = <block>, location = DW_OP_fbreg +24, decl = test-dbg.c:5
Variable: id = {0x0000005a}, name = "b", type = "int", valid ranges = <block>, location = DW_OP_fbreg +20, decl = test-dbg.c:6
(lldb) image lookup -vn _main
1 match found in .../test-dbg.wasm:
Address: test-dbg.wasm[0x0000007e] (test-dbg.wasm.code + 126)
Summary: test-dbg.wasm`_main at test-dbg.c:16
Module: file = ".../test-dbg.wasm", arch = "wasm32"
CompileUnit: id = {0x00000000}, file = ".../test-dbg.c", language = "c11"
Function: id = {0x00000099}, name = "_main", range = [0x000000e3-0x0000014c)
FuncType: id = {0x00000099}, byte-size = 0, decl = test-dbg.c:16, compiler_type = "void (void)"
Blocks: id = {0x00000099}, range = [0x000000e3-0x0000014c)
LineEntry: [0x000000e3-0x000000f9): .../test-dbg.c:16
Symbol: id = {0x00000002}, range = [0x000000e3-0x0000014c), name="_main"
It appears LLDB here uses the end of the LineEntry element which appears to be absent in the binary I got from TinyGo.
Summary
The DWARF information provided by the TinyGo compiler seems to point LLDB to the very start of a function definition whereas clang points to one of the first few instructions. This causes LLDB to sometimes put breakpoins on impossible program counters (when using TinyGo) that are sometimes in between two instructions because it does not know where to find the address of the first valid instruction.
I am currently working on supporting LLDB debugging of WebAssembly programs in our wasm VM (WARDuino). While doing so I noticed that sometimes LLDB would set breakpoints on addresses that would never be hit.
Example:
The step would result in LLDB placing a breakpoint using the message
Z0,7f3,1which I received in our debugger. This messages tries to add a breakpoint on address7f3, but this is invalid. Looking at the disassembly (withwasm-objdump -d) here is7f2and7f4but not7f3, that's not a valid address.What appears to be happening is that lldb looks at the code starting from
0x7e6(the very beginning of the function which includes the definition of the locals and so on) and goes from there in which case0x7f3is valid.And if we try to put a breakpoint on
memsetit actually sets it on0x7e6which is not an instruction and so a wasm VM will never hit it.I was unsure whether this was an issue with LLDB or something on my end or the compiler, so I tried to use clang instead. In a basic C program compiled with clang I get the following:
Adding breakpoints on function in LLDB:
Disassembly:
These are actually valid addresses for instructions that are on the first line of each of these functions, meanwhile TinyGo's debug information seems to make LLDB put breakpoints on the very beginning of the function definition, not the first instructions (so TinyGo would use
0x67and0xe3).Dwarf info in LLDB
TinyGo program
C program
It appears LLDB here uses the end of the LineEntry element which appears to be absent in the binary I got from TinyGo.
Summary
The DWARF information provided by the TinyGo compiler seems to point LLDB to the very start of a function definition whereas clang points to one of the first few instructions. This causes LLDB to sometimes put breakpoins on impossible program counters (when using TinyGo) that are sometimes in between two instructions because it does not know where to find the address of the first valid instruction.