Zig Version
0.12.0-dev.3161+377ecc6af
Steps to Reproduce and Observed Behavior
I'm attaching a small C++ example of the issue I'm facing when building on macOS with AArch64.
Essentially in this example I have a thread local variable defined in a static member function of a templated class. And I create two differently typed instances of this class.
When I build this example with the zig toolchain, the produced executable segfault on the access of the thread local variable.
The problem is really only with the zig linker:
- If I build everything with zig: the program crashes
- If I build with zig but link with clang: it works
- If I build with clang but link with zig: it crashes
The assembly produced by zig is fine, but the symbol resolution is not as far as I can tell.
In essence, the accesses to the TLV looks like this:
adrp x0, cst0 # 1. Get the page address for the symbol
ldr x0, [x0, #cst1] # 2. load the symbol address
ldr x8, [x0] # 3. get the address of tlv_get_adr for that symbol
blr x8 # 4. call tlv_get_addr
(Forget about the constants, they are irrelevant to the problem)
So the short story is in the clang linker case the final blr x8 is a call to tlv_get_addr (as expected), whereas with the zig linker, the symbol resolution manages to get the address of tlv_get_addr right after the first load, i.e., it is in x0 at step #2 instead of step #3. Then we load what is at the address of x0 and instead of getting the address of the proper tlv_get_addr, we read the encoding of the first few instructions and stuff that into x8.
At this point the call with x8 (step #4) jumps at a random location and the program segfaults.
To reproduce:
- unpack the provided tarball
- run the following commands:
zig c++ -c bar.cpp -O1 -o bar.o
zig c++ -c baz.cpp -O1 -o baz.o
zig c++ -c main.cpp -O1 -o main.o
zig build-exe bar.o baz.o main.o
Result:
Now if you replace the linking steps with clang:
xcrun clang++ -o a.out bar.o baz.o main.o
And execute (./a.out) it works fine.
Now if you execute the program with a debugger, you will see that the Foo<XXX>::getVar code doesn't load the right addresses for tlv_get_addr.
Also for the record, if you change the link order from what we mentioned to:
zig build-exe baz.o bar.o main.o
I.e., you swap baz.o and bar.o the crash happens in the path with the call to bar instead of the path to the call to baz.
In other words, whichever of the two object files gets link last, gets the wrong symbol resolution.
My guess is the linker thinks that it already solved that symbol TLV and reuse some sort of value cached in a table.
repro_files.tgz
Expected Behavior
Expected result:
Do not have to use clang linker to produce a usable executable.
Zig Version
0.12.0-dev.3161+377ecc6af
Steps to Reproduce and Observed Behavior
I'm attaching a small C++ example of the issue I'm facing when building on macOS with AArch64.
Essentially in this example I have a thread local variable defined in a static member function of a templated class. And I create two differently typed instances of this class.
When I build this example with the zig toolchain, the produced executable segfault on the access of the thread local variable.
The problem is really only with the zig linker:
The assembly produced by zig is fine, but the symbol resolution is not as far as I can tell.
In essence, the accesses to the TLV looks like this:
(Forget about the constants, they are irrelevant to the problem)
So the short story is in the clang linker case the final
blr x8is a call totlv_get_addr(as expected), whereas with the zig linker, the symbol resolution manages to get the address oftlv_get_addrright after the first load, i.e., it is inx0at step #2 instead of step #3. Then we load what is at the address ofx0and instead of getting the address of the propertlv_get_addr, we read the encoding of the first few instructions and stuff that intox8.At this point the call with
x8(step #4) jumps at a random location and the program segfaults.To reproduce:
Result:
Now if you replace the linking steps with clang:
And execute (
./a.out) it works fine.Now if you execute the program with a debugger, you will see that the
Foo<XXX>::getVarcode doesn't load the right addresses fortlv_get_addr.Also for the record, if you change the link order from what we mentioned to:
I.e., you swap
baz.oandbar.othe crash happens in the path with the call tobarinstead of the path to the call tobaz.In other words, whichever of the two object files gets link last, gets the wrong symbol resolution.
My guess is the linker thinks that it already solved that symbol TLV and reuse some sort of value cached in a table.
repro_files.tgz
Expected Behavior
Expected result:
Do not have to use clang linker to produce a usable executable.