New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
weak memcmp symbol from compiler_rt is not overridden when building Python 3.11.0 on Linux and MacOS #13303
Comments
I just checked (it's very easy) and building Python is still broken in exactly the same way with 0.10.0-dev.4587+710e2e7f1 I did an experiment with 0.10.0-dev.4587+710e2e7f1. I took the exact ld line (output by zig -v) and replaced the path to libcompiler_rt.a with the one for zig version 4476+0f0076666. Everything works fine. This proves that a change to libcompiler_rt.a between 4476+0f0076666 and 4562+b3cd38ea4 contains a new bug that significantly changes the behavior of (and breaks) Python. |
This bug is caused by @Luukdegram 's change here: cdf7e7d I reverted exactly that one line change to lib/compiler_rt.zig I'm going to patch my local version(s) of Zig so I can continue using and testing the latest version. In the meantime, @Luukdegram can you please reconsider your commit cdf7e7d |
💯 fan mail -- the fact that I can just change compiler_rt.zig and immediately work with the modified version is mind blowingly awesome. Zig rocks. |
While I don't mind reconsidering the commit, it has to be for the right reason. This commit simply exposes our version of |
Thanks for looking into this and reproducing the bug!! You guys of course have a vastly deeper understanding of zig than I do.
Understood. In my project I'm patching the zig source code to resolve this bug, and that's working 100% fine for me for now, so at this point there is no urgency from me. Of course, longterm I hope zig can be used to build Python for other people, since Python is a very popular language, and zig's packaging of clang is simply amazing. |
There are two things I want to investigate here:
zig/lib/compiler_rt/memcmp.zig Lines 8 to 20 in f4f4e33
The following implementation is nicer and has different machine code than the above version, which happens to match musl libc's machine code for x86_64 when compiled with clang: pub fn memcmp(vl: [*]const u8, vr: [*]const u8, n: usize) callconv(.C) c_int {
var i: usize = 0;
while (i < n) : (i += 1) {
const l_big: c_int = vl[i];
const r_big: c_int = vr[i];
if (l_big != r_big) return l_big -% r_big;
}
return 0;
} So I suspect it will solve the problem in this issue. However we should come up with a test case that triggers the faulty behavior before considering the case closed.
|
See the new test case - this fails in the previous implementation. See #13303
See the new test case - this fails in the previous implementation. See #13303
Alright I verified that this did indeed solve the original problem of building Python with In order to verify that this issue can be closed, one should be able to edit compiler_rt, replacing zig's |
yeesh. It looks like libc (at least on archlinux, x86_64) doesn't define strong symbols. (output trimmed)nm -D /usr/lib/libc.so.6 | egrep 'memcpy|sleep'
00000000000a0d30 T memcpy@GLIBC_2.2.5
0000000000099c70 i memcpy@@GLIBC_2.14
00000000000d21d0 W sleep@@GLIBC_2.2.5 From
and from my tests, it seems that during build-time linking the symbol resolution for *tidbit: apparently if you have no strong linkage of a symbol, but multiple weak linkages, the tie breaker is the SIZE, and presumably if SIZE are all same, then first one found on command line. I hazard a guess the system libc did this approach to allow for certain overrides using strong symbols, regardless of object ordering and this precludes zig from using weak linkage in the absence of libc. As an aside, I did verify that |
Tiny note: I only reported this due to aarch64-linux binaries of master being at https://ziglang.org/download/, but they seem to be discontinued now. In case somebody is curious, there's a lot of value in those binaries, and thanks to whoever was making them before. |
The aarch64 CI makes those tarballs automatically but it mysteriously died three days ago a65ba6c |
Thanks for the update. I tried signing up for their service to see if I could understand what is going on and got this very not-encouraging message: I'll post a message here if I end up setting up something on my own hardware. Evidently, it's possible to run that same drone.io service on your own hardware according to https://docs.drone.io/server/overview/, but you have to put everything on the public internet, which complicates things substantially for me at least. Evidently, Github still doesn't have aarch64 linux support yet: actions/runner-images#5631 |
Zig Version
0.10.0-dev.4562+b3cd38ea4
Steps to Reproduce
Building Python from source worked well using the dev versions of zig until a few days ago.
This bug is very easy to reproduce in about 1 minute. I've tested this on x86_64 Linux, aarch64 linux, and aarch64 macos, and the behavior is identical in all cases.
This fails to build with zig 4562+b3cd38ea4.tar.xz (also with 4560) but builds fine with zig 4476+0f0076666 and many previous versions.
Expected Behavior
Build Python successfully, like I did with 4476+0f0076666 and many previous versions.
Actual Behavior
The step that behaves differently in the build is creating
_bootstrap_python
. You can see what happens as follows:It is a big linking of all the objects together to make a basic python
interpreter.
_bootsrap_python
is then run to createPython/deepfreeze/deepfreeze.c
, and that fileis autogenerated incorrectly with the
_bootsrap_python
coming from the latest version of zig. It'slike some C library functions changed to be broken, which is breaking the behavior of the
bootstrap_python
program badly.
If I install zig 4476+0f0076666, then do
rm -f _bootstrap_python && make _bootstrap_python
then use this _bootstrap_python to create Python/deepfreeze/deepfreeze.c it is correct and works and the build even finishes. Thus my best guess is there is a change in zig's compiler-rt that is linked in, but I don't know. The problem definitely doesn't involve the compilation stage (which just involves clang anyways), but what is linked in.
Also, I want to emphasize again that this problem is exactly the same on a whole bunch of different OS's and CPU architectures, which is probably helpful for debugging this.
The text was updated successfully, but these errors were encountered: