New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source compilation fails on Musl system on multiple places #45446
Comments
@PureTryOut, |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you. |
Well, I get different errors that way.
|
This looks like some C/C++/system dependencies are not installed/found. It seems issues come from LLVM. Are you able to compile LLVM on the system? |
Well, right now it doesn't even want to compile that as it keeps trying to exec
I have |
That again is a Bazel bug. |
Found a workaround luckily. So, from source install with instructions from https://www.tensorflow.org/install/source#install_python_and_the_tensorflow_package_dependencies, I now get a different error with TF 2.4.0, but that's fixed with #46138. Now I'm back to the mallinfo error. LLVM is actually available on this distribution, is it not possible to use the system variant for TF? |
Unfortunately not really. TF does a lot of JIT and since LLVM does not provide a real stable ABI using the system variant might result in broken code. |
Well with hacks I get further and further. To workaround the Python interpreter problem as mentioned in #15618 (comment):
It seems I got rid of the LLVM problem by adding deps as used in the LLVM packages from the distribution. However I'm now back to one of the original problems:
It seems those failing files are part of S3 support, but I actually tried to disable that:
It still tries to compile those files however. |
After setting those exports you have to run |
That is exactly what I'm doing and what I have always done, yes. Doesn't make a difference. Please check my build script, should be nothing wrong. https://gitlab.alpinelinux.org/PureTryOut/aports/-/raw/mycroft-precise/testing/tensorflow/APKBUILD |
I took a change from Gentoo, seems So just stuck on the LLVM issue now. I really don't understand why, the failing code is guarded by a same conditional statement as one that's used to include the required header file. https://github.com/llvm/llvm-project/blob/main/llvm/lib/Support/Unix/Process.inc#L92 and https://github.com/llvm/llvm-project/blob/main/llvm/lib/Support/Unix/Process.inc#L34 There really is no reason this should fail... EDIT: So Musl does not have support for mallinfo, https://www.openwall.com/lists/musl/2018/01/17/2 However, that shouldn't be a problem as support for it is checked in https://github.com/llvm/llvm-project/blob/main/llvm/cmake/config-ix.cmake#L234 which should just return false. However, for some reason in the case of Tensorflow this line reports that the system does support it? |
IREE uses LLVM and Bazel in the same way as TF (almost). Can you try compiling that? This should give us an indication on whether the issue is from TF or from LLVM. Regarding having to manually write to |
Following these instructions https://google.github.io/iree/get-started/getting-started-linux-bazel, LLVM fails the same way yes. Although interestingly enough it also fails on |
Does Bazel/Tensorflow call CMake for LLVM differently somehow? The thing it's failing on currently is guarded properly by CMake, and it works for the distribution packages in Alpine Linux. So what is different in Tensorflow that it somehow passes the condition while it shouldn't? |
Bazel doesn't use cmake to configure the build. It seems the issue comes from the BUILD files that LLVM uses. A copy of them is at https://github.com/google/llvm-bazel/ |
Thanks @GMNGeoffrey for the additional context and the help. @PureTryOut I think the new error comes from an invalid BUILD file edit. Do you have more lines of context around that error? |
Well there is some stuff, but it doesn't seem related.
|
There should be some lines that should print the path to a malformed BUILD target. Alternatively, you can use |
It seems I found the cause of that particular issue. I did the following:
However, it seems Bazel doesn't like new lines and the whitespace. I put it all on one line instead, and the error was gone. Annoying 🤷 New error:
|
Hmm, now this is a grpc issue. They also use Bazel, can you file an issue at https://github.com/grpc/grpc please? |
Sorry it took a while. I filed an issue, grpc/grpc#25188 |
Can you please elaborate on that?
The only problem I had with TF itself was that I had to disable stacktrace using the method from this patch: LLVM refused to compile for me even with all the build dependencies. I took a similar approach for mallinfo by adding a GLIBC condition in
and had to remove backtrace in
You can actually keep ENABLE_BACKTRACE defined to 1... but just setting HAVE_BACKTRACE to 0 doesn't work because the troublesome header uses an ifdef... which seems like a mistake on llvm's part. There's probably a more elegant solution - it's a matter of musl lacking execinfo.h discussed here: |
@PureTryOut Could you please let us know if this issue still persists ? Thanks! |
Oof I haven't tried it in a while, I kinda lost interest because of all the issues. I'll give it another shot early in the new year. |
Well the good news is that there are fewer copies of these build files now. They've been upstreamed at https://github.com/llvm/llvm-project/tree/main/utils/bazel. Additionally, we've started defining things based on C preprocessor macros in config.h, which are generally way easier to use than Bazel platform selects (with the limitation that we can't execute arbitrary code like try-compile), so if you want to move |
Hey there. I successfully built TF 2.8.0 on Alpine Linux with these changes: grebaza@39381a1. I included the backtrace library ( |
That is awesome! Any chance you could upstream that to the main Tensorflow repo (this one)? |
Sure thing. I will also upload changes into tensorflow-io's repo (as tensorflow requires io during its installation). |
@grebaza , Could you please confirm whether you got time to raise a PR for the mentioned changes in above comment ? |
Note that since then libexecinfo has been removed from Alpine Linux. They had technical reasons but I don't recall them exactly, it's been a while since I looked at this. |
Hello there, I can delete the dependencies on libexecinfo thus enabling compilation on Alpine Linux >= 3.17 (in the same manner done here). After that I will raise a PR. |
Hi @PureTryOut , @grebaza , Tensorflow team maintaining Ubuntu instructions officially may be due to Limited resources. If you still believe this is useful for larger community and also willing to contribute please feel free to raise PR. Thanks! |
This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you. |
This issue was closed because it has been inactive for 7 days since being marked as stale. Please reopen if you'd like to work on this further. |
System information
Describe the problem
I'm trying to compile TensorFlow from source on Alpine Linux which is a Musl based system. It fails on several points however:
Provide the exact sequence of commands / steps that you executed before running into the problem
The build script can be found at https://gitlab.alpinelinux.org/alpine/aports/-/raw/4cf626b10d2f4700cc5e5e9e7536061137c8c6a1/testing/tensorflow/APKBUILD. Note that
prepare()
there is called beforebuild()
and the environment variables set carry over.The text was updated successfully, but these errors were encountered: