-
-
Notifications
You must be signed in to change notification settings - Fork 15.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{llvm,triton-llvm}: fix nondeterministic hang #392651
base: staging
Are you sure you want to change the base?
Conversation
I haven't been able to reproduce this issue. Tests pass after a few minutes. Limiting the lit jobs for me would severely slow LLVM builds down. |
What degree of parallelism are you using? It hangs for me with reasonably high probability (around one in two/three) at 8 cores, and I suspect the more cores the more likely. I haven't tested > 8 since memory becomes the bottleneck.
I noticed this as well. Would disabling the test instead be preferred ( |
My system's default of 64 cores, I've been packaging LLVM since 17 released and I've never experienced this issue.
That is the correct approach here, the other thing is it might not be a bad idea to investigate into what exactly triggers this. I've never experienced this natively on aarch64, x86_64, and riscv64. |
pkgs/development/compilers/llvm/common/llvm/llvm-exegesis-timeout.patch
Outdated
Show resolved
Hide resolved
pkgs/development/compilers/llvm/common/llvm/llvm-exegesis-timeout.patch
Outdated
Show resolved
Hide resolved
Thanks for the tip. It turns out the lit test was a red herring as the # REQUIRES: less-than-4-cpu-cores-in-parallel The real cause is a random hang in loading the configuration |
I have a simpler reproduction, just using llvm-exegesis -mode latency -opcode-name=ADD64rr -x86-lbr-sample-period 123 -repetition-mode loop On the problematic desktop, it gives
or hangs. On my laptop (with basically the same nixos configuration) it gives
without hanging. Seems that a hardware difference determines whether LBR is available or not. |
I'm experiencing a nondeterministic hang while running llvm's tests (it gets stuck on a lit test and doesn't finish, even after waiting for a long time). Other times it finishes fine. Seems to be more stable with 4 cores or less.
Possibly caused by llvm/llvm-project#56336 (see llvm/llvm-project@61708ec and the following comment)
Not sure if this is the only flaky test as the hang is nondeterministic and llvm takes a long time to build.
Of course this could be fixed by
--cores 4
but that would slow down the build as well (and not just the tests).Happy to change the PR to just disable this test (
rm llvm/utils/lit/tests/max-failures.py
) if that would be cleaner.Things done
nix.conf
? (See Nix manual)sandbox = relaxed
sandbox = true
nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Add a 👍 reaction to pull requests you find important.