-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in multiple tests on OS X, Python 3.9 #376
Comments
Great, thanks @hyanwong. Can you run this through gdb please?
This should run until it segfaults, and then we want to print out the backtrace. (bt) Or whatever equivalent debugger is present in the clang toolchain. |
OS X needs to codesign gdb, or run under sudo. Once I do that, it's not super helpful with the default install of Python 3.9 (below). I guess I need to recompile Python 3.9 for this to work?
|
Can you run
before the pytest? Although if your stack is only 9 deep then we are probably still in Python-land, which is weird. |
Oddly, that hangs when running. I'll try to recompile Python 3.9 with the debug flags. |
Hmm https://unconj.ca/blog/setting-up-gdb-for-debugging-python-on-os-x.html: "Note that for some reason, Python always sends a SIGTRAP before it begins executing the script. Don’t worry, just continue. If you want to find out why your program segfaults, you can run bt:" |
Ah didn't know that! Just do |
Can't get the damn thing to do anything other than hang now. Argh. |
? |
Yep. Tried all that |
Where did you get your Python from @hyanwong? Is there a debug version? We'll need the debug symbols everywhere to get to the bottom of this. |
Slightly weirdly, after make; make clean, I can't now run the tests normally:
|
I'm trying it on 2 different python 3.9 versions, one from conda, another from the OS X installation repo "Homebrew". Neither seem to have debug symbols compiled in, so I guess I'll need to compile my own. |
Yes, looks like compiling your own might be the simplest for getting debug symbols, as there's mac-specific technical issues with the conda distributed binaries. |
I guess this is CPython, right? |
From reading your link https://unconj.ca/blog/setting-up-gdb-for-debugging-python-on-os-x.html it seems they have symbols without recompiling python. |
Well now. If I compile my own version of python3.9.1rc1, the tests all now pass. |
Interesting/annoying. What version of Python and where does it come from are we using in CI @benjeffery? Is this some issue with Apple breaking things again? |
I don't think this is Apple, as they don't provide Python 3.9, as far as I know. I suspect it's conda? Or maybe there's been a bug fix in Python 3.9.1? |
You hit the problem via Homebrew as well though, right? |
Both on 3.9.0. I've just tried 3.9.1 compiled via the
Works. So perhaps we should simply not support 3.9, but only 3.9.1 on OS X? Apparently 3.9.1 will be released in December. |
I'd very much like to know what happened here - segfaults shouldn't happen, ever. |
I can recompile cpython at 3.9.0 and see if it segfaults. |
Yes please - including debug symbols if you can. |
The CI python is from miniconda: Here are the flags it was built with:
Obtained with the awesome |
I'm running |
Proving tricky!
|
Yes, that's why I was having problems! |
You can run as sudo. Or code-sign the gdb binary. |
More oddities. This version of python, compiled from source by Pyenv, crashes:
Whereas this one, compiled by me from the GitHub repo, passes:
loading with |
Could it be the same issue as pybind11 had, here? |
All very strange. |
I'd be happy this isn't our fault if I could get a backtrace from gbd - did you have any success building these with debug symbols @hyanwong? Avoiding the CLI tests will get rid of a level of indirection, hopefully make things more straightforward. |
I'm still digging. Can't get gdb working on CI as can't sign the binary so bisecting to find the cause. |
Unfortunately the version with gdb is the one that passed. I'm getting intermittent segfault failures using 3.9.0rc1, all when calling
|
Yay! Got one (although this is using 3.9.0rc1, not the actual release:
|
Ah, no, I get the same error with python 3.9.0 (the actual release, not the rc1 version), but it's not 100% repeatable. I'll see if it goes away with 3.9.1rc1
|
Can you get the full |
It seems to disappear with python 3.9.1rc1. Stupidly I forgot to backtrace. Doh. Need to recompile 3.9.0 and do it again. Damn! |
Here's the full
|
I think that confirms this is the same issue kind of issue as the pybind one you linked above. Will look into the reasoning there and how they fixed. |
I'd be surprised if this turned out to be a problem with our code now - seems like a bug in Python like the pybind one? I guess we just disable the tests for now until 3.9.1 is released? |
I don't think this is worth much more digging for now, so I agree we wait for 3.9.1. |
I think we're satisfied that this isn't our bug, so I'm going to take it out of the next release and remove the "bug" label. We can keep it open until we've got CI running on 3.9.1, though. |
The semi-standard OS X package manager "homebrew" has released python 3.9.1 and all the tests pass on my laptop with that installation. So hopefully when Github Actions & Conda have a 3.9.1 release this will all start working again. I'll run with 3.9 from now on, so I should spot any problems I hope. |
I had a go at reproducing this using fastmac. I wasn't able to get a working environment in about an hour - I tried with homebrew/pip initially, and this failed because numcodecs/blosc seems to have some issue on recent llvm compilers. I tried with conda, but borked up the whole thing and had to give up. I'll have another go at some point when I get a chance. Macs are such a horrid unix environment... |
I doubt this will be the last time so certainly worth keeping what ever incantations you use to get the debugger working. |
I've gotten a working environment using fastmac. Using python 3.9 from conda I can reproduce the segfault if I use the built-in clang 12 compiler. However, if I install clang11 from conda-forge ( So, it does seem to be some connection with clang12, and maybe something to do with using different compilers for building Python and the extension module (which shouldn't matter, but hey). @hyanwong, did you get a version of this working where both Python and the extension module were build with clang12? I failed at this because of problems with numcodecs under clang12. |
Huh, and now I can't reproduce even if I revert to using clang12 as the compiler. There must be something else in the toolchain that's causing the problem, which conda-forge c-compiler solves. So, I'm inclined to say this is a weird mac problem caused by it being a fragmented crappily executed unix development environment. I think this is a wider problem (see the nodes here e.g.), so there's no much we can do. If people hit the problem we can advise them to I'll see if I can get CI working by changing the compilers. |
I found that compiling wasn't very reproducible, but seemed to depend on number of threads. OTOH I didn't test extensively, so could be accidental correlation rather than causation. |
Just to note for posterity that compiling with the defaiult Mac compilers seems to work OK now on apple silicon in the most recent macOS 14.2 (Sonoma). See #909. |
Tests all pass with OS X on python 3.8, but on 3.9 there are multiple failures. The first I get, from running
python3 -m pytest -vs -n0
is intests/test_cli.py::TestCommandsDefaults::test_augment_ancestors
:Running in parallel identifies at least these others as problems
The text was updated successfully, but these errors were encountered: