Description
Affects: PythonCall
Describe the bug
I get segmentation faults when using PythonCall to call python functions that create python objects many times. It does not happen deterministically, but happens (on a couple of systems I have tested it on) reliably for the MWE below:
using PythonCall
N = 100_000_000
for i in 1:N
args = pytuple((2.0, 1.0, 0.0))
@assert !pyeq(Bool, args[0], args[2]) # SEGFAULT
if i % 100_000 == 0
println("step $i ($(i/N*100) %)")
end
end
Sometimes I get the segmentation fault immediately, before the first print statement, sometimes 30% of the loop succedes before the segfault.
If I wrap the doubles in Py
objects explicitly before, I do not get segmentation faults:
using PythonCall
N = 100_000_000
for i in 1:N
args = pytuple((Py(2.0), Py(1.0), Py(0.0)))
@assert !pyeq(Bool, args[0], args[2]) # NO SEGFAULT
if i % 100_000 == 0
println("step $i ($(i/N*100) %)")
end
end
My understanding is that this shouldn't be necessary to deal with explicitly, but somehow it is in this example. And since this is only a MWE it is not a solution to my actual problem, since I do not know which calls / arguments need this explicit handling and which do not.
Here is the segfault output (from the first code snippet):
[872756] signal 11 (1): Segmentation fault
in expression starting at /home/fischer/dev/jl/PythonCallSegFault/pythoncall_segfault_mwe.jl:5
do_richcompare at /usr/local/src/conda/python-3.12.10/Objects/object.c:815 [inlined]
PyObject_RichCompare at /usr/local/src/conda/python-3.12.10/Objects/object.c:865 [inlined]
PyObject_RichCompareBool at /usr/local/src/conda/python-3.12.10/Objects/object.c:887
PyObject_RichCompareBool at /home/fischer/.julia/packages/PythonCall/L4cjh/src/C/pointers.jl:303
unknown function (ip: 0x7efe9e9fca05)
unknown function (ip: 0x7efe9e9fc699)
unknown function (ip: 0x7efe9e9fc623)
macro expansion at /home/fischer/.julia/packages/PythonCall/L4cjh/src/Core/Py.jl:132 [inlined]
pyeq at /home/fischer/.julia/packages/PythonCall/L4cjh/src/Core/builtins.jl:304 [inlined]
top-level scope at /home/fischer/dev/jl/PythonCallSegFault/pythoncall_segfault_mwe.jl:12
jl_toplevel_eval_flex at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval_in at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:430 [inlined]
include_string at ./loading.jl:2734
_include at ./loading.jl:2794
include at ./Base.jl:557
jfptr_include_46879.1 at /home/fischer/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
exec_options at ./client.jl:323
_start at ./client.jl:531
jfptr__start_73430.1 at /home/fischer/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
true_main at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/jlapi.c:900
jl_repl_entrypoint at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/jlapi.c:1059
main at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/cli/loader_exe.c:58
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 312994585 (Pool: 312994469; Big: 116); GC: 13
[1] 872756 segmentation fault julia pythoncall_segfault_mwe.jl
Your system
Please provide detailed information about your system:
- OS: Ubuntu 20.04
julia> versioninfo()
Julia Version 1.11.5
Commit 760b2e5b739 (2025-04-14 06:53 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 16 × 12th Gen Intel(R) Core(TM) i7-1260P
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, alderlake)
Threads: 1 default, 0 interactive, 1 GC (on 16 virtual cores)
julia> Pkg.status()
Status `~/dev/jl/PythonCallSegFault/Project.toml`
[992eb4ea] CondaPkg v0.2.29
[6099a3de] PythonCall v0.9.25
julia> CondaPkg.status()
CondaPkg Status /home/fischer/dev/jl/PythonCallSegFault/CondaPkg.toml (empty)
Environment
/home/fischer/dev/jl/PythonCallSegFault/.CondaPkg/.pixi/envs/default
julia> pyimport("sys").version
Python: '3.12.10 | packaged by conda-forge | (main, Apr 10 2025, 22:41:16) [GCC 13.3.0]'
Additional context
The actual context where this happened to me is that I am using PythonCall.jl to use the python interface of the lanelet2 library, which in turn is a boost python interface to C++ code. I am getting segmentation faults from a couple of calls to this library.
Activity
cjdoris commentedon May 28, 2025
Thanks for the detailed report and the reproducer. I believe I have fixed it in #618 - please give it a try with your MWE and your actual problem.
The specific offending line appears to be
PythonCall.jl/src/Core/builtins.jl
Line 897 in 0a7d49f
incref(getptr(Py(x)))
it's possible that Julia freesPy(x)
beforeincref(getptr(...))
is called, and so the underlying Python object would be decref'd and freed, invalidating the pointer fromgetptr(...)
.cjdoris commentedon May 28, 2025
PS the reproducer made my laptop very warm thank you 😄
dpinol commentedon May 30, 2025
We have a python application calling julia multithreaded code. Our stress test takes 7h, and before this MR it crashed with all julia versions afters 1.10.7. With it, it's not crashing anymore. You made my day 🙏
johannes-fischer commentedon May 30, 2025
Thank you very much for this prompt fix!! I have tested my MWE and some real examples and so far it seems to fix the issue!
tmistele commentedon May 31, 2025
Fwiw, #618 also seems to fix #563 for me (or at least what seemed like a very similar issue to me )
johannes-fischer commentedon Jun 26, 2025
I came across this issue again and after some digging I think the reason is one of the additional changes you made to this branch after we tested your fix.
It still works fine in my original project folder, since I didn't update the project. But after recreating a fresh project (also tested fresh MWE folder) and
]add PythonCall@less-getptr
it pulled the newest commit and that seems to lead to segfaults again.For reference, this is the commit where it works for me: 96bd228
And the commit comparison with the most recent commit on less-getptr