-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assertion failed: (src/mailbox.cpp:99) #3313
Comments
are you using a socket from multiple threads? |
Yes and no. zmq_socket() is called in one thread, followed-up by a call zmq_threadstart() passing the socket as argument; then only that specific thread uses the created socket. Effectively the issue happens on trying to read for the first time from the socket in the thread. I will correct the issue, test all and report back. Thanks, Paolo |
ok, I'll close this in the meanwhile, reopen it if when the socket is opened and used from a single thread only it still happens |
Hi @bluca , Unfortunately the change did not help. Just FYI this is the patch i made ( pmacct/pmacct@b1b61ce ) but it still bails out on the same call as before, that is, the first use of the socket inside p_zmq_zap_handler(). Is it possible it does not like that the socket is part of a structure passed as argument of zmq_threadstart() - even if never used/initialised/touched outside the thread itself? Paolo |
We were having this issue on multiple clusters and debugging this for quite some time. As @paololucente mentioned:
Everything works fine with GCC v6.1.0 but not with newer GCC 7.3.0 or GCC v8.3.0. So certainly there is an issue. Thank you very much @paololucente for the hint! cc: @jorblancoa @matz-e |
Just to add : we tested different compilers:
From above list, GCC 7 and GCC 8 doesn't work! By the way, we don't have any our software component. We are just using ipython/ipyparallel. |
@bluca : I think it would be better to reopen this issue? Also, let me know if/how can help here, for example running tests? |
* Add deployment scripts * Update packages.yaml and modules.yaml * Update py-sonata-network-reduction package and dependencies * Add latest py-jupyter-core and py-jupyter-client from upstream * Install zeromq with %intel to avoid issue when running py-bluepyopt (see zeromq/libzmq#3313)
…545) * Add deployment scripts * Update packages.yaml and modules.yaml * Update py-sonata-network-reduction package and dependencies * Add latest py-jupyter-core and py-jupyter-client from upstream * Install zeromq with %intel to avoid issue when running py-bluepyopt (see zeromq/libzmq#3313) * flex & bison dependency missing after mod2c was made submodule * Update bluepyopt and efel * Add deployment installation scripts in daint and jureca Co-authored-by: Jorge Blanco Alonso <blancoalonso1@jrl06.jureca> Co-authored-by: Sanin Aleksei <aleksei.sanin@efpl.ch>
I'm seeing
I compiled libzmq 4.3.2 from source using gcc 9, hoping this might resolve the problem, but it did not. |
@aolney : just fyi : as mentioned in the previous comments, only working configuration we have found is to use old compiler (or using Intel compiler but this is more from our scientific computing use case!) |
@pramodk Many thanks for the helpful comment. I was hoping using gcc 9 would solve. I just recompiled libzmq 4.3.2 using icc 19.1.2.254, and the error remains. ldconfig lists my compiled library before the distribution's version, so I'm not sure why using icc 19 worked for you and not for me. Any suggestions appreciated. |
Could you elaborate this bit? On our system we have following (you can see
|
Sure, here's my
/usr/local/lib/libzmq.so.5 is the one I compiled with Your question got me thinking though about what's going on in virtual environments, where I'm doing my testing. Here's the output in one of those, which is clearly not using the
As before, any suggestions are appreciated. |
Hi, Running uwsgi and a python3 app with zmq connecting to a backend. Requests to backend mostly works the first time and then the app crashes at next. It works when I run uwsgi from command line, but when running in "emperor mode" it hangs. Using zmq.REQ - zmq.REP on a file socket. On Debian testing: |
@martininsulander for me it was so specific to a particular kernel and a particular library call with that kernel that I just did a work around and left it unsolved. |
For anyone still watching this ticket: zeromq's configure script is broken for use with gcc v7+ due to gcc introducing a new warning (builtin-declaration-mismatch) which is turned on by default; because zeromq's configure script turns all warnings into errors for it's own feature existence tests, the feature tests for fork() and memset() fail. |
thank you very much @caballist-accsys for sharing this information! In coming weeks we will give a try with our toolchain and see if this fix the issue! |
FWIW, I think I wrote at least part of the iterations about these warnings'
support in configure and/or ci_build.sh scripts. The code and ideas evolved
further in NUT configure.ac script and m4 files it includes, as it also has
to deal with as much portability as it can grab. So feel free to
cross-pollinate back :)
I think the code responsible was in zproject however. Memory is faded,
maybe libzmq or czmq did deviate from templates far enough to require
backports rather than regeneration ...
…On Mon, Mar 1, 2021, 15:18 Pramod Kumbhar ***@***.***> wrote:
thank you very much @caballist-accsys
<https://github.com/caballist-accsys> for sharing this information! In
coming weeks we will give a try with our toolchain and see if this fix the
issue!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3313 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAMPTFHEIJYNTXMBY2KNEMTTBOO2RANCNFSM4GFVF52Q>
.
|
@caballist-accsys I haven't been able to replicate your solution on Ubuntu 20.04. Steps to repro: sudo apt-get install libtool pkg-config build-essential autoconf automake libsodium-dev uuid-dev checkinstall
# latest libzmq version as of 3/21 is 4.3.4
wget https://github.com/zeromq/libzmq/releases/download/v4.3.4/zeromq-4.3.4.tar.gz
tar -xvf zeromq-4.3.4.tar.gz
cd zeromq-4.3.4
# ADD "-Wno-builtin-declaration-mismatch" to the configure script lines where "-Werror" is added to the "libzmq_cv_cxx_werror_flag"
# If previous builds, do make uninstall and make distclean
./configure
make
sudo checkinstall #makes and installs a .deb; useful for replacing any install package version
sudo ldconfig
# Check installed
ldconfig -p | grep zmq
dpkg -L packagename libzmq5 Where
ldconfig shows the source-built package has priority over the distro package:
Yet still getting the assertion failed:
Did I implement your solve incorrectly? |
@aolney, I cannot say whether the assert you are getting is a result of the fork configuration issue specifically - there may be another dozen reasons for it. To check whether you have correctly implemented the fork configuration issue I suggest you start from a fresh zmq distribution directory with no changes, run the configuration again and check that you see something like the following in the config.log file: configure:24809: checking for fork This will confirm whether you have the fork configuration issue in the first place. If you have, re-implement my suggested fix (it looks to me like you got it right) and then re-run configure and again check the config.log to make sure the error above is no longer present. Good luck |
Hi All, Or the version of libzmq have changed? As I saw that there are two versions: I have made same changes as the @aolney suggested, but no luck. I have changed the pyZMQ from newest (22.0.3) to 20.0.0 and.. there is no error any more. As from now one... I haven't tested it thoroughly... but it's promising. |
It seems to me that the binary wheel distributed by pypi is broken this way. When I force a recompile ( |
@andre-merzky I tried |
@aolney: yes, I am probably lucky that my local compiler works - just wanted to point out that the distributed wheel is broken... |
So.. I still have a problem with it. Any idea how to override it? I have tried with different types of builds for pyZMQ but with no luck... |
@MenosGrandes for me it fails only on specific library calls in R. My workaround is the following
This avoids ZMQ for the problematic call and isn't too ugly from my point of view |
@aolney thanks for respond. But pupilaLabs use the ZMQ for communications between processes.. I cannot have workaround for it.. |
I hit this bug while working on code that Using ZeroMQ 4.3.4 on Debian testing. Any updates? |
@caballist-accsys wrote:
Thanks, your workaround does indeed work, but it appears that this issue has been closed without being fixed. Is there a chance the detection of fork() will be fixed for an upcoming release? |
If someone sends a PR to fix it, yes |
When compiling with gcc 7 and newer, the program produced by AC_CHECK_FUNCS(fork) produces a warning, which results in configure incorrectly disabling fork support. Fix the issue by using an AC_COMPILE_IFELSE which correctly detects fork availability. Tested by running configure and make check on a system with gcc 7 installed, and verifying that HAVE_FORK was defined correctly. See issue zeromq#3313.
Solution: When compiling with gcc 7 and newer, the program produced by AC_CHECK_FUNCS(fork) produces a warning, which results in configure incorrectly disabling fork support. Fix the issue by using an AC_COMPILE_IFELSE which correctly detects fork availability. Tested by running configure and make check on a system with gcc 7 installed, and verifying that HAVE_FORK was defined correctly. See issue zeromq#3313.
I got fed up with this issue after running into it a couple of times, so I submitted a PR to fix it. Hopefully that does the trick! |
Issue description
I'm from a project called pmacct ( https://github.com/pmacct/pmacct ) which can optionally make use of ZeroMQ for passing messages internally. I got three reports in the last three weeks of users compiling pmacct against ZeroMQ 4.2.5 and getting a failed assertion message back:
shell> nfacctd -f ./nfacctd.detailed.conf
Assertion failed: ok (src/mailbox.cpp:99)
Aborted (core dumped)
Only an hint: one of the users made some testing and apparently all works well with up to gcc6. But compiling with gcc7 or gcc8 shows the above issue.
Environment
Minimal test code / Steps to reproduce the issue
plugin_pipe_zmq: true
What's the actual result? (include assertion message & call stack if applicable)
gdb /usr/local/sbin/nfacctd core
GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
< .. cut .. >
Reading symbols from /usr/local/sbin/nfacctd...done.
[New LWP 6120]
[New LWP 6118]
[New LWP 6117]
[New LWP 6119]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `nfacctd: Core Process [detailed] '.
Program terminated with signal SIGABRT, Aborted.
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f209000d700 (LWP 6120))]
(gdb) where
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007f2092961801 in __GI_abort () at abort.c:79
#2 0x00007f20936b1529 in ?? () from /usr/lib/x86_64-linux-gnu/libzmq.so.5
#3 0x00007f20936b60b4 in ?? () from /usr/lib/x86_64-linux-gnu/libzmq.so.5
4 0x00007f20936da472 in ?? () from /usr/lib/x86_64-linux-gnu/libzmq.so.5
#5 0x00007f20936daf36 in ?? () from /usr/lib/x86_64-linux-gnu/libzmq.so.5
#6 0x00007f20936fb5e9 in ?? () from /usr/lib/x86_64-linux-gnu/libzmq.so.5
#7 0x00007f20936fb669 in zmq_recv () from /usr/lib/x86_64-linux-gnu/libzmq.so.5
#8 0x000055ec462a574c in p_zmq_recv_str(sock=sock@entry=0x55ec4657fec8 <channels_list+7432>) at zmq_common.c:460
#9 0x000055ec462a58bb in p_zmq_zap_handler (zh=0x55ec4657fec0 <channels_list+7424>) at zmq_common.c:518
#10 0x00007f20936ef554 in ?? () from /usr/lib/x86_64-linux-gnu/libzmq.so.5
#11 0x00007f2092d196db in start_thread (arg=0x7f209000d700) at pthread_create.c:463
#12 0x00007f2092a4288f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb)
The text was updated successfully, but these errors were encountered: