Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assert in libzmq in a multi-threaded application with optimized binary #3742

Closed
mk4-github opened this issue Nov 20, 2019 · 6 comments
Closed

Comments

@mk4-github
Copy link

Please use this template for reporting suspected bugs or requests for help.

Issue description

We have a fairly complicated multi-threaded application which uses a number of zmqs. This application has generated a core-dump on field.
From the application side, I am checking if more than 1 thread is using same zmq socket. As its a huge size of code, it will need time to go through each zmq socket creation and its uses in send/receive/poll functions. Till now, I could find only one instance of it. But, it is used under mutex lock which I am not considering as an issue.

Apart from this, is there any other thing which I can check ? What else can cause this?

Environment

  • libzmq version (commit hash if unreleased):
    zeromq3-3.2.5-1.el7.x86_64
    zeromq-4.1.4-5.el7.x86_64
    czmq-3.0.2-3.el7.x86_64

  • OS: CentOS Linux release 7.4.1708 (Core)

Minimal test code / Steps to reproduce the issue

Reproduced once on field. No specific steps mentioned.

What's the actual result? (include assertion message & call stack if applicable)

#0 0x00007f44a00801f7 in raise () from /lib64/libc.so.6
#1 0x00007f44a00818e8 in abort () from /lib64/libc.so.6
#2 0x00007f44a1f74759 in zmq::zmq_abort(char const*) () from /lib64/libzmq.so.5
#3 0x00007f44a1fa410d in zmq::tcp_write(int, void const*, unsigned long) () from /lib64/libzmq.so.5
#4 0x00007f44a1f9f417 in zmq::stream_engine_t::out_event() () from /lib64/libzmq.so.5
#5 0x00007f44a1f7437a in zmq::epoll_t::loop() () from /lib64/libzmq.so.5
#6 0x00007f44a1fa83a6 in thread_routine () from /lib64/libzmq.so.5
#7 0x00007f44a1b2ce25 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f44a014334d in clone () from /lib64/libc.so.6

Errno is showing 14 (Bad address).

What's the expected result?

@bluca
Copy link
Member

bluca commented Nov 20, 2019

But, it is used under mutex lock which I am not considering as an issue.

This is an issue. Sockets cannot be used from multiple threads.

@mk4-github
Copy link
Author

Thanks Bluca for the response.

But in zmq FAQ i read ( I cant find the link right now)
"For those situations where a dedicated socket per thread is infeasible, a socket may be shared if and only if each thread executes a full memory barrier before accessing the socket. Most languages support a Mutex or Spinlock which will execute the full memory barrier on your behalf."

Is it invalid?

@bluca
Copy link
Member

bluca commented Nov 20, 2019

The I/O thread does not use a mutex/barrier, so it is not valid. It was on a very very old FAQ here: http://wiki.zeromq.org/area:faq I have fixed it now and removed that reference.

@mk4-github
Copy link
Author

okay. The link was in such case misleading.
Thanks Bluca.
I will fix that part.
Any other pointer to what might cause this issue? Specific to errno=14 (Bad address) ?

@mk4-github
Copy link
Author

hi, can anyone else share any more information on this errcode in zmq tcp_write?
Since this code has been running for a long time without any core dump in zmq, I cant be sure using mutex in the ipc zmq is causing it. Can there be any other probable cause?

@stale
Copy link

stale bot commented Dec 25, 2020

This issue has been automatically marked as stale because it has not had activity for 365 days. It will be closed if no further activity occurs within 56 days. Thank you for your contributions.

@stale stale bot added the stale label Dec 25, 2020
@stale stale bot closed this as completed Jun 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants