Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io_context destructor hangs in zmq_ctx_term() if async socket operation is still pending #207

Open
dkl opened this issue Jul 20, 2023 · 1 comment

Comments

@dkl
Copy link

dkl commented Jul 20, 2023

Hi,

It seems that the io_context destructor will hang in its internal shutdown() function in the call to zmq_ctx_term(), if there still are pending azmq::socket operations/completion handlers and the azmq::socket object still exists too. This can happen if the program extends the lifetime of the azmq::socket object into the completion handler by using shared_ptr/shared_from_this(), etc., and then exits the io_context.run() by doing io_context.stop() (for example as reaction to receiving SIGINT/SIGTERM).

Is this normal/expected? One "obvious" solution is to call socket.cancel() to abort and destroy all the pending completion handlers and then also destroy all the azmq::socket objects, instead of using io_context.stop(). This comment in a cppzmq issue suggests that this is even necessary: zeromq/cppzmq#139 (comment)

However, other boost::asio objects do not seem to have such requirements (though I don't know whether that's intentional or just coincidence). Should there perhaps be some sort of auto-close mechanism to avoid blocking the io_context destructor?

Small example:

// Build: g++ -Wall -g azmq_shutdown_hang.cpp -lzmq -lboost_filesystem -o azmq_shutdown_hang

#include <azmq/socket.hpp>
#include <zmq.hpp>

#include <array>
#include <memory>
#include <stdio.h>

int main()
{
	boost::asio::io_context ioctx;
	auto socket = std::make_shared<azmq::socket>(ioctx, ZMQ_PULL);

	socket->set_option(azmq::socket::linger(0));
	socket->connect("tcp://127.0.0.1:0");
	std::array<uint8_t, 1> buffer;

	// Capturing the shared_ptr<socket> into the completion handler lambda extends the socket's life-time beyond that of the io_context.
	// Usually the socket would be destroyed first (if it or the shared_ptr is declared after the io_context), but in this case it is not.
	// The io_context destructor should destroy the pending operation and its completion handler (without calling it),
	// which would also finally destroy the socket, but apparently the io_context hangs instead.
	socket->async_receive(boost::asio::buffer(buffer),
		[socket](boost::system::error_code const& ec, size_t)
		{
			printf("async_receive completion handler, ec = %s\n", ec.message().c_str());
		}
	);

	// Calling cancel() removes the pending async operation, so the socket is destroyed before the io_service again, then it does not hang.
	//socket->cancel();

	printf("destroying io_context, does it hang?...\n");
	return 0;
}

Backtrace of the hang:

#0  0x00007ffff7b3fd7f in __GI___poll (fds=0x7fffffffd7a0, nfds=1, timeout=-1)
    at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007ffff7f34dde in zmq::signaler_t::wait(int) const () from /usr/local/lib/libzmq.so.5
#2  0x00007ffff7f11d72 in zmq::mailbox_t::recv(zmq::command_t*, int) () from /usr/local/lib/libzmq.so.5
#3  0x00007ffff7f0321f in zmq::ctx_t::terminate() () from /usr/local/lib/libzmq.so.5
#4  0x00007ffff7f5575e in zmq_ctx_term () from /usr/local/lib/libzmq.so.5
#5  0x00005555555829a6 in std::_Sp_counted_deleter<void*, int (*)(void*), std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x5555555d0ae0) at /usr/include/c++/11/bits/shared_ptr_base.h:442
#6  0x000055555556e7d7 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5555555d0ae0)
    at /usr/include/c++/11/bits/shared_ptr_base.h:168
#7  0x000055555556bdbd in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fffffffda28, 
    __in_chrg=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:705
#8  0x000055555556566c in std::__shared_ptr<void, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fffffffda20, 
    __in_chrg=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:1154
#9  0x000055555556cb72 in std::__shared_ptr<void, (__gnu_cxx::_Lock_policy)2>::reset (this=0x5555555d0338)
    at /usr/include/c++/11/bits/shared_ptr_base.h:1272
#10 0x0000555555567bbe in azmq::detail::socket_service::shutdown_service (this=0x5555555d0310)
    at /usr/local/include/azmq/detail/socket_service.hpp:206
#11 0x0000555555565171 in boost::asio::io_context::service::shutdown (this=0x5555555d0310)
    at /usr/local/include/boost/asio/impl/io_context.ipp:148
#12 0x0000555555560637 in boost::asio::detail::service_registry::shutdown_services (this=0x5555555d0180)
    at /usr/local/include/boost/asio/detail/impl/service_registry.ipp:44
#13 0x0000555555560b9b in boost::asio::execution_context::shutdown (this=0x7fffffffdb00)
    at /usr/local/include/boost/asio/impl/execution_context.ipp:41
#14 0x00005555555650a4 in boost::asio::io_context::~io_context (this=0x7fffffffdb00, __in_chrg=<optimized out>)
    at /usr/local/include/boost/asio/impl/io_context.ipp:58
#15 0x000055555555c1be in main () at azmq_shutdown_hang.cpp:35

Tested with boost 1.82.0, libzmq master, azmq master.

@Degoah
Copy link

Degoah commented Oct 17, 2023

Any plans to work on this finding? I have also been caught by this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants