Need an explicit way to signal when it is safe to destroy an endpoint #501

Thalhammer · 2016-01-02T19:32:52Z

I have a application that allows creating and removing new servers at runtime.
To do this I have a wrapper class that contains a instance of

websocketpp::server<websocketpp::config::asio>

In the constructor I initialize the server instance using this code:

_server.init_asio(&service);
auto os = Logger::GetOutputStream();
_log_out = os;
_server.get_alog().set_ostream(_log_out.get());
_server.get_elog().set_ostream(_log_out.get());

_server.set_message_handler(boost::bind(&ws_on_message, this, ::_1, ::_2));
_server.set_close_handler(boost::bind(&ws_on_close, this, ::_1));
_server.set_open_handler(boost::bind(&ws_on_open, this, ::_1));
_server.set_http_handler(boost::bind(&ws_on_http, this, ::_1));

...
_server.set_reuse_addr(reuse_addr);
_server.listen(config_port);
_server.start_accept();

Inside my destructor I stop listening and iterate over all connections calling close on them.
Afterwards I call stop on the server

_server.stop_listening();
{
    std::unique_lock<std::mutex> lck(_connections_mutex);
    for (auto& c : _connections)
    {
        _server.close(c.first, websocketpp::close::status::going_away, "Server shutdown");
    }
}
_server.stop();

However I have a problem.
Some time later when the server tries to write anything to the log it crashes while locking the mutex in logger/basic.hpp

void write(level channel, std::string const & msg) {
        scoped_lock_type lock(m_lock);            // <= here
        if (!this->dynamic_test(channel)) { return; }
        *m_out << "[" << timestamp << "] "
                  << "[" << names::channel_name(channel) << "] "
                  << msg << "\n";
        m_out->flush();
    }

I suppose that some operations are still running in the asio pool which try to access the already closed log file.
Is this the correct way to shutdown the server ?
How can I force the server to wait for all its asynchronous tasks to exit.
I can not shutdown the asio pool because it is shared by several tasks.
I'm not sure whether my code is wrong or not.

zaphoyd · 2016-01-31T19:26:53Z

Firstly, you'll want to get rid of the call to _server.stop(). That forcibly kills all Asio requests immediately (and may also kill everything else on your Asio pool) and does not allow the stop_listening and close calls to finish naturally.

To rephrase the issue then: In the case where you are using a shared Asio io_service pool where you can't simply wait until the asio pool is empty. When is it safe to let the server endpoint go out of scope?

I don't have a great answer immediately. A stopgap would be to simply wait at least as long as the close handshake timeout (default is 5 seconds, but you can configure that). Longer term, I will look at two changes:

Pull the loggers out into a separate heap object managed by a smart pointer so that they stick around as long as needed rather than being deallocated when the server goes away.
Have the endpoint method that blocks until endpoint shutdown tasks are complete.

jupp0r · 2016-03-30T14:21:59Z

change number 1 is in pull #537

adds related changelog entry and fix one overzealous find and replace operation.

zaphoyd · 2016-04-06T13:40:43Z

@jupp0r's patch has been committed to develop. This should fix the crash when log entries are printed after an endpoint is deallocated. Any further testing from folks who were experiencing this problem would be appreciated.

harrishancock · 2016-09-26T18:51:28Z

Seems to work for me. I had an accept loop that would occasionally die for no discernible reason -- analyzing the problem led me to discover that my code was affected by this issue. I upgraded from websocketpp 0.7.0 to develop a month ago, and my accept loop doesn't die anymore.

zaphoyd added the Bug label Jan 31, 2016

zaphoyd changed the title ~~Crash on server shutdown~~ Need an explicit way to signal when it is safe to destroy an endpoint Jan 31, 2016

zaphoyd pushed a commit that referenced this issue Apr 6, 2016

Store loggers in shared pointers references #501 #537

94fe989

adds related changelog entry and fix one overzealous find and replace operation.

zaphoyd added the Needs Testing label Apr 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need an explicit way to signal when it is safe to destroy an endpoint #501

Need an explicit way to signal when it is safe to destroy an endpoint #501

Thalhammer commented Jan 2, 2016

zaphoyd commented Jan 31, 2016

jupp0r commented Mar 30, 2016

zaphoyd commented Apr 6, 2016

harrishancock commented Sep 26, 2016

Need an explicit way to signal when it is safe to destroy an endpoint #501

Need an explicit way to signal when it is safe to destroy an endpoint #501

Comments

Thalhammer commented Jan 2, 2016

zaphoyd commented Jan 31, 2016

jupp0r commented Mar 30, 2016

zaphoyd commented Apr 6, 2016

harrishancock commented Sep 26, 2016