I didn't want to make a new branch and pull request for each of these, so I'll just put them all together (unless you want separate branches):
468d37c adds an optional timeout to keep-alive connections in the HTTP server, via the keyword argument connection_timeout in the HTTPServer constructor. Without this, when the client's connection dropped, the connections would sometimes never get closed, which led to our server accumulating open sockets and eventually running out of file descriptors.
96f057c fixes the IOStream close callback sometimes never getting called if it was delayed due to pending callbacks.
690081f fixes a bug in the draft-10 WebSocket implementation: The size of messages of length 126 was not encoded properly.
Add an optional timeout for keep-alive HTTP connections.
Fix the IOStream close callback not getting called if there are pendi…
_maybe_add_error_listener only does anything if _state is None.
Fix the handling of messages of length 126 in the draft-10 Websocket …
Cool, thanks for the fixes. The iostream and websocket fixes look good, so I'll go ahead and cherry-pick them.
The keepalive timeout change needs a little work. First, a nit: use None instead of -1 for unset timeouts. Second, I think it might be best to have separate timeouts for separate phases of the connection: idle, uploading requests, application processing (this one should maybe be handled in Application rather than HTTPServer), downloading responses. It doesn't necessarily make sense to use the same timeout value for all of these (although maybe a simple end-to-end timeout is worth it for simplicity). The most important of these is probably for the idle phase, since otherwise we let clients easily and accidentally keep connections open forever.
Are you sure you've seen the HTTPServer hold on to an idle connection after the client has closed it (as opposed to clients just keeping connections open for a long time)? The IOStream should detect the close and shut itself down, and if it's not that's a bug (was this how you found the IOStream bug you fixed here?). The keepalive portion of HTTPServer actually hasn't been tested very much in practice since most large-scale uses of tornado are behind nginx which doesn't reuse its client connections.
Yes -- our server has actually run out of file descriptors several times, after running about two weeks (and it isn't that frequented); with the timeout, the number of used file descriptors is now stable. This may very well relate to some problem in the server configuration; although, if TCP keepalive is disabled and the server has no reason to send something over the connection, there would be no reason for it to notice if the client's connection suddenly dropped, or am I missing something here?
This problem existed already before the IOStream bug was introduced.
I'll look into having separate timeouts.
The master branch now has header_timeout and body_timeout arguments; the body_timeout can be overridden in prepare() for handlers in streaming mode.