Refactor TCPServer and HTTPServer to support TLS NPN #533

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
2 participants
Contributor

alekstorm commented Jun 8, 2012

  • TLS NPN means that one of many protocols can be selected after a TCP connection is established. A layer of indirection was added to TCPServer to allow it to delegate handling of a TCP connection to whichever protocol handler was negotiated. If the npn_protocols parameter (a list of (name, handler) tuples in order of preference) was passed to the constructor, the connection is over TLS, and NPN succeeded, the handler for the chosen name will be called. Otherwise, the protocol constructor parameter will be called. For example, SPDYServer is essentially:
  class SPDYServer(TCPServer):
      def __init__(self, request_callback):
          http_protocol = HTTPServerProtocol(request_callback)
          TCPServer.__init__(self, http_protocol,
              npn_protocols=[
                  ('spdy/2', SPDYServerProtocol(request_callback)),
                  ('http/1.1', http_protocol)])
  • TCPServer was moved from netutil to its own module, tcpserver.
  • Since utilizing NPN support in Python 3.3 requires the ssl.SSLContext class, which isn't available in Python 2.x, the wrap_socket() top-level function was added to netutil to abstract away these details. In addition, the SUPPORTS_NPN constant was added as a convenience for determining if the system supported NPN.
  • Previously, web.RequestHandler formatted the HTTP response itself and wrote it directly to the IOStream. This responsibility has been moved to the HTTPRequest.connection object, which must provide the write_preamble() and write() methods - the former writes the response status line and headers, while the latter writes a chunk of the response body.
  • Although IOStream.connect() already takes a callback parameter, in SSLIOStream it's not called until the SSL handshake is completed (which contains TLS NPN) - and TCPServer, which doesn't call connect(), won't know which protocol handler to execute until that happens. To fix this, a set_connect_callback method was added to IOStream.
  • Snippets that conditionally imported BytesIO and ssl were moved into util and netutil, respectively. These symbols are now imported from there.

From the SPDY fork

@alekstorm alekstorm Refactor TCPServer and HTTPServer to support TLS NPN
* TLS NPN means that one of many protocols can be selected after a TCP connection is established. A
  layer of indirection was added to TCPServer to allow it to delegate handling of a TCP connection
  to whichever protocol handler was negotiated. If the `npn_protocols` parameter (a list of
  (name, handler) tuples in order of preference) was passed to the constructor, the connection is
  over TLS, and NPN succeeded, the handler for the chosen name will be called. Otherwise, the
  `protocol` constructor parameter will be called. For example, SPDYServer is essentially:

  class SPDYServer(TCPServer):
      def __init__(self, request_callback):
          http_protocol = HTTPServerProtocol(request_callback)
          TCPServer.__init__(self, http_protocol,
              npn_protocols=[
                  ('spdy/2', SPDYServerProtocol(request_callback)),
                  ('http/1.1', http_protocol)])

* TCPServer was moved from netutil to its own module, tcpserver.

* Since utilizing NPN support in Python 3.3 requires the `ssl.SSLContext` class, which isn't
  available in Python 2.x, the wrap_socket() top-level function was added to `netutil` to abstract
  away these details. In addition, the `SUPPORTS_NPN` constant was added as a convenience for
  determining if the system supported NPN.

* Previously, `web.RequestHandler` formatted the HTTP response itself and wrote it directly to the
  IOStream. This responsibility has been moved to the HTTPRequest.connection object, which must
  provide the write_preamble() and write() methods - the former writes the response status line and
  headers, while the latter writes a chunk of the response body.

* Although IOStream.connect() already takes a callback parameter, in SSLIOStream it's not called
  until the SSL handshake is completed (which contains TLS NPN) - and TCPServer, which doesn't call
  connect(), won't know which protocol handler to execute until that happens. To fix this, a
  set_connect_callback method was added to IOStream.

* Snippets that conditionally imported BytesIO and ssl were moved into util and netutil,
  respectively. These symbols are now imported from there.
37e5bff
Owner

bdarnell commented Jun 13, 2012

I don't think TCPServer is the right layer for NPN stuff to happen.
We'll eventually need to refactor the way IOStream and SSLIOStream
interact to support protocols that use the STARTTLS pattern; by that
point I'd like to deprecate all the ssl-related stuff in TCPServer
(i.e. HTTPServer.handle_stream would get a regular IOStream that it
would immediately switch to SSL mode if so configured).

In the meantime, my thinking for NPN support is that things that take
ssl_options now should also take an ssl_context, but the only
npn-specific functionality in any current tornado code would be an
accessor to selected_npn_protocol on SSLIOStream. SPDYServer would
then pass an SSLContext initialized with the appropriate npn options
to TCPServer.__init__, and in handle_stream it would look at
selected_npn_protocol to decide whether to go into HTTP mode or SPDY
mode.

As for the other changes in this diff, I'll look more closely at them
if they either get their own pull request or we resolve things with this one,
but here are my initial comments:

  • I'm still -1 on giving TCPServer its own file.
  • I'm not sure IOStream.set_connect_callback is necessary with the npn
    changes outlined above
  • A new wrap_socket abstraction seems reasonable for bridging the gap
    between the old-style ssl_options and SSLContexts, but again I'd
    let apps that need NPN pass in the whole SSLContext.
  • Moving the HTTP response formatting to the connection makes sense.
    We use the term headers elsewhere in tornado to refer to the status
    line+header block, and I think I'd prefer to keep it that way
    instead of introducing the term preamble.
  • Consolidating the conditional import of BytesIO and ssl is a good change.
Contributor

alekstorm commented Jun 15, 2012

On Tue, Jun 12, 2012 at 11:38 PM, bdarnell <
reply@reply.github.com

wrote:

I don't think TCPServer is the right layer for NPN stuff to happen.
We'll eventually need to refactor the way IOStream and SSLIOStream
interact to support protocols that use the STARTTLS pattern; by that
point I'd like to deprecate all the ssl-related stuff in TCPServer
(i.e. HTTPServer.handle_stream would get a regular IOStream that it
would immediately switch to SSL mode if so configured).

I think that's unnecessary. Neither Chrome nor Firefox implement STARTTLS
for HTTP, nor do they plan to, judging by their corresponding feature
requests (http://code.google.com/p/chromium/issues/detail?id=4527,
https://bugzilla.mozilla.org/show_bug.cgi?id=276813). In any case, I'm not
sure how STARTTLS and NPN would interact, but I think keeping HTTPServer
blissfully SSL-unaware for now is an advantage.

In the meantime, my thinking for NPN support is that things that take

ssl_options now should also take an ssl_context, but the only
npn-specific functionality in any current tornado code would be an
accessor to selected_npn_protocol on SSLIOStream.

All the keys in ssl_options overlap with properties of SSLContext - if
they conflict, which takes precedence? The IOStream.socket property
already seems sufficient for exposing things like selected_npn_protocol;
promoting it to an IOStream method seems redundant, and we'd have to do the
same for cipher(), compression(), etc.

SPDYServer would

then pass an SSLContext initialized with the appropriate npn options
to TCPServer.__init__, and in handle_stream it would look at
selected_npn_protocol to decide whether to go into HTTP mode or SPDY
mode.

I am strongly opposed to this. The decision of which application-layer
protocol to use for the connection should be made at the transport layer's
corresponding handler, TCPServer. Shunting it down into each
application-layer *Server's handle_stream will result in virtually
identical blocks of logic like:

def handle_stream(conn, address):
npn_protocol = None if not SUPPORTS_NPN else
conn.selected_npn_protocol()
if npn_protocol == 'spdy/2':
# handle SPDY
elif npn_protocol == 'http/1.1':
# handle HTTP
else:
# handle HTTP

And without factoring out the protocol logic into *ServerProtocol classes,
how will SPDYServer delegate to HTTPServer if 'http/1.1' is negotiated?
Will it create a new instance of HTTPServer (and thereby another
TCPServer), and call handle_stream on it?

Pedagogically speaking, HTTPServer should not subclass TCPServer, because
TCP encapsulates HTTP; HTTP isn't a "type of" TCP. In my patch,
HTTPServer and SPDYServer do inherit from TCPServer, but only as a
convenience, to provide sensible default values for protocol and
npn_protocols. The protocols' implementation should not be placed in
these classes.

The API I've designed is an elegant, flexible solution to this. It allows
users to specify exactly which protocols they'd like to support, in order
of preference. For example, if only spdy/3 is desired (and not spdy/2),
they can do:

http_protocol = HTTPServerProtocol(callback)
server = TCPServer(
http_protocol,
npn_protocols=[
('spdy/3', SPDYServerProtocol(callback)),
('http/1.1', http_protocol)])

Under the system you've proposed, unless I've missed something, this would
require either duplicating large portions of SPDYServer or a lot of hacking.

As for the other changes in this diff, I'll look more closely at them

if they either get their own pull request or we resolve things with this
one,
but here are my initial comments:

  • I'm still -1 on giving TCPServer its own file.

That's fine. Should it go back in netutil? It seems a bit heavyweight for
a utility module. I did have a plan to unflatten the module namespace for
3.0...

  • I'm not sure IOStream.set_connect_callback is necessary with the npn

    changes outlined above

I'm pretty sure it still will be - we need a way for server-side
connections to know when the SSL handshake has completed. Can you clarify
why it's unnecessary?

  • A new wrap_socket abstraction seems reasonable for bridging the gap

    between the old-style ssl_options and SSLContexts, but again I'd
    let apps that need NPN pass in the whole SSLContext.
    * Moving the HTTP response formatting to the connection makes sense.
    We use the term headers elsewhere in tornado to refer to the status
    line+header block, and I think I'd prefer to keep it that way
    instead of introducing the term preamble.

That seems like asking for confusion. You're saying the term "headers"
should apply to both the actual headers and status_line+headers? I didn't
set out to coin a new term - I searched high and low for a preexisting one

  • but it seemed necessary. As an ugly, but unambiguous alternative, how
    about write_status_headers()?

  • Consolidating the conditional import of BytesIO and ssl is a good change.


    Reply to this email directly or view it on GitHub:
    facebook#533 (comment)

Owner

bdarnell commented Jun 17, 2012

STARTTLS for HTTP is a non-starter, but it's necessary for other protocols (notably IMAP and SMTP). To be clear, the change I'm referring to would be to consolidate IOStream and SSLIOStream in one class, with a method to start an SSL handshake (and maybe the reverse, like SSLSocket.unwrap). In this world TCPServer would be ignorant of SSL, but an HTTPServer configured for ssl would unconditionally call start_tls() as soon as it was handed a new IOStream.

I think it should be an error to specify both ssl_options and ssl_context. I'm indifferent as to whether we create accessors on SSLIOStream for things like selected_npn_protocol (I was thinking we already had a precedent for this in get_ssl_certificate, but now that I look that's on HTTPRequest, and it just reaches around the IOStream and goes straight to the socket)

I'm strongly opposed to TCPServer processing the NPN information - in my mind there is only ever one application-level protocol which may switch between different modes depending on the content of the communication. It's just a historical accident that the switch between SPDY and HTTP is done out-of-band by NPN while the switch between HTTP 1.0 and 1.1 is done by a version number in the first line. (and websockets use a third switching mechanism).

I hadn't grokked that you wanted set_connect_callback to set a "connect" callback for server-side connections. Aside from the slightly-confusing name it seems reasonable, although this concern vanishes if we merge IOStream and SSLIOStream and start the handshake with start_tls(callback) instead of doing it in the constructor.

The headers vs first-line confusion that you're afraid of already exists in both HTTPServer and SimpleAsyncHTTPClient (on_headers callbacks) and RequestHandler (_generate_headers()), and it hasn't been a problem. That said, I don't feel too strongly about this.

Contributor

alekstorm commented Jun 18, 2012

STARTTLS for HTTP is a non-starter, but it's necessary for other protocols
(notably IMAP and SMTP). To be clear, the change I'm referring to would be
to consolidate IOStream and SSLIOStream in one class, with a method to
start an SSL handshake (and maybe the reverse, like SSLSocket.unwrap). In
this world TCPServer would be ignorant of SSL, but an HTTPServer configured
for ssl would unconditionally call start_tls() as soon as it was handed a
new IOStream.

Sure, I understand what you're envisioning for IOStream. In fact, since
either side of the connection can initiate a new handshake at any time to
renegotiate security parameters, you might want to call it something more
general than start_tls(). It's important to note that this pattern can
seamlessly coexist with my TCPServer/NPN proposal (NPN would obviously be
disabled when TLS is being negotiated mid-session).

I think it should be an error to specify both ssl_options and ssl_context.
I'm indifferent as to whether we create accessors on SSLIOStream for
things like selected_npn_protocol (I was thinking we already had a
precedent for this in get_ssl_certificate, but now that I look that's on
HTTPRequest, and it just reaches around the IOStream and goes straight to
the socket)

Okay, that'll work - I thought you meant ssl_context would be a member of
ssl_options. Maybe that is a better strategy than expanding ssl_options to
be the union of the 2.x and 3.x wrap_socket options.

I'm strongly opposed to TCPServer processing the NPN information - in my
mind there is only ever one application-level protocol which may switch
between different modes depending on the content of the communication.
It's just a historical accident that the switch between SPDY and HTTP is
done out-of-band by NPN while the switch between HTTP 1.0 and 1.1 is done
by a version number in the first line. (and websockets use a third
switching mechanism).

It's not just a historical accident - I think NPN will become a major
feature of the Internet within the next few years. First of all,
negotiating OOB, without involving the application protocol at all, allows
the protocol to freely evolve without worrying about keeping its versioning
information backwards-compatible. Second, it eliminates cross-protocol
attacks. And it's not just for negotiating different versions of the same
protocol - I could advertise ['websocket', 'http/1.1'] to establish
bidirectional client-server communication through WebSocket or long
polling. There is simply no way to do that without NPN.

I'm trying to understand how an application-level server like SPDYServer
would work under your proposal. Could you sketch it out in pseudocode? How
would it delegate to HTTPServer if 'http/1.1' is negotiated through NPN?

I hadn't grokked that you wanted set_connect_callback to set a "connect"
callback for server-side connections. Aside from the slightly-confusing
name it seems reasonable, although this concern vanishes if we merge
IOStream and SSLIOStream and start the handshake with start_tls(callback)
instead of doing it in the constructor.

I agree that it's confusing, but I was just using preexisting terminology:
SSLIOStream, perhaps unwisely, redefined "connect" to mean "when the SSL
handshake has completed". On non-SSL server-side connections,
set_connect_callback() makes no sense - maybe it should be moved to
SSLIOStream (which, I know, may be merged with IOStream)?

The headers vs first-line confusion that you're afraid of already exists in
both HTTPServer and SimpleAsyncHTTPClient (on_headers callbacks) and
RequestHandler (_generate_headers()), and it hasn't been a problem. That
said, I don't feel too strongly about this.

Okay, write_headers() it is.

Contributor

alekstorm commented Jun 30, 2012

Just tried moving TCPServer back into netutil, but that creates a circular dependency between netutil and iostream. What would you like me to do instead? This is one of the reasons that utility modules are generally kept as lightweight and dependency-free as possible.

bdarnell added the tcpserver label Jul 16, 2014

Owner

bdarnell commented Jul 5, 2015

Most of the changes from this PR have been incorporated in some form or another over the years.

bdarnell closed this Jul 5, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment