Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Implement SPDY support #525

Open
wants to merge 3 commits into from
+3,783 −1,000

2 participants

@alekstorm

Introduction

To my knowledge, this patch makes Tornado the first Python web framework to support SPDY (v2), and the second web framework in any language to support pushed streams (Node.js is the other). Websocket-over-SPDY support is coming soon; see below.

If you're not excited about SPDY yet, I recommend reading Google's whitepaper and the specification.

Do not use this in production systems, as it has only been tested on its own test suite.

This introduces minor, but fundamental incompatibilities with current Tornado applications, so merging it should be postponed until Tornado 3.0 gives us carte blanche to break backward compatibility. Until then, I'll maintain it as a separate branch - please submit bug reports and other issues there.

In most cases, users will be able to upgrade their applications to support SPDY simply by swapping out HTTPServer for SPDYServer, since connection framing is transparent to the application layer. AsyncSPDYClient, a client that uses SPDY when available, and HTTP otherwise, is provided as well.

Implementation

See commit message, which covers many changes not mentioned here.

Dependencies

Unfortunately, SPDY will probably not be immediately deployable in production systems, since most load balancers don't support it. Nginx, whose support is due out in the next few weeks, will only support it on the client-facing side of the reverse proxy in the first release, not the backend.

In addition, SPDY framing on the open Internet is negotiated through the NPN extension to TLS, which is only supported in OpenSSL 1.0.1 and Python 3.3. SPDYServer with NPN is tested and confirmed working against both Chrome and Firefox (latest), and AsyncSPDYClient works with both Google's and Twitter's servers.

Still, I highly recommend doing internal testing in anticipation of wider SPDY support. Both SPDYServer and AsyncSPDYClient can be configured to speak SPDY without NPN, and Chrome can be started with the --use-spdy=no-ssl flag.

Server push

UIModules can now push their Javascript and CSS files, and the static_url function available to templates can push the returned URL. Since the default behavior for both of these is easily overridden and users are aware that this patch introduces incompatibilities, I am confident enough to have both push by default, but could be convinced otherwise.

Resources can also be pushed manually with the new RequestHandler.push method, which takes the relative URI of the resource as a parameter. If the request doesn't use SPDY framing, this is a no-op.

Miscellaneous

Previously, TCP connection timeout, premature connection termination, and any HTTP parsing errors were handled in the client by executing the request callback with a fake 599 response; users would then call response.rethrow() to raise an exception. Now, rather than returning a fake response, an exception will be automatically raised in the stack context of the callback. For example:

    @contextlib.contextmanager
    def die_on_error():
        try:
            yield
        except Exception:
            logging.error("exception in asynchronous operation",exc_info=True)
            sys.exit(1)

    with StackContext(die_on_error):
        http_client.fetch(url, callback)

HTTPResponse.rethrow has been removed, but the error attribute has been converted to a boolean representing whether the status code isn't 2xx or 3xx.

Since the websocket module has been updated to use the new HTTPConnection interface, it should, in theory, support SPDY framing. However, Chrome 19 crashes whenever I start it with --enable-websocket-over-spdy, so we'll need to work with the Chromium team to get this fixed. If there were unit tests for the websocket module, I would have updated them.

I borrowed (stole) Mark Nottingham's c_zlib module from his old nbhttp project, since the Python standard library's zlib module lacks pre-set dictionary support, which SPDY requires. It works fine on my Ubuntu machine; reports on which systems it doesn't are more than welcome.

Future work

Next up are implementing a prioritization algorithm and v3 support. The most significant additions to v3 are flow control and the CREDENTIALS frame; time will tell how difficult these are to implement. The code is modularized in such a way that supporting new versions should be transparent to the user.

@alekstorm alekstorm Implement SPDY (v2) support
This is incompatible with Tornado 2.x applications, but users should be able to update their code
without too much trouble.

Changes, in no particular order:

* TLS NPN means that one of many protocols can be selected after a TCP connection is established. A
  layer of indirection was added to TCPServer to allow it to delegate handling of a TCP connection
  to whichever protocol handler was negotiated. If the `npn_protocols` parameter (a list of
  (name, handler) tuples in order of preference) was passed to the constructor, the connection is
  over TLS, and NPN succeeded, the handler for the chosen name will be called. Otherwise, the
  `protocol` constructor parameter will be called. For example, SPDYServer is essentially:

  class SPDYServer(TCPServer):
      def __init__(self, request_callback):
          http_protocol = HTTPServerProtocol(request_callback)
          TCPServer.__init__(self, http_protocol,
              npn_protocols=[
                  ('spdy/2', SPDYServerProtocol(request_callback)),
                  ('http/1.1', http_protocol)])

* TCPServer was moved from netutil to its own module, tcpserver.

* Since utilizing NPN support in Python 3.3 requires the `ssl.SSLContext` class, which isn't
  available in Python 2.x, the wrap_socket() top-level function was added to `netutil` to abstract
  away these details. In addition, the `SUPPORTS_NPN` constant was added as a convenience for
  determining if the system supported NPN.

* Previously, TCP connection timeout, premature connection termination, and any HTTP parsing errors
  were handled in the client by executing the request callback with a fake 599 response. The user
  would then (if they remembered) call response.rethrow() to raise an exception. Separating the TCP
  connection from response handling made this scheme impossible (aside from being un-Pythonic in the
  first place). Instead, stack_context.wrap() was refactored to return a callable with a restore()
  method that returned a context manager that restores the saved stack context within its block.
  The above-mentioned errors are raised within this context. Top-level save() and switch_contexts()
  functions were also added, in case someone finds them useful. This is the primary incompatibility
  with current Tornado applications. `HTTPResponse.rethrow` has been removed, but the `error`
  attribute has been converted to a boolean representing whether the status code isn't 2xx or 3xx.

* The SPDY standard is rapidly evolving. v3 has already been completed, and we're currently working
  on v4. To shield the user from this, the spdy* modules are actually directories that place their
  public interfaces in __init__.py, and their version-specific implementation in v2.py. This allows
  users to `import spdyserver`, and spdy* modules to `import spdyserver.v2`. Once v3 support is
  implemented, it will go in v3.py, and users won't notice the difference.

* Much of the functionality of SimpleAsyncHTTPClient was factored out into QueuedAsyncHTTPClient,
  to allow SPDYClient to take advantage of its queuing logic as well. Subclasses pass a `handler`
  argument to the `initialize` pseudo-constructor, which gets called when a request is ready to be
  processed. If the handler determines that a new TCP connection is needed, it can call the
  `_http_connect` method to open one and return a (conn, address) pair.

* Added push_callback, priority, force_connection, ssl_version attributes to httpclient.HTTPRequest.

* Added framing, associated_urls, associated_to_url attributes to httpclient.HTTPResponse. The
  latter is necessary for notifying the client which URLs the server has promised to push, since
  pushed streams can be finished after the original response stream has.

* Added framing, priority attributes to httpserver.HTTPRequest.

* Since SPDY sessions are long-lived and streams are bidirectional, logic common to both clients and
  servers (stream management) has been factored out into the spdysession module. Each frame type,
  stream finishing events, and stream adding events all provide hooks for subclasses to override
  with client/server-specific functionality.

* Previously, `web.RequestHandler` formatted the HTTP response itself and wrote it directly to the
  IOStream. To allow for SPDY framing, this responsibility has been moved to the
  HTTPRequest.connection object, which must provide the write_preamble() and write() methods - the
  former writes the response status line and headers, while the latter writes a chunk of the
  response body. In addition, the SPDYConnection object provides the push() method, which returns
  a new `SPDYConnection` that will push a resource to the client.

* Body argument-parsing code (application/x-www-form-urlencoded or multipart/form data) needed by
  spdyserver, httpserver, and wsgi (and which was duplicated in the latter two) has been factored
  out into httputil.parse_body_arguments.

* Although IOStream.connect() already takes a callback parameter, in SSLIOStream it's not called
  until the SSL handshake is completed (which contains TLS NPN) - and TCPServer, which doesn't call
  connect(), won't know which protocol handler to execute until that happens. To fix this, a
  set_connect_callback method was added to IOStream.

* Since both SPDYSession and SimpleAsyncHTTPClient find a gzip decompressor useful, a
  GzipDecompressor class was added to the util module.

* A new RequestHandler.push() method has been added that takes a relative URL as a parameter and
  opens a new push stream for it the next time flush() is called. A new HTTPRequest is constructed
  and passed to the RequestHandler's Application instance, just like normal - the only difference
  is that the response will be written to the pushed stream. Since SPDY forbids servers from
  pushing streams after any of the response body has been written to the network to avoid race
  conditions, calling this method after flush() has been called results in an exception.

* The UIModule constructor now takes a `push_files` parameter that determines whether its
  Javascript and CSS files are automatically pushed when the UIModule is rendered. Also, the
  static_url() function provided to templates now takes an optional `push` parameter, which
  likewise causes the resulting URL to be pushed. Since the default for both of these parameters is
  easily overridden and users are aware that this patch introduces incompatibilities, I am
  confident enough to have both of these parameters default to `True`, but could be convinced
  otherwise.

* Gzip body encoding is automatically chosen if the `gzip` Application setting is `True` and the
  connection framing is SPDY, regardless of the request's Accept-Encoding header. SPDY specifically
  allows this.

* The boolean `spdy` Application parameter was added to cause RequestHandler to add an
  "Alternate-Protocol: 443:npn-spdy" header to all responses if the connection framing is not
  already SPDY.

* AsyncSSLTestCase has been added to the testing module as a base class for all tests that require
  SSL connections.

* Since the websocket module has been updated to use the new HTTPConnection interface, it should,
  in theory, support SPDY framing. However, Chrome 19 crashes whenever I start it with
  --enable-websocket-over-spdy, so we'll need to work with the Chromium team to get this fixed.
  If there were unit tests for the websocket module, I would have updated them.

* The spdyutil module, the bulk of which is in its v2 submodule, provides data structures
  representing SPDY frames (which can serialize themselves), as well as a non-blocking parse_frame
  function.

* Mark Nottingham's `c_zlib` module has been borrowed (stolen) from his old nbhttp
  (https://github.com/mnot/nbhttp/tree/spdy/src) project, since the Python standard library's
  `zlib` module lacks pre-set dictionary support, which SPDY uses for header compression.
d29d366
@alekstorm alekstorm referenced this pull request
Closed

Implement SPDY protocol #486

@bdarnell
Owner

Cool! I'm probably not going to look too closely at this yet since as you say it's not really mergeable unless and until we decide to take the backwards-compatibility hit (and I'm initially iffy on this change to error handling. I recognize that simple_httpclient currently has a habit of throwing exceptions into the StackContext, but I consider that a flaw since it's so hard to do any real recovery in that case. I have been exploring alternative error handling patterns though, so maybe things in this area will change in 3.0), but it's great that it exists for people who are ready to experiment with it.

Even without the backwards-compatibility concerns, I'm inclined to be very conservative about merging this. We added websocket support prematurely and it ended up being a big headache. Since this is going to be a long-lived fork, feel free to submit separate pull requests for the non-spdy-specific parts that may make it easier to keep your fork up to date. (it would be great if this could eventually become a module installed alongside a regular tornado installation rather than a fork, but I suspect that some parts like the push support would require changes to the core classes that don't make sense to merge before full spdy support goes in)

The python 3.3 requirement for NPN is unfortunate (and is another reason to not even think about merging this for a few months). I wonder if it's worth looking at third-party openssl wrappers.

@bdarnell
Owner

For an example of what I was saying about error recovery being difficult when exceptions just get thrown to the StackContext, see https://github.com/alekstorm/tornado/blob/spdy/tornado/test/process_test.py#L90. This doesn't work (at least not in my environment; there could be subtle differences in when/how this exception gets raised) because the exception is thrown directly into the test's StackContext rather than being raised at the appropriate location. This could be fixed by defining an appropriate StackContext in the synchronous HTTPClient, but in general that seems a lot more complicated than the old way.

@alekstorm
@alekstorm

You're right that it doesn't work - I'm not sure why I thought it would when changing it. But sticking the fetch() call inside an ExceptionStackContext context manager should do the trick, shouldn't it?

@bdarnell
Owner
@alekstorm
@bdarnell bdarnell added the multiple label
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jun 3, 2012
  1. @alekstorm

    Implement SPDY (v2) support

    alekstorm authored
    This is incompatible with Tornado 2.x applications, but users should be able to update their code
    without too much trouble.
    
    Changes, in no particular order:
    
    * TLS NPN means that one of many protocols can be selected after a TCP connection is established. A
      layer of indirection was added to TCPServer to allow it to delegate handling of a TCP connection
      to whichever protocol handler was negotiated. If the `npn_protocols` parameter (a list of
      (name, handler) tuples in order of preference) was passed to the constructor, the connection is
      over TLS, and NPN succeeded, the handler for the chosen name will be called. Otherwise, the
      `protocol` constructor parameter will be called. For example, SPDYServer is essentially:
    
      class SPDYServer(TCPServer):
          def __init__(self, request_callback):
              http_protocol = HTTPServerProtocol(request_callback)
              TCPServer.__init__(self, http_protocol,
                  npn_protocols=[
                      ('spdy/2', SPDYServerProtocol(request_callback)),
                      ('http/1.1', http_protocol)])
    
    * TCPServer was moved from netutil to its own module, tcpserver.
    
    * Since utilizing NPN support in Python 3.3 requires the `ssl.SSLContext` class, which isn't
      available in Python 2.x, the wrap_socket() top-level function was added to `netutil` to abstract
      away these details. In addition, the `SUPPORTS_NPN` constant was added as a convenience for
      determining if the system supported NPN.
    
    * Previously, TCP connection timeout, premature connection termination, and any HTTP parsing errors
      were handled in the client by executing the request callback with a fake 599 response. The user
      would then (if they remembered) call response.rethrow() to raise an exception. Separating the TCP
      connection from response handling made this scheme impossible (aside from being un-Pythonic in the
      first place). Instead, stack_context.wrap() was refactored to return a callable with a restore()
      method that returned a context manager that restores the saved stack context within its block.
      The above-mentioned errors are raised within this context. Top-level save() and switch_contexts()
      functions were also added, in case someone finds them useful. This is the primary incompatibility
      with current Tornado applications. `HTTPResponse.rethrow` has been removed, but the `error`
      attribute has been converted to a boolean representing whether the status code isn't 2xx or 3xx.
    
    * The SPDY standard is rapidly evolving. v3 has already been completed, and we're currently working
      on v4. To shield the user from this, the spdy* modules are actually directories that place their
      public interfaces in __init__.py, and their version-specific implementation in v2.py. This allows
      users to `import spdyserver`, and spdy* modules to `import spdyserver.v2`. Once v3 support is
      implemented, it will go in v3.py, and users won't notice the difference.
    
    * Much of the functionality of SimpleAsyncHTTPClient was factored out into QueuedAsyncHTTPClient,
      to allow SPDYClient to take advantage of its queuing logic as well. Subclasses pass a `handler`
      argument to the `initialize` pseudo-constructor, which gets called when a request is ready to be
      processed. If the handler determines that a new TCP connection is needed, it can call the
      `_http_connect` method to open one and return a (conn, address) pair.
    
    * Added push_callback, priority, force_connection, ssl_version attributes to httpclient.HTTPRequest.
    
    * Added framing, associated_urls, associated_to_url attributes to httpclient.HTTPResponse. The
      latter is necessary for notifying the client which URLs the server has promised to push, since
      pushed streams can be finished after the original response stream has.
    
    * Added framing, priority attributes to httpserver.HTTPRequest.
    
    * Since SPDY sessions are long-lived and streams are bidirectional, logic common to both clients and
      servers (stream management) has been factored out into the spdysession module. Each frame type,
      stream finishing events, and stream adding events all provide hooks for subclasses to override
      with client/server-specific functionality.
    
    * Previously, `web.RequestHandler` formatted the HTTP response itself and wrote it directly to the
      IOStream. To allow for SPDY framing, this responsibility has been moved to the
      HTTPRequest.connection object, which must provide the write_preamble() and write() methods - the
      former writes the response status line and headers, while the latter writes a chunk of the
      response body. In addition, the SPDYConnection object provides the push() method, which returns
      a new `SPDYConnection` that will push a resource to the client.
    
    * Body argument-parsing code (application/x-www-form-urlencoded or multipart/form data) needed by
      spdyserver, httpserver, and wsgi (and which was duplicated in the latter two) has been factored
      out into httputil.parse_body_arguments.
    
    * Although IOStream.connect() already takes a callback parameter, in SSLIOStream it's not called
      until the SSL handshake is completed (which contains TLS NPN) - and TCPServer, which doesn't call
      connect(), won't know which protocol handler to execute until that happens. To fix this, a
      set_connect_callback method was added to IOStream.
    
    * Since both SPDYSession and SimpleAsyncHTTPClient find a gzip decompressor useful, a
      GzipDecompressor class was added to the util module.
    
    * A new RequestHandler.push() method has been added that takes a relative URL as a parameter and
      opens a new push stream for it the next time flush() is called. A new HTTPRequest is constructed
      and passed to the RequestHandler's Application instance, just like normal - the only difference
      is that the response will be written to the pushed stream. Since SPDY forbids servers from
      pushing streams after any of the response body has been written to the network to avoid race
      conditions, calling this method after flush() has been called results in an exception.
    
    * The UIModule constructor now takes a `push_files` parameter that determines whether its
      Javascript and CSS files are automatically pushed when the UIModule is rendered. Also, the
      static_url() function provided to templates now takes an optional `push` parameter, which
      likewise causes the resulting URL to be pushed. Since the default for both of these parameters is
      easily overridden and users are aware that this patch introduces incompatibilities, I am
      confident enough to have both of these parameters default to `True`, but could be convinced
      otherwise.
    
    * Gzip body encoding is automatically chosen if the `gzip` Application setting is `True` and the
      connection framing is SPDY, regardless of the request's Accept-Encoding header. SPDY specifically
      allows this.
    
    * The boolean `spdy` Application parameter was added to cause RequestHandler to add an
      "Alternate-Protocol: 443:npn-spdy" header to all responses if the connection framing is not
      already SPDY.
    
    * AsyncSSLTestCase has been added to the testing module as a base class for all tests that require
      SSL connections.
    
    * Since the websocket module has been updated to use the new HTTPConnection interface, it should,
      in theory, support SPDY framing. However, Chrome 19 crashes whenever I start it with
      --enable-websocket-over-spdy, so we'll need to work with the Chromium team to get this fixed.
      If there were unit tests for the websocket module, I would have updated them.
    
    * The spdyutil module, the bulk of which is in its v2 submodule, provides data structures
      representing SPDY frames (which can serialize themselves), as well as a non-blocking parse_frame
      function.
    
    * Mark Nottingham's `c_zlib` module has been borrowed (stolen) from his old nbhttp
      (https://github.com/mnot/nbhttp/tree/spdy/src) project, since the Python standard library's
      `zlib` module lacks pre-set dictionary support, which SPDY uses for header compression.
Commits on Jun 10, 2012
  1. @alekstorm
  2. @alekstorm
Something went wrong with that request. Please try again.