Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factor HTTP-specific code into HTTPServer subclass of TCPServer #353

Merged
merged 2 commits into from Sep 15, 2011

Conversation

alekstorm
Copy link
Contributor

Since almost no logic in TCPServer was HTTP-specific, this was a surprisingly small change that provides a convenient server base class for application-layer protocols (I've already subclassed this for an SMTP framework). In particular, this makes SSL automatically available to any subclass. If this is accepted, however, TCPServer may need to be moved to a different module altogether, or the httpserver module renamed. I'm also unsure about the name "TCPServer", since using other transport-layer protocols such as UDP may be possible.

@jehiah
Copy link
Contributor

jehiah commented Sep 9, 2011

perhaps StreamServer would be a good name if it's agnostic to TCP/UDP

@alekstorm
Copy link
Contributor Author

I agree. Ben?

@bdarnell
Copy link
Member

I think this is mostly a good idea, although it's unfortunate that it further fragments the documentation of stuff involved in starting a typical tornado server. Maybe the summary of the three configuration patterns should stay in the HTTPServer class docstring?

Should the new class have all three configuration methods for consistency with HTTPServer, or is this a chance to clean up/narrow the interface? I suppose it's best to keep it consistent, since there are no plans to actually deprecate anything on the HTTPServer side.

As for the name, I could go with either TCPServer or StreamServer. The listen() and bind() interfaces are TCP-specific, but add_sockets can work with non-TCP streaming sockets (not UDP, since it's not a connection/stream-oriented protocol, but it works with e.g. unix domain sockets). If we call it StreamServer, the hack to remap non-IP addresses to 0.0.0.0 should probably be moved down to HTTPServer (but it can stay if we call it TCPServer and pretend unix sockets are just a weird degenerate case of TCP).

tornado.netutil is probably a better home for this class than tornado.httpserver.

Rather than taking a handle_stream method in its constructor, TCPServer should just call self.handle_stream and let that method be defined/overridden in the subclass. Otherwise it's confusing to have TCPServer.handle_stream and HTTPServer._handle_stream which are very subtly different.

@alekstorm
Copy link
Contributor Author

I agree; I wasn't sure how you wanted the documentation formatted (and I hope you don't mind the noise from me correcting the Sphinx backtick-escaping). If the process for starting a server were repaired, I don't think this multi-pattern problem would arise. Since (AFAIK) the development version isn't liable for breaking compatibility with itself in future versions, perhaps this question is best saved for the near future.

I think you've actually made a pretty good case for a three-level hierarchy of StreamServer -> TCPServer -> HTTPServer, where TCPServer adds the listen() and bind() methods, as well as the 0.0.0.0 IP address hack.

I've never felt particularly comfortable with callbacks to object methods for this very reason: they conflict with inheritance. I'll make the changes, of course.

@bdarnell
Copy link
Member

StreamServer->TCPServer->HTTPServer makes sense conceptually (although it's a little funny that TCPServer.add_sockets will work with non-TCP sockets; from another perspective you could have StreamServer multiply-inheriting from TCPServer and UnixSocketServer), but an extra layer in the hierarchy seems pretty heavyweight for this. I think it's OK if StreamServer has some optional TCP-specific methods, and the 0.0.0.0 hack can move down into HTTPServer (or even HTTPConnection) closer to the point where it assumes that the remote address has an IP in it.

And thanks for fixing up the sphinx markup. One nit - you've got an extra backquote around the "1" in start's docstring. (I actually think numeric literals may look better as normal text than styled as code, especially next to comparison operators that aren't styled).

@alekstorm
Copy link
Contributor Author

I just remembered why I had TCPServer take a callback - HTTPServer does as well, and I thought both should have similar interfaces. Theoretically, I should be able to build a TCP-level framework, just as I could an HTTP-level one. What do you think?

@bdarnell
Copy link
Member

I think it's awkward to mix composition and inheritance in the same pair of classes. The handle_stream constructor argument is a composition-style interface, which would make more sense if HTTPServer did not subclass TCPServer but instead had a TCPServer as a member variable (but then you'd have to manually delegate all the other methods). I think it's reasonable to assume that most other uses of TCPServer will be subclasses that look a lot like HTTPServer, rather than instantiations of a plain TCPServer with a callback argument.

Also, the single-callback interface between HTTPServer and e.g. Application has proven to be somewhat limiting, and encourages adding unrelated methods and attributes to httpserver.HTTPRequest. It's not quite the same thing since I think Application probably shouldn't be a subclass of HTTPServer, but in the case of TCPServer requiring subclassing gives us more flexibility to evolve the interface in the future.

@alekstorm
Copy link
Contributor Author

I had originally solved this problem by defining the callback function creating an HTTPConnection within the HTTPServer constructor itself, refactoring TCPServer in the same way (moving _handle_connection() to a local variable within the constructor). It was slightly less pretty due to the indentation, but allowed subclasses of arbitrary depth to pass callbacks up the inheritance chain without worrying about overwriting each others' handle_stream attribute, or whatever.

Another issue I have with your proposed solution is that it abuses inheritance to override non-behavior (private) methods. IMO, overriding should affect only the public interface, not provide internal hooks for subclasses to plug into - this forces subclasses to depend on the internal implementation details of their parent classes, effectively creating two interfaces: one for consumers of the class, and another for subclasses. The idea of "hooks" in general is most elegantly implemented with callbacks.

@alekstorm
Copy link
Contributor Author

I forgot to ask - what about Unix domain sockets is HTTP-specific? It's reasonable for everything above the IP layer to assume the remote host has an IP address, isn't it? Moving the 0.0.0.0 hack down to HTTPServer would be awkward, since the whole connection object would have to be passed down through the callback or overridden method hook.

Also, though I disagree with the direction you'd like to go, I agree that the hybrid design of TCPServer/HTTPServer as it currently exists is untenable, and only one of subclassing or callbacks should be used. And just to be explicit, you'd like a two-level class hierarchy, with the superclass called StreamServer, right?

@bdarnell
Copy link
Member

Nothing about unix domain sockets is HTTP-specific, but the assumption that the underlying network layer is IP-based is. I don't think unix domain sockets have a meaningful remote address, but other non-IP network layers might, so it's best to pass the socket address around in its original form until something wants to do something specific with it (here, it's when HTTPRequest wants a remote_ip attribute). Unless, of course, we just want to pretend the whole world is IP and make the base class TCPServer instead of StreamServer (which is not unreasonable). HTTPConnection could peek into stream.socket.family, which is probably safer than passing the connected socket around separately since actually using the socket for anything else could confuse the IOStream.

Yes, I want a two-level hierarchy. I would be OK with either name for the base class. TCPServer actually sounds a little better to me, even though it's slightly inaccurate in the case of unix sockets (TCP is the case we care about; we only support unix sockets because it's basically free to do so with the socket abstraction, so the occasional TCPism like the 0.0.0.0 hack don't bother me much).

You're right that designing for hooks by inheritance is effectively creating two interfaces. I don't think that's necessarily a bad thing - splitting the two apart is cleaner, but for small classes like this I find that it increases the cognitive weight of the system. In such cases I prefer to treat the subclass interface as if it were public (i.e. handle_stream instead of _handle_stream) and just call out in the docs that it's intended for subclasses.

@alekstorm
Copy link
Contributor Author

I may be misreading the specification, but I don't see how HTTP assumes anything about the layer encapsulating it - AFAIK, network layers are completely orthogonal, and for very good reasons. Granted, it is aware of DNS resolution, but only to identify the requested domain name in the Host header, not to address the client. The hack to handle unix domain sockets seems useful to other protocols that are also "normally" delivered over TCP, not just HTTP. I don't think I've ever had this much discussion over two lines of code :)

@bdarnell
Copy link
Member

Yeah, HTTP doesn't care (except in that URIs have hostnames and ports in them and I don't think you can construct an http URI that refers to anything but TCP). What I meant to say is that tornado.httpserver.HTTPRequest (and not any of the intermediate layers) assumes that the other end of the socket has an IP.

The two lines have always been a hack and will be a hack wherever we put them, it's just a question of whether we want to say that the lower-level stream/tcp server works with all SOCK_STREAM sockets (and exposes the most general/lowest-common-denominator interface possible) or whether it is designed for TCP and may incidentally support other socket types where they can be hammered into a TCP-like shape. I don't actually care that strongly one way or the other, I just think that supporting both separately with a three-level hierarchy would be overengineering.

…rridden method hook, move IP address hack for unix domain sockets to `HTTPConnection`
@alekstorm
Copy link
Contributor Author

Done. Let me know if I forgot anything.

@bdarnell
Copy link
Member

Looks good. I still think it's worth having the overview in HTTPServer's docs, even if it's kind of redundant, so I've put it back in. You also missed the tornado.process import in netutil.py.

@bdarnell bdarnell merged commit 437c477 into tornadoweb:master Sep 15, 2011
@kachayev
Copy link
Contributor

Ben, according to comments in this thread and some refactoring, which have already made here, I want to ask: why do not to improve this functionality up to full supporting of TCPServer? I mean, that we can implement some TCPConnection, TCPServer and TCPClient (the same stack as for HTTP) in those way, that it will be possible to write TCP applications.

I implemented similar functionality in the last project in order to give possibility to connect to running HTTPServer via telnet protocol. You can review Torpeda package and some demos here:

https://github.com/kachayev/torpeda

I can try to explain in short words why I think that TCP server/client applications supporting matters:

  1. As I mentioned above, we can implement telnet connection for flexible application management (on-running).
  2. We can implement some functionality which will run (execute) in separated processes (with TCP "conversations"). For ex, multi-player game. Scheme:

Browser -> [websoket] -> serveral Tornado HTTPServer (websocket handling) -> [tcp client] -> Tornado TCPServer (always running game engine, same for all HTTPServers).

I can quickly port my implementation to current Tornado state and send pull request, even with additional interesting application in demos, which will show how to use new facilities. Of course, if it correlates with Tornado framework plans/vision.

@alekstorm
Copy link
Contributor Author

I think you're taking the wrong approach. TCP is a transport layer - there's no such thing as a "TCP application". Rather, TCP provides connection management services like in-order and guaranteed delivery of packets. The TCPConnection class already exists in the standard library as socket, which is converted to an IOStream and passed to the handle_stream callback in TCPServer. If you've written an application implementing the Telnet protocol (which is a layer on top of TCP, just like HTTP), I would refactor it into a TelnetServer subclass of TCPServer - see the HTTPServer constructor for guidance.

On a related note, I have begun to question the wisdom of HTTPServer inheriting from TCPServer at all, given the Russian doll-like enveloping of the protocol stack within a packet. This seems to conflate inheritance with composition - perhaps they can be decoupled entirely, and only communicate via the handle_stream callback.

@bdarnell
Copy link
Member

Yeah, IOStream is essentially what a TCPConnection would be, so I'm not sure what specifically you're suggesting adding to TCPServer. There's not much that would go into a generic TCP client base class (but once AsyncHTTPClient supports connection reuse the connection pooling would be a candidate).

As for inheritance vs composition for HTTPServer and TCPServer, I'm sympathetic to the pro-composition argument, but I think in this case inheritance makes more sense. TCPServer doesn't have a lot of functionality, but what it does have is methods that we want to have available on HTTPServer (this is necessary for backwards compatibility, but even without that consideration I think it still makes sense to reduce the number of layers that are exposed to the end user in the common case). As long as there's a one-to-one mapping between HTTPServer and TCPServer instances, I don't think it really matters much how they interact (on the other hand, if each TCPServer were limited to a single port like HTTPServer used to be, then composition would make more sense so the HTTPServer part could be shared).

@kachayev
Copy link
Contributor

Oh, as far as I understood, my story was not clear enough :) I'll just try to explain more.

@alekstorm

I meant under TCP Application some logic about what to do when we receive content from stream or send something into it.

@bdarnell

According to

IOStream is essentially what a TCPConnection would be

my suggestion was only for using consistent approach for TCP/HTTP. In this case, TCPConnection can be simply

 class TCPConnection(IOStream):
      pass

If it will help framework users to work not only with HTTP protocols with using same approach.

There's not much that would go into a generic TCP client base class

Yeah, that's true. HTTPClient doesn't support open connection, so possible all method from base class will be overwritten. But I think that TCPClient class in architecture will be good "start point" for implementing clients for different other protocols.

@alekstorm
Copy link
Contributor Author

Again, TCP is lower-level than application logic. Follow the HTTPServer/HTTPConnection/Application pattern. I'm posting an implementation of SMTP on the mailing list later; perhaps you can model telnet on that, since they're both stateful, unlike HTTP.

Renaming standard library classes to fit our naming scheme would obfuscate more than it would help.

@kachayev
Copy link
Contributor

TCP is lower-level than application logic

I understand this, and I described above situation, when we don't have nothing else, only TCP connection and some application logic over it (#353 (comment), point 2 - browser game backend). How pattern HTTPServer/HTTPConnection/Application will help me in this case?

In Twisted framework we should use more complex pattern (objects chain) *Server/Factory/*Protocol/*Transport/Application, but it can be used for TCP (transport layer), and for other protocols (application layer) like HTTP, SMTP, SSH etc.

Renaming standard library classes to fit..

I don't suggest to rename standard library class, I just talk about adding tcpserver.TCPServer, tcpserver.TCPConnection (like httpserver), tcpclient.TCPClient (like httpclient). This is what I propose. I'll try to prepare code example/pull request with non-trivial usage demo in couple of days.

@kachayev
Copy link
Contributor

I created separated code branch, just to show example of my suggestion:

https://github.com/kachayev/tornado/tree/tcp

This code is not full ready, TCP client is just a draft and so on and so far. But. There is one TCPServer usage example with simple pinger:

https://github.com/kachayev/tornado/blob/tcp/demos/tcpping/ping-server.py

This ping server can be used as part of cross-servers backend implementation without overheads for HTTP protocol usage. Also it can be used from telnet terminal etc. I have already described above some other situation when established live connection is very useful for our application.

There are many questions and discrepancies here, but all of them can be resolve if you agree with general approach. Most important question could be found here:

https://github.com/kachayev/tornado/blob/tcp/tornado/tcpserver.py#L81
https://github.com/kachayev/tornado/blob/tcp/demos/tcpping/ping-server.py#L50

In any case, if you agree with general approach, I will develop this code up to ready for framework state ASAP (with comments, test changes, and more examples of how to use TCPServer/TCPClient implementations).

@bdarnell
Copy link
Member

I still don't think there's a need for a TCPConnection class in addition to IOStream. If there are things that are difficult or tedious to do with IOStream then maybe IOStream should be modified, but it should just be one class.

@kachayev
Copy link
Contributor

I absolutely agree, that name TCPConnection doesn't describe what this class do and even can obfuscate to some extent. But I tried to use objects model from httpserver/client modules.

In any case, I think base class for application layer protocol will be good choice. And this base class should contain a) IOStream object, b) connection address, c) reference to "factory" object (for ex., for HTTPConnection this will be reference to HTTPServer) and this class should give some facilities for customizing behavior on streams' connect/disconnect events. In this architecture model, TCPServer (or inherited one) will receive as param Protocol implementation class (inherited from this "base class") or set it directly in init method.

This will give to us flexible way for implementing over-TCP protocols (application layer) without problems of declaration new type of _Server/_Client (in many cases, TCPServer will be enough).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants