New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Factor HTTP-specific code into HTTPServer subclass of TCPServer #353
Conversation
perhaps |
I agree. Ben? |
I think this is mostly a good idea, although it's unfortunate that it further fragments the documentation of stuff involved in starting a typical tornado server. Maybe the summary of the three configuration patterns should stay in the HTTPServer class docstring? Should the new class have all three configuration methods for consistency with HTTPServer, or is this a chance to clean up/narrow the interface? I suppose it's best to keep it consistent, since there are no plans to actually deprecate anything on the HTTPServer side. As for the name, I could go with either TCPServer or StreamServer. The listen() and bind() interfaces are TCP-specific, but add_sockets can work with non-TCP streaming sockets (not UDP, since it's not a connection/stream-oriented protocol, but it works with e.g. unix domain sockets). If we call it StreamServer, the hack to remap non-IP addresses to 0.0.0.0 should probably be moved down to HTTPServer (but it can stay if we call it TCPServer and pretend unix sockets are just a weird degenerate case of TCP). tornado.netutil is probably a better home for this class than tornado.httpserver. Rather than taking a handle_stream method in its constructor, TCPServer should just call self.handle_stream and let that method be defined/overridden in the subclass. Otherwise it's confusing to have TCPServer.handle_stream and HTTPServer._handle_stream which are very subtly different. |
I agree; I wasn't sure how you wanted the documentation formatted (and I hope you don't mind the noise from me correcting the Sphinx backtick-escaping). If the process for starting a server were repaired, I don't think this multi-pattern problem would arise. Since (AFAIK) the development version isn't liable for breaking compatibility with itself in future versions, perhaps this question is best saved for the near future. I think you've actually made a pretty good case for a three-level hierarchy of I've never felt particularly comfortable with callbacks to object methods for this very reason: they conflict with inheritance. I'll make the changes, of course. |
StreamServer->TCPServer->HTTPServer makes sense conceptually (although it's a little funny that TCPServer.add_sockets will work with non-TCP sockets; from another perspective you could have StreamServer multiply-inheriting from TCPServer and UnixSocketServer), but an extra layer in the hierarchy seems pretty heavyweight for this. I think it's OK if StreamServer has some optional TCP-specific methods, and the 0.0.0.0 hack can move down into HTTPServer (or even HTTPConnection) closer to the point where it assumes that the remote address has an IP in it. And thanks for fixing up the sphinx markup. One nit - you've got an extra backquote around the "1" in start's docstring. (I actually think numeric literals may look better as normal text than styled as code, especially next to comparison operators that aren't styled). |
I just remembered why I had |
I think it's awkward to mix composition and inheritance in the same pair of classes. The handle_stream constructor argument is a composition-style interface, which would make more sense if HTTPServer did not subclass TCPServer but instead had a TCPServer as a member variable (but then you'd have to manually delegate all the other methods). I think it's reasonable to assume that most other uses of TCPServer will be subclasses that look a lot like HTTPServer, rather than instantiations of a plain TCPServer with a callback argument. Also, the single-callback interface between HTTPServer and e.g. Application has proven to be somewhat limiting, and encourages adding unrelated methods and attributes to httpserver.HTTPRequest. It's not quite the same thing since I think Application probably shouldn't be a subclass of HTTPServer, but in the case of TCPServer requiring subclassing gives us more flexibility to evolve the interface in the future. |
I had originally solved this problem by defining the callback function creating an Another issue I have with your proposed solution is that it abuses inheritance to override non-behavior (private) methods. IMO, overriding should affect only the public interface, not provide internal hooks for subclasses to plug into - this forces subclasses to depend on the internal implementation details of their parent classes, effectively creating two interfaces: one for consumers of the class, and another for subclasses. The idea of "hooks" in general is most elegantly implemented with callbacks. |
I forgot to ask - what about Unix domain sockets is HTTP-specific? It's reasonable for everything above the IP layer to assume the remote host has an IP address, isn't it? Moving the Also, though I disagree with the direction you'd like to go, I agree that the hybrid design of |
Nothing about unix domain sockets is HTTP-specific, but the assumption that the underlying network layer is IP-based is. I don't think unix domain sockets have a meaningful remote address, but other non-IP network layers might, so it's best to pass the socket address around in its original form until something wants to do something specific with it (here, it's when HTTPRequest wants a remote_ip attribute). Unless, of course, we just want to pretend the whole world is IP and make the base class TCPServer instead of StreamServer (which is not unreasonable). HTTPConnection could peek into stream.socket.family, which is probably safer than passing the connected socket around separately since actually using the socket for anything else could confuse the IOStream. Yes, I want a two-level hierarchy. I would be OK with either name for the base class. TCPServer actually sounds a little better to me, even though it's slightly inaccurate in the case of unix sockets (TCP is the case we care about; we only support unix sockets because it's basically free to do so with the socket abstraction, so the occasional TCPism like the 0.0.0.0 hack don't bother me much). You're right that designing for hooks by inheritance is effectively creating two interfaces. I don't think that's necessarily a bad thing - splitting the two apart is cleaner, but for small classes like this I find that it increases the cognitive weight of the system. In such cases I prefer to treat the subclass interface as if it were public (i.e. |
I may be misreading the specification, but I don't see how HTTP assumes anything about the layer encapsulating it - AFAIK, network layers are completely orthogonal, and for very good reasons. Granted, it is aware of DNS resolution, but only to identify the requested domain name in the |
Yeah, HTTP doesn't care (except in that URIs have hostnames and ports in them and I don't think you can construct an http URI that refers to anything but TCP). What I meant to say is that tornado.httpserver.HTTPRequest (and not any of the intermediate layers) assumes that the other end of the socket has an IP. The two lines have always been a hack and will be a hack wherever we put them, it's just a question of whether we want to say that the lower-level stream/tcp server works with all SOCK_STREAM sockets (and exposes the most general/lowest-common-denominator interface possible) or whether it is designed for TCP and may incidentally support other socket types where they can be hammered into a TCP-like shape. I don't actually care that strongly one way or the other, I just think that supporting both separately with a three-level hierarchy would be overengineering. |
…rridden method hook, move IP address hack for unix domain sockets to `HTTPConnection`
Done. Let me know if I forgot anything. |
Looks good. I still think it's worth having the overview in HTTPServer's docs, even if it's kind of redundant, so I've put it back in. You also missed the tornado.process import in netutil.py. |
Ben, according to comments in this thread and some refactoring, which have already made here, I want to ask: why do not to improve this functionality up to full supporting of TCPServer? I mean, that we can implement some TCPConnection, TCPServer and TCPClient (the same stack as for HTTP) in those way, that it will be possible to write TCP applications. I implemented similar functionality in the last project in order to give possibility to connect to running HTTPServer via telnet protocol. You can review https://github.com/kachayev/torpeda I can try to explain in short words why I think that TCP server/client applications supporting matters:
Browser -> [websoket] -> serveral Tornado I can quickly port my implementation to current |
I think you're taking the wrong approach. TCP is a transport layer - there's no such thing as a "TCP application". Rather, TCP provides connection management services like in-order and guaranteed delivery of packets. The On a related note, I have begun to question the wisdom of |
Yeah, IOStream is essentially what a TCPConnection would be, so I'm not sure what specifically you're suggesting adding to TCPServer. There's not much that would go into a generic TCP client base class (but once AsyncHTTPClient supports connection reuse the connection pooling would be a candidate). As for inheritance vs composition for HTTPServer and TCPServer, I'm sympathetic to the pro-composition argument, but I think in this case inheritance makes more sense. TCPServer doesn't have a lot of functionality, but what it does have is methods that we want to have available on HTTPServer (this is necessary for backwards compatibility, but even without that consideration I think it still makes sense to reduce the number of layers that are exposed to the end user in the common case). As long as there's a one-to-one mapping between HTTPServer and TCPServer instances, I don't think it really matters much how they interact (on the other hand, if each TCPServer were limited to a single port like HTTPServer used to be, then composition would make more sense so the HTTPServer part could be shared). |
Oh, as far as I understood, my story was not clear enough :) I'll just try to explain more. I meant under According to
my suggestion was only for using consistent approach for TCP/HTTP. In this case, TCPConnection can be simply
If it will help framework users to work not only with HTTP protocols with using same approach.
Yeah, that's true. HTTPClient doesn't support open connection, so possible all method from base class will be overwritten. But I think that TCPClient class in architecture will be good "start point" for implementing clients for different other protocols. |
Again, TCP is lower-level than application logic. Follow the Renaming standard library classes to fit our naming scheme would obfuscate more than it would help. |
I understand this, and I described above situation, when we don't have nothing else, only TCP connection and some application logic over it (#353 (comment), point 2 - browser game backend). How pattern In Twisted framework we should use more complex pattern (objects chain)
I don't suggest to rename standard library class, I just talk about adding |
I created separated code branch, just to show example of my suggestion: https://github.com/kachayev/tornado/tree/tcp This code is not full ready, TCP client is just a draft and so on and so far. But. There is one TCPServer usage example with simple pinger: https://github.com/kachayev/tornado/blob/tcp/demos/tcpping/ping-server.py This ping server can be used as part of cross-servers backend implementation without overheads for HTTP protocol usage. Also it can be used from telnet terminal etc. I have already described above some other situation when established live connection is very useful for our application. There are many questions and discrepancies here, but all of them can be resolve if you agree with general approach. Most important question could be found here: https://github.com/kachayev/tornado/blob/tcp/tornado/tcpserver.py#L81 In any case, if you agree with general approach, I will develop this code up to ready for framework state ASAP (with comments, test changes, and more examples of how to use TCPServer/TCPClient implementations). |
I still don't think there's a need for a TCPConnection class in addition to IOStream. If there are things that are difficult or tedious to do with IOStream then maybe IOStream should be modified, but it should just be one class. |
I absolutely agree, that name In any case, I think base class for application layer protocol will be good choice. And this base class should contain a) IOStream object, b) connection address, c) reference to "factory" object (for ex., for HTTPConnection this will be reference to HTTPServer) and this class should give some facilities for customizing behavior on streams' connect/disconnect events. In this architecture model, TCPServer (or inherited one) will receive as param Protocol implementation class (inherited from this "base class") or set it directly in init method. This will give to us flexible way for implementing over-TCP protocols (application layer) without problems of declaration new type of _Server/_Client (in many cases, TCPServer will be enough). |
Since almost no logic in
TCPServer
was HTTP-specific, this was a surprisingly small change that provides a convenient server base class for application-layer protocols (I've already subclassed this for an SMTP framework). In particular, this makes SSL automatically available to any subclass. If this is accepted, however,TCPServer
may need to be moved to a different module altogether, or thehttpserver
module renamed. I'm also unsure about the name "TCPServer", since using other transport-layer protocols such as UDP may be possible.