Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any interest in an UnixHTTPConnectionPool? #1465

Closed
mattrobenolt opened this issue Oct 25, 2018 · 22 comments
Closed

Any interest in an UnixHTTPConnectionPool? #1465

mattrobenolt opened this issue Oct 25, 2018 · 22 comments

Comments

@mattrobenolt
Copy link
Contributor

mattrobenolt commented Oct 25, 2018

I'm happy to create a PR, but it seems like a pretty common use case to want to establish an HTTP connection over a unix socket.

I see this abstraction happening more commonly through requests, but was a little surprised that this doesn't just exist as a connection within urllib3.

The code I've thrown together (not yet battle tested) is pretty trivial to do when you get rid of the requests bits, and it's boiled down to just:

class UnixHTTPConnection(HTTPConnection):
    def __init__(self, host, **kwargs):
        # We're using the `host` as the socket path, but
        # urllib3 uses this host as the Host header by default.
        # If we send along the socket path as a Host header, this is
        # never what you want and would typically be malformed value.
        # So we fake this by sending along `localhost` by default as
        # other libraries do.
        self.socket_path = host
        super(UnixHTTPConnection, self).__init__(host='localhost', **kwargs)

    def _new_conn(self):
        sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
        if self.timeout is not socket._GLOBAL_DEFAULT_TIMEOUT:
            sock.settimeout(self.timeout)
        sock.connect(self.socket_path)
        return sock


class UnixHTTPConnectionPool(HTTPConnectionPool):
    ConnectionCls = UnixHTTPConnection

If this is something you'd be interested in pulling into urllib3 as a core connection pool, I think this would be great to have, and I'll convert this into a PR. If not, would be curious if this seems outside the scope of urllib3 or some other reason.

The sources of this off the top of my head are within docker-py, which again, uses this sorta abstraction except a bit heavy handed since it also needs the requests adapter https://github.com/docker/docker-py/blob/master/docker/transport/unixconn.py. Then requests-unixsocket project https://github.com/msabramo/requests-unixsocket.

And again, nothing explicitly for those of us that like just directly using urllib3.

I plan on adding this into Sentry very soon, and will cut over to using this for some of our critical internal services to get it battle tested.

I also noticed that this is sorta being used within test suite:

class HTTPUnixConnection(HTTPConnection):
def __init__(self, host, timeout=60, **kwargs):
super(HTTPUnixConnection, self).__init__('localhost')
self.unix_socket = host
self.timeout = timeout
self.sock = None
class HTTPUnixConnectionPool(HTTPConnectionPool):
scheme = 'http+unix'
ConnectionCls = HTTPUnixConnection
I'm not entirely sure of the context here, but it's definitely just used for some mock class. Seems relevant though. :)

@sigmavirus24
Copy link
Contributor

I'm a -0 on this. This project isn't very actively maintained. I'd rather see something that has been shown to work well and already handle cases before adopting it into the core and even then, I'd like to see the original author maintain it which I didn't understand that as being the suggestion here.

The tests there were to ensure that people can write other connections and connection pools. This was just a convenient test case as I recall. People have FTP connections and connection pools. We could have just as easily used that as an example.

@mattrobenolt
Copy link
Contributor Author

That's fine with me. Like I said, was just curious if there was interest in having support for this in core. :)

Thanks for the attention @sigmavirus24! <3

@sigmavirus24
Copy link
Contributor

I'm but one person :) I'd be interested to hear what @SethMichaelLarson and @theacodes think.

@sethmlarson
Copy link
Member

I'm actually okay with having this added to the library due to it's small surface. Not sure who'd use it but I'm sure urllib3 purists may have some interest.

@mattrobenolt
Copy link
Contributor Author

From my perspective, people are more frequently doing this by writing an adapter for requests, which, is obviously using urllib3. I personally prefer to just use urllib3 when I can.

Within Sentry, we try to leverage Unix sockets whenever we can for various reasons. This includes internal HTTP services. We directly use urllib3 for these situations, and I wanted to see how much work it’s be to just write a connection pool for Unix sockets. And turns out, it’s a trivial number of lines of code.

With that said, I think there’s a reasonable demand for http over Unix sockets in generally I think mostly due to popularity of Docker since by default, it’s bound to a Unix socket.

This leads us to projects such as docker-py as referenced, and other adapters for requests. And in my opinion, they are each duplicating efforts on the urllib3 side too since they aren’t just the requests adapter, they also include the connection pool needed for the adapter to use.

So in my hypothetical vision, we’d drop a unixconn package into urllib3.contrib. Then from there, docker-py and the other adapters people have, can just utilize this core bit that’s, theoretically then shared between everyone using this with requests.

This is at least my theory on why the demand might appear lower since others have tackled it at a different level.

(Also reopening since it seems we might have interest.)

@mattrobenolt mattrobenolt reopened this Oct 27, 2018
@mattrobenolt
Copy link
Contributor Author

And lastly for reference, we are going to be shipping this and adopting it internally for a while in production.

getsentry/sentry#10295

@sigmavirus24
Copy link
Contributor

Also worth noting that unix socket URIs have different semantics. Requests has had several issues over URL handling, much of which is rooted in urllib3's Url class so we probably want to switch to a different way of parsing URIs (there are a few options in the python-hyper org)

@mattrobenolt
Copy link
Contributor Author

Maybe I’m missing some context here. But if I initialize the connection pool explicitly, where does URI parsing come into play?

The usage here is something like:

UnixHTTPConnectionPool("/var/run/something.sock").urlopen("GET", "/")

@sigmavirus24
Copy link
Contributor

Ah, if you're doing that instead of using a PoolManager (which is how I think most people use urllib3) then you're probably in the clear.

@sethmlarson
Copy link
Member

+1 to only implementing Connection and ConnectionPool, allows us to skip the URI handling for paths until later (if ever).

@mattrobenolt
Copy link
Contributor Author

mattrobenolt commented Oct 29, 2018

Oh, I see. Yeah, if we wanted this, we'd have to invent some URI pattern like: unix:/var/run/docker.sock/foo or something. I have no real interest in doing this, and I don't think it makes a ton of sense here.

This being, working with PoolManager.

@sigmavirus24
Copy link
Contributor

The reason I was assuming the URI parsing problems was because I think main-stream support of a UnixHTTPConnection will mean that people assume they can use it from PoolManager. I suspect as soon as we release this that we'll get bug reports that PoolManager doesn't Just Work™ with it.

@mattrobenolt
Copy link
Contributor Author

PoolManager only supports http and https. There are a few other connectionpool types under urllib3.contrib that all exist without going through PoolManager.

I don’t think this would be a barrier for adoption, since as I said, most are using requests anyways, and these underlying requests adapters would leverage the connectionpool directly and don’t use PoolManager.

@mattrobenolt
Copy link
Contributor Author

Either way, I think working with PoolManager would be nice, but agree that opens up a can of worms for URIs.

@sigmavirus24
Copy link
Contributor

Yeah. I think we just want to be explicit in the documentation (like a .. warning at the top) that this does not work with PoolManager. That should help deflect most issues.

@shazow
Copy link
Member

shazow commented Oct 29, 2018

If/when the SessionManager layer is complete (not sure whether that's on the roadmap anymore), which might someday supersede the PoolManager as the default high-level entry point, it would be a better fit for it anyways. :)

I don't think it semantically makes sense to pool unix sockets anyways, so PoolManager is probably the wrong place for it (except that redirect logic lives here for now--though it probably ultimately belongs at the Session layer). Even ConnectionPool is a weird layer for it but right now the RequestMethods are mixed in at that layer (perhaps that is the wrong layer though--or maybe it's just a misnamed layer).

Though if it's just a contrib thing, it could be a UnixHTTPConnection that happens to inherit from RequestMethods too.

@mattrobenolt
Copy link
Contributor Author

Re: PoolManager

If we did want to support some URI scheme, nginx seems to have a solution that works: http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_pass

http://unix:/foo.sock:/some/path

It's not pretty, but it works. I tried to look up other examples, and things like curl don't accept a full URI, there's an explicit --unix-socket flag probably to also not deal with this.

I don't think it semantically makes sense to pool unix sockets anyways

Why do you say that? Would it make sense to reuse the same socket when possible? And also when dealing with threads, you'd need more than 1, so the pool abstraction still makes sense to me. But I'll admit, I'm not expert urllib3 user and maybe my understanding is wrong.

Though if it's just a contrib thing, it could be a UnixHTTPConnection that happens to inherit from RequestMethods too.

TIL, I can adapt that in a PR.

@sigmavirus24
Copy link
Contributor

Interesting that nginx does it that way. Requests seems to support http+unix as the scheme iirc (in so much as we support it so third-party adapters can support it). Maybe we should just pick a third syntax just because this entire concept isn't specified in any reliable way. (just a bit of snark)

@mattrobenolt
Copy link
Contributor Author

+1 to inventing a new method. :)

@theacodes
Copy link
Member

since requests is our primary downstream user, let's match their scheme. It's also a scheme that makes sense, nginx's just looks.. weird.

I'm fine with UnixHTTPConnection being in contrib.

@mattrobenolt
Copy link
Contributor Author

Cool. I’ll take the time after we ship this into production to put together a PR in here and try to figure out some tests.

I’ll focus on the connectionpool bits as we have it, then we can see about addressing the generic PoolManager and URI stuff.

Thanks everyone!

@sethmlarson
Copy link
Member

Since the PR was closed and this thread seems to have stalled I'm going to close it. If anyone has interest in adding support for Unix domain sockets and maintaining that functionality long-term maybe it makes sense as a separate package now that we define BaseHTTPConnection protocols?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants