Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addressing HTTP servers over Unix domain sockets #577

Closed
rkjnsn opened this issue Feb 6, 2021 · 55 comments
Closed

Addressing HTTP servers over Unix domain sockets #577

rkjnsn opened this issue Feb 6, 2021 · 55 comments

Comments

@rkjnsn
Copy link

rkjnsn commented Feb 6, 2021

It is often desirable to run various HTTP servers that are only locally connectable. These could be local daemons that expose an HTTP API and/or web GUI, a local dev instance of a web server, et cetera.

For these use cases, using Unix domain sockets provides two major advantages over TCP on localhost:

  1. Namespacing. If two users on a system are running the same service, TCP requires them both to pick, configure, and remember different port numbers. With Unix domain sockets, each socket can live in the respective user's runtime directory and be named after the service.
  2. Access control. Even if the service is diligent only to bind to localhost, TCP still allows any (non-sandboxed) process or user on the machine to connect. Any access control has to be implemented by the service itself, which often involves implementing (hopefully with sufficient security) its own password authentication mechanism. Unix domain sockets, on the other hand, can take advantage of the access control functionality provided by the filesystem, and thus can easily be restricted to a single user or set of users. In the event that a service wants to allows multiple users to connect and discriminate between them, several operating systems provide a means of querying the UID of the connecting process, again without requiring it's own authentication scheme.

Indeed, due to these advantages, many servers/services already provide options for listening via a Unix domain socket rather a local TCP port. Unfortunately, there is not currently an agreed-upon way to address such a service in a URL. As a result, clients who choose to support it end up creating there own bespoke approach (e.g., a special command-line flag, or a custom URL format), while others choose not to support it so as not to bring their URL parsing out-of-spec (among other potential concerns).

Here are some of the various URL formats I've seen used or suggested:

  • Transport only: unix:/path/to/socket.sock. This lacks both the protocol and resource path, so it can only be used for clients that already know they'll be speaking to a specific HTTP API, and is not generally usable.
  • HTTP with socket path as the port: http://localhost:[/path/to/socket.sock]/resource. Only allowed when host is localhost. Paths containing ] could either be disallowed or URL encoded.
  • Composite scheme with socket path as URL-encoded host: http+unix://%2Fpath%2Fto%2Fsocket.sock/resource. Distinct scheme allows existing http URL parsing to stay the same. URL encoding reduces read- and type-ability.
  • Combining ideas from the previous two: http+unix://[/path/to/socket.sock]/resource or just http://[/path/to/socket.sock]/resource. (The latter would require using the leading / of the socket path to disambiguate from an IPv6 address.)

References:
Archived Google+ post suggesting the socket-as-port approach:
https://web.archive.org/web/20190321081447/https://plus.google.com/110699958808389605834/posts/DyoJ6W6ufET
My request for this functionality if Firefox, which sent me here:
https://bugzilla.mozilla.org/show_bug.cgi?id=1688774
Some previous discussion that was linked in the Firefox bug:
https://daniel.haxx.se/blog/2008/04/14/http-over-unix-domain-sockets/
https://bugs.chromium.org/p/chromium/issues/detail?id=451721

@annevk
Copy link
Member

annevk commented Feb 6, 2021

It seems you don't need just addressing for this, but some kind of protocol as well. I recommend using https://wicg.io/ to see if there's interest to turn this into something more concrete.

@rkjnsn
Copy link
Author

rkjnsn commented Feb 6, 2021

I'm not sure I understand why any additional protocol would be necessary. It's just HTTP over a stream socket. The server accepts connections and speaks HTTP just like it would for a TCP socket. Indeed, I can set up such a server today, and it works fine provided that the client provides a way to specify the socket, e.g., curl --unix-socket /path/to/socket.sock http://localhost/resource.

@avakar
Copy link

avakar commented Jun 3, 2021

I don't even understand how this is not a thing yet. Especially now that Windows started supporting AF_UNIX sockets natively, it seems to be the best, cross-platform way to connect web and native apps without consuming a TCP port.

@annevk
Copy link
Member

annevk commented Oct 20, 2021

Let me take a step back, what exactly is the ask from the URL Standard here?

@rkjnsn
Copy link
Author

rkjnsn commented Oct 23, 2021

The ask is for the URL standard to specify a syntax for referring to a page served via HTTP over a UNIX domain socket. Currently, applications that want to support connecting to an HTTP service have to pick from one of the following three:

  1. Provide a bespoke mechanism for specifying the server's socket outside of the URL, such as curl's --unix-socket command-line argument.
  2. Accept a custom URL format outside of the URL standard for addressing resources served via HTTP over UNIX domain socket.
  3. Forgo the functionality altogether if 1 is impractical and 2 is undesired.

None of these are ideal. Deciding on a standardized URL syntax allows different implementations to implement the functionality in a common, standards-compliant way.

@annevk
Copy link
Member

annevk commented Oct 25, 2021

I see, https://wicg.io/ is the place for that. The URL standard defines the generic syntax. If you want to define the syntax for a particular URL scheme as well as behavior, you would do that in something that builds upon the URL standard. E.g., https://fetch.spec.whatwg.org/#data-urls for data: URLs.

@annevk annevk closed this as completed Oct 25, 2021
@rkjnsn
Copy link
Author

rkjnsn commented Oct 25, 2021

Let me rephrase: the specific ask for the URL standard is to provide an allowance in the URL syntax for specifying a UNIX domain socket, either in lieu of the port (e.g., http://localhost:[/path/to/socket.sock]/resource) or in lieu of the hostname (e.g., http://[/path/to/socket.sock]/resource), both of which are currently invalid according to the URL standard.

@annevk
Copy link
Member

annevk commented Oct 26, 2021

I recommend using something like unix:/path/to/socket.sock?url=http://localhost/resource. We can't change the URL syntax for each new protocol that comes along.

@cyanogilvie
Copy link

It's the same protocol over a stream socket, just a different address (ie. authority part). Ok, so it's a different protocol in the sense of IP, but so are IPPROTO_IP and IPPROTO_IPV6, and the URL standard doesn't treat those as different. The relevant comparison I think are address families for stream sockets, like AF_INET, AF_INET6 and AF_UNIX. Once the stream socket has been established (as specified by the authority part of the URL), HTTP software shouldn't care or even know how the stream is transported.

Most invented, non-standard approaches for HTTP-over-unix-sockets seem to gravitate to something like a different scheme (since the authority part can't really be disambiguated from a hostname if relative socket paths are allowed from what I can see), like http+unix or https+unix, and then percent-encoding the socket into the authority part, and then everything works naturally from there from what I can see.

I've also seen (and used) enclosing the socket path in [] in the authority part and keeping the scheme as http or https, but I think that namespace clashes with IPv6 style numeric addresses like [::1]:80. RFC 3986 (in section 3.2.2) kind of leaves space for this by anticipating future formats within the [], and providing a version prefix to disambiguate them. Overall I like this approach the best (it extends into the error space so it doesn't change the interpretation of any valid existing URL, lives in an extension space envisioned by the standard, minimally extends just the appropriate part of the standard (authority part), keeps the schemes http and https to mean "this is a resource we talk to this authority using the http(s) protocol for", and so preserves compatibility for software that uses the scheme to know what protocol to speak with the authority over the socket.

@annevk
Copy link
Member

annevk commented Nov 4, 2021

Changing the syntax of URLs is not really something we're willing to do. That has a substantive cost on the overall ecosystem. The benefits would have to be tremendous.

@michael-o
Copy link

michael-o commented Nov 4, 2021

Syntax in mod_proxy:

In 2.4.7 and later, support for using a Unix Domain Socket is available by using a target which prepends unix:/path/lis.sock|. For example, to proxy HTTP and target the UDS at /home/www.socket, you would use unix:/home/www.socket|http://localhost/whatever/.

@karwa
Copy link
Contributor

karwa commented Nov 12, 2021

The strongest argument I can think of for this is: http(s) URLs have special parsing quirks which don't apply if the scheme is http+unix. So for a perfect 1:1 behaviour match, UDSs would need to use an actual http URL, not a custom scheme (similar to IP addresses).

That said, I'm also not a fan of adding yet another kind of host (file paths). My preference would be to use a combination of:

  • Fake hostname (localhost, example, test and invalid are all reserved and will never be allowed as a TLD, so something like uds.localhost should work), and
  • Socket address in the fragment (HTTP clients should strip it before sending the request anyway)
http://uds.localhost/some/path?some=query#/path/to/socket.sock

This is a perfectly valid HTTP URL, and should be capable of representing any HTTP request target.

Alternatively, you could try to get uds or socket as reserved TLDs, but I'm not sure how you'd go about doing that.

(Note: this would also mean that all UDS URLs have the same origin, although that could be remedied by adding a discriminator to the fake hostname to make your own zones of trust, e.g. 123.uds.localhost)

@rkjnsn
Copy link
Author

rkjnsn commented Nov 12, 2021

I'm not sure using the fragment is really tenable for these use cases (and local web dev, especially). Many web applications use the fragment for their own purposes in JavaScript, whereas the host (at least it my experience) tends to be handled more opaquely.

What would be the main drawback for allowing additional characters within [] for the host portion of an HTTP URL?

@karwa
Copy link
Contributor

karwa commented Nov 12, 2021

Ah yes, you're right, it wouldn't work for local web development. I was thinking more about generic HTTP servers.

The main drawbacks IMO are:

  • Complexity. This standard has a hard-enough time trying to document existing browser behaviour without inventing new things. Then again, there is a reasonable counter-argument that URLs shouldn't have to stay frozen while other aspects of technology and the web evolve to meet new use-cases. There is a counter-counter argument that URLs are in a particularly sorry state compared to most other web technologies. Perhaps this is something for the future, once all browsers conform to this standard and things have stabilised a bit?

  • Possible loss of validation for IPv6 addresses. Unless we want to get in to the business of validating local system paths (and I'm quite sure nobody is thrilled by that idea) we would basically have to accept any non-empty string within [] in the host portion of HTTP URLs. How do we know http://[::::foo]/some/path doesn't refer to a valid path on some system somewhere?

@cyanogilvie
Copy link

cyanogilvie commented Nov 13, 2021

Yes, I think the place for the UDS socket is in the authority portion - that's the bit that has the responsibility for describing the endpoint of the stream socket to talk to for this resource. Putting it elsewhere feels like an abuse and likely to cause unforeseen problems (HTTP client software will certainly have the host portion of the URL available in the portion of the code that establishes the stream socket, but may not have the fragment).

I think the namespace collision with IPv6 literals and syntax validation for UDS paths can be solved by:

  • Reusing the syntax for the path portion of the URI: "/" is a separator, path elements must be percent encoded.
  • Socket paths must be absolute (start with "/" or "~"). This distinguishes them from IPv6 literals, and should be the case anyway (what would a relative path be relative to? No similar relative resolution for hostnames exists in the standard).
  • Possibly using a version prefix as envisioned by RFC 3986, putting it within the syntax anticipated in that standard, something like: http://[v1.uds:/tmp/mysock]/foo/bar.

It's up to the host to decode and translate the path into whatever native scheme that OS uses (just as it is for the path portion of the URI).

For me the motivation for supporting HTTP over UDS goes way beyond web browsers (and I would see that as a minor use case for this) - for better or worse HTTP has become a lingua franca protocol for anything that wants to communicate on the Internet (consider websockets for some of the forces that drive this), and that is increasingly machine to machine. For example: we run an online marketplace that serves about 10 million requests a day over HTTP (excluding static resources offloaded to a CDN), but each of those involve several HTTP interactions with other services to construct the response: Elasticsearch queries, S3 to fetch image sources that are resized, etc, a whole host of REST services for shipping estimates, geocoding, ratings and reviews, federated authentication providers etc. So, by volume, the overwhelming majority of HTTP requests our webservers are party to are between them and other servers, and aren't transporting web pages.

As the trend toward microservices and containerization continues this will only increase, and it's particularly there that I see HTTP-over-UDS being useful:

  • Communication over UDS is materially faster and lower latency than over the loopback interface because a lot of the complexities in the network stack can be skipped - packet filtering and transformation, TCP, etc. The loopback interface doesn't have network latency but it still has all these other things. Local sockets (UDS) are more or less just buffers managed by the kernel. This starts to really matter to page response times when generating the page involves many interactions with microservices.
  • The namespace for sockets is hierarchical for UDS rather than flat for ports on localhost, so there is a natural way to scope the namespace for each microservice, and which is self-describing. Compare http://localhost:1234/ with http://[/sockets/session/addrs]/ for the address of a microservice providing the address management service for the current session user.

The other trend is for UIs to be implemented in HTML rather than some OS-native widget set (Android, iOS, GTK, QT, MacOS native controls, Windows native controls, etc), even when the application is entirely local on the user's device. There are very good reasons for this:

  • HTML+Javascript is portable, greatly reducing the cost to develop the application if it has to run across platforms.
  • HTML+Javascript is much richer and more capable than those native widget sets in the types of UIs they can implement.
  • Essentially every developer these days already knows HTML and Javascript.
  • Gone are the days when users expect native OS controls. These days they expect web application style interfaces, since that's the majority of what they're exposed to (gmail, various cloud based office applications, twitter, etc.)

In this use case the hierarchical namespace issue is important and addresses a major downside to this pattern - choosing a port from the flat, system-wide shared namespace (ok, so the listening socket can specify 0 and have the OS pick a random unused port on some systems, but that's a bit ugly). Much nicer to use ~/.sockets/<app>/<pid>, and more discoverable. Another reason to use UDS in this case is that the user for the client side of the socket can be obtained from the OS in a way that only trusts the OS, solving the other issue with this pattern - knowing which user we're interacting with. If these issues were solved by HTTP-over-UDS, do you think something like Prusaslicer would use that (HTML, Javascript, webGL) rather than wxWidgets for its UI portability requirements? That would make porting to mobile devices like tablets much easier too.

Finally, consider things like headless Chrome in an automated CI/CD pipeline - the software managing the tests being run on the deployment candidate version could start a number of headless chrome instances and run tests in parallel, easily addressing the websocket each provides with a UDS path like /tmp/chrome/<pid> rather than somehow managing port assignments.

The tech already exists to make these obvious next steps in application provisioning and inter-service communication happen (even Windows supports Local sockets aka UDS), and the scope of the change for existing HTTP client software should be small and of limited scope (URL parsing, name resolution and stream socket establishment steps) but it can't happen unless there is a standardised way to address these sockets.

@annevk
Copy link
Member

annevk commented Nov 13, 2021

What exactly is wrong with #577 (comment)? @karwa uds.localhost can resolve locally.

@mnot
Copy link
Member

mnot commented Nov 15, 2021

Alternatively, you could try to get uds or socket as reserved TLDs, but I'm not sure how you'd go about doing that.

You ask the IETF, just like .onion did. Admittedly, there are some politics involved, but it's possible, and this is a pretty clearly technical use case. The backstop would be to use a subdomain of .arpa.

Personally, I'd go with something like:

http://%2Ftmp%2Fmysock.uds/foo/bar

Yes, the escaping is ugly, but it's much cleaner than overloading IPV6 in URLs. Alternatively, you might be able to get away with:

http://tmp.mysock.uds/foo/bar

@agowa
Copy link

agowa commented Apr 23, 2022

@mnot any update on this? Was it implemented? Should this ticket be reopened? I'm also interested in this.

@mnot
Copy link
Member

mnot commented Apr 23, 2022

I just left a comment with some context; I don't know that anything else has happened.

@thx1111
Copy link

thx1111 commented Jul 13, 2022

I haven't read anything here that seems to justify breaking with the familiar pattern, "<protocol>://<domain>/<filepath>" or injecting a lot of special characters into the URL, or mimicking an IPv6 address. The protocol is simply "http". The domain is right there in the name, "Unix Domain Socket". Like any other top level domain - net, com, org - the domain is simply "unix". I don't know any reason that a web browser application cannot parse the domain from a URL, recognize a nonstandard domain name, and invoke a special handler for a non-network socket. The difficulty seems to be in distinguishing the path to the socket from the path to the resource file.

The "HTTP with socket path as the port" option, above, makes the most sense. And since a special handler must already be invoked for this "unix domain", I expect that colons - ":" - can continue to be used as the "port" separator for the socket path.

Altogether, that suggests a straightforward URL, as in: "http://unix:/var/run/server/ht.socket:/path/to/resource.html".

Is there any reason that those repeating ":/" character sequences would pose a problem in a URL?

This approach would not impose any limitation on the use of ":" in the resource path name, since a "unix domain" must be followed by a socket path, and that path will always be delimited by ":/". Any subsequent colons must then be part of the resource path name.

And, of course, this URL format still supports specifying any arbitrary protocol, served through a unix domain socket. And there is nothing redundant or misleading in the URL, as would be the case with any format requiring the name "localhost" or involving special parameter passing.

@michael-o
Copy link

http+uds:///path/to/socket?

@rkjnsn
Copy link
Author

rkjnsn commented Jul 13, 2022

@michael-o, that doesn't provide any means to specify the resource path, as it is putting the path to the socket where the resource path should go.

@hathiphant
Copy link

Isn't it simpler to consider a UNIX socket as a connection detail, then integrate it simply in configuration of connection proxy. This would let HTTP URI totally unchanged.

It would probably need a simple note to detail rules for HTTP over UNIX socket and specify some modifications to Proxy auto-config, adding a new return type "UNIX" with the path to UNIX socket as host value.

A modification like that would be largely less intrusive that changing URL, but would provide the functionality in specification then could improve compatibility.

My two cents,

@lcampbel
Copy link

lcampbel commented Jul 7, 2023

What about

    http://host.example.com.uds.localhost/path/to/socket//path/to/resource
    https://host.example.com.uds.localhost/path/to/socket//path/to/resource

An HTTP client library would strip .uds.localhost from the host portion and pass the remainder in the host header (and SNI, if using TLS). I think most URL parsers would be happy with this. localhost is a reserved toplevel domain (RFC 2606) so this won't ever conflict with a real hostname. It doesn't require introducing new schemes, or any new syntax (such as hijacking the port number field). And using localhost is kind of a nice hint that this actually refers to something on the local host.

@agowa
Copy link

agowa commented Jul 10, 2023

I still think we need to extend the PEG with a way to specify the lower-layer protocols (I.E., chain multiple schemas together). Especially since HTTP can now also be via UDP and more and more stuff uses HTTP as a transport/tunneling protocol...

Edit: moved proposal to update parsing into separate ticket

@lcampbel, your examples would have compatibility issues in the real world, as some servers have (not quite RFC compliant) usage of double slashes in the URL. I already had the unpleasant opportunity to debug such an issue in an API. Requests just failed without the additional slash. Also, some people use ".localhost" for their localhost development environment. I've seen that with some k8s developers with a clone of the environment running locally and ".localhost" they used for the parts of the web app that would normally have been public (in the prod deployment). Everything below it represented the different subdomains of it (mainly because *.localhost. resolves to 127.0.0.1 and ::1 on almost all systems, regardless of how many subdomains one provides, and without the need for editing the hosts file or deploying a locally running additional DNS resolver with a special zone file)...

@lcampbel
Copy link

I've never seen subdomains of localhost resolving to anything. It certainly doesn't happen on vanilla macOS or Ubuntu. Sure, you could put an entry foo.localhost in your /etc/hosts, but if you're doing that you could just avoid putting uds.localhost. The problem I have with adding new syntax is that it requires all URL parsers to be updated, which is impractical.

@agowa
Copy link

agowa commented Jul 10, 2023

@lcampbel, just try it. This is using systemd-resolved (what RedHat and Fedora use, for example):
image

No host file entry for it, it "just works" as long as the tld is "localhost"...

(And btw, I moved the rest of my comment into a new ticket, 778)

@lcampbel
Copy link

Well, that's what Redhat does, I get it, but on Ubuntu (focal) and macOS (Monterey) I get "Name or service not known" or "Unknown host" respectively. But it really doesn't matter; an HTTP client that supported this proposal would not even bother trying to resolve the name if it ends in '.uds.localhost'; it would just be connecting directly to a local Unix domain socket.

@agowa
Copy link

agowa commented Jul 10, 2023

Introducing the ".local" issue all over again. But even if we exclude that part, it still has the issue with the //, not to mention that you completely forgot about Windows systems and their paths (yes, windows has AF_UNIX too).

And your syntax wouldn't work for any of these:

  • D:/project/foo.socket
  • \\wsl\home\user\project.socket

And even using the UNIX style syntax Windows supports of /foo/bar.socket (that Windows interprets as CurrentDrive:\foo\bar.socket) won't reliably work, as it will not necessarily use the same "CurrentDrive" for all applications on the same computer, as it entirely depends on what drive your current working directory set to. If it's D:\something then it evaluates to D:\foo\bar.socket but if it is C:\something then you get C:\foo\bar.socket

@randomstuff
Copy link

randomstuff commented Nov 27, 2023

http://host.example.com.uds.localhost/path/to/socket//path/to/resource

What you absolutely don't want is the ability for any web server in the wild to use your browser to issue arbitrary HTTP requests to arbitrary Unix sockets.

It is already quite difficult for people to grasp the notion that LAN-only services and localhost-ony services can be attacked by remote web servers (CSRF, DNS rebinding attacks to LAN services or localhost-services). If a web browser, were to allow arbitrary websites to issue HTTP request to arbitrary UNIX sockets, this would open up a wide range of attack opportunity (eg. using DNS rebinding attacks to attack UNIX-socket bound Docker servers) including attacks based on protocol-confusion.

If you wanted such a feature to be mostly safe, you would have to actively opt-in:

  • either by having the user actively map a Unix-socket into a HTTP domain;
  • or by having a default location for UNIX sockets which wants to be mapped to a domain name / URI (eg. /run/user/{pid}/published/80/XXX → http://XXX).

Firefox currently allows to use a SOCKS proxy over UNIX socket (including multiple suchs proxies when using FoxyProxy). It would be possible to have a Unix-bound SOCKS proxy which would resolve some domain names to Unix socket.

@agowa
Copy link

agowa commented Dec 2, 2023

@randomstuff only because it is addressable doesn't mean it is reachable. And after all websites currently can already contain "file:///" urls or similar.

@karwa
Copy link
Contributor

karwa commented Dec 2, 2023

You don't really want to put the UDS path in the URL's path, because somebody could write:

<a href="/help">...</a>

And that would overwrite the path to the UDS, meaning a broken link.

Instead, you really want this to be part of the hostname. Hostnames are intrinsically abstract already, so there is no fundamental reason they can't resolve to a local socket. In other words, @randomstuff 's project is doing the conceptually correct thing by providing a mapping from hostnames to sockets.

And perhaps most importantly, it shows that this need can be met without changing the URL standard.

@thx1111
Copy link

thx1111 commented Dec 10, 2023

Reading back through this discussion, it has not at all been established that there is a consensus as to "where" the underlying issue should lie, and so, any "solution" offered can appear to simply "miss the point", depending upon your point of view. I find myself back-and-forth about the various approaches suggested, including my own.

I can summarize at least four alternatives proposed here to the issue of, to generalize, "Addressing Unix Domain Sockets".

  1. RFC 3986 "Uniform Resource Identifier (URI): Generic Syntax" must be modified to allow addressing unix domain sockets.

  2. The URI Shemes in BCP 35/RFC 7595 "Guidelines and Registration Procedures for URI Schemes" must define a new URI Scheme and Owner which specifically supports unix domain socket addressing.
    Review here: https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml

  3. The existing http/https schemes defined in RFC 8615 "Well-Known Uniform Resource Identifiers (URIs)" must be expanded to explicitly support addressing unix domain sockets.

  4. Ignore the URI standard RFCs and just write or modify an html display client to support unix domain socket addressing.

Without first saying which approach we are thinking about, the conversation can become kind of silly, since any solution which "works", works. Otherwise, it may be that I both enjoy, and cringe at, "bike shedding" as much as anyone else.

@randomstuff
Copy link

randomstuff commented Dec 10, 2023

For context about the pitfalls of stuffing/smuggling a Unix socket path in a HTTP URI, the Node.js Requests and got libraries would allow stuffing a Unix domain socket path in a HTTP URI like so: http://unix:/var/run/docker.sock:/containers/json. It turned out this could be exploited by a remote web server to target a local Unix domain socket through a HTTP redirect. In got, this feature is now disabled by default and HTTP redirects to Unix sockets are now disabled.

I would think that the ability to address arbitrary Unix domain sockets in HTTP(S) URIs is fraught with peril. If this were part of the URI standards, client applications and libraries would be expected to implement this feature and this would certainly end up generating a lot of vulnerabilities such as CVE-2022-33987: attacks on arbitrary Unix domain socket application through malicious redirects or more generally through malicious URIs.

What might be useful is:

  • the ability for the user to map domain names to Unix domain sockets in client applications (not super user friendly);
  • maybe associating some domain name suffix to system-local bound services which are explicitly designed to be used this way (eg. *.user.alt for user services and *.system.alt for system services),
    • with some way for applications to expose themselves this way,
    • these domain names could be considered as secure contexts.

but this is really outside of the scope of the URL standard.

@kevincox
Copy link

kevincox commented Dec 11, 2023

While you have a good point it is sort of a shame to block UNIX sockets due to this. The same problems exist for local services, LAN servers (like routers) and even cloud VM metadata servers are open to vulnerabilities due to this. Really every redirect target should be carefully considered, and every DNS lookup should have the resulting IP treated with scrutiny. Unfortunately that isn't the world that we live in, developers are careless and many (most?) popular HTTP libraries don't even expose the primitives to do this. I am not aware of even a single library that prevents this by default. In practice things like Origin headers and CORS are used to ensure that requests are coming from the right place and not tricked redirections. These hacks have worked OK, and particularly vulnerable services like browsers are more strict (such as preventing public sites from accessing your router's web UI in most cases).

However while this vulnerability is not specific to UNIX sockets it is maybe wise to avoid adding more surfaces that can be accessed via this common issue.

@kevincox
Copy link

the ability for the user to map domain names to Unix domain sockets in client applications

Isn't this just security through obscurity? Or is the idea that the service hosting the domain socket needs to opt-in. Presumably because it has some sort of heuristics to block misdirected requests.

@randomstuff
Copy link

randomstuff commented Dec 11, 2023

Or is the idea that the service hosting the domain socket needs to opt-in.

Yes.

One motivation of OP was access control:

Access control. Even if the service is diligent only to bind to localhost, TCP still allows any (non-sandboxed) process or user on the machine to connect. Any access control has to be implemented by the service itself, which often involves implementing (hopefully with sufficient security) its own password authentication mechanism.

However, in order to increase the security of some local application (reduction of the attack surface, rely on implicit authentication through UID and filesystem access control), this might end-up:

  • increasing the attack surface of already existing services;
  • undermining the implicit authentication through UID and filesystem access control of already existing services (confused deputy problem).

Some opt-in mechanism could mitigate these issues to some extent.

@kevincox
Copy link

While this may increase the attack surface of some services it will also decrease the attack surface of others as the original message explains. So it is important to weight the benefits as well as consider possible mitigations that can make the tradeoffs more favourable.

@thx1111
Copy link

thx1111 commented Dec 11, 2023

Given the ambiguity in addressing unix domain sockets, I am still inclined to fault the basic RFC 3986. So, here is a brief review, several rants, and another suggestion for unix domain socket addressing, simply using the square bracket "hack".

Assuming the general concept of "Uniform Resource Identifier" from Section 1.1.3., the basic structure is defined in Section 3 as having 5 components: scheme, authority, path, query, and fragment. First off, then, what type of URI component is a unix domain socket (UDS) address?

The original context here is "HTTP servers", and "http" is, itself, a type of "scheme". So, UDS as "scheme" is not my first choice.

Now, RFC 3986 uses the term "resource" without much constraint, saying 'This specification does not limit the scope of what might be a resource; rather, the term "resource" is used in a general sense for whatever might be identified by a URI.' Effectively, a "resource" is whatever the user wants it to be. Is a UDS a "resource" itself? For the purpose here, "no". The "resource" implied by an HTTP server is some other specific data delivered using HTTP.

Then, is a UDS a type of "path", "query", or "fragment"?

From Section 3.3, "The path component contains data, usually organized in hierarchical form, that, along with data in the non-hierarchical query component (Section 3.4), serves to identify a resource within the scope of the URI's scheme and naming authority (if any)." Since the UDS is not the "resource", and, since the "path" identifies a "resource", then the UDS cannot be a "path".

Similarly, from Sections 3.4. Query and 3.5 Fragment, both of these components are also references to the "resource". So the UDS is also not either a "query" or a "fragment".

And that leads to the inference that the UDS must be a kind of "authority". RFC 3986 actually subdivides the "authority" component itself into three parts, in Section 3.2.:

 authority   = [ userinfo "@" ] host [ ":" port ]

And here, the same analysis can be applied. Is the UDS a type of "userinfo"? Section 3.2.1. says, "The userinfo subcomponent may consist of a user name and, optionally, scheme-specific information about how to gain authorization to access the resource." Hmm - "scheme-specific information about how to gain authorization to access the resource" - "how to gain authorization". Does the UDS tell "how to gain authorization"? Sort of - maybe - not really - I'd say "no".

Is the UDS a type of "host"? From Section 3.2.2., "The host subcomponent of authority is identified by an IP literal encapsulated within square brackets, an IPv4 address in dotted- decimal form, or a registered name." Is, then, the UDS a type of "IP literal", "IPv4 address", or a "registered name"? Hmm - what is an "IP literal"? Again, from Section 3.2.2.:

 IP-literal = "[" ( IPv6address / IPvFuture  ) "]"

Since a UDS is not any of an "IPv6address / IPvFuture", an "Pv4 address", or a "registered name", then "no", a UDS is also not any type of "host".

And then, using RFC 3986, there is only one interpretation remaining. Is the UDS a type of "port"? From Section 3.2.3. Port:

 The port subcomponent of authority is designated by an optional port number in decimal following the
 host and delimited from it by a single colon (":") character.

  port        = *DIGIT

Well, clearly, and as has been mentioned previously in this discussion, the UDS is not a "DIGIT". And here is where I find fault with RFC 3986, in its limited scope when defining "port". Except that, Section 3.2.3. goes on to say, "The type of port designated by the port number (e.g., TCP, UDP, SCTP) is defined by the URI scheme." And that statement suggests asking "What sort of Communication Protocol is UDS?" Of course a UDS is not itself a kind of communication protocol, but the relationship should become apparent. It may be more illuminating to ask the converse, "What sort of Sockets are TCP, UDP, and SCTP?" And then, the Unix - in this case Linux - man pages offer some guidance.

 man 7 tcp:     tcp_socket = socket(AF_INET, SOCK_STREAM, 0);
 man 7 udp:     udp_socket = socket(AF_INET, SOCK_DGRAM, 0);
 man 7 sctp:    sctp_socket = socket(PF_INET, SOCK_STREAM, IPPROTO_SCTP);
                sctp_socket = socket(PF_INET, SOCK_SEQPACKET, IPPROTO_SCTP);

And generally, "What is a 'socket'"? In part:

 man 2 socket:
        Name            Purpose                         Man page
        AF_UNIX         Local communication             unix(7)
        AF_LOCAL        Synonym for AF_UNIX
        AF_INET         IPv4 Internet protocols         ip(7)

 HISTORY
        The  manifest  constants  used under 4.x BSD for protocol families are PF_UNIX, PF_INET, and so
        on, while AF_UNIX, AF_INET, and so on are used for address families.  However, already the BSD
        man page promises: "The protocol family generally is the same as the address family", and
        subsequent standards use  AF_*  everywhere.

and then:

 man 7 unix:    unix_socket = socket(AF_UNIX, type, 0);

Here is my first rant about RFC 3986. The "port" component of the defined URI has presumed an Address Family, here implying AF_INET exclusively, along with what is a merely incidental association with a port "number". There is no explanation or justification given for this presumption.

Alternatively, it might be supposed that this presumption of an Address Family is an erroneous interpretation by the reader of RFC 3986. It may instead be supposed that the "port" component of the URI is simply a general concept to be associated with any Address Family which might be included from the list given from man(2)socket.

And so, I believe that this is the interpretation, while not "official", yet, that must be taken with RFC 3986.

Then, "What is the 'port' subcomponent of authority of an Address Family AF_UNIX socket?"

Here, man(7)unix tells us, "Traditionally, UNIX domain sockets can be either unnamed, or bound to a filesystem pathname (marked as being of type socket)." In our case, we are looking for a URI, so "unnamed" is not useful. Instead, the man page offers "a filesystem pathname". That seems clear enough.

Therefore, an RFC 3986 URI "port" for an AF_UNIX socket might also be interpreted as simply "a filesystem pathname", instead of exclusively as a number.

Allowing that, then the remaining problem only involves appropriate delimiters, to allow correctly parsing the resulting URI for the AF_UNIX "port".

Referring again to Section 2.2.:

      reserved    = gen-delims / sub-delims

      gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

      sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

Incidentally, it may be noted that this RFC 3986 list of delimiters is missing the percent "%", from Section 2.1 Percent-Encoding, and the set of White Space characters generally. The reader is now well into the realm of "inferring", "guessing", and "interpreting", instead of specifically "defining".

Here is my second rant about RFC 3986, related to the use of delimiters. The Section 3. URI syntax explicitly defines the ":" as separating the "scheme" from the "authority". Subsequently, in Section 3.2., it says 'The authority component is preceded by a double slash ("//") and is terminated by the next slash ("/"), question mark ("?"), or number sign ("#") character, or by the end of the URI.' Taken together, this double slash actually provides no information whatsoever in the URI and only serves to "poison" the parsing of the URI, by requiring the parser to distinguish potentially between ":///...", "://...", and ":/...". For instance, the "file" scheme, RFC 8089, supports optionally leaving out this useless "//" altogether. RFC 3986 offers no explanation or justification for this use the double slash "//". The delimiter might as well have been defined explicitly as "://". This makes any use of the slash "/" as a delimiter in the URI potentially problematic, where it is also used as an essential component of any unix "filesystem pathname", when referring to the proposed UDS AF_UNIX "port", as well as, already, referring to an actual "resource" by pathname.

A third rant regards Section 3.2.2 Host, which says:

 A host identified by an Internet Protocol literal address, version 6 [RFC3513] or later, is
 distinguished by enclosing the IP literal within square brackets ("[" and "]").  This is the only place
 where square bracket characters are allowed in the URI syntax.

The only reason that these square brackets are needed is because of the repeated and overloaded use of the colon ":" as a delimiter in the "authority", in Section 3.2 preceding the "port", and in Section 3.2.1, potentially subdividing the "userinfo". Considering that RFC 3513 defines the use of colon ":" as the field delimiter in an IPv6 address, this should have glaringly suggested that the same ":" would be a bad choice for a delimiter in the RFC 3986 "authority" component and subcomponents of the URI. And there are plenty of alternative characters to choose, from the small ASCII character set, for use as delimiters in the "authority".

The use of the square brackets, then, is a "hack", consequent of a bad choice for delimiter in the "authortiy" component of the URI. Be that as it may, suppose that the prohibition "This is the only place where square bracket characters are allowed in the URI syntax", is ignored. Then, this same "hack" can be applied equally to the unfortunate choice of the slash "/" as a delimiter within the URI syntax with respect to the "port" subcomponent of the "authority", as with the "host" subcomponent.

I propose now another alternative to addressing unix domain sockets. By example, using the square bracket "hack", the result would allow, for instance, all of:

http://:[/path/to/socket]/path/to/resource.html?...#...
http://localhost:[/path/to/socket]/path/to/resource.html...
http://[::1]:[/path/to/socket]/path/to/resource.html...
http://user:password@[::1]:[/path/to/socket]/path/to/resource.html...
http://unix:[/path/to/socket]/path/to/resource.html...

All of these examples otherwise strictly follow the RFC 3986 URI syntax.

That is the least intrusive "hack" to UDS addressing and merely extends an existing URI "hack". A "cleaner" revision to RFC 3986 would be to eliminate the use of either the colon ":" or the slash "/" as delimiters in the URI syntax delineating its components and subcomponents, except for the initial ":" separating the "scheme" and "authority". There are 11 other "sub-delims" defined in RFC 3986 that seem perfectly usable as delimiters in the URI "authority", which would obviate the need for using these square bracket "hacks" completely.

With reference to previous remarks about security issues, it may be noted that man(7)unix describes AF_UNIX as supporting communication "between processes on the same machine", so there would be no "remote access" possible, despite the http/https "scheme", if that constraint were followed. And, since the UDS "port" is just a Unix "filesystem pathname", there are many existing security measures available.

On the other hand, this suggested UDS AF_UNIX "port" addressing clearly does lend itself to replacing "localhost" with "some-remote-host", to access some UDS on, literally, a remote host. But then, any http/https "server" will be providing its own security measures, should it allow UDS addressing at all, so that's a different issue and not really a problem here. This does introduce another concept, access to a UDS by a local http/https server, as opposed to UDS access only by a local html display client.

There is still the question of whether the http/https schemes would need to be formally updated to acknowledge any kind of UDS AF_UNIX "port" addressing. Reading at RFC 9110, Sections 4.2.1. http URI Scheme and 4.2.2. https URI Scheme:

        The origin server for an "http[/https]" URI is identified by the authority component, which
        includes a host identifier ([URI], Section 3.2.2) and optional port number ([URI], Section
        3.2.3).

By my reading, "no". The http/https schemes simply refer to the RFC 3986 URI "optional port number" definition, and would therefore follow any update to RFC 3986 itself.

The much more difficult issue remains with any html display client, which must be taught to recognize any kind of UDS AF_UNIX "port" addressing. Again, strictly, that is a separate issue. But this does point-out that the proposal here implies that there are two distinct "solution" arenas to confront: first, RFC 3986 itself, and second, the various de facto standard html display clients extent.

The Node.js security issue mentioned by @randomstuff is - well - a Node.js security issue, as was mentioned. It's not a server security issue and has nothing to do with UDS AF_UNIX "port" addressing per se. Of course, that also doesn't mean that html display client security issues go away. It's just a separate problem - though, it's still a problem. It is interesting that this raises the question of security in the "reverse" direction, from a remote "server" potentially accessing a local "client resource", through a UDS.

That is not something inherent in the original concept of http client/server communication, but a consequence of allowing the "client" to potentially act, itself, as a kind of "server", using some client facility, as with javascript, to access a local resource. The security model, then, requires simply that the client be smart enough not to do "anything stupid" at the behest of the server. Ha!

@mnot
Copy link
Member

mnot commented Dec 11, 2023

Lots of different proposals have been made above:

  • Changing the URL syntax
  • Adding a new DNS TLD
  • Appending a suffix to the URL scheme
  • Defining a new URL scheme

Changing the URL syntax requires coming up with a solution for all URLs, not just HTTP. Backwards compatibility needs to be considered for a very large ecosystem, and incremental deployment needs to be considered. As Anne said above, these factors raise the bar considerably for any proposal, and so should be a last resort (there's currently an effort by IPv6 people to do a similar thing, and it's not going well for these reasons).

Creating a new TLD for one protocol isn't good architecture, and a lot of people are going to push back on it. Again, a proposal in this area is likely to hit friction from other, unrelated communities (in this case, DNS).

Appending a suffix to the URL scheme implies that the suffix makes sense for other URL schemes. This means that wider review and discussion will need to take place to get it adopted.

That makes defining a new URL scheme the approach that's most likely to succeed. Such a scheme could define itself to use an authority that is not grounded in DNS, so it could be something like:

httpu://tmp.mysock/path/to/resource?query&string

Defining it as a new scheme would also provide an opportunity to answer a lot of questions like "is HTTP/1 or HTTP/2 used"? "does it use TLS"? and so on.

But that's just my opinion.

If there's interest in solving this problem, I'd suggest that someone write a document outlining a proposal and bring it to the IETF HTTP WG - there are are larger diversity of HTTP implementers represented there that can provide feedback.

@thx1111
Copy link

thx1111 commented Dec 12, 2023

@mnot:

Creating a new TLD for one protocol isn't good architecture, and a lot of people are going to push back on it.

On reflection, I'm going to totally agree with that.

Changing the URL syntax requires coming up with a solution for all URLs, not just HTTP.
...
That makes defining a new URL scheme the approach that's most likely to succeed.

There is nothing in any of my, or several other, proposals that is specific to only the http/https "schemes", as the term is defined in RFC 3986. Again, RFC 8820, Section 2.1, "URI Schemes", strongly discourages the introduction of new "schemes".

I have suggest three alternatives for - to put it generally - Address Family "port" addressing.

Extending the overloaded use of the colon ":" delimiter:

 http://:/path/to/socket:/path/to/resource.html...
 http://user@[::1]:/path/to/socket:/path/to/resource.html...

Extending the square bracket hack:

 http://:[/path/to/socket]/path/to/resource.html...
 http://user@[::1]:[/path/to/socket]/path/to/resource.html...

Using alternate delimiters, eliminating the double slash "//", the square bracket hack "["..."]", and
the overloaded use of the colon ":" delimiter, as for instance:

 http:&/path/to/socket+/path/to/resource.html...
 http:user@::1&/path/to/socket+/path/to/resource.html...

More generally, any specific delimiter between RFC 3986 "authority" and "path" would solve the URI issue raised here. To illustrate, where RFC 3986 has defined:

      URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

      hier-part   = "//" authority path-abempty
                  / path-absolute
                  / path-rootless
                  / path-empty

      authority   = [ userinfo "@" ] host [ ":" port ]

This would instead become:

      URI = scheme ":" [ userinfo "@" ] host [ ":" port ] "your-favorite-delimiter-here" path-something [ "?" query ] [ "#" fragment ]

The essential problem for Address Family "port" addressing comes down to RFC 3986 failing to just define a specific delimiter between its "authority" and "path" components, or, stating this another way, failing to define a specific
delimiter which precedes its "path" component. And then, RFC 3986 struggles desperately to overcome this failure in Section 3.3. Path, explaining "The ABNF requires five separate rules to disambiguate these cases, only one of which will match the path substring within a given URI reference."

Section 3.3. even provides an unconvincing example of "path" while trying to "paper-over" this failure:

   A path consists of a sequence of path segments separated by a slash
   ("/") character.  A path is always defined for a URI, though the
   defined path may be empty (zero length).  Use of the slash character
   to indicate hierarchy is only required when a URI will be used as the
   context for relative references.  For example, the URI
   <mailto:fred@example.com> has a path of "fred@example.com", whereas
   the URI <foo://info.example.com?fred> has an empty path.

Why try to "shoehorn" mailto:fred@example.com into an example of "path"? "fred@example.com" looks like
a perfectly good example of 'userinfo "@" host' to me. There is no need to call it something else, attempting to justify the missing useless double slash "//", which otherwise requires "mailto://fred@example.com".

@mnot
Copy link
Member

mnot commented Dec 12, 2023

Again, RFC 8820, Section 2.1, "URI Schemes", strongly discourages the introduction of new "schemes".

I wrote that RFC. That is not what Section 2.1 says.

@thx1111
Copy link

thx1111 commented Dec 12, 2023

Hmm - copying the text:
https://www.rfc-editor.org/rfc/rfc8820

Abstract
...  While it is common for schemes to further
delegate their substructure to the URI's owner, publishing independent standards that mandate particular
forms of substructure in URIs is often problematic.
...
2.1. URI Schemes
...
A Specification that defines substructure for URI schemes overall (e.g., a prefix or suffix for URI scheme
names) MUST do so by modifying [BCP35] (an exceptional circumstance).

and, https://www.rfc-editor.org/info/bcp35

Abstract
This document updates the guidelines and recommendations, as well as
the IANA registration processes, for the definition of Uniform
Resource Identifier (URI) schemes. It obsoletes RFC 4395.

Then, by "exceptional circumstance", you meant modifying, literally, the document BCP35 itself, and not the resulting list of registered "schemes" referencing BCP35? I stand corrected.

Still, there is the problem of modifying existing, or creating new, applications able to utilize any particular scheme. I don't expect that my web browser actually supports the currently 374 different registered schemes available. In fact, the trend has been for, for instance, web browsers to drop support for less commonly used schemes - no more gopher, ftp, or mailto - with some functionality being replaced by specialized scheme applications or by "groupware" suites.

I still don't agree that defining and registering a new scheme, exclusively to support html rendering from a local unix domain socket, is a good idea. Rather, that use case does serve to illuminate a deeper systemic fault in RFC 3986.

I did rather like gopher, though, ...

@agowa
Copy link

agowa commented Dec 14, 2023

@mot: A while ago I already worte to the IETF mailing lists about such a change, but they just forwarded me here. I don't remember all the details, as the whole thing started ages ago (ok, probably more like about one year), but I could try to look for these related mails.

You already have seen my initial suggestion in another ticket?
#778 (comment)

It would be backwards compatible by allowing for default values to be omitted. It would work with everything that currently uses the URL Schema (in a standards compliant way, at least). And it would also allow for the very verbose way of specifying all the protocols down to the wire....

@mnot
Copy link
Member

mnot commented Dec 20, 2023

@thx1111:

Then, by "exceptional circumstance", you meant modifying, literally, the document BCP35 itself, and not the resulting list of registered "schemes" referencing BCP35? I stand corrected.

Understand that RFC 8820 is best current practice for applications that use HTTP (what some people call "HTTP APIs" or "REST APIs") -- it's saying that it's exceptional that a one of them would require a new scheme.

Still, there is the problem of modifying existing, or creating new, applications able to utilize any particular scheme. I don't expect that my web browser actually supports the currently 374 different registered schemes available. In fact, the trend has been for, for instance, web browsers to drop support for less commonly used schemes - no more gopher, ftp, or mailto - with some functionality being replaced by specialized scheme applications or by "groupware" suites.

Browsers are going to have to change if they want to support anything that happens here, so that isn't a decisive factor regarding syntax.

I still don't agree that defining and registering a new scheme, exclusively to support html rendering from a local unix domain socket, is a good idea.

HTTP isn't just for HTML.

To be clear, I don't think a new scheme is the only way to do this; it's just more straightforward than other suggestions so far.

@agowa:

You already have seen my initial suggestion in another ticket?
#778 (comment)

I hadn't, but that seems like a lot of work (and abstraction) to get to the goals here.

Normally, protocols can negotiate transitions like this (see eg the evolution from HTTP 1-3). What's different here is that unix domain sockets have a completely different authority, and a subtly different transport (as opposed to TCP).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests