Support Unix domain sockets as -a listen addresses and backend addresses #2371

slimhazard · 2017-07-16T20:22:21Z

This is a partial implementation of the ideas proposed in VIP17.

In VIP17 but not implemented yet:

UDS addresses for varnishadm and the CLI (-T and -M args).
param uds_path as a search path for relative paths in backend definitions.
- At the moment, all paths for a UDS are required to begin with a /.
Obtaining credentials of the peer over the UDS (uid, gid) and making them available via VMOD std.
Named listen addresses (@Dridi will be working on that separately).

Update: At the end of VIP17 (section "Implementation"), I've written down some proposals for how these features could be implemented -- -T and -M, uds_path, and additions to VMOD std.

The big picture: -a addresses can be specified as:

-a /path/to/uds[,[PROXY,][user=<user>,][group=<group>,][mode=<mode>]]

If you specify any of user, group and/or mode (mode in octal), then the path will be created with those permissions. (The sub-args can appear in any order.)

A backend definition may have:

backend foo { .path = "/path/to/uds"; }

.path XOR .host must be specified for a backend.

You can also use -b /path/to/uds to define a backend on the command line.

This code has been tested on:

Linux 3.16 (Debian jessie 3.16.43-2+deb8u2) and 4.9 (Debian jessie 8.9, 4.9.30-2+deb9u2~bpo8+1)
FreeBSD 11.0-RELEASE-p9
SunOS 5.11 (joyent_20160317T000621Z)

There are quite a few details and issues worth considering for a review. So as to break it up into small pieces, I'll add more comments about those issues (and point out some interesting quirks about UDSen that I discovered along the way).

Update: Branches with the fork rebased to Varnish 5.2.0 and 5.2.1 live here:

RPMs for el7 for the two forks live at: https://packagecloud.io/uplex/varnish-uds

slimhazard · 2017-07-16T20:48:58Z

sockaddr_un storage management, VSAs and session workspace

A VSA now looks like this:

struct suckaddr {
        unsigned                        magic;
#define SUCKADDR_MAGIC                  0x4b1e9335
        sa_family_t                     sa_family;
        union {
                struct sockaddr_in              sa4;
                struct sockaddr_in6             sa6;
                const struct sockaddr_un        *sau;
        }                               sa;
};

So the UDS member is a pointer to sockaddr_un and the creator of the VSA must pass in the pointer. This has the consequence that the typeless sockaddr, which had been used to get the sa_family, cannot be a member of the union, so sa_family is pulled out as a separate field. VSAs don't get any larger -- on my machine they're 40 bytes.

VSA_Build and VSA_Malloc now have an optional argument that may be NULL or a pointer to a sockaddr_un, which must be passed in by the caller for a UDS. Neither of these allocate space for sockaddr_un, but VSA_Clone clones a whole VSA, including space for a sockaddr_un if it's PF_UNIX.

When Varnish starts, mgt_acceptor.c allocates space for sockaddr_un for any UDS listen addresses, and the VSAs in the pool of struct listen_sock point to them. As before, these are handed off to the child via heritage.socks, which lives for the lifetime of Varnish and never has to be destroyed. So it just dies, along with the sockaddr_un storage, when the management process exits.

When a session is created for a connection accepted via UDS, its VSA is duplicated as the LOCAL session attribute, but it points back to the sockaddr_un in the heritage.socks pool. The other three session attributes all point to LOCAL. A UDS will never have different addresses for LOCAL and REMOTE (it's the same path), and this implementation is not supporting UDS addresses via PROXY (for reasons I'll explain below), so CLIENT and SERVER are also the same.

So we actually use less session workspace for UDSen -- one VSA for LOCAL, pointing to the sockaddr_un in heritage.socks is all we need. The other three session attributes for addresses point to it, and we use no session workspace for the IP and port strings.

slimhazard · 2017-07-16T21:05:23Z

Backends and TCP pools

struct backend and struct tcp_pool now each have these fields, in addition to the VSA fields for IPv4 and IPv6 VSAs:

        struct suckaddr         *uds;
        void                    *uds_sockaddr;

The second pointer points to suckaddr_un, which the backend and TCP pool each allocate, and later destroy. IMO this does not significantly increase the storage burden, because each backend and TCP pool has up to two sockaddrs, as was the case before with IPv4 and IPv6 (other than the fact that UDS sockaddrs are larger).

struct vrt_backend now has a field char *path. VCC does not have to do the dance of resolving IP addresses and the binary VSA, so it defers the creation of the VSA for a UDS, and of the storage for sockaddr_un, to VRT_new_backend. The VCC compiler checks the path -- it must exist and be a socket, otherwise Varnish won't be able to connect to it.

VMOD debug has been extended to support dynamic creation of backends that listen at UDSen via VRT_new_backend, and VTC tests for that have been added.

Edit: uds_sockaddr had been const void * in a previous version, but --enable-developer-warnings and flexelint pointed out that the pointer cannot be const, since for backends it must be possible to free the pointer. Later commits in the PR fix that.

slimhazard · 2017-07-16T21:21:43Z

VTCP_*name

These are called about 30 times in the code, and are conceived for IPs/ports, so it's a chore to get them right for UDS. The solution here is:

VTCP_ADDRBUFSIZE is large enough to hold a UDS path.
If it's a UDS, the path gets placed into the addr field, and "-" goes into port.

But that by itself doesn't solve all of the problems, because getsockname and getpeername may return an unnamed UDS, meaning that there's garbage in the sockaddr_un.sun_path field, depending on whether the socket is bound or connected. If it's bound, then getpeername (used by VTCP_hisname) returns an unnamed UDS, and if connected, then getsockname (used by VTCP_myname) returns an unnamed UDS.

So the code has to decide, on a case-by-case basis, whether we're dealing with a UDS, and whether it's bound or connected, and do the right thing. That in turn requires knowing that it's a UDS, which isn't always possible (the VTCP_*names take a socket fd, not a VSA).

slimhazard · 2017-07-16T21:38:20Z

VSL outputs

The output formats are consistent with formats used for IP addresses -- there is always the same number of non-whitespace positions in the output. But some of those positions may be - when the listen/backend address is UDS:

ReqStart /path/to/uds -
- For IPs: ReqStart <ipaddr> <port>
SessOpen - - </path/to/uds or named -a arg> - - <timestamp> <fd>
- IP: SessOpen <local ip> <local port> <-a addr> <remote ip> <remote port> <timestamp> <fd>
BackendStart /path/to/uds -
- IP: BackendStart <backend addr> <backend port>
BackendOpen <fd> <backend display name> /path/to/uds - - - -
- IP: BackendOpen <fd> <backend display name> <remote ip> <remote port> <local ip> <local port>

ReqStart and BackendStart are used by varnishncsa for the %h and %r formatters. There's a test (u00008.vtc) to check that these come out right -- %h produces the path, and %r comes out right for reasons I'll explain below.

slimhazard · 2017-07-16T21:49:33Z

-a path creation and deletion

bind creates the UDS path, and fails with EADDRINUSE if it already exists. This implementation calls unlink before bind, ignores ENOENT and leaves the path in the file system on exit.

Precedents:

haproxy does the same thing.
nginx does the opposite -- it unlinks UDSen that it created on exit, and if the path is there at startup, then bind fails and nginx fails to start.

I'm a bit of wary of deleting an existing path at startup -- someone might think they're using it for something, and Varnish comes along with root permissions and deletes it. But if we were to do it the nginx way, then:

We'd have to install handlers for SIGTERM, SIGINT and atexit in mgt to clean up the paths on exit.
We might still fail to clean up the paths if Varnish is stopped irregularly.
systemd love: when systemctl stop or restart are called, systemd sends SIGQUIT to stop the process. So we'd have to consider a handler for SIGQUIT as well.

That last is the reason why nginx fails to restart when you use systemctl -- it doesn't have a SIGQUIT handler, doesn't clean up the UDSen, and then on systemctl start or restart nginx fails to start if it has to bind to a UDS that it didn't remove on exit.

slimhazard · 2017-07-16T22:05:24Z

X-Forwarded-For and backend Host headers

This implementation leaves X-F-F unchanged if the client connection came over a UDS. Maybe we should add something like localhost. I wouldn't add the path, we don't need to give out information about the file system layout.

Precedents: with haproxy and nginx, you get whatever you get when you have a configuration like set_header X-Forwarded-For $x-f-f-variable. I haven't checked yet what that looks like exactly for a UDS.

In backend definitions thus far, if .hosthdr is not set, then Host in backend requests gets set to whatever was in .host for the backend. But for UDS backends, you must have .path and you MAY NOT set .host in the definition. So in this implementation, Host gets set to localhost in requests to a UDS backend if the .hosthdr was not set.

Maybe we should require .hosthdr for UDS backend definitions. If not, we should stongly recommend setting it in the documentation.

Because of the fact that Host gets set to localhost in this situation, the varnishncsa %r formatter renders a request line with http://localhost/url, so it turns out reasonably.

slimhazard · 2017-07-16T22:29:06Z

vtc

To test Varnish with a UDS listen address, call it with -arg "-a /path/to/uds". The implicit -a argument with port 0 (bind to any open port) is not opened in this case.

The -listen and -connect commands have been adapted to work with UDSen -- if the string begins with /, then a UDS address is used. For server -listen "/path/to/uds", the umask is temporarily set to 0 before bind, to avoid permission issues accessing the socket. -vcl+backend detects which servers are listening at UDS, and constructs appropriate backend definitions.

The easy way to do this is to use ${tmpdir} as the location of the socket, so the new tests look something like this:

server s1 -listen "${tmpdir}/s1.sock" { ... }

varnish v1 -arg "-a ${tmpdir}/v1.sock" -vcl+backend { ... }

client c1 -connect "${tmpdir}/v1.sock" { ... }

If a server is listening at UDS, then both ${s1_addr} and ${s1_sock} have the path, and ${s1_port} is "-". When a Varnish instance listens at UDS, ${v1_addr} has the UDS path, ${v1_port} is "-", and ${v1_sock} is /path/to/uds -.

If a client or server is using a UDS address, then the vtc variables remote.ip and remote.port are NULL (match <undef>). A new variable remote.path has been added, which holds a UDS path or NULL (matching <undef>) if an IP address is in use.

However, I noticed while doing that that remote.ip and remote.port are not used in any of our VTC tests. Maybe we should remove them, and not introduce remote.path.

slimhazard · 2017-07-16T22:33:34Z

Socket options

The table of socket options in cache_acceptor.c now has a flag iponly. When this flag is true, the option is not set for a UDS socket.

Socket options not set for UDS are currently exactly the same as those with level is IPPROTO_TCP, meaning that TCP_NODELAY and the TCP_KEEP* options are not set.

slimhazard · 2017-07-16T22:39:17Z

PROXY protocol

The PR has code to send UDS addresses to a backend for PROXYv2, but there are no tests for that yet.

At this point I don't think we should support receiving UDS addresses via PROXY:

No one seems to have missed it so far.
UDS is not supported in PROXYv1.

We could theoretically support receiving UDS addresses via PROXYv2, but I can't think of any place they could go besides session workspace, with all of the sockaddr_un storage requirements that we wanted to avoid.

IMO this should be something for a future iteration.

slimhazard · 2017-07-16T22:43:53Z

HTTP/2

There is one test for HTTP/2 over UDS addresses -- t02010.vtc, which does exactly what t020000.vtc does, except with UDS.

It seems that the socket address shouldn't matter to the transport protocol. But I'm afraid that I don't really grok HTTP/2 or the vtc commands for it, so I am unaware of any surprises that may turn up after all. Someone who knows HTTP/2 better than I do should put some thought to that.

slimhazard · 2017-07-16T22:45:16Z

Backend_health

May now emit a U, where 4 or 6 appears for IP, for backend probe over UDS.

slimhazard · 2017-07-16T23:02:14Z

*.ip, std.ip() and std.port()

All of the *.ip variables, including beresp.backend.ip, are NULL when the listen or backend address is a UDS. There are now VCL variables local.path (for a listen address) and beresp.backend.path, which hold the path if it's UDS, NULL otherwise.

I didn't see the point in remote., client. or server.path. For a UDS, the remote address is exactly the same as the local address (the same path). If we're not supporting UDSen over the PROXY protocol, there's also no point in client.path or server.path, and introducing them as new VCL variables might create the false impression that we are supporting UDSen over PROXY.

Because of the fact that the *.ip variables might be NULL, the assumption that a VCL_IP variable is never NULL, for example in VMOD functions, is no longer true. For example, the fallback in std.ip() could be NULL, so the idiom std.ip("<string>", client.ip) is no longer guaranteed to return an IP address. That could become a problem for third-party VMODs, if they always assert not-NULL for VCL_IP.

I updated std.ip() to return NULL if called with a string that begins with /, and not to throw assertions if the fallback if NULL. We should consider invoking VCL failure in one or both of these situations.

std.port() returns 0 if the IP is NULL (that didn't have to be changed). If std.port() is passed in a string that begins with a /, then it is not implicitly converted to an IP address (see below about getaddrinfo).

The potential NULL valiue for the *.ip variables also has the strange consequence that:

x.ip == x.ip

is always false for every x. That is, all of the *.ip variables are unequal to themselves if set to NULL due to a UDS (because IP comparison always evaluates to false if one or both of the comparands is NULL).

slimhazard · 2017-07-16T23:07:17Z

ACLs

UDS addresses don't work in any way with ACLs. They can't be listed in an ACL, and matching a *.ip variable against an ACL, which is always NULL for a UDS, always fails for a UDS.

slimhazard · 2017-07-16T23:17:10Z

getaddrinfo

On Linux, AF_UNIX addresses are not supported for getaddrinfo, but they are on FreeBSD. That is, if the string provided happens to match a path that is a socket, then FreeBSD getaddrinfo will return a sockaddr-as-sockaddr_un for it.

In a few places in the code, getaddrinfo is called with the intent of converting a string to an IP address, for example in the IP resolver called by VCC or in VRT_IP_string. Since the intent is to create or return an IP type (a VCL_IP variable or a VSA for IPv4 or IPv6), the code needs to make sure it doesn't get to getaddrinfo if the string might be interpreted as a UDS path. I believe that I've done this; at any rate we'll have to keep an eye on it in the future.

slimhazard · 2017-07-16T23:37:23Z

Weirdnesses

std.set_ip_tos() invokes VCL failure if the listen address is a UDS (socket option IP_TOS is level IPPROTO_IP and is not compatible with UDS).
Test b00046.vtc uses send_urgent to send TCP OOB data, to verify that it "doesn't cause ill effects". OOB data is not supported for UDS (at least on Linux). There's no way to write a test about it, since send_urgent just calls send(2) with MSG_OOB, which fails for a UDS and causes the test to fail. I guess this would be an "ill effect". Is that a problem? (What exactly is the test trying to prove?)

bsdphk · 2017-08-08T08:37:26Z

I tried this on FreeBSD and sent geoff email with the results.

nigoroll · 2017-09-04T11:52:05Z

Consensus from bugwash: merge after next release, re-think the VCL interface

bsdphk · 2017-09-06T08:02:50Z

Just to detail the discussion yesterday: The main worries is if the VCL side of this is where we want to go, and the plan is to start working on this right after 5.2 is out.

… path. On FreeBSD, getaddrinfo(3) may convert a path to an AF_UNIX sockaddr.

Note that as a consequence of the *.ip variables returning NULL for UDS, the fallback argument to std.ip() may now be NULL.

Otherwise, since client.ip is NULL for a UDS, unset client.identity would always return NULL, and hence would be the same for every client. That would undermine the purposes it's meant for, such as the hash director.

These are NULL (match <undef>) for UDSen.

NULL for an IP address (matches <undef>), the path for a UDS.

…heck

There's currently only one left, e826 in cache_acceptor.c, due to assigning void * from VSA_Get_Sockaddr to struct sockaddr_un *.

This affects the log output for BackendStart.

Dridi · 2018-02-27T17:21:56Z

Shouldn't we close this one?

slimhazard · 2018-02-27T17:25:30Z

I was keeping it open for reference, since it's not entirely OBE yet (the backend part is still pending), but yes it was going to be closed eventually. We can of course look back at a closed PR, same difference.

Dridi · 2018-02-27T17:29:19Z

As I see it, the backend side is happening in #2579. I just went through some triage frenzy so if anything should be kept open, don't mind me and reopen it.

slimhazard added a=need bugwash a=feedback please c=varnishd r=6.0 labels Jul 17, 2017

slimhazard force-pushed the master branch from e7e28b6 to 65f930a Compare July 18, 2017 13:55

bsdphk removed the a=need bugwash label Aug 8, 2017

slimhazard force-pushed the master branch from 72e702c to 7b20043 Compare August 15, 2017 10:23

slimhazard force-pushed the master branch 2 times, most recently from 8adecfe to 334055e Compare August 30, 2017 09:40

bsdphk added the a=need bugwash label Sep 4, 2017

bsdphk removed the a=need bugwash label Sep 6, 2017

slimhazard force-pushed the master branch from 334055e to 2a17145 Compare October 26, 2017 13:33

slimhazard added 21 commits February 15, 2018 10:48

VCL *.ip variables are NULL when the address is a Unix domain socket.

2dc4f45

Don't attempt to resolve a string in an IP context if it looks like a…

e993782

… path. On FreeBSD, getaddrinfo(3) may convert a path to an AF_UNIX sockaddr.

std.ip() and std.port() don't work with UDS addresses.

86fb7fc

Note that as a consequence of the *.ip variables returning NULL for UDS, the fallback argument to std.ip() may now be NULL.

std.set_ip_tos() invokes VCL failure for a Unix domain socket.

382378b

Add the VCL variable local.path.

3203262

Add the VCL variable beresp.backend.path.

4d242cc

Add a test to verify that Unix domain sockets don't work with ACLs.

3b6b244

Update vtc remote.ip and remote.port for Unix domain sockets

7141a94

These are NULL (match <undef>) for UDSen.

Add the vtc variable remote.path.

ebdbf5e

NULL for an IP address (matches <undef>), the path for a UDS.

Test dynamic creation of backends listening at Unix domain sockets.

f65cbaa

Add a test for HTTP/2 over Unix domain sockets.

97c1f90

Verify outputs of varnishncsa when UDS addresses are in use.

2bec271

Fix some errors related to UDS revealed by autogen.des and make distc…

6b88cad

…heck

Silence flexelint warnings due to the UDS-related code.

78ca854

There's currently only one left, e826 in cache_acceptor.c, due to assigning void * from VSA_Get_Sockaddr to struct sockaddr_un *.

Use '-' at the port position of a UDS address name.

27c84ef

This affects the log output for BackendStart.

The parser for -a addresses rejects relative paths as UDS addresses.

0375032

Add VSA_Malloc_UDS(), and use it for backends and TCP pools.

08ae847

Add VSA_Build_UDS(), and use it in cache_acceptor.c.

3595898

Revert VSA_Malloc() to the form it had before the introduction of UDS.

74adde0

Revert VSA_Build() to the form it had before the introduction of UDS.

2aeb1d3

slimhazard force-pushed the master branch from 398cae4 to 2aeb1d3 Compare February 15, 2018 09:56

This was referenced Feb 15, 2018

Support Unix domain sockets #2564

Merged

Support Unix domain sockets for -a listen addresses #2571

Closed

Dridi closed this Feb 27, 2018

vssync added the a=backport-review-4.1 label Feb 27, 2018

hermunn removed the a=backport-review-4.1 label Apr 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Unix domain sockets as -a listen addresses and backend addresses #2371

Support Unix domain sockets as -a listen addresses and backend addresses #2371

slimhazard commented Jul 16, 2017 •

edited

slimhazard commented Jul 16, 2017

slimhazard commented Jul 16, 2017 •

edited

slimhazard commented Jul 16, 2017 •

edited

slimhazard commented Jul 16, 2017 •

edited

slimhazard commented Jul 16, 2017 •

edited

slimhazard commented Jul 16, 2017

slimhazard commented Jul 16, 2017 •

edited

slimhazard commented Jul 16, 2017 •

edited

slimhazard commented Jul 16, 2017

slimhazard commented Jul 16, 2017

slimhazard commented Jul 16, 2017

slimhazard commented Jul 16, 2017 •

edited

slimhazard commented Jul 16, 2017

slimhazard commented Jul 16, 2017

slimhazard commented Jul 16, 2017

bsdphk commented Aug 8, 2017

nigoroll commented Sep 4, 2017

bsdphk commented Sep 6, 2017

Dridi commented Feb 27, 2018

slimhazard commented Feb 27, 2018

Dridi commented Feb 27, 2018

Support Unix domain sockets as -a listen addresses and backend addresses #2371

Support Unix domain sockets as -a listen addresses and backend addresses #2371

Conversation

slimhazard commented Jul 16, 2017 • edited

slimhazard commented Jul 16, 2017

sockaddr_un storage management, VSAs and session workspace

slimhazard commented Jul 16, 2017 • edited

Backends and TCP pools

slimhazard commented Jul 16, 2017 • edited

VTCP_*name

slimhazard commented Jul 16, 2017 • edited

VSL outputs

slimhazard commented Jul 16, 2017 • edited

-a path creation and deletion

slimhazard commented Jul 16, 2017

X-Forwarded-For and backend Host headers

slimhazard commented Jul 16, 2017 • edited

vtc

slimhazard commented Jul 16, 2017 • edited

Socket options

slimhazard commented Jul 16, 2017

PROXY protocol

slimhazard commented Jul 16, 2017

HTTP/2

slimhazard commented Jul 16, 2017

Backend_health

slimhazard commented Jul 16, 2017 • edited

*.ip, std.ip() and std.port()

slimhazard commented Jul 16, 2017

ACLs

slimhazard commented Jul 16, 2017

getaddrinfo

slimhazard commented Jul 16, 2017

Weirdnesses

bsdphk commented Aug 8, 2017

nigoroll commented Sep 4, 2017

bsdphk commented Sep 6, 2017

Dridi commented Feb 27, 2018

slimhazard commented Feb 27, 2018

Dridi commented Feb 27, 2018

slimhazard commented Jul 16, 2017 •

edited

slimhazard commented Jul 16, 2017 •

edited

slimhazard commented Jul 16, 2017 •

edited

slimhazard commented Jul 16, 2017 •

edited

slimhazard commented Jul 16, 2017 •

edited

slimhazard commented Jul 16, 2017 •

edited

slimhazard commented Jul 16, 2017 •

edited

slimhazard commented Jul 16, 2017 •

edited