Add support for unprivileged namespaces on linux #104

one-d-wide · 2024-04-03T16:22:56Z

Motivation

Hi. I was trying to came up with some solution to make a network environment isolated from a global system setup with all traffic reaching outside routing via the a socks proxy or vpn instance in unprivileged (rootless) manner.

Usually this would require using something like veth pairs (virtual network interfaces) to bridge namespaces. But required modifications affect the global network namespace therefore require root-like capabilities. That aren't normally available to the user processes.

This why I came up with a solution utilizing TUN interface. And since it similar to what this project does, I decided to integrate that functionality here.

Here are use-cases I wanted to add support for:

Create an isolated and proxified environment for applications that don't understand proxies by themselves.
Separate vpn instances from global network namespace.
Support flatpak application (this is tricky because some containerization solutions are messing up capabilities such that flatpak fails to run).
Do everything above without requiring root-like capabilities.

Implemented features

Unprivileged mode which creates an isolated network environment (from the global network configuration) and bridges all the traffic reaching outside of it through the proxy. See --unshare.
No-proxy mode which sends routes traffic directly to the requested address. See --proxy none.

Usage

Create a new network namespace where all traffic reaching outside is routed through the socks5 proxy.

$ tun2proxy --unshare --startup --proxy "socks5://..."
...

Create a new network namespace without additional redirection of traffic with openvpn running in it (with root-like capabilities provided to allow network configuration).

$ tun2proxy --unshare --startup --proxy "none" -- openvpn [...]
...

Start /bin/sh in the namespace created by the tun2proxy started the latest. Persistent command is suggested in tun2proxy output.

$ nsenter --target $(pgrep -n tun2proxy) --preserve-credentials --user  --net --mount /bin/sh

Architecture

There are three core ideas in play:

When a process creates namespace it gains root-like capabilities. I.e. the ability to modify filesystem and network devices it the namespaces. This capability can be passed to descendant processes allowing them to create a TUN devices in unprivileged environment.
A pair of interconnected unix(7) sockets may not only exchange data but also copies of valid file descriptors.
This mechanism is useful to transfer sockets bonded to the global network to another network namespace.
File descriptors without CLOEXEC flag (it is usually set by the std implementation) are inherited in descending processes.

Using those ideas execution flow can be described as follows:

First tun2proxy process with argument --unshare generates a pair of interconnected unix sockets unsetting CLOEXEC flag on one of those and converting it to raw file descriptor value.
Then it starts unshare(1) process that creates a new namespace.
unshare process is instructed to start tun2proxy binary again but this time with --socket-transfer-fd <fd> argument where <fd> is the raw file descriptor discussed earlier.
tun2proxy process inside the new namespace initializes TUN interface and routing table as normal.
But since network namespace it is running in is isolated, it can't connect to the proxy server directly. This what unix sockets are required for.
After initialization is done, it starts transferring network sockets from the first tun2proxy process that is still in the global network namespace and sockets created by which are bonded to the global namespace even after the transfer to the descendant process occurs.
The network admin process (see [admin command] cli argument) is started after the network is ready, preserving root-like capabilities, that process could be for example a vpn daemon.
Then tun2proxy operates as usual.

Code changes

bin/main.rs - added a startup routine triggered if --unshare is specified.
args.rs - added new cli arguments: --proxy none, --unshare, --socket-transfer-fd <fd>, [admin\_command...], udp_timeout.
desktop_api.rs - added code to spawn an admin_command and fix to allow running other proxy in a new namespace.
lib.rs - added functionality to fetch network sockets in case --socket-tranfer-fd was specified. Also a small fix to bunch series of Mutex::lock into one call where applicable.
no_proxy.rs (created) - implementation of no-proxy mode, basically a no-op since it signals that connection is always established.
proxy_handle.rs (and implementations of proxy protocols) - add get_server_addr function to ProxyHandler trait, previously server address was supplied directly from cli arguments.
socket_transfer.rs (created) - functionality to safely transfer raw sockets between processes.

Notes

tproxy-config is lacking the support not to set ipv6 routing. This confuses some user application if proxy doesn't support ipv6 connectivity, e.g. causes timeout in resolution and subsequent fail of dns requests (this also could be attributed to ipstack not reporting unreachable destinations).
tproxy-config is always setting more specific (compared to 0.0.0.0/0) routing addresses even if no default route is presented in the network namespace. This causes mutual exclusion in routing tables for 0.0.0.0/1 and 128.0.0.0/1 address spaces with other proxy application (notably openvpn).

blechschmidt · 2024-04-03T20:36:35Z

Hi,

thanks a lot. This is a great feature and the PR is well-documented. It's rare to see PRs of such quality and I really appreciate the effort. I probably won't be able to finish the review today, but I will likely get this merged next weekend.

Regarding your notes:

tproxy-config is lacking the support not to set ipv6 routing. This confuses some user application if proxy doesn't support ipv6 connectivity, e.g. causes timeout in resolution and subsequent fail of dns requests (this also could be attributed to ipstack not reporting unreachable destinations).

This is totally true. There is at least one easily feasible approach to solve this:

By adapting tproxy-config to take an argument indicating whether the default route for IPv6 should be created. Currently, the implementations for the different operating systems are not aligned but fixing this is not technically complicated.
It would be better if IPv6 support could be detected automatically. But this is not easy. For SOCKS5 one could probably react upon Network unreachable response codes from the server, but I doubt this solution would be transferable to HTTP proxies. Probing a particular well-known address could also be considered, but this would in turn come with privacy drawbacks.

tproxy-config is always setting more specific (compared to 0.0.0.0/0) routing addresses even if no default route is presented in the network namespace. This causes mutual exclusion in routing tables for 0.0.0.0/1 and 128.0.0.0/1 address spaces with other proxy application (notably openvpn).

I will address this when merging your PR.

Thanks again.

PS: Depending on your use case, you may also be interested in pallium which solves this by chaining network namespaces with the first one making use of (https://github.com/rootless-containers/slirp4netns) or slirpnetstack. (It does not yet use tun2proxy though.)

one-d-wide · 2024-04-03T21:24:27Z

Hi. Thanks for the suggestions, will definitely give them a try.

There are some issues with cross compilation, I will address them in a weekend.

By adapting tproxy-config to take an argument indicating whether the default route for IPv6 should be created. Currently, the implementations for the different operating systems are not aligned but fixing this is not technically complicated.

Good to hear. You may also consider moving tproxy_remove functionality to a TproxyState drop trait to guarantee it's run, since it also messes up with /etc/resolv.conf.

blechschmidt · 2024-04-03T21:26:37Z

Good to hear. You may also consider moving tproxy_remove functionality to a TproxyState drop trait to guarantee it's run, since it also messes up with /etc/resolv.conf.

I am considering this, I have already seen the TODO in the code.

one-d-wide · 2024-04-04T14:02:05Z

Fixed, checks are all green.

I just tested target/release/tun2proxy --unshare --setup --proxy none under the perf tool with iperf3 session running across namespace boundary. And like 82.34% of the cpu time is spent in memmove. Seems like a lot of room for zero-copy optimizations since the proxy doesn't really need to touch data that much. I wonder what if just use mtu-sized buffer with a sliding window in it to mark available data for each consecutive network layer. And then cycle those buffers around via an unbounded channel, instead of allocating and copying to a new one at each step.

Edit: Large memmove spike in flamegraph was mainly contributed by a really fat but unused internal metadata being passed around in memory in ipstack. It's optimization resulted in 300% increase in throughput in tun2proxy for the same testload.

FYI, there is a flamegraph (js is baked in)

ssrlive · 2024-04-07T07:14:32Z

Hi @one-d-wide , I have changed the code to make UDP work again. Please check your part to make it works correctly.
Thanks.

one-d-wide · 2024-04-07T10:50:42Z

Hi.

The issue with socks5 associated udp connection was that primary tcp connection must be maintained for the whole period of the udp session, and when it closes socks5 server assumes udp connection is no longer in use.

I reverted your last commit since it is mostly obsolete. Should I remove it from the history entirely?

And could you elaborate about the change to the cli parameters there making admin_command an option (previously it was gathered from trailing fields). I don't see trailing fields used anywhere else in the cli of tun2proxy.

     /// Specify a command to run with root-like capabilities in the new namespace.
     /// This could be useful to start additional daemons, e.g. `openvpn` instance.
-    #[arg(requires = "unshare")]
+    #[arg(long, value_name = "command", requires = "unshare")]
     pub admin_command: Vec<OsString>,

For example that change would require turning the following line:

$ tun2proxy --unshare --setup --proxy none -- openvpn --config config.ovpn

To this:

$ tun2proxy --unshare --setup --proxy none --admin-command=openvpn --admin-command=--config --admin-command=config.ovpn

ssrlive · 2024-04-07T11:04:57Z

Since first call to get_udp_associate() will always is None, so there be minor changes.

ssrlive · 2024-04-07T11:10:06Z

OK, change back as your wish.

And could you elaborate about the change to the cli parameters there making admin_command an option (previously i

ssrlive · 2024-04-07T11:13:34Z

Since the merge will be Squash and merge , so any dirty code in PR is allowed.

one-d-wide · 2024-04-07T11:17:02Z

Since first call to get_udp_associate() will always is None, so there be minor changes.

It isn't. In case proxy type is NoProxy and udp_associate argument was true, then the get_udp_associate() of NoProxyHandler will always return a destination address.
And since server_addr may not even be opened for tcp, the failed connection to it could falsely abort the udp connection.

ssrlive · 2024-04-07T11:19:15Z

Oops.

ssrlive · 2024-04-07T11:19:56Z

You change back it. Please.

ssrlive · 2024-04-07T11:24:39Z

I'm not familiar with namespaces, so please modify it according to your ideas.

one-d-wide · 2024-04-07T11:26:54Z

Since the merge will be Squash and merge , so any dirty code in PR is allowed.

Why would you do so? This would destroy any logical changes segregation provided by commits, I contributed to some larger project and they were pretty insistent on keeping commit history clean.

You change back it. Please.

Ok, give me a moment.

ssrlive · 2024-04-07T12:03:23Z

Hi @blechschmidt , I have no more opinion here. Please review and test and then merge it.

blechschmidt · 2024-04-07T20:26:41Z

Thank you again for implementing this feature.

I have merged this implementation now using a merge commit, making it clear that all commits have been merged into master in one go.

Apart from a few cosmetic changes, I have mainly made the following adaptions:

40368dd makes use of /proc/self/exe. This is more portable (cf. current_exe() returns invalid path on linux when exe has been deleted rust-lang/rust#69343) and more secure than using current_exe(), cf. https://doc.rust-lang.org/std/env/fn.current_exe.html. I don't think it has any security implications in this case, but better safe than sorry.
e8469f0 restricts the namespace-related arguments to Linux only and hides the fd argument, which should never be displayed to the end user.

Seems like a lot of room for zero-copy optimizations since the proxy doesn't really need to touch data that much. I wonder what if just use mtu-sized buffer with a sliding window in it to mark available data for each consecutive network layer. And then cycle those buffers around via an unbounded channel, instead of allocating and copying to a new one at each step.

There is certainly a lot of room for optimization. One just has to look at the push/pull implementation (push_data, consume_data) to know that the current implementation is not ideal in terms of performance. sendfile would also come to mind here. I myself currently don't have the resources for exploring and implementing these optimizations, but you are welcome to help by contributing.

blechschmidt · 2024-04-13T15:24:26Z

tproxy-config is lacking the support not to set ipv6 routing. This confuses some user application if proxy doesn't support ipv6 connectivity, e.g. causes timeout in resolution and subsequent fail of dns requests (this also could be attributed to ipstack not reporting unreachable destinations).

This has been addressed in 09994d4 and v4.0.0 of the tproxy-config implementation. The --setup argument now also deletes the IPv6 default route if IPv6 is not enabled.

blechschmidt self-assigned this Apr 3, 2024

one-d-wide added 3 commits April 3, 2024 20:58

ci: don't abort checks immediately if error is encountered

74e5220

add udp timeout option

361cf95

add no-proxy mode

5e99c9f

one-d-wide force-pushed the namespaces branch from 978762b to a188f91 Compare April 3, 2024 21:24

one-d-wide force-pushed the namespaces branch 2 times, most recently from 90be936 to a1a708a Compare April 4, 2024 13:03

ssrlive force-pushed the namespaces branch 2 times, most recently from 589da6a to d6bd7a5 Compare April 6, 2024 15:39

blechschmidt mentioned this pull request Apr 6, 2024

Implement destructor tun2proxy/tproxy-config#4

Merged

ssrlive force-pushed the namespaces branch from d6bd7a5 to 6da8a98 Compare April 7, 2024 02:43

one-d-wide force-pushed the namespaces branch from 80fc942 to b4371b3 Compare April 7, 2024 11:48

ssrlive force-pushed the namespaces branch from b4371b3 to 80675ef Compare April 7, 2024 12:45

blechschmidt force-pushed the namespaces branch 2 times, most recently from 7d3c48a to a5e9195 Compare April 7, 2024 19:06

blechschmidt force-pushed the namespaces branch from a5e9195 to aafbc95 Compare April 7, 2024 19:13

one-d-wide and others added 8 commits April 7, 2024 21:32

add support for unprivileged namespaces

d351b50

Apply clippy suggestion

a08b333

remove useless get_server_addr

181497e

Args class

56be614

fix socks5 udp connectivity

f9f5401

minor changes

af6a8a3

Restrict namespace arguments to Linux

e8469f0

Update README

4f5a128

blechschmidt force-pushed the namespaces branch from aafbc95 to 4f5a128 Compare April 7, 2024 19:33

Increase security and portability through the use of /proc/self/exe

40368dd

blechschmidt force-pushed the namespaces branch from 9a74d90 to 40368dd Compare April 7, 2024 19:47

blechschmidt merged commit 0239a22 into tun2proxy:master Apr 7, 2024
3 checks passed

blechschmidt added a commit that referenced this pull request Apr 9, 2024

Fix routing issues described in #104

92bd8d6

blechschmidt added a commit that referenced this pull request Apr 9, 2024

Fix routing issues described in #104

3eb2a34

blechschmidt added a commit that referenced this pull request Apr 13, 2024

Fix routing issues described in #104

09994d4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for unprivileged namespaces on linux #104

Add support for unprivileged namespaces on linux #104

one-d-wide commented Apr 3, 2024 •

edited

blechschmidt commented Apr 3, 2024

one-d-wide commented Apr 3, 2024

blechschmidt commented Apr 3, 2024

one-d-wide commented Apr 4, 2024 •

edited

ssrlive commented Apr 7, 2024

one-d-wide commented Apr 7, 2024

ssrlive commented Apr 7, 2024

ssrlive commented Apr 7, 2024

ssrlive commented Apr 7, 2024

one-d-wide commented Apr 7, 2024 •

edited

ssrlive commented Apr 7, 2024

ssrlive commented Apr 7, 2024

ssrlive commented Apr 7, 2024

one-d-wide commented Apr 7, 2024 •

edited

ssrlive commented Apr 7, 2024 •

edited

blechschmidt commented Apr 7, 2024 •

edited

blechschmidt commented Apr 13, 2024

Add support for unprivileged namespaces on linux #104

Add support for unprivileged namespaces on linux #104

Conversation

one-d-wide commented Apr 3, 2024 • edited

Motivation

Implemented features

Usage

Architecture

Code changes

Notes

blechschmidt commented Apr 3, 2024

one-d-wide commented Apr 3, 2024

blechschmidt commented Apr 3, 2024

one-d-wide commented Apr 4, 2024 • edited

ssrlive commented Apr 7, 2024

one-d-wide commented Apr 7, 2024

ssrlive commented Apr 7, 2024

ssrlive commented Apr 7, 2024

ssrlive commented Apr 7, 2024

one-d-wide commented Apr 7, 2024 • edited

ssrlive commented Apr 7, 2024

ssrlive commented Apr 7, 2024

ssrlive commented Apr 7, 2024

one-d-wide commented Apr 7, 2024 • edited

ssrlive commented Apr 7, 2024 • edited

blechschmidt commented Apr 7, 2024 • edited

blechschmidt commented Apr 13, 2024

one-d-wide commented Apr 3, 2024 •

edited

one-d-wide commented Apr 4, 2024 •

edited

one-d-wide commented Apr 7, 2024 •

edited

one-d-wide commented Apr 7, 2024 •

edited

ssrlive commented Apr 7, 2024 •

edited

blechschmidt commented Apr 7, 2024 •

edited