Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for unprivileged namespaces on linux #104

Merged
merged 12 commits into from Apr 7, 2024

Conversation

one-d-wide
Copy link
Contributor

@one-d-wide one-d-wide commented Apr 3, 2024

Motivation

Hi. I was trying to came up with some solution to make a network environment isolated from a global system setup with all traffic reaching outside routing via the a socks proxy or vpn instance in unprivileged (rootless) manner.

Usually this would require using something like veth pairs (virtual network interfaces) to bridge namespaces. But required modifications affect the global network namespace therefore require root-like capabilities. That aren't normally available to the user processes.

This why I came up with a solution utilizing TUN interface. And since it similar to what this project does, I decided to integrate that functionality here.

Here are use-cases I wanted to add support for:

  • Create an isolated and proxified environment for applications that don't understand proxies by themselves.
  • Separate vpn instances from global network namespace.
  • Support flatpak application (this is tricky because some containerization solutions are messing up capabilities such that flatpak fails to run).
  • Do everything above without requiring root-like capabilities.

Implemented features

  • Unprivileged mode which creates an isolated network environment (from the global network configuration) and bridges all the traffic reaching outside of it through the proxy. See --unshare.
  • No-proxy mode which sends routes traffic directly to the requested address. See --proxy none.

Usage

Create a new network namespace where all traffic reaching outside is routed through the socks5 proxy.

$ tun2proxy --unshare --startup --proxy "socks5://..."
...

Create a new network namespace without additional redirection of traffic with openvpn running in it (with root-like capabilities provided to allow network configuration).

$ tun2proxy --unshare --startup --proxy "none" -- openvpn [...]
...

Start /bin/sh in the namespace created by the tun2proxy started the latest. Persistent command is suggested in tun2proxy output.

$ nsenter --target $(pgrep -n tun2proxy) --preserve-credentials --user  --net --mount /bin/sh

Architecture

There are three core ideas in play:

  • When a process creates namespace it gains root-like capabilities. I.e. the ability to modify filesystem and network devices it the namespaces. This capability can be passed to descendant processes allowing them to create a TUN devices in unprivileged environment.
  • A pair of interconnected unix(7) sockets may not only exchange data but also copies of valid file descriptors.
    This mechanism is useful to transfer sockets bonded to the global network to another network namespace.
  • File descriptors without CLOEXEC flag (it is usually set by the std implementation) are inherited in descending processes.

Using those ideas execution flow can be described as follows:

  • First tun2proxy process with argument --unshare generates a pair of interconnected unix sockets unsetting CLOEXEC flag on one of those and converting it to raw file descriptor value.
  • Then it starts unshare(1) process that creates a new namespace.
  • unshare process is instructed to start tun2proxy binary again but this time with --socket-transfer-fd <fd> argument where <fd> is the raw file descriptor discussed earlier.
  • tun2proxy process inside the new namespace initializes TUN interface and routing table as normal.
  • But since network namespace it is running in is isolated, it can't connect to the proxy server directly. This what unix sockets are required for.
  • After initialization is done, it starts transferring network sockets from the first tun2proxy process that is still in the global network namespace and sockets created by which are bonded to the global namespace even after the transfer to the descendant process occurs.
  • The network admin process (see [admin command] cli argument) is started after the network is ready, preserving root-like capabilities, that process could be for example a vpn daemon.
  • Then tun2proxy operates as usual.

Code changes

  • bin/main.rs - added a startup routine triggered if --unshare is specified.
  • args.rs - added new cli arguments: --proxy none, --unshare, --socket-transfer-fd <fd>, [admin\_command...], udp_timeout.
  • desktop_api.rs - added code to spawn an admin_command and fix to allow running other proxy in a new namespace.
  • lib.rs - added functionality to fetch network sockets in case --socket-tranfer-fd was specified. Also a small fix to bunch series of Mutex::lock into one call where applicable.
  • no_proxy.rs (created) - implementation of no-proxy mode, basically a no-op since it signals that connection is always established.
  • proxy_handle.rs (and implementations of proxy protocols) - add get_server_addr function to ProxyHandler trait, previously server address was supplied directly from cli arguments.
  • socket_transfer.rs (created) - functionality to safely transfer raw sockets between processes.

Notes

  • tproxy-config is lacking the support not to set ipv6 routing. This confuses some user application if proxy doesn't support ipv6 connectivity, e.g. causes timeout in resolution and subsequent fail of dns requests (this also could be attributed to ipstack not reporting unreachable destinations).

  • tproxy-config is always setting more specific (compared to 0.0.0.0/0) routing addresses even if no default route is presented in the network namespace. This causes mutual exclusion in routing tables for 0.0.0.0/1 and 128.0.0.0/1 address spaces with other proxy application (notably openvpn).

@blechschmidt blechschmidt self-assigned this Apr 3, 2024
@blechschmidt
Copy link
Member

Hi,

thanks a lot. This is a great feature and the PR is well-documented. It's rare to see PRs of such quality and I really appreciate the effort. I probably won't be able to finish the review today, but I will likely get this merged next weekend.

Regarding your notes:

  • tproxy-config is lacking the support not to set ipv6 routing. This confuses some user application if proxy doesn't support ipv6 connectivity, e.g. causes timeout in resolution and subsequent fail of dns requests (this also could be attributed to ipstack not reporting unreachable destinations).

This is totally true. There is at least one easily feasible approach to solve this:

  1. By adapting tproxy-config to take an argument indicating whether the default route for IPv6 should be created. Currently, the implementations for the different operating systems are not aligned but fixing this is not technically complicated.
  2. It would be better if IPv6 support could be detected automatically. But this is not easy. For SOCKS5 one could probably react upon Network unreachable response codes from the server, but I doubt this solution would be transferable to HTTP proxies. Probing a particular well-known address could also be considered, but this would in turn come with privacy drawbacks.
  • tproxy-config is always setting more specific (compared to 0.0.0.0/0) routing addresses even if no default route is presented in the network namespace. This causes mutual exclusion in routing tables for 0.0.0.0/1 and 128.0.0.0/1 address spaces with other proxy application (notably openvpn).

I will address this when merging your PR.

Thanks again.

PS: Depending on your use case, you may also be interested in pallium which solves this by chaining network namespaces with the first one making use of (https://github.com/rootless-containers/slirp4netns) or slirpnetstack. (It does not yet use tun2proxy though.)

@one-d-wide
Copy link
Contributor Author

Hi. Thanks for the suggestions, will definitely give them a try.

There are some issues with cross compilation, I will address them in a weekend.

By adapting tproxy-config to take an argument indicating whether the default route for IPv6 should be created. Currently, the implementations for the different operating systems are not aligned but fixing this is not technically complicated.

Good to hear. You may also consider moving tproxy_remove functionality to a TproxyState drop trait to guarantee it's run, since it also messes up with /etc/resolv.conf.

@blechschmidt
Copy link
Member

Good to hear. You may also consider moving tproxy_remove functionality to a TproxyState drop trait to guarantee it's run, since it also messes up with /etc/resolv.conf.

I am considering this, I have already seen the TODO in the code.

@one-d-wide one-d-wide force-pushed the namespaces branch 2 times, most recently from 90be936 to a1a708a Compare April 4, 2024 13:03
@one-d-wide
Copy link
Contributor Author

one-d-wide commented Apr 4, 2024

Fixed, checks are all green.

I just tested target/release/tun2proxy --unshare --setup --proxy none under the perf tool with iperf3 session running across namespace boundary. And like 82.34% of the cpu time is spent in memmove. Seems like a lot of room for zero-copy optimizations since the proxy doesn't really need to touch data that much. I wonder what if just use mtu-sized buffer with a sliding window in it to mark available data for each consecutive network layer. And then cycle those buffers around via an unbounded channel, instead of allocating and copying to a new one at each step.

Edit: Large memmove spike in flamegraph was mainly contributed by a really fat but unused internal metadata being passed around in memory in ipstack. It's optimization resulted in 300% increase in throughput in tun2proxy for the same testload.

FYI, there is a flamegraph (js is baked in)

flamegraph

@ssrlive
Copy link
Member

ssrlive commented Apr 7, 2024

Hi @one-d-wide , I have changed the code to make UDP work again. Please check your part to make it works correctly.
Thanks.

@one-d-wide
Copy link
Contributor Author

Hi.

The issue with socks5 associated udp connection was that primary tcp connection must be maintained for the whole period of the udp session, and when it closes socks5 server assumes udp connection is no longer in use.

I reverted your last commit since it is mostly obsolete. Should I remove it from the history entirely?

And could you elaborate about the change to the cli parameters there making admin_command an option (previously it was gathered from trailing fields). I don't see trailing fields used anywhere else in the cli of tun2proxy.

     /// Specify a command to run with root-like capabilities in the new namespace.
     /// This could be useful to start additional daemons, e.g. `openvpn` instance.
-    #[arg(requires = "unshare")]
+    #[arg(long, value_name = "command", requires = "unshare")]
     pub admin_command: Vec<OsString>,

For example that change would require turning the following line:

$ tun2proxy --unshare --setup --proxy none -- openvpn --config config.ovpn

To this:

$ tun2proxy --unshare --setup --proxy none --admin-command=openvpn --admin-command=--config --admin-command=config.ovpn

@ssrlive
Copy link
Member

ssrlive commented Apr 7, 2024

Since first call to get_udp_associate() will always is None, so there be minor changes.

@ssrlive
Copy link
Member

ssrlive commented Apr 7, 2024

OK, change back as your wish.

And could you elaborate about the change to the cli parameters there making admin_command an option (previously i

@ssrlive
Copy link
Member

ssrlive commented Apr 7, 2024

Since the merge will be Squash and merge , so any dirty code in PR is allowed.

@one-d-wide
Copy link
Contributor Author

one-d-wide commented Apr 7, 2024

Since first call to get_udp_associate() will always is None, so there be minor changes.

It isn't. In case proxy type is NoProxy and udp_associate argument was true, then the get_udp_associate() of NoProxyHandler will always return a destination address.
And since server_addr may not even be opened for tcp, the failed connection to it could falsely abort the udp connection.

@ssrlive
Copy link
Member

ssrlive commented Apr 7, 2024

Oops.

@ssrlive
Copy link
Member

ssrlive commented Apr 7, 2024

You change back it. Please.

@ssrlive
Copy link
Member

ssrlive commented Apr 7, 2024

I'm not familiar with namespaces, so please modify it according to your ideas.

@one-d-wide
Copy link
Contributor Author

one-d-wide commented Apr 7, 2024

Since the merge will be Squash and merge , so any dirty code in PR is allowed.

Why would you do so? This would destroy any logical changes segregation provided by commits, I contributed to some larger project and they were pretty insistent on keeping commit history clean.

You change back it. Please.

Ok, give me a moment.

@ssrlive
Copy link
Member

ssrlive commented Apr 7, 2024

Hi @blechschmidt , I have no more opinion here. Please review and test and then merge it.

@blechschmidt blechschmidt merged commit 0239a22 into tun2proxy:master Apr 7, 2024
3 checks passed
@blechschmidt
Copy link
Member

blechschmidt commented Apr 7, 2024

Thank you again for implementing this feature.

I have merged this implementation now using a merge commit, making it clear that all commits have been merged into master in one go.

Apart from a few cosmetic changes, I have mainly made the following adaptions:

  1. 40368dd makes use of /proc/self/exe. This is more portable (cf. current_exe() returns invalid path on linux when exe has been deleted rust-lang/rust#69343) and more secure than using current_exe(), cf. https://doc.rust-lang.org/std/env/fn.current_exe.html. I don't think it has any security implications in this case, but better safe than sorry.
  2. e8469f0 restricts the namespace-related arguments to Linux only and hides the fd argument, which should never be displayed to the end user.

Seems like a lot of room for zero-copy optimizations since the proxy doesn't really need to touch data that much. I wonder what if just use mtu-sized buffer with a sliding window in it to mark available data for each consecutive network layer. And then cycle those buffers around via an unbounded channel, instead of allocating and copying to a new one at each step.

There is certainly a lot of room for optimization. One just has to look at the push/pull implementation (push_data, consume_data) to know that the current implementation is not ideal in terms of performance. sendfile would also come to mind here. I myself currently don't have the resources for exploring and implementing these optimizations, but you are welcome to help by contributing.

blechschmidt added a commit that referenced this pull request Apr 9, 2024
blechschmidt added a commit that referenced this pull request Apr 9, 2024
blechschmidt added a commit that referenced this pull request Apr 13, 2024
@blechschmidt
Copy link
Member

  • tproxy-config is lacking the support not to set ipv6 routing. This confuses some user application if proxy doesn't support ipv6 connectivity, e.g. causes timeout in resolution and subsequent fail of dns requests (this also could be attributed to ipstack not reporting unreachable destinations).

This has been addressed in 09994d4 and v4.0.0 of the tproxy-config implementation. The --setup argument now also deletes the IPv6 default route if IPv6 is not enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants