Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get it working with IPv6 #75

Open
Adorfer opened this issue Dec 14, 2017 · 24 comments

Comments

@Adorfer
Copy link

@Adorfer Adorfer commented Dec 14, 2017

It would be really great to be able to use native IPv6 on the WAN side, since more and more broadband-consumers get "IPv6 + DSlite".
Meaning that IPv4 is only "via CarrierGradeNat/CGN", where we have to observe very often strange effects like dying connections and/or packet loss. While IPv6 is working fine.

As far as i understand, it's not a simple job like for IPv4, since the packet header sizes are not that static in V6, but it would be really helpful, even it would end up as a total rewrite.

@mitar

This comment has been minimized.

Copy link
Member

@mitar mitar commented Dec 14, 2017

This issue was also described here: https://dev.wlan-si.net/ticket/1187

For few years now we had it for Google Summer of Code. But it is pretty tricky to get it right. I think it will require a kernel modification.

The issue is that we are currently using NAT to get all connections to the same port. You probably do not want to use this with IPv6, but would want that Linux kernel finally identifies connections based on 4-tuple of src port, src ip, dst ip, dst port. Implementing this would even make IPv4 implementation simpler.

@Adorfer

This comment has been minimized.

Copy link
Author

@Adorfer Adorfer commented Dec 14, 2017

I am not familiar with the status/outcome of the GSOC2017.

But if i assume that it is not "as simple as expected":
Perhaps aworking "less aesthetic", but feasable approach would be sufficient, at least to improve the current situation.

@mitar

This comment has been minimized.

Copy link
Member

@mitar mitar commented Dec 14, 2017

There was no working outcome. Students didn't manage to do it.

You can propose what that approach would be?

@Adorfer

This comment has been minimized.

Copy link
Author

@Adorfer Adorfer commented Dec 15, 2017

the most nasty approach would be: Setup locally on the client a 4to6 and on the server a 6to4 gateway, linked static and route explicitly through this routing table for the tunneldigger/L2TP.
(yes, that would be definitly horrible)

@kaechele

This comment has been minimized.

Copy link
Contributor

@kaechele kaechele commented Dec 17, 2017

What we could try is porting tunneldigger to (ab)using vxlan instead of L2TPv3 for tunneling.
vxlan in the Kernel does work with multiple Tunnels using the same src and dst port and separates traffic using the tunnel id (so exactly the behaviour we want).
We currently use vxlan to cross-connect our gateways in one layer2 environment.
Unfortunately, IPv6 support in the vxlan module in-kernel is rather recent (aka there are still patches flowing in in 4.15). But it is already happening, which is more than what is going on with regards to having L2TPv3 changed in our favour in the kernel.
Apparently the implementation as of 3.12 should be useable for our use-case though (using unicast links to create the tunnel).

@NeoRaider

This comment has been minimized.

Copy link

@NeoRaider NeoRaider commented Feb 13, 2018

@kaechele VXLAN does not work over NAT (and stateful firewalls) at all, as it uses random source ports (based on flow hashs), so packets in opposite directions aren't associated with each other. Generally, I consider VXLAN unfit for use over WAN networks.

I don't really know enough about tunneldigger to make an informed suggestion, but maybe someone can answer my question: Why does tunneldigger use a fixed port for the tunnel at all, thus requiring the NAT hack? Wouldn't it be much simpler and saner to use a distinct port for each tunnel like L2TP is meant to be used (only using a fixed port for the broker connection to coordinate the setup of the L2TP tunnel)?

It would also make connections more robust over bad NAT routers (In fastd development, we've observed the following two issues: 1) Broken NAT routers not allowing UDP flows from the same host/port to multiple peers. 2) Broken NAT routers not allowing packets to flow at all or only with tiny MTU for previously known UDP flows after the uplink connection was reestablished. Both issues have been mitigated by using a separate UDP socket for each peer, and creating new sockets with new ports whenever the connection is lost).

@mitar

This comment has been minimized.

Copy link
Member

@mitar mitar commented Feb 14, 2018

Just to be clear. L2TP is meant to be used over same port, this is why every package has a session ID even. (And even if it there would be no session ID, you could also have unique tunnels because not two tunnels would share a tuple of (client IP, client port, server IP, server port). But that would break abstraction between tunneling and transport.) The Linux implementation has this crazy limitation. Not sure why.

Why does tunneldigger use a fixed port for the tunnel at all

So that it is easy to open necessary ports on firewalls. Also in our network, we use port 53 (DNS) which is generally always open. Other ports tend to be closed much more. UDP is not connection-based, so you need ports to be open in both directions. Many cheap home routers do not use support UDP based stateful firewalling.

Wouldn't it be much simpler and saner to use a distinct port for each tunnel like L2TP is meant to be used

We could add this to Tunneldigger, it can probably be a simple configuration switch. But do not complain about closed ports and connections not being established. Maybe on IPv6 there will be less of that. I have not seen so much port closing on IPv6. Maybe because it is even cheap to have one IP per tunnel. So instead of NAT, on IPv6, we could have one IPv6 address per tunnel, with same port on all IPs.

It would also make connections more robust over bad NAT routers

Just to be clear. NAT happens on the server, not on the client. It has nothing with client.

Broken NAT routers not allowing UDP flows from the same host/port to multiple peers.

Your client host/port can be anything, it does not have to be fixed. It probably should not be fixed.

And hopefully you are not running a VPN server behind (broken) NAT?

Broken NAT routers not allowing packets to flow at all or only with tiny MTU for previously known UDP flows after the uplink connection was reestablished.

What do you mean by "connection"? UDP flows do not have connections?

BTW, Freifunk was accepted again to GSoC so this issue is again open this year to be resolved. If anyone of you is student, I would encourage you to apply and get payed to fix this issue. :-)

@RalfJung

This comment has been minimized.

Copy link
Member

@RalfJung RalfJung commented Feb 14, 2018

FWIW, we tell users that they need to allow communication on a certain set of UDP ports (10000-10010) to enable Freifunk. This is relevant e.g. when people run Freifunk routers inside a "guest network" like some modern home routers provide it, that has very restricted internet access. We also have some Freifunk routers in a network that heavily traffic shapes UDP on all ports except 53.

So, we do rely on all tunneldigger+l2tp communication going over a single UDP port on the server side. This was also the case when we used fastd, so @NeoRaider are you describing changes to fastd to no longer just have a single server-side port?

@rotanid

This comment has been minimized.

Copy link

@rotanid rotanid commented Feb 14, 2018

@RalfJung the discussion was/is mainly about outgoing ports, not server side incoming ports, as far as i understand.

@RalfJung

This comment has been minimized.

Copy link
Member

@RalfJung RalfJung commented Feb 14, 2018

Now that you say this, I realize that I assumed that on the server side, the outgoing port is the same as the incoming port. However, I assume for the guest network firewalls, incoming packages also have to come from whitelisted port, so the outgoing serverside port would also have to be fixed.

@NeoRaider

This comment has been minimized.

Copy link

@NeoRaider NeoRaider commented Feb 14, 2018

@mitar

Just to be clear. L2TP is meant to be used over same port, this is why every package has a session ID even. (And even if it there would be no session ID, you could also have unique tunnels because not two tunnels would share a tuple of (client IP, client port, server IP, server port). But that would break abstraction between tunneling and transport.) The Linux implementation has this crazy limitation. Not sure why.

Thanks for the explanation. I'd say, let's fix the kernel, so eventually no workarounds are necessary for this limitation anymore.

What do you mean by "connection"? UDP flows do not have connections?

Maybe the term "session" would be more appropriate. What a fastd handshake establishes on top of an UDP flow. Not in any way relevant to the tunneldigger discussion though...

@RalfJung

Now that you say this, I realize that I assumed that on the server side, the outgoing port is the same as the incoming port. However, I assume for the guest network firewalls, incoming packages also have to come from whitelisted port, so the outgoing serverside port would also have to be fixed.

Random source ports are the usual case with most software. Firewalls are stateful, so as long as the outgoing port to the server is open, the opposite direction will just work (no need to change anything here). If your firewall requires a fixed source port, replace the firewall, IMO it is really not worth the effort to deal with such systems.

@RalfJung

This comment has been minimized.

Copy link
Member

@RalfJung RalfJung commented Feb 14, 2018

Random source ports are the usual case with most software. Firewalls are stateful, so as long as the outgoing port to the server is open, the opposite direction will just work (no need to change anything here). If your firewall requires a fixed source port, replace the firewall, IMO it is really not worth the effort to deal with such systems.

I am confused, I feel we are not talking about the same thing. I said "serverside sourceport"; of course the clientside sourceport is random -- but the sourceport of the server->client packets will be the same as the destport of the client->server packets (and hence be fixed, e.g. 10000 in our case), right? Even UDP is usually used bidirectionally? I don't even see a good reason for doing anything else.

@NeoRaider

This comment has been minimized.

Copy link

@NeoRaider NeoRaider commented Feb 14, 2018

@RalfJung After understanding that using different ports for different tunnels is a limitation of the Linux kernel and not of the L2TP standard, I don't think different server ports should be used. (fastd doesn't work any different, and no changes are planned; I had only suggested such a change for tunneldigger because of my insufficient knowledge of L2TP)

@mitar

This comment has been minimized.

Copy link
Member

@mitar mitar commented Feb 14, 2018

Here is a relevant standard section: https://tools.ietf.org/html/rfc3931#section-4.1.2.2

@NeoRaider Just to be clear. This is my understanding of the protocol and standard. I might be mistaken and that there exist some reasonable way why it is done like this in Linux. But to my understanding this is unnecessary limitation and standard has a session ID one could use to differentiate packets coming over the same port.

About discussion above about ports. To my understanding, the main improvement on prevention of blocking is that traffic going from a client (a device in home network of a user behind a cheap router) going to a server has to have a common and fixed destination port (like 53 UDP port). Source port (port used by a client) can be random. Even more, for secure DNS queries it has to be random. This is one of main mitigation techniques used to increase entropy in DNS queries. I cannot believe that even a cheap router would prevent random source ports for DNS queries.

But destination port (port on the server) should be fixed. So that traffic looks like DNS. This is what NAT is doing on the server (an ugly workaround). NAT is using the above mentioned tuple to know how to map ports. It does not use session ID. But Linux implementation could just be using session ID.

But for IPv6 we might not need this restriction. Server could have /64 of IPv6 addresses and each tunnel could go to a different address, same fixed 53 UDP port. So maybe this is a workaround. But it would be still ugly. This means that server would have to listen on /64 IP addresses. I am not even sure this would be performant.

@RalfJung

This comment has been minimized.

Copy link
Member

@RalfJung RalfJung commented Feb 15, 2018

Server could have /64 of IPv6 addresses and each tunnel could go to a different address, same fixed 53 UDP port.

I'm pretty sure not all the servers we run and that have IPv6, have more than a single (/128) address assigned to them. If you rent a VPS or similar systems, you don't typically get an entire subnet.

@mitar

This comment has been minimized.

Copy link
Member

@mitar mitar commented Feb 15, 2018

You don't? I found it very easy and cheap (free) to ask them to give you more. They currently do that easily. See here. So DigitalOcean allocates to you a range, but it does not configure it to you by default.

@RalfJung

This comment has been minimized.

Copy link
Member

@RalfJung RalfJung commented Feb 15, 2018

To be fair, we never asked.

@kaechele

This comment has been minimized.

Copy link
Contributor

@kaechele kaechele commented Oct 13, 2018

This is my understanding of the protocol and standard. I might be mistaken and that there exist some reasonable way why it is done like this in Linux. But to my understanding this is unnecessary limitation and standard has a session ID one could use to differentiate packets coming over the same port.

I don't know if I'm totally misunderstanding how the kernel and sockets work here but it seems that the kernel does differentiate based on tunnel and session id (see l2tp_core.c at 868-888) when reading packets.
My understanding here (and I could be totally wrong) is, that the issue is with the creation of the socket. If we just created one socket for all tunnels and reference it's fd in all tunnels we set up from there on the kernel code will either push any data into the correct skb for each tunnel and session or pass any unhandled data for userspace to handle if desired (L 880-884).
The limitation that you are discussing is when using unmanaged tunnels, because the kernel driver opens a new and fresh (and therefor unique) socket for each new tunnel, because it doesn't have a way of tracking and reusing existing sockets.

@kaechele

This comment has been minimized.

Copy link
Contributor

@kaechele kaechele commented Nov 4, 2018

Okay. Did some more analysis.
I, too, see now that the limitation is in the kernel L2TPv3 implementation.
The problem becomes apparent when you read the documentation:

The driver keeps a struct l2tp_tunnel context per L2TP tunnel and a
struct l2tp_session context for each session. The l2tp_tunnel is
always associated with a UDP or L2TP/IP socket and keeps a list of
sessions in the tunnel.

The problem here is that in the current code the socket is used to infer context as to which tunnel the data messages belong to and then goes on to infer session context (the actual relevant context) from the session ID read from the data messages.
In L2TPv3 there is no tunnel ID, everything is either just a control message or a data message. So the only way the current code can infer to which tunnel traffic belongs is by looking at the socket from which it came. In the code you can see that the tunnel is part of the socket and vice versa. The code uses this fact to pull up the tunnel context when it only has the socket to infer from.
And because of that currently each tunnel uses it's own socket, which in turn means that each tunnel needs it's own port.

Technically the driver is already set up to enforce uniqueness of session IDs not only for L2TPv3 but also for L2TPv2 which doesn't specifically require this.
This implies that the session ID could be used to infer context. In fact RFC3931 states exactly that:

The Session ID alone provides the necessary context for all further packet processing, including the presence, size, and value of the Cookie, the type of L2-Specific Sublayer, and the type of payload being tunneled.

Technically the driver does that already: https://github.com/torvalds/linux/blob/master/net/l2tp/l2tp_core.c#L874

But the l2tp_tunnel struct is still used as a kind of glue between the kernel and all sessions. Specifically for all the cleanup that takes place when a tunnel gets torn down.

So from my understanding the kernel driver needs to be modified to basically just look at the data messages and infer context from the session ID alone. Control messages are directly passed to userspace anyway, so there's not really a point of dealing with them in kernelspace anyway.
In l2tp_ip it seems to have been implemented that way already and maybe could be used as an inspiration for this.

I have also locally built a test version of the broker based on socketserver and threading (rather than raw sockets and the epoll event loop). But it's super ugly and was just for my personal testing to see if I could easily get at least the broker protocol to work on IPv6 (the answer is yes, but only to then hit the kernel brick wall). It changes so much code (since I didn't really know what's going on with this whole epoll stuff) that it's really more of a rewrite than an evolution, so I'm somewhat unwilling to share. But boy. The code is a lot lighter without that NAT foo baked in.

@MPW1412

This comment has been minimized.

Copy link

@MPW1412 MPW1412 commented Nov 4, 2018

It would be awesome, if you shared your code. As R. C. Martin says in ”Clean Code“: The way to clean code leads over bad code ;)

@kaechele

This comment has been minimized.

Copy link
Contributor

@kaechele kaechele commented Apr 21, 2019

I verified my understanding of the situation with James Chapman, the original author of the Kernel L2TP implementation. He was so kind to take the time and provide me some insight.
We are indeed hitting a limitation in the implementation, not in the actual specification. Apparently the current design of the Kernel implementation was chosen for efficiency reasons when supporting both L2TPv3 and L2TPv2.
His suggestion was to try the L2TPv3 over IP implementation, which I understand this project decidedly doesn't use because it doesn't work as well with NAT and Routing.
So there you have it: We need to fix the Kernel to get IPv6 support in Tunneldigger in its current form.

@mitar

This comment has been minimized.

Copy link
Member

@mitar mitar commented Apr 22, 2019

Yes, this was also my understanding. We have to fix the kernel to get this to work properly without NAT hack.

But given that mostly we run this on devices where we do control kernel, I think this could be fixed with a patch even if it does not get into upstream. Moreover, Batman people I think have quite some experience getting things into upstream kernel so we could work with them. Or at least get this as a patch into OpenWrt kernels.

@kaechele

This comment has been minimized.

Copy link
Contributor

@kaechele kaechele commented Apr 22, 2019

Let me break out what needs to be done in order to get this done:

  • Fixing the kernel L2TP implementation (this is the hardest part)
    • Change the way the implementation refers back to the tunnel from the UDP recv context. That basically means changing the mechanism the kernel implementation uses to keep track of tunnels. I have some ideas here but I'm still learning to code in C so it seems to be a little much for my capabilities right now.
    • Just grep linux/net/l2tp for sk_user_data or l2tp_tunnel\(. These are all the (few) instances where the current implementation reaches for the tunnel struct through the socket. This works right now because socket <=> tunnel currently is a 1:1 relationship. But with our required changes it's going to be 1:n. So we can no longer reach the individual tunnnel through the socket struct. From my analysis this is the culprit.
    • Anything of interest to us using the UDP implementation seems to be happening exclusively in linux/net/l2tp/l2tp_core.c but the other parts will need adjustments as well when we change the behaviour.
  • Making the broker v6 capable
    • Remove all parts of the NAT Hack (attempted this here: kaechele@6280182, seems to work and makes the code so much lighter ❤️)
    • Make the broker listen on v6 Sockets (attempted this in an extremely dirty way here: kaechele@6d26913, seems to work, removed the custom event loop in favor of Python's ThreadedServer, also converted everything to Python3 compatibility in that one commit, Note: This probably breaks stuff and creates whacky timing issues or equally obscure issues I didn't account for)
  • Making the client v6 capable
    • Make the client use v6 sockets (this is also covered by above mentioned dirty hack)
    • Make the client handle v4 and v6 responses accordingly, cycling through them to ensure we get a working connection. Since this is UDP we'd have to rely on timeouts to figure out whether a connection is working or not.
@mitar

This comment has been minimized.

Copy link
Member

@mitar mitar commented Apr 22, 2019

Yes, this fix in the kernel would then also remove need for NAT for IPv4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.