-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recent kernels enforce system-wide unique L2TPv3 session IDs #57
Comments
Such padding is used in get cookie (and usage?) protocol messages in order to ensure a specific message length. This is so that the protocol does not enable traffic amplification attacks (e.g. where sending a small (with possibly spoofed source) message would generate a larger message in return). At least in get cookie messages this was the reasoning. I did not implement the "usage" functionality, so I am not sure how it is used there, but it should be similar. |
I see, thanks. However, there also is a "minimum package length" thing going on in |
Yes, I am not sure what the reasoning is with CONTROL_TYPE_USAGE having 8 bytes of padding (which is not even checked), as the response is only two additional bytes. For CONTROL_TYPE_COOKIE the padding length is the same as the cookie length. This probably needs to be fixed. |
Actually, while the "clients don't retry a broken broker" may still be a good idea, it doesn't help here. The problem is old clients, and obviously patching this now won't help for them. So, I am still inclined to have a client indicate, in |
So we should probably add some kind of generic "supported features" field (e.g. a 32-bit bitmap field), which would indicate to the broker what kind of features are supported.
I don't like how this "usage" part was added to the protocol. It would be much better for the "usage" to be indicated as a feature in Thinking about it, going forward, we should consider updating the protocol (there is already an 8-bit version field provided for that in all messages) so that the messages are based on a sequence of TLVs instead of fixed fields. This would make extending the protocol much easier in the future. |
Right, ever since 88df1c5.
I don't disagree. But I wouldn't want to do this as part of fixing this problem.
What about the following: For now, we make it an 8-bit field. It is sent both in the COOKIE (after the 8-byte padding) and the PREPARE message (so the server does not need state). Absence of the field means all-0. The first bit ( I mean, we could use 4 bytes, but that seems wasteful? |
Yes I agree that we should fix this first.
Ok, sounds good. |
Aren't these bytes send only in the first message? This is not too much. Maybe making it future proof to have more "supported features" if needed might be a good thing. |
I don'r really care, so sure. Seems like I will have some fun with endianess then :/ |
I have a fix for this at https://github.com/freifunk-saar/tunneldigger/tree/wlanslovenija. The server side of this is currently already being tested in our network; for the client side I will hopefully soon have an experimental firmware but there are some other issues I want to look at. Also, it is based on #61, so I am waiting for that to get merged. |
Seems like links to github commit with fix to linux-stable is broken. Here's another reference to the patch https://patchwork.kernel.org/patch/9843481/ . |
When running tunneldigger on the current Debian stable kernel (4.9.51), only one client can connect. The second client fails because the l2tp tunnel interface does not appear. After fixing a bug in the netlink interface (#50), one can see that the kernel sends an EEXIST in reply to the session_create.
A lot of digging through the linux kernel sources uncovered the source of the issue: L2TPv3 session IDs have to be unique system-wide. Tunneldigger hard-codes a session ID of 1 for every connection. That used to work due to a bug in the kernel, which meant that the kernel failed to actually ensure uniqueness of the session ID. That bug got fixed by https://github.com/linux-stable/linux-stable/commit/dbdbc73b44782e22b3b4b6e8b51e7a3d245f3086, which was backported to a few stable series, in particular, to 4.9.36.
Proposed fix
Fixing this in a compatible way will require protocol changes: Both ends of the tunnel have to know each others session ID, so they have to negotiate whether they use 1 or something more unique. I started working on a fix at https://github.com/freifunk-saar/tunneldigger/tree/wlanslovenija. The approach is summarized in the commit message over there, copied here for reference:
I am running two of our four servers with this fix, so compatibility with old clients is tested already. However, due to #55, I can't say anything about long-time stability yet. I also couldn't yet test new clients as I am still fighting my firmware build system. (The client uses such an ancient version of libnl that I can't build it on the host.)
Open problems
As the last paragraph in the commit description says, there still is a potential problem: Once we upgrade one of our servers to a kernel including the problematic bugfix, only one old client will be able to talk to it at a time. There is nothing we can do about this, but what want to avoid is a client trying to connect to a new server and failing, while there are old servers (with higher usage) that could still support this client. I first tried to (ab)use
CONTROL_TYPE_USAGE
to let the client indicate whether it supports unique session IDs, so that the server could report "I am full" to old clients and steer them elsewhere. However, clients actually seem to send some rather arbitrary data alongside that message (UUUUUUUU
, to be precise -- wtf?!?), so I am worried that attaching meaningful bytes here will not work very well. We could introduce aCONTROL_TYPE_USAGE2
, but I think I have a better idea.Clients already have a retry loop to connect again if the connection to the broker failed. I think clients should remember which broker failed, and exclude that one in the next round. Only once all brokers got excluded that way, they will be enabled again. This will, I think, improve client behavior in general, not just for this particular issue. It will also solve this issue as (after #50), brokers will send an error to clients when the session ID is already used, making the client try some other server. So, as long as one of the available brokers still has an old kernel, old clients will reliably be able to connect. Furthermore, even if all servers are on a new kernel, there can still be N old clients connected at the same time (and hopefully, they will fetch an auto-update and then become new clients).
I started implementing this, but got stuck yesterday due to the aforementioned build system issues.
The text was updated successfully, but these errors were encountered: