-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incoming messages get incorrectly dispatched to broker #129
Comments
When using the backports kernel, the issues goes away.
However, requiring a 5.4 kernel seems a bit excessive... |
I also tried some awful hacks that only use one socket per broker. However, then I get "OSError: Netlink error: Device or resource busy (16)" when the second tunnel is created. Maybe the kernel does not support more than one l2tp tunnel per socket? Note that just processing messages with the right tunnel fixed the timeouts, but user traffic did not work -- likely because the l2tp traffic still arrived at the wrong socket and thus was never processed by the kernel. |
I'll make a guess and say that this commit could be the one fixing the bug here. It first appeared in 5.0, and unfortunately does not seem to have been backported. |
Yes. This is actually the relevant commit. So that'd bump the minimum Kernel requirement up to at least 5.2.17 which would exclude quite a number of popular distros running their default non-backported Kernels. |
Yeah, 5.2.17, 5.3.1, and then 5.4 and later. |
So, how do we proceed? I have to admit that I'd really like to get rid of this NAT stuff from our gateways.^^ And using Debian stable with backports kernels is not an entirely uncommon setup I hope. I see two options:
|
I think the second option is a good one, provided that we document that well and also make it easy for people to install one or the other version. (Which reminds me, we do not offer really any packages of Tunneldigger, no?) |
I agree. A secondary legacy branch should not be too hard to maintain and it allows us to move on and start developing new features on the lean codebase. |
All right, let's go for that then. I created a legacy branch. I also updated #126 to describe the new situation.
So what else should we do for that? We should probably call this out in the broker docs, describing the symptoms and linking to #126. We could also detect this problem similar to how we already detect the "packet arrives at the wrong 4-tuple socket", so that we can print a very targeted error message. However, doing that requires a loop over every tunnel for each packet that arrives at the broker. Do you think that will be a performance problem?
Of the broker? Not that I know of. |
I think README should be a good place, explaining that there are two versions and two branches.
How often? Could we throttle this somehow? In a way this message has to appear only once in logs to point you to the right direction. |
The issue is rather the case where the error has not happened yet (e.g. because we are on a good kernel). I am not worried about performance on broken kernels, stuff won't work anyway. However, we only have to do this for packets arriving at the broker socket, not packets arriving for any already established connection. |
I maintain Fedora packages here: https://copr.fedorainfracloud.org/coprs/heffer/tunneldigger/ |
Submitted PR: #135 |
We re-landed @kaechele's NAT removal, but things are still going very wrong: once I have more than just 1 or 2 clients (I am not sure if there is a fixed limit, but with 2 clients things still seemed to work fine), connections start to drop. The server thinks that the client timed out.
I added some extra logging, and from what I can see, incoming messages from the client often arrive at the (2-tuple) broker socket, not at the (4-tuple) tunnel socket. So despite what @kaechele said here, there does not seem to be a guarantee that messages with a matching 4-tuple socket do indeed get delivered to that socket.
This is on Debian buster:
The text was updated successfully, but these errors were encountered: