Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't establish direct connection to a Kubernetes pod despite having a direct connection to its node (Flannel CNI) #11427

Open
LiquidPL opened this issue Mar 15, 2024 · 9 comments

Comments

@LiquidPL
Copy link

LiquidPL commented Mar 15, 2024

What is the issue?

I have configured the Tailscale Kubernetes operator on my cluster, so that I could expose my services to the tailnet. It works, and I can get access, however I am unable to establish direct connection with the tailnet node that's responsible for my exposed K8s pod. The node running said pod however can establish a direct connection without any issues.

tailscale status output (with irrelevant nodes and public IP addresses omitted):

[liquid@liquid-desktop ~]$ tailscale status
# control plane node
100.64.193.4    control01            tagged-devices linux   active; direct <home connection ip>:14309, tx 652432 rx 6276240
# subnet router I use for direct access
100.116.6.41    homeassistant        <username>@ linux   active; offers exit node; direct <home connection ip>:51069, tx 1644892 rx 24845572
# exposed Traefik pod, connects over the waw relay despite it's k8s node having a direct connection
100.83.180.32   network-rp           tagged-devices linux   active; relay "waw", tx 430720 rx 10129968
# worker node
100.76.232.98   node01               tagged-devices linux   active; direct <home connection ip>:49077, tx 26756 rx 43212
100.127.193.3   tailscale-operator   tagged-devices linux   -

Steps to reproduce

  1. Install the Kubernetes operator on a new K3s cluster.
  2. Expose a pod to the tailnet.
  3. Connect to said pod over Tailscale.

Are there any recent changes that introduced the issue?

No response

OS

Linux

OS version

Latest Fedora CoreOS on the k8s nodes, Arch Linux on the desktop

Tailscale version

1.62.0 on both desktop and the k8s node

Other software

Standard K3s installation with (default) Flannel networking on the cluster

Bug report

BUG-451fb6ee26ceb7c75ed1267cd48d26d4a43de5259c2e6db1ff0cb06353ea0d8e-20240315170435Z-3c58dcacadf8a4b7

@irbekrm
Copy link
Contributor

irbekrm commented Mar 15, 2024

Hi, thanks for creating the issue.

A known issue why it is not possible to get direct connections on some Kubernetes configurations is when the CNI enforces source port randomization.
I am not very familiar with Flannel, but it appears that they added support for randomization a while ago flannel-io/flannel#1004

You could verify that it is on, by grepping for random-fully in your iptables rules on nodes - if there is a SNAT rule that would apply to traffic originating at Pods going out to the internet with `random-fully that would likely be a reason why tailscale running in that Pod cannot get direct connections.

We do need to document this behaviour and/or if it can be turned off for different CNIs.
It might be possible to turn this off for Flannel.

See also #3822

@LiquidPL
Copy link
Author

Yeah, I have found rules like that in my iptables:

root@control01:~# iptables-save | grep random-fully
-A FLANNEL-POSTRTG -s 10.42.0.0/16 ! -d 224.0.0.0/4 -m comment --comment "flanneld masq" -j MASQUERADE --random-fully
-A FLANNEL-POSTRTG ! -s 10.42.0.0/16 -d 10.42.0.0/16 -m comment --comment "flanneld masq" -j MASQUERADE --random-fully
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -j MASQUERADE --random-fully

I'll try removing the --random-fully from these rules and see what happens.

Additionally, I've looked into how Flannel builds its iptables rules, and it appears that it will automatically enable port randomization if it finds that iptables supports it. So I'm not sure if there is a way to easily disable it, and I'm also not sure if it's a good thing to completely disable it either (although I am not a network person so I dunno).

@irbekrm
Copy link
Contributor

irbekrm commented Mar 15, 2024

Additionally, I've looked into how Flannel builds its iptables rules, and it appears that it will automatically enable port randomization if it finds that iptables supports it. So I'm not sure if there is a way to easily disable it, and I'm also not sure if it's a good thing to completely disable it either (although I am not a network person so I dunno).

Thank you for taking a look- it does indeed seem like they have it hardcoded.

Is the client that you are connecting from also behind hard nat?

@LiquidPL
Copy link
Author

Yeah, most likely - it's a student dorm network, and while I have no idea how to properly quantify how hard its NAT is, I remember I've always had issues connecting to P2P games on a Nintendo Switch, so there's certainly some stuff going on there.

@irbekrm
Copy link
Contributor

irbekrm commented Mar 16, 2024

how to properly quantify how hard its NAT is

If you take a look at your client in Tailscale Machines panel, there is a field named varies, if that's set to true, you're behind what we consider hard nat.
However, looking at your debug logs it actually doesn't seem to be the case

@LiquidPL
Copy link
Author

I have tried manually editing the relevant iptables rules but either I'm doing something wrong and these are not applied, or flannel just keeps reapplying them, because I am still unable to get a direct connection. I guess I will have to manually patch flannel itself and try again.

Though I suppose this should be reported upstream as well.

@LiquidPL
Copy link
Author

LiquidPL commented Mar 26, 2024

Small update: I have switched to Calico for networking on my cluster, since it has a way to disable source port randomization, by configuring Felix (Calico's node agent) to override iptables feature detection and make it think the randomization is not supported.

The relevant bits of documentation can be found here: https://docs.tigera.io/calico/latest/reference/felix/configuration. The relevant config key is referenced by some variation of featureDetectOverride, based on the method it's configured (env vars, config file, the Calico k8s operator, etc.), and the specific feature flag is MASQFullyRandom.

With this set, I can now directly connect to my pods, and iptables only reports a single rule with --random-fully, presumably added by Kubernetes itself:

root@control01:~# iptables-save | grep random-fully
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -j MASQUERADE --random-fully

I'm guessing if someone removed the rules created by Flannel while also preventing it from adding them back, it would have the same effect.

Should I close this issue now? My issue is fixed but I suppose it could also track any changes to the docs.

@irbekrm
Copy link
Contributor

irbekrm commented Mar 27, 2024

Hi @LiquidPL thank you very much for the confirmation that you now get direct connections with Calico.

Should I close this issue now? My issue is fixed but I suppose it could also track any changes to the docs.

I think we could leave this open- as you say it would be great to update docs and I would also like to reach out to Flannel folks and see if they might be willing to make the port randomization for external connections configurable.

@irbekrm irbekrm changed the title Can't establish direct connection to a Kubernetes pod despite having a direct connection to its node Can't establish direct connection to a Kubernetes pod despite having a direct connection to its node (Flannel CNI) Mar 27, 2024
@kevinvalk
Copy link

Hi @LiquidPL thank you very much for the confirmation that you now get direct connections with Calico.

Should I close this issue now? My issue is fixed but I suppose it could also track any changes to the docs.

I think we could leave this open- as you say it would be great to update docs and I would also like to reach out to Flannel folks and see if they might be willing to make the port randomization for external connections configurable.

Did you reach out to Flannel folks about this? I am definitely interested as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants