Skip to content

Weird issue with Cilium + KubeSpan + BPF Masquerade #11235

Open
@stevefan1999-personal

Description

@stevefan1999-personal

I am currently operating a Talos Linux-based Kubernetes cluster deployed across multiple public Virtual Private Servers (VPSes). The cluster architecture consists of nodes distributed across different network environments with varying connectivity requirements.

Network Architecture

The majority of my cluster nodes are hosted on VPSes with publicly accessible IP addresses, which allows me to leverage GENEVE (Generic Network Virtualization Encapsulation) tunneling for overlay networking between these nodes. GENEVE provides efficient encapsulation for the cluster's pod-to-pod communication across the public internet.

However, I have a specific requirement to extend my cluster to include nodes within my home network environment, which sits behind NAT and lacks direct public IP accessibility. For this purpose, I need to utilize KubeSpan, Talos's built-in WireGuard-based mesh networking solution, to establish secure connectivity between my home network nodes and the public VPS nodes.

Cilium CNI Configuration

For the cluster's Container Network Interface (CNI), I have deployed Cilium version 1.17.4 with an emphasis on maximizing eBPF (extended Berkeley Packet Filter) performance optimizations. My Cilium configuration includes:

  • Native eBPF datapath for high-performance packet processing
  • eBPF-based kube-proxy replacement for more efficient service load balancing
  • Direct routing mode where applicable for reduced overhead
  • BPF-based masquerading for SNAT operations

The Critical Issue: BPF Masquerade and KubeSpan Incompatibility

I have encountered a significant compatibility issue between Cilium's BPF Masquerade feature and KubeSpan networking. The symptoms of this issue are particularly perplexing:

  1. Address Resolution Works: I can successfully resolve and look up addresses through Cilium's internal mechanisms
  2. ICMP Connectivity Functions: Ping tests to the affected endpoints complete successfully, indicating basic L3 connectivity
  3. TCP Connections Fail: Despite working ICMP, I cannot establish any TCP socket connections to these endpoints

Troubleshooting and Impact Assessment

Through systematic debugging, I have narrowed down the issue to the transport layer (L4) level. The most critical impact of this problem manifests in:

  • Kubernetes API server communication failures: All requests to the Kubernetes API endpoints experience timeouts. This is not just the Service IP having the problem, but curling the underlying, direct public IP address to the endpoint also stuck and fails.
  • Complete cluster management breakdown: Unable to perform kubectl operations or any API-dependent tasks
  • Service mesh disruption: Inter-service TCP communications fail when traversing KubeSpan links

The only workaround I've discovered is to completely disable BPF Masquerade in the Cilium configuration, which allows traffic to flow normally but sacrifices the performance benefits of eBPF-based NAT. Worse, I need to use Egress Gateway later on, rendering me in an impasse/deadlock situation where I cannot progress.

Current Remediation Efforts

To address this issue, I am pursuing the following strategy:

  1. Clean Cilium State: Performing a complete purge of all Cilium-related state data, including:

    • eBPF maps and programs
    • Cilium etcd/CRD state
    • CNI configuration remnants
  2. Version Downgrade: Planning to downgrade Cilium to an earlier version (considering 1.16.x or 1.15.x series) to determine if this is a regression introduced in newer releases

  3. Configuration Validation: Reviewing all Cilium and KubeSpan configuration parameters to identify any potential conflicts or misconfigurations

This issue appears to be a complex interaction between Cilium's eBPF-based networking stack and Talos's WireGuard-based KubeSpan overlay, specifically affecting TCP state tracking or NAT translation when packets traverse both networking layers. In addition, bpf.hostLegacyRouting seems to work with KubeSpan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions