Skip to content

Confd problems in Windows #10596

Open
Open
@manuelbuil

Description

@manuelbuil

I deployed kubernetes + calico v3.30 in a cluster with a windows node using BGP. In the linux nodes, I can see the /var/run/calico/ directory being created automatically and filled with useful internal stuff for Calico.

In Windows, I don't see that directory being created automatically. However, confd complains that it can't find it.

Expected Behavior

Everything works well, no errors in the logs

Current Behavior

confd log keeps printing the following lines every 30s:

2025-06-25 08:59:13.306 [ERROR][3008] confd/status_file_watcher.go 118: Error adding directory to fsnotify. error=GetFileAttributes: The system cannot find the path specified.
2025-06-25 08:59:13.306 [INFO][3008] confd/status_file_watcher.go 185: Error initializing fsnotify. Falling back to polling. error=GetFileAttributes: The system cannot find the path specified.
2025-06-25 08:59:13.306 [ERROR][3008] confd/status_file_watcher.go 222: Error reading directory error=open \var\run\calico\endpoint-status: The system cannot find the path specified.

If we never fix it, confd ends up crashing with:

runtime: program exceeds 10000-thread limit
fatal error: thread exhaustion

Workaround

Manually create the directory: \var\run\calico\endpoint-status. That stops the error and instead we get:

2025-06-25 10:21:34.508 [INFO][4112] confd/status_file_watcher.go 123: Started watching directory via fsnotify. dir="\\var\\run\\calico\\endpoint-status"

However, two caveats:
1 - That directory is always empty. Even after creating pods in the windows node
2 - There is a related felix config parameter: endpointStatusPathPrefix, which by default is /var/run/calico. If I manually create that directory, the confd keeps complaining, it needs the endpoint-status directory too

Possible Solution

These three facts:
1 - the directory /var/run/calico/endpoint-status is not automatically created in Windows nodes
2 - when we manually create /var/run/calico/endpoint-status, it remains empty
3 - when deploying with vxlan encapsulation, no component is complaining (confd is not deployed in this scenario)

make me suspect that /var/run/calico/ is not really needed/supported in Windows. Therefore, this might be a bug in confd code, which should not require it to be present in windows.

Steps to Reproduce (for bugs)

1.Deploy calico in a kubernetes cluster with windows nodes using bgp
2.Check confd logs
3.
4.

Context

If not manually fixed, confd ends up crashing

Your Environment

  • Calico version 3.30
  • Calico dataplane (iptables, windows etc.)
  • Orchestrator version (e.g. kubernetes, mesos, rkt): rke2 (kubernetes)
  • Operating System and version: windows server 2022
  • Link to your project (optional):

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions