Description
I deployed kubernetes + calico v3.30 in a cluster with a windows node using BGP. In the linux nodes, I can see the /var/run/calico/
directory being created automatically and filled with useful internal stuff for Calico.
In Windows, I don't see that directory being created automatically. However, confd complains that it can't find it.
Expected Behavior
Everything works well, no errors in the logs
Current Behavior
confd log keeps printing the following lines every 30s:
2025-06-25 08:59:13.306 [ERROR][3008] confd/status_file_watcher.go 118: Error adding directory to fsnotify. error=GetFileAttributes: The system cannot find the path specified.
2025-06-25 08:59:13.306 [INFO][3008] confd/status_file_watcher.go 185: Error initializing fsnotify. Falling back to polling. error=GetFileAttributes: The system cannot find the path specified.
2025-06-25 08:59:13.306 [ERROR][3008] confd/status_file_watcher.go 222: Error reading directory error=open \var\run\calico\endpoint-status: The system cannot find the path specified.
If we never fix it, confd ends up crashing with:
runtime: program exceeds 10000-thread limit
fatal error: thread exhaustion
Workaround
Manually create the directory: \var\run\calico\endpoint-status
. That stops the error and instead we get:
2025-06-25 10:21:34.508 [INFO][4112] confd/status_file_watcher.go 123: Started watching directory via fsnotify. dir="\\var\\run\\calico\\endpoint-status"
However, two caveats:
1 - That directory is always empty. Even after creating pods in the windows node
2 - There is a related felix config parameter: endpointStatusPathPrefix
, which by default is /var/run/calico
. If I manually create that directory, the confd keeps complaining, it needs the endpoint-status
directory too
Possible Solution
These three facts:
1 - the directory /var/run/calico/endpoint-status
is not automatically created in Windows nodes
2 - when we manually create /var/run/calico/endpoint-status
, it remains empty
3 - when deploying with vxlan encapsulation, no component is complaining (confd is not deployed in this scenario)
make me suspect that /var/run/calico/
is not really needed/supported in Windows. Therefore, this might be a bug in confd code, which should not require it to be present in windows.
Steps to Reproduce (for bugs)
1.Deploy calico in a kubernetes cluster with windows nodes using bgp
2.Check confd logs
3.
4.
Context
If not manually fixed, confd ends up crashing
Your Environment
- Calico version 3.30
- Calico dataplane (iptables, windows etc.)
- Orchestrator version (e.g. kubernetes, mesos, rkt): rke2 (kubernetes)
- Operating System and version: windows server 2022
- Link to your project (optional):