[Bug]: VirtualServer/VirtualServerRoute flapping status #7491

sass1997 · 2025-03-11T08:38:22Z

3.7.0

Kind

Deploy some VirtualServer and VirtualServerRoutes -> It really not depends what combination
Do a Rollout Restart of the NGINX Ingress Controller -> Deployed as 3 Replicas
Instead of Rollout Restart sometimes it also flaps the status if you just kill 1 of 3 pods
Following a rollout restart, we've observed an intermittent issue. While it doesn't occur every time, there's a high probability that some virtual servers and virtual server routes experience status fluctuations. These fluctuations typically manifest as either a change to a warning status or a complete loss of status information. This behavior is most commonly observed after the restart process, although it's not guaranteed to happen in every instance. As well there is no guarantee which of the virutal server or virtual server route status will change

Method to fix is:

Each VirtualServer and VirtualServerRoute which has this new "wrong" status needs to be deleted and reapplied. Sometimes I've seen that I don't need to delete all of them that all are updated again correctly. So e.x . after deleting 3 of 8 the remaining 5 also got updated. As well here I can't see any schema or pattern how to force this.

Important to say is the virtualserver and virtualserverroutes are working like they should even when the status is Warning or empty.

In my opinion this status is really flaky and the deleting and readding means each time a little downtime which doesn't make sense.

Is there maybe an endpoint where I can trigger the ingress to update the virtualserver status as it does when delete and readd the manifests.

github-actions · 2025-03-11T08:38:35Z

Hi @sass1997 thanks for reporting!

Be sure to check out the docs and the Contributing Guidelines while you wait for a human to take a look at this 🙂

Cheers!

vepatel · 2025-03-11T11:10:46Z

Hi @sass1997 the behaviour you're seeing is due to batch reloading implemented in NIC, see https://docs.nginx.com/nginx-ingress-controller/overview/design/#when-nginx-ingress-controller-reloads-nginx
the resources which aren't included in the first batch after restart shows the wrong status and once they're included in the subsequent batch (e.x . after deleting 3 of 8 the remaining 5 also got updated) the correct status is reinstated.

Important to say is the virtualserver and virtualserverroutes are working like they should even when the status is Warning or empty.

This is because the nginx config is still correct, hope this helps. You can minimise this by changing the nginx-reload-timeout. hope this helps

sass1997 added bug needs triage labels Mar 11, 2025

vepatel added waiting for response and removed needs triage labels Mar 11, 2025

Provide feedback