Description
What happened:
I've discovered that from when ingress-nginx gets a SIGTERM
to when it stops proxying traffic it does not update any backends (upstream IPs) even if they change in kubernetes.
This causes errors such as
[error] 28#28: *144 upstream timed out (110: Operation timed out) while connecting to upstream
because ingress-nginx can, in some cases, route to a pod IP which no longer is in use.
This is an issue if a connection to ingress-nginx is kept open to send multiple HTTP requests. The connection stays connected to the ingress-nginx pod even after termination starts. If an upstream pod is terminated before the ingress-nginx pod no longer processes requests the upstream IP it has will be incorrect as it hasn't updated upstream (backend) IPs since it started terminating.
This is more noticeable if --shutdown-grace-period
is used.
What you expected to happen:
ingress-nginx should update backends until it stops proxying traffic.
I've discovered a fix to the problem and will have a PR. Here are details:
The root cause of the issue is a bug in how ingress-nginx handles graceful shutdown. When an ingress-nginx pod is being terminated there are several things that happen:
- The controller gets the
SIGTERM
and callsStop()
on theNGINXController
struct. src - Inside of
Stop()
theisShuttingDown
bool is set totrue
. src - Inside of
Stop()
the controller sleeps for whatever was passed to--shutdown-grace-period
. 350 seconds in my case. src - Next
Stop()
sends aSIGQUIT
to the NGINX process and waits for it to stop before returning. src - ingress-nginx waits for one last time via
--post-shutdown-grace-period
and then exits.
At a cursory glance that does not present the problem, but there’s one important note: as soon as isShuttingDown
in step 2 is set to true
the goroutine which is processing Kubernetes events to update backends stops processing events. src
NGINX Ingress controller version (exec into the pod and run /nginx-ingress-controller --version
):
I'm testing in kind
using the latest commit from main
. The commit is 8f2593b
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: 1.0.0-dev
Build: git-8f2593bb8
Repository: https://github.com/kubernetes/ingress-nginx.git
nginx version: nginx/1.27.1
-------------------------------------------------------------------------------
Kubernetes version (use kubectl version
):
Client Version: v1.32.2
Kustomize Version: v5.5.0
Server Version: v1.32.3
Environment: kind
via make dev-env
- Cloud provider or hardware configuration:
kind
viamake dev-env
- OS (e.g. from /etc/os-release):
kind
viamake dev-env
- Kernel (e.g.
uname -a
):kind
viamake dev-env
- Install tools:
kind
viamake dev-env
Please mention how/where was the cluster created like kubeadm/kops/minikube/kind etc.
- Basic cluster related info:
kubectl version
see "Kubernetes version" sectionkubectl get nodes -o wide
:
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ingress-nginx-dev-control-plane Ready control-plane 7h25m v1.32.3 172.18.0.2 <none> Debian GNU/Linux 12 (bookworm) 6.10.14-linuxkit containerd://2.0.3
- How was the ingress-nginx-controller installed:
make dev-env
which runshelm template
. I used the following setup:
controller:
extraArgs:
shutdown-grace-period: 350
image:
repository: ${REGISTRY}/controller
tag: ${TAG}
digest:
config:
worker-processes: "1"
podLabels:
deploy-date: "$(date +%s)"
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
hostPort:
enabled: true
terminationGracePeriodSeconds: 360
service:
type: NodePort
- Current State of the controller:
kubectl describe ingressclasses
Name: nginx
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=ingress-nginx
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
app.kubernetes.io/version=1.12.1
helm.sh/chart=ingress-nginx-4.12.1
Annotations: <none>
Controller: k8s.io/ingress-nginx
Events: <none>
kubectl -n <ingresscontrollernamespace> get all -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
httpd pod/httpd-777868ddb6-m4gfb 1/1 Running 0 42m 10.244.0.12 ingress-nginx-dev-control-plane <none> <none>
ingress-nginx pod/ingress-nginx-admission-create-xrhg8 0/1 Completed 0 7h2m 10.244.0.4 ingress-nginx-dev-control-plane <none> <none>
ingress-nginx pod/ingress-nginx-admission-patch-d9ktn 0/1 Completed 1 7h2m 10.244.0.6 ingress-nginx-dev-control-plane <none> <none>
ingress-nginx pod/ingress-nginx-controller-cd664468-x5mk9 1/1 Running 0 37m 10.244.0.13 ingress-nginx-dev-control-plane <none> <none>
kube-system pod/coredns-668d6bf9bc-pxkbw 1/1 Running 0 7h2m 10.244.0.5 ingress-nginx-dev-control-plane <none> <none>
kube-system pod/coredns-668d6bf9bc-tjc6c 1/1 Running 0 7h2m 10.244.0.2 ingress-nginx-dev-control-plane <none> <none>
kube-system pod/etcd-ingress-nginx-dev-control-plane 1/1 Running 0 7h2m 172.18.0.2 ingress-nginx-dev-control-plane <none> <none>
kube-system pod/kindnet-86z7x 1/1 Running 0 7h2m 172.18.0.2 ingress-nginx-dev-control-plane <none> <none>
kube-system pod/kube-apiserver-ingress-nginx-dev-control-plane 1/1 Running 0 7h2m 172.18.0.2 ingress-nginx-dev-control-plane <none> <none>
kube-system pod/kube-controller-manager-ingress-nginx-dev-control-plane 1/1 Running 0 7h2m 172.18.0.2 ingress-nginx-dev-control-plane <none> <none>
kube-system pod/kube-proxy-97sjv 1/1 Running 0 7h2m 172.18.0.2 ingress-nginx-dev-control-plane <none> <none>
kube-system pod/kube-scheduler-ingress-nginx-dev-control-plane 1/1 Running 0 7h2m 172.18.0.2 ingress-nginx-dev-control-plane <none> <none>
local-path-storage pod/local-path-provisioner-7dc846544d-z7nb8 1/1 Running 0 7h2m 10.244.0.3 ingress-nginx-dev-control-plane <none> <none>
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 7h2m <none>
httpd service/httpd ClusterIP 10.96.198.184 <none> 80/TCP 6h54m app=httpd
ingress-nginx service/ingress-nginx-controller NodePort 10.96.120.145 <none> 80:31510/TCP,443:30212/TCP 7h2m app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
ingress-nginx service/ingress-nginx-controller-admission ClusterIP 10.96.240.122 <none> 443/TCP 7h2m app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 7h2m k8s-app=kube-dns
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
kube-system daemonset.apps/kindnet 1 1 1 1 1 kubernetes.io/os=linux 7h2m kindnet-cni docker.io/kindest/kindnetd:v20250214-acbabc1a app=kindnet
kube-system daemonset.apps/kube-proxy 1 1 1 1 1 kubernetes.io/os=linux 7h2m kube-proxy registry.k8s.io/kube-proxy:v1.32.3 k8s-app=kube-proxy
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
httpd deployment.apps/httpd 1/1 1 1 6h54m httpd httpd:alpine app=httpd
ingress-nginx deployment.apps/ingress-nginx-controller 1/1 1 1 7h2m controller us-central1-docker.pkg.dev/k8s-staging-images/ingress-nginx/controller:1.0.0-dev app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
kube-system deployment.apps/coredns 2/2 2 2 7h2m coredns registry.k8s.io/coredns/coredns:v1.11.3 k8s-app=kube-dns
local-path-storage deployment.apps/local-path-provisioner 1/1 1 1 7h2m local-path-provisioner docker.io/kindest/local-path-provisioner:v20250214-acbabc1a app=local-path-provisioner
NAMESPACE NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
httpd replicaset.apps/httpd-777868ddb6 1 1 1 42m httpd httpd:alpine app=httpd,pod-template-hash=777868ddb6
httpd replicaset.apps/httpd-798d447958 0 0 0 6h54m httpd httpd:alpine app=httpd,pod-template-hash=798d447958
ingress-nginx replicaset.apps/ingress-nginx-controller-69df88cb89 0 0 0 7h2m controller us-central1-docker.pkg.dev/k8s-staging-images/ingress-nginx/controller:1.0.0-dev app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx,pod-template-hash=69df88cb89
ingress-nginx replicaset.apps/ingress-nginx-controller-778bbb8bf5 0 0 0 46m controller us-central1-docker.pkg.dev/k8s-staging-images/ingress-nginx/controller:1.0.0-dev app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx,pod-template-hash=778bbb8bf5
ingress-nginx replicaset.apps/ingress-nginx-controller-867f4dc7b8 0 0 0 6h50m controller us-central1-docker.pkg.dev/k8s-staging-images/ingress-nginx/controller:1.0.0-dev app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx,pod-template-hash=867f4dc7b8
ingress-nginx replicaset.apps/ingress-nginx-controller-cd664468 1 1 1 44m controller us-central1-docker.pkg.dev/k8s-staging-images/ingress-nginx/controller:1.0.0-dev app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx,pod-template-hash=cd664468
kube-system replicaset.apps/coredns-668d6bf9bc 2 2 2 7h2m coredns registry.k8s.io/coredns/coredns:v1.11.3 k8s-app=kube-dns,pod-template-hash=668d6bf9bc
local-path-storage replicaset.apps/local-path-provisioner-7dc846544d 1 1 1 7h2m local-path-provisioner docker.io/kindest/local-path-provisioner:v20250214-acbabc1a app=local-path-provisioner,pod-template-hash=7dc846544d
NAMESPACE NAME STATUS COMPLETIONS DURATION AGE CONTAINERS IMAGES SELECTOR
ingress-nginx job.batch/ingress-nginx-admission-create Complete 1/1 23s 7h2m create registry.k8s.io/ingress-nginx/kube-webhook-certgen:v1.5.2@sha256:e8825994b7a2c7497375a9b945f386506ca6a3eda80b89b74ef2db743f66a5ea batch.kubernetes.io/controller-uid=ef0717de-b89d-4a52-ad10-24d10bedb23e
ingress-nginx job.batch/ingress-nginx-admission-patch Complete 1/1 24s 7h2m patch registry.k8s.io/ingress-nginx/kube-webhook-certgen:v1.5.2@sha256:e8825994b7a2c7497375a9b945f386506ca6a3eda80b89b74ef2db743f66a5ea batch.kubernetes.io/controller-uid=e57c790d-19a9-43eb-a9c0-3f603e893660
kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>
Name: ingress-nginx-controller-cd664468-x5mk9
Namespace: ingress-nginx
Priority: 0
Service Account: ingress-nginx
Node: ingress-nginx-dev-control-plane/172.18.0.2
Start Time: Mon, 14 Apr 2025 14:44:25 -0600
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=ingress-nginx
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
app.kubernetes.io/version=1.12.1
deploy-date=1744663026
helm.sh/chart=ingress-nginx-4.12.1
pod-template-hash=cd664468
Annotations: kubectl.kubernetes.io/restartedAt: 2025-04-14T14:35:13-06:00
Status: Running
IP: 10.244.0.13
IPs:
IP: 10.244.0.13
Controlled By: ReplicaSet/ingress-nginx-controller-cd664468
Containers:
controller:
Container ID: containerd://6d5572d8452e0054fddc759096c727017a937974423e42798e19f502b5be38fc
Image: us-central1-docker.pkg.dev/k8s-staging-images/ingress-nginx/controller:1.0.0-dev
Image ID: sha256:9c6eeb4f00fee2013f0aea8cfcefae86094763728f1a2d0b2555fc319aea2183
Ports: 80/TCP, 443/TCP, 8443/TCP
Host Ports: 80/TCP, 443/TCP, 0/TCP
SeccompProfile: RuntimeDefault
Args:
/nginx-ingress-controller
--publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
--election-id=ingress-nginx-leader
--controller-class=k8s.io/ingress-nginx
--ingress-class=nginx
--configmap=$(POD_NAMESPACE)/ingress-nginx-controller
--validating-webhook=:8443
--validating-webhook-certificate=/usr/local/certificates/cert
--validating-webhook-key=/usr/local/certificates/key
--shutdown-grace-period=350
State: Running
Started: Mon, 14 Apr 2025 14:44:26 -0600
Ready: True
Restart Count: 0
Requests:
cpu: 100m
memory: 90Mi
Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_NAME: ingress-nginx-controller-cd664468-x5mk9 (v1:metadata.name)
POD_NAMESPACE: ingress-nginx (v1:metadata.namespace)
LD_PRELOAD: /usr/local/lib/libmimalloc.so
Mounts:
/usr/local/certificates/ from webhook-cert (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-k2ltk (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
webhook-cert:
Type: Secret (a volume populated by a Secret)
SecretName: ingress-nginx-admission
Optional: false
kube-api-access-k2ltk:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 39m default-scheduler 0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
Normal Scheduled 38m default-scheduler Successfully assigned ingress-nginx/ingress-nginx-controller-cd664468-x5mk9 to ingress-nginx-dev-control-plane
Normal Pulled 38m kubelet Container image "us-central1-docker.pkg.dev/k8s-staging-images/ingress-nginx/controller:1.0.0-dev" already present on machine
Normal Created 38m kubelet Created container: controller
Normal Started 38m kubelet Started container controller
Normal RELOAD 38m nginx-ingress-controller NGINX reload triggered due to a change in configuration
kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>
Name: ingress-nginx-controller
Namespace: ingress-nginx
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=ingress-nginx
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
app.kubernetes.io/version=1.12.1
helm.sh/chart=ingress-nginx-4.12.1
Annotations: <none>
Selector: app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
Type: NodePort
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.96.120.145
IPs: 10.96.120.145
Port: http 80/TCP
TargetPort: http/TCP
NodePort: http 31510/TCP
Endpoints: 10.244.0.13:80
Port: https 443/TCP
TargetPort: https/TCP
NodePort: https 30212/TCP
Endpoints: 10.244.0.13:443
Session Affinity: None
External Traffic Policy: Cluster
Internal Traffic Policy: Cluster
Events: <none>
- Current state of ingress object, if applicable:
kubectl -n <appnamespace> get all,ing -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/httpd-777868ddb6-m4gfb 1/1 Running 0 44m 10.244.0.12 ingress-nginx-dev-control-plane <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/httpd ClusterIP 10.96.198.184 <none> 80/TCP 6h57m app=httpd
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
deployment.apps/httpd 1/1 1 1 6h57m httpd httpd:alpine app=httpd
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
replicaset.apps/httpd-777868ddb6 1 1 1 44m httpd httpd:alpine app=httpd,pod-template-hash=777868ddb6
replicaset.apps/httpd-798d447958 0 0 0 6h57m httpd httpd:alpine app=httpd,pod-template-hash=798d447958
NAME CLASS HOSTS ADDRESS PORTS AGE
ingress.networking.k8s.io/wildcard nginx *.wildcard.example.com 10.96.120.145 80 6h57m
kubectl -n <appnamespace> describe ing <ingressname>
Name: wildcard
Labels: <none>
Namespace: httpd
Address: 10.96.120.145
Ingress Class: nginx
Default backend: <default>
Rules:
Host Path Backends
---- ---- --------
*.wildcard.example.com
/ httpd:80 (10.244.0.12:80)
Annotations: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Sync 49m nginx-ingress-controller Scheduled for sync
Normal Sync 46m (x2 over 47m) nginx-ingress-controller Scheduled for sync
Normal Sync 40m nginx-ingress-controller Scheduled for sync
- If applicable, then, your complete and exact curl/grpcurl command (redacted if required) and the reponse to the curl/grpcurl command with the -v flag
I used hurl to simulate the re-use of a connection to send multiple requests. This is the config file:
GET http://one.wildcard.example.com
HTTP 200
GET http://one.wildcard.example.com
[Options]
delay: 10s
HTTP 200
GET http://one.wildcard.example.com
[Options]
delay: 20s
HTTP 200
GET http://one.wildcard.example.com
[Options]
delay: 30s
HTTP 200
GET http://one.wildcard.example.com
[Options]
delay: 40s
HTTP 200
GET http://one.wildcard.example.com
[Options]
delay: 60s
HTTP 200
GET http://one.wildcard.example.com
[Options]
delay: 90s
HTTP 200
GET http://one.wildcard.example.com
[Options]
delay: 120s
HTTP 200
- Others:
- Any other related information like ;
- copy/paste of the snippet (if applicable)
kubectl describe ...
of any custom configmap(s) created and in use- Any other related information that may help
- Any other related information like ;
How to reproduce this issue:
Setup
make dev-env
. Make sure you've setcontroller.extraArgs.shutdown-grace-period
is set to350
andcontroller.terminationGracePeriodSeconds
is360
. Any time period will work, you just need it to keep the pod around long enough to show the issue.kubectl create namespace httpd
kubectl create deployment httpd -n httpd --image=httpd:alpine
kubectl expose deployment -n httpd httpd --port 80
kubectl -n httpd create ingress wildcard --class nginx --rule "*.wildcard.example.com/*"=httpd:80
You now have a working cluster. Hit it with curl --connect-to ::127.0.0.1: "http://one.wildcard.example.com"
to see that it works.
Setup a file graceful_shutdown.hurl
with the following contents:
GET http://one.wildcard.example.com
HTTP 200
GET http://one.wildcard.example.com
[Options]
delay: 10s
HTTP 200
GET http://one.wildcard.example.com
[Options]
delay: 20s
HTTP 200
GET http://one.wildcard.example.com
[Options]
delay: 30s
HTTP 200
GET http://one.wildcard.example.com
[Options]
delay: 40s
HTTP 200
GET http://one.wildcard.example.com
[Options]
delay: 60s
HTTP 200
GET http://one.wildcard.example.com
[Options]
delay: 90s
HTTP 200
GET http://one.wildcard.example.com
[Options]
delay: 120s
HTTP 200
Reproduce the Issue
- In one terminal run
hurl --verbose --connect-to ::127.0.0.1: graceful_shutdown.hurl
to start the connection. - In a new terminal follow the logs of the current ingress-nginx pod:
kubectl logs ingress-nginx-controller-cd664468-x5mk9 -f
- In a new terminal, rollout restart ingress-nginx:
kubectl rollout restart deploy/ingress-nginx-controller -n ingress-nginx
- Observe in the logs of the controller pod:
Received SIGTERM, shutting down
- Wait for a request from hurl to go through and observe the log line in the terminating controller pod:
172.18.0.1 - - [14/Apr/2025:21:32:50 +0000] "GET / HTTP/1.1" 200 45 "-" "hurl/6.0.0" 87 0.002 [httpd-httpd-80] [] 10.244.0.12:80 45 0.002 200 33b4e103ac3ac7cf0955ba9a47f138bb
- Get the IP of the current
httpd
pod:kubectl get pod -n httpd -o wide
(10.244.0.12
in my case) - Rollout restart
httpd
in order to get a new pod IP:kubectl rollout restart deploy/httpd -n httpd
- Get the IP of the new
httpd
pod:kubectl get pod -n httpd -o wide
(10.244.0.15
in my case) - Observe errors in controller pod:
025/04/14 21:33:25 [error] 81#81: *9182 upstream timed out (110: Operation timed out) while connecting to upstream, client: 172.18.0.1, server: ~^(?<subdomain>[\w-]+)\.wildcard\.example\.com$, request: "GET / HTTP/1.1", upstream: "http://10.244.0.12:80/", host: "one.wildcard.example.com" 2025/04/14 21:33:30 [error] 81#81: *9182 upstream timed out (110: Operation timed out) while connecting to upstream, client: 172.18.0.1, server: ~^(?<subdomain>[\w-]+)\.wildcard\.example\.com$, request: "GET / HTTP/1.1", upstream: "http://10.244.0.12:80/", host: "one.wildcard.example.com" 2025/04/14 21:33:35 [error] 81#81: *9182 upstream timed out (110: Operation timed out) while connecting to upstream, client: 172.18.0.1, server: ~^(?<subdomain>[\w-]+)\.wildcard\.example\.com$, request: "GET / HTTP/1.1", upstream: "http://10.244.0.12:80/", host: "one.wildcard.example.com"
- Note that the IP is that of the old
httpd
pod10.244.0.12
and not of the new one10.244.0.15
.
Activity
k8s-ci-robot commentedon Apr 14, 2025
This issue is currently awaiting triage.
If Ingress contributors determines this is a relevant issue, they will accept it by applying the
triage/accepted
label and provide further guidance.The
triage/accepted
label can be added by org members by writing/triage accepted
in a comment.Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
grounded042 commentedon Apr 14, 2025
Draft PR of the fix: #13216
longwuyuan commentedon Apr 17, 2025
@grounded042 thank you very much for the detailed issue description. It really helps a reader to look at possible action items coming out of this issue.
On a tangent, please help explain a few aspects of your test.
In addition to the above factors, In the context of that that pod IPAddress being fetched from the endpointSlice, you expect no disruption ?
grounded042 commentedon Apr 18, 2025
Hey @longwuyuan, thanks for following up.
When I've encountered this issue it's due to either scaling in and we have pods which will be terminated or we're upgrading ingress-nginx version and all pods must be replaced. This is to simulate these scenarios I did a rolling restart.
Similar to the above, this is to simulate the app getting a new version and having all pods replaced.
I did not use hostPort for any particular reason. It is what was in
make dev-env
script.Correct, I expect as Pod IPs change that ingress-nginx will handle those until it sends SIGQUIT to NGINX.
longwuyuan commentedon Apr 20, 2025
Concurrent restart of controller as well as the backend pods, causing disruption is expected behaviour and I don't think the project will make changes to support this test case.
longwuyuan commentedon Apr 20, 2025
But wait for other comments
longwuyuan commentedon Apr 20, 2025
/remove-kind bug
/kind feature
grounded042 commentedon Apr 22, 2025
The controller and backend pods could restart concurrently at any point due to scheduling / scaling / etc. in kubernetes. Expecting the controller to keep pod IPs up to date until NGINX starts rejecting new connections after
SIGQUIT
seems like it would be table stakes.github-actions commentedon Jun 10, 2025
This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach
#ingress-nginx-dev
on Kubernetes Slack.