Skip to content

Add option for node-cache to reload in-process CoreDNS via a signal. #689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Michcioperz
Copy link
Contributor

When the nodelocaldns addon is applied during a cluster upgrade, there is a risk of deadlock. Consider the following run:

  1. node-local-dns ConfigMap is updated.
  2. node-cache binary receives the update, writes a new Corefile.
  3. CoreDNS's reload plugin picks up the new Corefile and begins to set up a new server instance.
  4. node-local-dns DaemonSet is updated.
  5. node-local-dns Pod receives SIGTERM with 30s grace period. It begins executing shutdown callbacks, locking instancesMu. One of the callbacks is sending to reload plugin's unbuffered exit channel to make its goroutine exit.
  6. reload plugin finishes setting up a new server instance, and tries to Stop() the original instance. This requires the instancesMu mutex. instancesMu is locked and to unlock it, reload plugin would have to return to its main loop.
  7. Grace period is exceeded and Pod is terminated without tearing down iptables rules.

This seems like a plausible root cause of #453.

Reload plugin is still supported after this change. The addon continues to work as-is, but it provides a flag-gated way to reload via SIGUSR1.

In order to enable this new behavior, remove reload directives from the Corefile template and add -reloadwithsignal to container args.

Reload with SIGUSR1 doesn't cause the deadlock, because signal processing in coredns/caddy is sequential. If SIGTERM comes during SIGUSR1 processing, it will just wait (and hopefully reload and termination will fit in 30 seconds). IF SIGUSR1 comes during SIGTERM processing, well, that's too late for it.

When the nodelocaldns addon is applied during a cluster upgrade, there is a risk of deadlock. Consider the following run:

1. node-local-dns ConfigMap is updated.
2. node-cache binary receives the update, writes a new Corefile.
3. CoreDNS's reload plugin picks up the new Corefile and begins to set up a new server instance.
4. node-local-dns DaemonSet is updated.
5. node-local-dns Pod receives SIGTERM with 30s grace period. It begins executing shutdown callbacks, locking instancesMu. One of the callbacks is sending to reload plugin's unbuffered `exit` channel to make its goroutine exit.
6. reload plugin finishes setting up a new server instance, and tries to Stop() the original instance. This requires the instancesMu mutex. instancesMu is locked and to unlock it, reload plugin would have to return to its main loop.
7. Grace period is exceeded and Pod is terminated without tearing down iptables rules.

This seems like a plausible root cause of kubernetes#453.

Reload plugin is still supported after this change.
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 19, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @Michcioperz. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Michcioperz
Once this PR has been reviewed and has the lgtm label, please assign damiansawicki for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label May 19, 2025
@johnbelamaric
Copy link
Member

/ok-to-test

Would it make any sense for reload to always work via a signal, rather than what we do today in upstream CoreDNS? If so, maybe introduce that concept in the CoreDNS repo.

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 20, 2025
@Michcioperz
Copy link
Contributor Author

@johnbelamaric Thanks for the ok-to-test.

I was hoping to raise this issue in CoreDNS, but I didn't have time to verify that this repro works on CoreDNS directly (though I don't see why it shouldn't). I'll try to find a moment tomorrow.

Personally I think reloading with signals is fine here, as we treat CoreDNS as a kind of opaque-box (we invoke its main() function in a goroutine). But I would be surprised if CoreDNS had to resort to using signals within itself, surely this can be made to make sense with mutexes? I will include your suggestion in the bug when I file it though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants