Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: withdraw advertisements on shutdown #27

Open
and0x000 opened this issue Oct 19, 2021 · 4 comments · May be fixed by #28
Open

Feature request: withdraw advertisements on shutdown #27

and0x000 opened this issue Oct 19, 2021 · 4 comments · May be fixed by #28

Comments

@and0x000
Copy link

Is there a way to tell anycast-healthchecker to withdraw all announcements on a clean shutdown? Similar to purge_ip_prefixes but on exit?

My scenario is, that I want to be able to perform some maintenance without interrupting any service to much.
Routers may take some time for announcements to converge. So if I shut down any healthchecked service it takes a few seconds before the healthchecker notices the service's unavailability and then again some time until the traffic no longer hits the system.

For a smooth transition my approach is to first withdraw all the routes on a system before shutting down any service.

Doing so by shutting down the anycast-healtchecker looks the cleanest to me. Everything else I can think of would be messing with the healthchecker and probably result in attempts by it to fix the configuration.

@unixsurfer
Copy link
Owner

For your use-case the easiest and fastest way is to stop the bird daemon. It will yield what you want. Bird daemon is stopped during the shutdown process, so you don't need to do much with anycast-healthchecker.

@and0x000
Copy link
Author

My NOC people get a little twitchy when BGP sessions are down, so I'd avoid taking them down for most of my use scenarios.

Bird daemon is stopped during the shutdown process, so you don't need to do much with anycast-healthchecker.

Yes, but that may cause service interruption as described above. Routes may not have converged into the routers' ASICs and traffic may still hit the machine while no service is up for responding.

From my point of view, an additional parameter on the checks would do the trick. Probably on_exit, similar to on_disabled.

on_exit => "withdraw" -> disable ip_prefix on exit. This requires itterating all checks in the shutdown method.
If you don't see any problem with this I'll try to put it into code (albeit python not really being my native language) and start a PR.

@unixsurfer
Copy link
Owner

My NOC people get a little twitchy when BGP sessions are down, so I'd avoid taking them down for most of my use scenarios.

I never had a problem with this approach and if NOC is having issues when a BGP session is terminated then something is wrong, terminating a BGP session is a normal operational task and it shouldn't cause troubles, only an alert.

Bird daemon is stopped during the shutdown process, so you don't need to do much with anycast-healthchecker.

Yes, but that may cause service interruption as described above. Routes may not have converged into the routers' ASICs and traffic may still hit the machine while no service is up for responding.

You can avoid this scenario with correct systemd ordering for Bird systemd service. I have had bird configured to start last on boot and stopped first on shutdown to avoid the scenario you describe.

From my point of view, an additional parameter on the checks would do the trick. Probably on_exit, similar to on_disabled.

on_exit => "withdraw" -> disable ip_prefix on exit. This requires itterating all checks in the shutdown method. If you don't see any problem with this I'll try to put it into code (albeit python not really being my native language) and start a PR.

Having on_exit parameter per service check makes sense, it should have a default value of none which does anything.

I will try to cook something this weekend, let's see if I manage to find time for it.

@and0x000 and0x000 linked a pull request Oct 21, 2021 that will close this issue
@and0x000
Copy link
Author

and0x000 commented Oct 21, 2021

I never had a problem with this approach and if NOC is having issues when a BGP session is terminated then something is wrong, terminating a BGP session is a normal operational task and it shouldn't cause troubles, only an alert.

You are right, but the alert is something I'd like to avoid for most use cases.

Having on_exit parameter per service check makes sense, it should have a default value of none which does anything.

I will try to cook something this weekend, let's see if I manage to find time for it.

I cobbled together a pull request but my python is far from any good. It's mostly copy/paste from your existing code with some stackoverflow sprinkled over it. It works but it's probably not very clean python. Feel free to adjust my code where there are more elegant ways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants