Add support for exit callback/cleanup script #129

troglobit · 2020-07-01T08:46:22Z

Similar to the systemd ExecStopPost=. Sometimes useful for services not cleaning up after themselves, or crashing.

The text was updated successfully, but these errors were encountered:

magnusmalm · 2020-08-12T14:48:35Z

I currently have a PoC that might be a candidate for this.

A new keyword 'on_term' can be added to a service config line with the following syntax:

on_term:</absolut/path/to/script>

If on_term is set, and the path points to an existing file, that file is assumed to be an executable script. This script is then run, being put in the background with a system() call with & appended to the command line to ensure finit itself is not locked up in case the script behaves poorly.

Currently, this call is made in the following functions:

service_kill() and service_stop()
directly after the kill() call

service_step
in the SVC_RUNNING_STATE case when a service has crashed.

My experience with finit thus far is quite limited so there are certainly quite a few improvements to be made.

Limited testing has been done with a dummy script on an embedded target which verifies that the PoC in its current form works as expected in the simple cases, namely

normal stop of the service (initctl stop )
forced stop of the service (sending SIGKILL via kill)
service crash (sending SIGSEGV via kill)

The PoC is currently based on finit 3.1 with additional in-house patches but I will create a proper branch ASAP based on master.

magnusmalm · 2020-08-12T21:29:04Z

I have created a branch here for the PoC.

troglobit · 2020-08-13T15:01:26Z

Looks promising. There are a few things I dislike, from cosmetic to design:

I don't like the on_term keyword. Mostly because of the underscore, we'll have to think more about it. When do we want to run it, always or only on SIGTERM? post-script:/path/to/script may be better, dunno
The policy in Finit is to supervise everything, never background stuff we can monitor. We don't want a pile of processes lingering due to badly written scripts that never terminate. I'd suggest having a look at the service_timeout_after() mechanism, either that can be reused for this, or we need another such to provide timeout for external scripts.

magnusmalm · 2020-08-25T21:41:34Z

Sorry for the delayed response. Thanks for the excellent feedback.

Agreed on the name here. Regarding when to run it, the original idea was to have a way to clean up after a crashed/force stopped service (SIGSEGV, SIGKILL, etc). Well Implemented Services (tm) should of course either prep before "going to work", or clean up after itself on a clean exit (SIGTERM). Keeping this in mind, post-script is totally fine IMHO. 😄
Considering the potential number of times a post-script can be executed if 1. above is assumed, the supervise everything policy is a must, I realize. 😄 Will need to consider the options here. An initial approach could be to simply have a timeout (1s? configurable?) that, when running out, terminates the script.

Lastly, two things I am pondering:

a) Should a service be prevented from restarting until the post-script has finished (or been terminated due to timeout)?

b) If post-script is run on SIGKILL, SIGSEGV, SIGPIPE, etc, would it be of interest to have the option to execute a clean exit script? For instance, exit-script:/path/to/script would be run on SIGTERM. Or is this putting too much responsibility on finit?

Will contemplate over all this for a bit and see if I arrive at anything codewise. 😄

troglobit · 2020-08-27T06:27:18Z

I see, well https://www.freedesktop.org/software/systemd/man/systemd.service.html#ExecStopPost= defines its action as always run when the process is stopped. Meaning the commands/script executed must be able to handle cases when the process didn't even start properly. So I'm not sure there is any added value to only running cleanup scripts on crashes.

All processes that exit land in service_monitor(), the service_step() state machine is then called to transition the run/task/service to it's next state. We could probably add another state here for services with cleanup script. I.e., from SVC_RUNNING_STATE to a new SVC_CLEANUP_STATE, or similar. Notice how service_timeout_after(), which I mentioned previously, is called here. So it's infrastructure can likely be reused in the CLEANUP state transitions.

So to answer your questions:

Yes, a service must definitely be prevented from restarting during cleanup. Use the svc state machine.
It's better to do it like systemd, they set the $SERVICE_RESULT, $EXIT_CODE and $EXIT_STATUS environment variables to that of the exiting PID. Making it up to the cleanup script to handle the different cases.

troglobit added this to the v3.3 milestone Jul 1, 2020

troglobit removed this from the v3.3 milestone Feb 10, 2021

troglobit self-assigned this Apr 18, 2021

troglobit changed the title ~~Add support for exit callback/clanup script~~ Add support for exit callback/cleanup script Apr 18, 2021

troglobit added this to the v4.0 milestone Apr 18, 2021

troglobit closed this as completed in ba85874 Apr 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for exit callback/cleanup script #129

Add support for exit callback/cleanup script #129

troglobit commented Jul 1, 2020

magnusmalm commented Aug 12, 2020 •

edited

Loading

magnusmalm commented Aug 12, 2020

troglobit commented Aug 13, 2020 •

edited

Loading

magnusmalm commented Aug 25, 2020

troglobit commented Aug 27, 2020

Add support for exit callback/cleanup script #129

Add support for exit callback/cleanup script #129

Comments

troglobit commented Jul 1, 2020

magnusmalm commented Aug 12, 2020 • edited Loading

magnusmalm commented Aug 12, 2020

troglobit commented Aug 13, 2020 • edited Loading

magnusmalm commented Aug 25, 2020

troglobit commented Aug 27, 2020

magnusmalm commented Aug 12, 2020 •

edited

Loading

troglobit commented Aug 13, 2020 •

edited

Loading