Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for exit callback/cleanup script #129

Closed
troglobit opened this issue Jul 1, 2020 · 5 comments
Closed

Add support for exit callback/cleanup script #129

troglobit opened this issue Jul 1, 2020 · 5 comments
Assignees
Milestone

Comments

@troglobit
Copy link
Owner

Similar to the systemd ExecStopPost=. Sometimes useful for services not cleaning up after themselves, or crashing.

@troglobit troglobit added this to the v3.3 milestone Jul 1, 2020
@magnusmalm
Copy link
Contributor

magnusmalm commented Aug 12, 2020

I currently have a PoC that might be a candidate for this.

A new keyword 'on_term' can be added to a service config line with the following syntax:

on_term:</absolut/path/to/script>

If on_term is set, and the path points to an existing file, that file is assumed to be an executable script. This script is then run, being put in the background with a system() call with & appended to the command line to ensure finit itself is not locked up in case the script behaves poorly.

Currently, this call is made in the following functions:

service_kill() and service_stop()
directly after the kill() call

service_step
in the SVC_RUNNING_STATE case when a service has crashed.

My experience with finit thus far is quite limited so there are certainly quite a few improvements to be made.

Limited testing has been done with a dummy script on an embedded target which verifies that the PoC in its current form works as expected in the simple cases, namely

  • normal stop of the service (initctl stop )
  • forced stop of the service (sending SIGKILL via kill)
  • service crash (sending SIGSEGV via kill)

The PoC is currently based on finit 3.1 with additional in-house patches but I will create a proper branch ASAP based on master.

@magnusmalm
Copy link
Contributor

I have created a branch here for the PoC.

@troglobit
Copy link
Owner Author

troglobit commented Aug 13, 2020

Looks promising. There are a few things I dislike, from cosmetic to design:

  1. I don't like the on_term keyword. Mostly because of the underscore, we'll have to think more about it. When do we want to run it, always or only on SIGTERM? post-script:/path/to/script may be better, dunno
  2. The policy in Finit is to supervise everything, never background stuff we can monitor. We don't want a pile of processes lingering due to badly written scripts that never terminate. I'd suggest having a look at the service_timeout_after() mechanism, either that can be reused for this, or we need another such to provide timeout for external scripts.

@magnusmalm
Copy link
Contributor

Sorry for the delayed response. Thanks for the excellent feedback.

  1. Agreed on the name here. Regarding when to run it, the original idea was to have a way to clean up after a crashed/force stopped service (SIGSEGV, SIGKILL, etc). Well Implemented Services (tm) should of course either prep before "going to work", or clean up after itself on a clean exit (SIGTERM). Keeping this in mind, post-script is totally fine IMHO. 😄

  2. Considering the potential number of times a post-script can be executed if 1. above is assumed, the supervise everything policy is a must, I realize. 😄 Will need to consider the options here. An initial approach could be to simply have a timeout (1s? configurable?) that, when running out, terminates the script.

Lastly, two things I am pondering:

a) Should a service be prevented from restarting until the post-script has finished (or been terminated due to timeout)?

b) If post-script is run on SIGKILL, SIGSEGV, SIGPIPE, etc, would it be of interest to have the option to execute a clean exit script? For instance, exit-script:/path/to/script would be run on SIGTERM. Or is this putting too much responsibility on finit?

Will contemplate over all this for a bit and see if I arrive at anything codewise. 😄

@troglobit
Copy link
Owner Author

I see, well https://www.freedesktop.org/software/systemd/man/systemd.service.html#ExecStopPost= defines its action as always run when the process is stopped. Meaning the commands/script executed must be able to handle cases when the process didn't even start properly. So I'm not sure there is any added value to only running cleanup scripts on crashes.

All processes that exit land in service_monitor(), the service_step() state machine is then called to transition the run/task/service to it's next state. We could probably add another state here for services with cleanup script. I.e., from SVC_RUNNING_STATE to a new SVC_CLEANUP_STATE, or similar. Notice how service_timeout_after(), which I mentioned previously, is called here. So it's infrastructure can likely be reused in the CLEANUP state transitions.

So to answer your questions:

  1. Yes, a service must definitely be prevented from restarting during cleanup. Use the svc state machine.
  2. It's better to do it like systemd, they set the $SERVICE_RESULT, $EXIT_CODE and $EXIT_STATUS environment variables to that of the exiting PID. Making it up to the cleanup script to handle the different cases.

@troglobit troglobit removed this from the v3.3 milestone Feb 10, 2021
@troglobit troglobit self-assigned this Apr 18, 2021
@troglobit troglobit changed the title Add support for exit callback/clanup script Add support for exit callback/cleanup script Apr 18, 2021
@troglobit troglobit added this to the v4.0 milestone Apr 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants