Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When used in targets, crew backend R instances do not close automatically after user cancelled #141

Closed
11 tasks done
psychelzh opened this issue Dec 7, 2023 · 6 comments
Closed
11 tasks done

Comments

@psychelzh
Copy link

Prework

  • Read and agree to the Contributor Code of Conduct and contributing guidelines.
  • Confirm that your issue is a genuine bug in the crew package itself and not a user error, known limitation, or issue from another package that crew depends on. For example, if you get errors running tar_make_clustermq(), try isolating the problem in a reproducible example that runs clustermq and not crew. And for miscellaneous troubleshooting, please post to discussions instead of issues.
  • If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
  • Post a minimal reproducible example like this one so the maintainer can troubleshoot the problems you identify. A reproducible example is:
    • Runnable: post enough R code and data so any onlooker can create the error on their own computer.
    • Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
    • Readable: format your code according to the tidyverse style guide.

Description

I am using the lastest 0.7 version of crew package. After I run targets::tar_make(), the front-end R terminal will invoke back-end R instances. However, if I canceled the pipeline manually, these instances won't close and continue to run until they finished.

Reproducible example

  • Post a minimal reproducible example so the maintainer can troubleshoot the problems you identify. A reproducible example is:
    • Runnable: post enough R code and data so any onlooker can create the error on their own computer.
    • Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
    • Readable: format your code according to the tidyverse style guide.

This pipeline is very useful to debug this problem.

# _targets.R
library(targets)
tar_option_set(
  controller = crew::crew_controller_local(workers = 8)
)

list(
  tarchetypes::tar_rep(
    test,
    Sys.sleep(100),
    batches = 30,
    reps = 1
  )
)

Expected result

After I canceled the pipeline, all backend R instances should terminate automatically.

@wlandau
Copy link
Owner

wlandau commented Dec 7, 2023

crew uses mirai, and the behavior you see is due to shikokuchuo/mirai#87 (also c.f. shikokuchuo/mirai#86 (comment)). Unfortunately there is nothing I can do from the perspective of crew or targets, but those crew workers will exit on their own once they finish the tasks that were in progress during the interrupt.

targets does have a callback to manually terminate its crew workers on error:

https://github.com/ropensci/targets/blob/cc9a6a23b4c25c7048a303535da3ed05c0682d9a/R/class_crew.R#L262-L277

Unfortunately, because the pipeline runs in a background callr process, the on.exit() callbacks are skipped when the inner process is abruptly terminated. I think counteracting this behavior would require installing non-default operating system signal handlers, which seems risky.

@wlandau wlandau closed this as not planned Won't fix, can't repro, duplicate, stale Dec 7, 2023
@wlandau
Copy link
Owner

wlandau commented Dec 8, 2023

That said, I may reopen this issue, depending on the result of shikokuchuo/mirai#87

@wlandau
Copy link
Owner

wlandau commented Dec 12, 2023

From shikokuchuo/mirai#87 and shikokuchuo/mirai#88, it looks like there will eventually be support from mirai via nanonext. Reopening.

@wlandau
Copy link
Owner

wlandau commented Dec 12, 2023

Fixed in aeeef8f. Requires development nanonext and mirai for now.

If the pipeline is interrupted, worker processes now send themselves SIGKILL to force quit abruptly. If you want a softer exit which allows cleanup/writing to complete before exiting, you can supply tools::SIGINT to the signal argument of crew_controller_local(). It will take some time before I can propagate the signal argument to launcher plugins in other packages.

@wlandau
Copy link
Owner

wlandau commented Dec 12, 2023

If the pipeline is interrupted, worker processes now send themselves SIGKILL to force quit abruptly. If you want a softer exit which allows cleanup/writing to complete before exiting, you can supply tools::SIGINT to the signal argument of crew_controller_local(). It will take some time before I can propagate the signal argument to launcher plugins in other packages.

Rethinking that choice, c.f. shikokuchuo/mirai#87 (comment):

An update: as I was testing just now, I found out that tools::SIGKILL is actually NA on Windows. That led me to do some more digging on signals, and I learned SIGKILL on a parent process can lead to zombie child processes. Zombies would defeat the whole purpose of my original preference for SIGKILL, so I am moving away from it. Instead, crew now prefers SIGTERM because the intention to kill the process is more explicit than SIGINT. If for some reason that signal is not defined on the user's platform, then it uses SIGQUIT. And if SIGQUIT is not defined, it uses SIGINT as a last resort. I don't think I will expose the choice of signal to crew users.

@psychelzh
Copy link
Author

Thanks for all the kind considerations into this. Here confirmed that the development version works as expected perfectly on Windows, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants