When used in targets, crew backend R instances do not close automatically after user cancelled #141

psychelzh · 2023-12-07T16:41:32Z

Prework

Read and agree to the Contributor Code of Conduct and contributing guidelines.
Confirm that your issue is a genuine bug in the crew package itself and not a user error, known limitation, or issue from another package that crew depends on. For example, if you get errors running tar_make_clustermq(), try isolating the problem in a reproducible example that runs clustermq and not crew. And for miscellaneous troubleshooting, please post to discussions instead of issues.
If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
Post a minimal reproducible example like this one so the maintainer can troubleshoot the problems you identify. A reproducible example is:
- Runnable: post enough R code and data so any onlooker can create the error on their own computer.
- Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
- Readable: format your code according to the tidyverse style guide.

Description

I am using the lastest 0.7 version of crew package. After I run targets::tar_make(), the front-end R terminal will invoke back-end R instances. However, if I canceled the pipeline manually, these instances won't close and continue to run until they finished.

Reproducible example

Post a minimal reproducible example so the maintainer can troubleshoot the problems you identify. A reproducible example is:
- Runnable: post enough R code and data so any onlooker can create the error on their own computer.
- Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
- Readable: format your code according to the tidyverse style guide.

This pipeline is very useful to debug this problem.

# _targets.R
library(targets)
tar_option_set(
  controller = crew::crew_controller_local(workers = 8)
)

list(
  tarchetypes::tar_rep(
    test,
    Sys.sleep(100),
    batches = 30,
    reps = 1
  )
)

Expected result

After I canceled the pipeline, all backend R instances should terminate automatically.

The text was updated successfully, but these errors were encountered:

wlandau · 2023-12-07T21:41:24Z

crew uses mirai, and the behavior you see is due to shikokuchuo/mirai#87 (also c.f. shikokuchuo/mirai#86 (comment)). Unfortunately there is nothing I can do from the perspective of crew or targets, but those crew workers will exit on their own once they finish the tasks that were in progress during the interrupt.

targets does have a callback to manually terminate its crew workers on error:

https://github.com/ropensci/targets/blob/cc9a6a23b4c25c7048a303535da3ed05c0682d9a/R/class_crew.R#L262-L277

Unfortunately, because the pipeline runs in a background callr process, the on.exit() callbacks are skipped when the inner process is abruptly terminated. I think counteracting this behavior would require installing non-default operating system signal handlers, which seems risky.

wlandau · 2023-12-08T10:15:58Z

That said, I may reopen this issue, depending on the result of shikokuchuo/mirai#87

wlandau · 2023-12-12T13:25:51Z

From shikokuchuo/mirai#87 and shikokuchuo/mirai#88, it looks like there will eventually be support from mirai via nanonext. Reopening.

wlandau · 2023-12-12T15:56:18Z

Fixed in aeeef8f. Requires development nanonext and mirai for now.

If the pipeline is interrupted, worker processes now send themselves SIGKILL to force quit abruptly. If you want a softer exit which allows cleanup/writing to complete before exiting, you can supply tools::SIGINT to the signal argument of crew_controller_local(). It will take some time before I can propagate the signal argument to launcher plugins in other packages.

wlandau · 2023-12-12T16:52:14Z

If the pipeline is interrupted, worker processes now send themselves SIGKILL to force quit abruptly. If you want a softer exit which allows cleanup/writing to complete before exiting, you can supply tools::SIGINT to the signal argument of crew_controller_local(). It will take some time before I can propagate the signal argument to launcher plugins in other packages.

Rethinking that choice, c.f. shikokuchuo/mirai#87 (comment):

An update: as I was testing just now, I found out that tools::SIGKILL is actually NA on Windows. That led me to do some more digging on signals, and I learned SIGKILL on a parent process can lead to zombie child processes. Zombies would defeat the whole purpose of my original preference for SIGKILL, so I am moving away from it. Instead, crew now prefers SIGTERM because the intention to kill the process is more explicit than SIGINT. If for some reason that signal is not defined on the user's platform, then it uses SIGQUIT. And if SIGQUIT is not defined, it uses SIGINT as a last resort. I don't think I will expose the choice of signal to crew users.

psychelzh · 2023-12-13T02:12:01Z

Thanks for all the kind considerations into this. Here confirmed that the development version works as expected perfectly on Windows, too.

psychelzh added the type: bug label Dec 7, 2023

psychelzh assigned wlandau Dec 7, 2023

This was referenced Dec 7, 2023

Dispatcher stays running after daemons(n = 0L) when not all tasks are dispatched yet shikokuchuo/mirai#86

Closed

Allow daemons to exit immediately when the connection terminates? shikokuchuo/mirai#87

Closed

wlandau added depends: another issue depends: different tool type: edge case and removed type: bug labels Dec 7, 2023

wlandau closed this as not planned Won't fix, can't repro, duplicate, stale Dec 7, 2023

wlandau reopened this Dec 12, 2023

wlandau-lilly added a commit that referenced this issue Dec 12, 2023

Add deliberately failing test of #141

0449197

wlandau-lilly closed this as completed in aeeef8f Dec 12, 2023

wlandau mentioned this issue Dec 12, 2023

Expose the signal argument of the launcher constructor wlandau/crew.aws.batch#6

Closed

wlandau-lilly added a commit that referenced this issue Dec 12, 2023

Rethink #141

4fed2cd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When used in targets, crew backend R instances do not close automatically after user cancelled #141

When used in targets, crew backend R instances do not close automatically after user cancelled #141

psychelzh commented Dec 7, 2023

wlandau commented Dec 7, 2023 •

edited

wlandau commented Dec 8, 2023

wlandau commented Dec 12, 2023

wlandau commented Dec 12, 2023

wlandau commented Dec 12, 2023 •

edited

psychelzh commented Dec 13, 2023

When used in targets, crew backend R instances do not close automatically after user cancelled #141

When used in targets, crew backend R instances do not close automatically after user cancelled #141

Comments

psychelzh commented Dec 7, 2023

Prework

Description

Reproducible example

Expected result

wlandau commented Dec 7, 2023 • edited

wlandau commented Dec 8, 2023

wlandau commented Dec 12, 2023

wlandau commented Dec 12, 2023

wlandau commented Dec 12, 2023 • edited

psychelzh commented Dec 13, 2023

wlandau commented Dec 7, 2023 •

edited

wlandau commented Dec 12, 2023 •

edited