-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graph of dependent jobs? #4
Comments
Is this still on the radar? I'm currently using the python tool, Snakemake, for job submission and dependency management. There are now many such workflow systems available now, but none in R that I know of. |
I second this - I am working on bringing parallelism to the The cluster engines I've investigated (TORQUE, SGE, SLURM) all appear to allow the user to specify dependencies based on completion of previous jobs (specified by the scheduler's job ID). My hope is for If dependency management is considered to fit into the overall goals of |
Leaving dependency management to the scheduler has some disadvantages, including the inability to test for error conditions on exit of dependent jobs. |
Interesting point - is there a good alternative? |
Managing the dependencies in R is much more flexible. The first pass is to Sean On Fri, May 22, 2015 at 12:23 PM, Raman Shah notifications@github.com
|
👍 @seandavi: At least LSF (which I'm primarily interested in) can define the dependency conditional on exit status. Isn't this true for other schedulers? If the scheduler knows about dependencies, this is by far the easiest approach. The workflow you suggested -- reimplementing this in R -- sounds a bit like reimplementing Checking if a job needs to be re-run can be done as part of the job itself: if (digest::digest(input) == digest::digest(last_good_input)) quit(0) @ramanshah: Do you have any updates? |
@mllg: My use case is a web of data pipelines: Each stage processes data and creates artifacts, some of which are processed in subsequent stages. Currently I'm using @seandavi: Of course, for the "multicore" schedulers we'll need our own dependency handling. Which, again, could happen with an autogenerated Makefile. |
@krlmlr I left the position where I was working on this problem as part of my day job, so there won't likely be substantial news from me anymore. The group seems to have interest in building the benchmarking framework on top of a different foundation, possibly |
@ramanshah, what I am interested in is what you describe. There are many frameworks for doing this kind of thing: https://github.com/pditommaso/awesome-pipeline It would be great to do something in R related to common-workflow-language. I'd definitely be interested in working with you and @road2stat on this. |
SRC: https://code.google.com/p/batchjobs/issues/detail?id=19
For some experiments it MIGHT be useful to be able to specify a graph of dependent jobs, similar to how targets are defined in a Makefile.
This means, that for some jobs to starts, the results of others have to be fully completed. The solution for this probably is simple topological sorting wrt to preconditions.
But I want to collect more use cases, before we look into this again.
The text was updated successfully, but these errors were encountered: