Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dagobah's features compared to chronos or azkaban? #23

Closed
utsengar opened this issue Aug 17, 2013 · 7 comments
Closed

Dagobah's features compared to chronos or azkaban? #23

utsengar opened this issue Aug 17, 2013 · 7 comments
Labels

Comments

@utsengar
Copy link

dagobah is exactly what I was looking for, its much simpler (than chronos or azkaban) but also not similar to them.

So some questions/suggestions:

  1. Remote task execution? I believe this can be easily implemented via fabric. What do you think about it?
  2. Logging all the task runs. Right now only the most recent task logs can be seen. Adding logs for all the tasks would be a useful feature (atleast for our usecase).
  3. A fairly easy change will be give an option to disable auth if not needed.

I am playing around with Dagobag and I am trying to work on 1 and 2 now, not sure about the timeline though :)

@thieman
Copy link
Owner

thieman commented Aug 17, 2013

Hey there! Unfortunately, I'm not familiar with chronos or azkaban, but by a quick glance at their product pages they seem to be good examples of why other scheduling projects weren't great fits for me. Namely:

  • Azkaban, like a bunch of other schedulers, was created with a specific task in mind (managing Hadoop jobs). Spotify's Luigi, and a bunch of others, also seem to target map-reduce workflows. Dagobah is meant to be pretty much a cron replacement.
  • Chronos is a cron replacement! That's awesome, but it seems like total overkill for what I needed to do. I wanted something simple that I could run on one machine to manage my nightly analytics updates.

Now, to your questions:

  1. Could you go into a bit more detail here? Dagobah is currently made for running on one machine, so if you want to influence other machines you'll need to do that in a different framework. Fabric, as you suggest, or something like Celery would be good for this. If I'm misunderstanding you, please let me know.
  2. The task logs do get permanently stored in your backend. There's no way to examine them currently in the web app, though.
  3. Yes! This is a good idea. I'll make an issue for it.

@utsengar
Copy link
Author

Azkaban, like a bunch of other schedulers, was created with a specific task in mind (managing Hadoop jobs). Spotify's Luigi, and a bunch of others, also seem to target map-reduce workflows. Dagobah is meant to be pretty much a cron replacement. Chronos is a cron replacement! That's awesome, but it seems like total overkill for what I needed to do. I wanted something simple that I could run on one machine to manage my nightly analytics updates.

This is the exact reason why I liked dagobah. Its simple and is flexible enough to be used for other usecases. I am also looking for a cron replacement. Azkaban is meant for hadoop job automation but it can also act as a cron replacement (with retries, DAG etc). Chronos is awful and completely an overkill.

Could you go into a bit more detail here? Dagobah is currently made for running on one machine, so if you want to influence other machines you'll need to do that in a different framework. Fabric, as you suggest, or something like Celery would be good for this. If I'm misunderstanding you, please let me know.

You got it right. Dagobah currently runs on one machine. But I am trying to add fabric to Dagobah so that it can execute commands on a remote machine. Celery is good too, but the management overhead is less if I just use fabric and execute remote commands.

The task logs do get permanently stored in your backend. There's no way to examine them currently in the web app, though.

Good to know, I will try to expose this data in the web app.

@utsengar
Copy link
Author

I have added remote task execution here: https://github.com/utkarsh2012/dagobah/blob/master/dagobah/core/core.py#L634 it works nicely with the existing stuff (UI needs some work like updating remote machine endpoint, needs tests). It uses paramiko and spawns processes for every remote request.

What do you think?

@thieman
Copy link
Owner

thieman commented Aug 24, 2013

Hey @utkarsh2012, this looks awesome! I will review your branch when I get some time and get back to you.

@thieman thieman reopened this Aug 24, 2013
@utsengar
Copy link
Author

Looks like Travis CI build is broken, missed to add dependency for paramiko in setup.py and requirements.txt.

Also the UI might need some work, I hacked up the solution in a day to see how will it work. I might submit more fixed if I find bugs.

@thieman thieman closed this as completed Aug 30, 2013
@rclough
Copy link
Collaborator

rclough commented Apr 7, 2014

Did this ever get included? I can't seem to find remote options in the web UI

@levonk
Copy link

levonk commented Dec 1, 2017

Just a minor correction, Azkaban can be used for anything, not just Hadoop based Map-Reduce. It can be used only as a superior cron (dependencies, transparency, partial workflows, etc...) only. You don't need hadoop at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants