Skip to content
This repository has been archived by the owner on Apr 24, 2023. It is now read-only.

Add Fenzo for scheduling #84

Closed
wants to merge 29 commits into from
Closed

Add Fenzo for scheduling #84

wants to merge 29 commits into from

Conversation

dgrnbrg
Copy link
Contributor

@dgrnbrg dgrnbrg commented Nov 18, 2015

This adds Fenzo as the task placer. To take advantage of Fenzo, we want to provide it with many tasks to place every cycle. In order to guarantee that big tasks get scheduled and Fenzo sees a lot of tasks, we exponentially reduce the # of tasks that Fenzo sees every scheduling cycle as long as it doesn't schedule the head of the queue. Once the head is scheduled, we reset the # of jobs that Fenzo gets to consider.

When Fenzo starts a task out of order, it is flagged as "backfilled". A backfilled task is bumped to the highest priority for the preemption system. Once a backfilled task moves to the head of the queue, however, we upgrade it to be non-backfilled. The backfilled tasks don't actually make a job register as "running", since they could be killed at any time.

@@ -21,4 +21,6 @@
:levels {"datomic.db" :warn
"datomic.peer" :warn
"datomic.kv-cluster" :warn
"cook.mesos.scheduler" :debug
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these levels should be removed before merge

@@ -218,10 +218,22 @@
[{:keys [task->scored-task host->spare-resources] :as state}
{:keys [min-dru-diff safe-dru-threshold] :as params}
pending-job-ent]
;;We need to rely on this being a priority map TODO check that this is true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true. It has to be a priority map

result (.scheduleOnce fenzo requests leases)
failure-results (.. result getFailures values)
assignments (.. result getResultMap values)]
(log/info "Found this assigment:" result)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgrnbrg dgrnbrg added this to the Fenzo Integration milestone Dec 7, 2015
[clj-time "0.9.0"]
[org.clojure/core.async "0.1.346.0-17112a-alpha"]
[org.clojure/core.async "0.2.374"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the newer core.async seems to require clojure 1.7, is this upgrade safe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, moved to 1.7

@icexelloss
Copy link
Member

In general, it would be great if the effect of backfilled-instances are well documented in a single place to be better maintainable. Now it's kind of all over the place, things like:
(1) Backfilled instances are of the lowerest priority?
(2) Having backfilled intance doesn't change the job state to be running?
...

@dgrnbrg
Copy link
Contributor Author

dgrnbrg commented Dec 9, 2015

Where do you think that backfilled instances could be documented?

@dgrnbrg
Copy link
Contributor Author

dgrnbrg commented Dec 9, 2015

@icexelloss @wyegelwel additional comments?

offer-size-mem
(get-in offer [:resources :mem] 0)))
(log/debug "invoked handle-resource-offers!")
(let [offer-stash (atom nil)] ;; This is a way to ensure we never lose offers fenzo assigned if an errors occures in the middle of processing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe put a comment:

If we fail to launch some of the tasks, we will put all the offers - used and unused - onto offer-stash. Those used offers will be feed back to frenzo again for scheduling, which will probably cause task lost next time we use it.

@icexelloss
Copy link
Member

Put docs about backfilled instance and the state machine of jobs

@@ -55,7 +55,7 @@
:max-retries (s/both s/Int (s/pred pos? 'pos?))
:max-runtime (s/both s/Int (s/pred pos? 'pos?))
(s/optional-key :uris) [Uri]
(s/optional-key :ports) [(s/pred zero? 'zero)] ;;TODO add to docs the limited uri/port support
(s/optional-key :ports) (s/pred #(not (neg? %)) 'nonnegative?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ports allowed to be a list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it is the number of ports.

@wyegelwel
Copy link
Contributor

I'm closing this in favor of #145 which is just about ready to merge.

@wyegelwel wyegelwel closed this Jul 25, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants