-
Notifications
You must be signed in to change notification settings - Fork 64
Conversation
@@ -21,4 +21,6 @@ | |||
:levels {"datomic.db" :warn | |||
"datomic.peer" :warn | |||
"datomic.kv-cluster" :warn | |||
"cook.mesos.scheduler" :debug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these levels should be removed before merge
@@ -218,10 +218,22 @@ | |||
[{:keys [task->scored-task host->spare-resources] :as state} | |||
{:keys [min-dru-diff safe-dru-threshold] :as params} | |||
pending-job-ent] | |||
;;We need to rely on this being a priority map TODO check that this is true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true. It has to be a priority map
result (.scheduleOnce fenzo requests leases) | ||
failure-results (.. result getFailures values) | ||
assignments (.. result getResultMap values)] | ||
(log/info "Found this assigment:" result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also look at and log the exceptions from result.getExceptions(): http://netflix.github.io/Fenzo/fenzo-core/com/netflix/fenzo/TaskScheduler.html#scheduleOnce-java.util.List-java.util.List-
[clj-time "0.9.0"] | ||
[org.clojure/core.async "0.1.346.0-17112a-alpha"] | ||
[org.clojure/core.async "0.2.374"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the newer core.async seems to require clojure 1.7, is this upgrade safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, moved to 1.7
In general, it would be great if the effect of backfilled-instances are well documented in a single place to be better maintainable. Now it's kind of all over the place, things like: |
Where do you think that backfilled instances could be documented? |
@icexelloss @wyegelwel additional comments? |
offer-size-mem | ||
(get-in offer [:resources :mem] 0))) | ||
(log/debug "invoked handle-resource-offers!") | ||
(let [offer-stash (atom nil)] ;; This is a way to ensure we never lose offers fenzo assigned if an errors occures in the middle of processing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put a comment:
If we fail to launch some of the tasks, we will put all the offers - used and unused - onto offer-stash. Those used offers will be feed back to frenzo again for scheduling, which will probably cause task lost next time we use it.
Put docs about backfilled instance and the state machine of jobs |
@@ -55,7 +55,7 @@ | |||
:max-retries (s/both s/Int (s/pred pos? 'pos?)) | |||
:max-runtime (s/both s/Int (s/pred pos? 'pos?)) | |||
(s/optional-key :uris) [Uri] | |||
(s/optional-key :ports) [(s/pred zero? 'zero)] ;;TODO add to docs the limited uri/port support | |||
(s/optional-key :ports) (s/pred #(not (neg? %)) 'nonnegative?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is ports allowed to be a list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see it is the number of ports.
I'm closing this in favor of #145 which is just about ready to merge. |
This adds Fenzo as the task placer. To take advantage of Fenzo, we want to provide it with many tasks to place every cycle. In order to guarantee that big tasks get scheduled and Fenzo sees a lot of tasks, we exponentially reduce the # of tasks that Fenzo sees every scheduling cycle as long as it doesn't schedule the head of the queue. Once the head is scheduled, we reset the # of jobs that Fenzo gets to consider.
When Fenzo starts a task out of order, it is flagged as "backfilled". A backfilled task is bumped to the highest priority for the preemption system. Once a backfilled task moves to the head of the queue, however, we upgrade it to be non-backfilled. The backfilled tasks don't actually make a job register as "running", since they could be killed at any time.