Skip to content
This repository has been archived by the owner on Apr 24, 2023. It is now read-only.

Implemented host-placement group constraints #284

Merged
merged 1 commit into from
Apr 12, 2017
Merged

Conversation

DiegoAlbertoTorres
Copy link
Contributor

Multiple code review changes, including better docstrings, many variable name changes, and filtering host->scored-tasks.

Created a new namespace for constraints

Generalized the balanced constraint to any attribute, exposed expected count parameter

Multiple code review changes: made unconstrained filtering in rebalancer simpler and more efficient, improved naming of cotask functions, added host placement explanation to groups.md and other small changes

Made offer cache maximum size and ttl configurable.

Added rebalancer tests for balanced constraint

CR changes: clarifications in group.md, extracted common code from fenzo constraints into shared functions, cotasks are now both unknown and running instances, host attributes now extracted from offers, renamed cotask-getter functions, added test for make-guuid->juuids

Added many docstrings. Made sure that order of considerable jobs is preserved before handing to scheduleOnce. Wrote a test to check that considerable jobs are passed in right order.

compute-preemption-decision now considers preemptions done in the same cycle to validate constraints.

Refactored constraints.clj. Now all logic for constraints is in the evaluation function, which is actually a multi-method. This multi-method can then be wrapped to make it fenzo-compatible (in make-fenzo-hp-constraint) or rebalancer-compatible (in make-rebalacer-hp-constaint).

Added unique-host constraint to the rebalancer. Refactored constraints.clj to define JobConstraint and GroupConstraint protocols, which make the code more modular, easier to modify and harder to unintentionally break.

Now synthetic instances in state have slave-id, which is used to find attributes of constraints. Fixed lack of nil check when placement constraint is 'all'.

Multiple minor improvements to address CR by shams

Small improvements as suggested in CR by Wil

Removed gpu-host-constraint-helper

Multiple small CR improvements as suggested by Wil.

More CR fixes as suggested by Wil, mostly on constraints.clj, msotly small refactoring and renaming

Signed-off-by: Diego Torres Quintanilla diego.torresquintanilla@twosigma.com

@DiegoAlbertoTorres DiegoAlbertoTorres force-pushed the host-placement branch 16 times, most recently from 1b06a13 to c1d49bf Compare April 5, 2017 23:33
@DiegoAlbertoTorres DiegoAlbertoTorres changed the title WIP: Implemented host-placement group constraints Implemented host-placement group constraints Apr 6, 2017
Copy link
Contributor

@dposada dposada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't made it all the way through this yet, but here are some initial comments

Example:
```json
{"groups": [{"uuid": "410198dd-c171-470e-aeb8-6ffa4e6a2ada",
"host-placement": {"type": "unique"}],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing one more } before the ]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

"host-placement": {"type": "unique"}],
"jobs": [{"uuid": "831836ee-d5c3-47f7-a052-f4c3ce502495",
"group": "410198dd-c171-470e-aeb8-6ffa4e6a2ada". ...},
...]}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the 2 lines above, why not have only one set of ellipses? Also, there's an extra trailing } it seems

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the first one indicates there are more fields inside each job, and the second shows there can be more jobs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

[clojure.walk :as walk]
[datomic.api :as d :refer (q)]
[plumbing.core :refer (map-vals map-keys)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we combine this with the existing plumbing.core require below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍


(defn init-offer-cache
[& init]
(-> (if (nil? init) {} init)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe (or init {})?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

@@ -67,7 +67,7 @@
;;Networking
[clj-http "2.0.0"]
[io.netty/netty "3.10.1.Final"]
[cc.qbits/jet "0.5.7" :exclusions [org.eclipse.jetty/jetty-io
[cc.qbits/jet "0.6.4" :exclusions [org.eclipse.jetty/jetty-io
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the jet upgrade needed for the host placement feature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was needed for testing

```json
{"groups": [{"uuid": "410198dd-c171-470e-aeb8-6ffa4e6a2ada",
"host-placement": {"type": "attribute-equals",
"parameters": {"attribute": "HOSTNAME"}}],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use availability zone since that's the example in the paragraph above? Also missing one more } before the ]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

"parameters": {"attribute": "HOSTNAME"}}],
"jobs": [{"uuid": "831836ee-d5c3-47f7-a052-f4c3ce502495",
"group": "410198dd-c171-470e-aeb8-6ffa4e6a2ada". ...},
...]}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same ellipses comment, and extra trailing }

@@ -19,6 +19,7 @@
[clj-time.core :as t]
[clojure.java.io :as io]
[clojure.pprint :refer (pprint)]
[clojure.core.cache :as cache]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep these sorted?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

(cache/fifo-cache-factory :threshold max-size)
(cache/ttl-cache-factory :ttl ttl-ms)
atom)]
cache))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: the let is not really needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember this giving me trouble without the let... Let me try again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got rid of let 👍

@@ -346,6 +354,11 @@
task-constraints))
:offer-incubate-time-ms (fnk [[:config [:scheduler {offer-incubate-ms 15000}]]]
offer-incubate-ms)
:offer-cache (fnk [[:config [:scheduler {offer-cache nil}]]]
(merge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be indented too far in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

@@ -250,12 +249,20 @@
"A schema for the parameters of a host placement with type attribute-equals"
{:attribute s/Str})

(def Balanced-Parameters
"A schema for the parameters of a host placement with type balanced"
{:attribute s/Str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the attribute really be any string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much: any string that is a valid mesos attribute name

(ns cook.mesos.constraints
(:require [clojure.tools.logging :as log]
[cook.mesos.util :as util]
[cook.mesos.group :as group]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we sort these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

[cook.mesos.util :as util]
[cook.mesos.group :as group]
[swiss.arrows :refer :all])
(import com.netflix.fenzo.VirtualMachineLease))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:import instead of import

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

(defn get-class-name
"Returns the name of a class without package information."
[object]
(last (clojure.string/split (str (type object)) #"\.")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do (.getSimpleName (type object))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, knew there must have been a better way for this. Now using .getSimpleName 👍

@@ -539,5 +568,5 @@
@(d/transact conn [{:db/id :rebalancer/config
:rebalancer.config/min-utilization-threshold 0.0
:rebalancer.config/safe-dru-threshold 0.0
:rebalancer.config/min-dru-diff 0.0000000001
:rebalancer.config/min-dru-diff 0.000000001
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change to min-dru-diff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inside a comment block, so it does not matter. But will change it back to avoid confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

(:value %))))
(into {}))
cook-attributes {"HOSTNAME" (:hostname offer)
"COOK_GPU?" (> (or (offer-resource-scalar offer "gpus") 0.0) 0)}]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

(-> offer
  (offer-resource-scalar "gpus")
  (or 0.0)
  (> 0))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is way better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

Copy link
Contributor

@dposada dposada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with my pass


(ns cook.test.mesos.constraints
(:use [clojure.test])
(:require [mesomatic.types :as mtypes]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we sort these?

[cook.mesos.constraints :as constraints]
[cook.test.testutil :refer (restore-fresh-database! create-dummy-group create-dummy-job create-dummy-instance)]
[datomic.api :as d :refer (db)])
(import org.mockito.Mockito
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:import instead of import

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

scored->expected (zipmap scored expected)]
(doall (for [[scored expected] scored->expected]
(is (= scored expected)))))
)))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: close parens can go on previous line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

(:require [clojure.core.async :as async]
[clojure.core.cache :as cache]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indenting off by a little

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

@dposada dposada requested a review from pschorf April 6, 2017 22:22
@DiegoAlbertoTorres DiegoAlbertoTorres force-pushed the host-placement branch 5 times, most recently from 079ce5d to 4b47f93 Compare April 10, 2017 20:40
Multiple code review changes, including better docstrings, many variable name changes, and filtering host->scored-tasks.

Created a new namespace for constraints

Generalized the balanced constraint to any attribute, exposed expected count parameter

Multiple code review changes: made unconstrained filtering in rebalancer simpler and more efficient, improved naming of cotask functions, added host placement explanation to groups.md and other small changes

Made offer cache maximum size and ttl configurable.

Added rebalancer tests for balanced constraint

CR changes: clarifications in group.md, extracted common code from fenzo constraints into shared functions, cotasks are now both unknown and running instances, host attributes now extracted from offers, renamed cotask-getter functions, added test for make-guuid->juuids

Added many docstrings. Made sure that order of considerable jobs is preserved before handing to scheduleOnce. Wrote a test to check that considerable jobs are passed in right order.

compute-preemption-decision now considers preemptions done in the same cycle to validate constraints.

Refactored constraints.clj. Now all logic for constraints is in the evaluation function, which is actually a multi-method. This multi-method can then be wrapped to make it fenzo-compatible (in make-fenzo-hp-constraint) or rebalancer-compatible (in make-rebalacer-hp-constaint).

Added unique-host constraint to the rebalancer. Refactored constraints.clj to define JobConstraint and GroupConstraint protocols, which make the code more modular, easier to modify and harder to unintentionally break.

Now synthetic instances in state have slave-id, which is used to find attributes of constraints. Fixed lack of nil check when placement constraint is 'all'.

Multiple minor improvements to address CR by shams

Small improvements as suggested in CR by Wil

Removed gpu-host-constraint-helper

Multiple small CR improvements as suggested by Wil.

More CR fixes as suggested by Wil, mostly on constraints.clj, msotly small refactoring and renaming

Signed-off-by: Diego Torres Quintanilla <diego.torresquintanilla@twosigma.com>
@dposada
Copy link
Contributor

dposada commented Apr 11, 2017

@pschorf Are you planning on taking a look at this?

@@ -19,6 +19,7 @@
[clj-time.core :as t]
[clojure.java.io :as io]
[clojure.pprint :refer (pprint)]
[clojure.core.cache :as cache]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

[this target-vm-resources target-vm-attrs target-vm-tasks-assigned]
"Evaluates whether a vm with resources 'target-vm-resources', attributes 'target-vm-attrs'
and TaskRequests (optional) 'target-vm-tasks-assigned' passes the job's constraint. Must return a
2-element vector, where the first element is a boolean (whether the constraint was passsed) and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about just returning the message if it failed, an nil otherwise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think returning passes? explicitly is convenient, as it encourages constraint code to name a variable called passes?. Also, this is the format that Fenzo expects, so eventually both reason string and boolean will be in memory.

(concat running-cotasks assigned-cotasks)))

(defn get-fenzo-cotasks
"Given a group, a map of guuids to task-ids, a task-id and a Fenzo task-tracker-state, returns a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guuids->guids

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guuid = Group UUID (I use it in many other places)

(group-constraint-name [self] (get-class-name self))
(group-constraint-type [self] (get-group-constraint-type (:group self)))
(group-constraint-evaluate [this target-attr-map cohost-attr-maps]
(let [group (:group this)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this line necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well group is reused in the next 2 lines and the last let definition.

@wyegelwel wyegelwel merged commit 083b255 into master Apr 12, 2017
@scrosby scrosby deleted the host-placement branch November 20, 2018 18:16
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants