New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds per-pool job scheduling #870

Merged
merged 80 commits into from Aug 28, 2018

Conversation

Projects
3 participants
@dposada
Copy link
Member

dposada commented May 30, 2018

Changes proposed in this PR

The Main Point

This PR turns on per-pool task-scheduling. Offers are split by pool and jobs are scheduled only on the pool they belong to. Rebalancing also happens per-pool.

Details

Code

  • changing the scheduler to schedule within each pool
  • changing the rebalancer to preempt within each pool
  • introducing pool "DRU mode", which can be either "default" or "gpu" and determines which DRU-ranking function is used
  • changing docstrings and function names from category to pool

Testing

  • setting --maxfail=5 for our pytest runs
  • fixing test_queue_endpoint and test_get_queue to account for pools instead of normal/gpu
  • making test_attribute_equals_hostname_constraint more reliable by taking into account the possibility of not enough cpus being available
  • making test_pool_specific_quota_check_on_submit more reliable by bailing out in pools where quota exceeds the configured task constraint for a resource
  • turning on some more namespaces for debug-level logging
  • upgrading minimesos to the version that supports agent attributes
  • tagging the minimesos agents with pools
  • adding an integration test, test_pool_scheduling, specifically for per-pool scheduling

Why are we making these changes?

We want to support scheduling jobs on heterogeneous clusters (e.g. some preemptible machines, some non-preemptible machines) with a single Cook instance while preserving the independence of scheduling provided by separate ranking and matching cycles for each of these types of machines.

For more context, see issue #755.

Things left to do in subsequent PRs

  • make configurable the name of the attribute that marks the pool on each offer
  • make all relevant metrics pool-specific

@dposada dposada self-assigned this May 30, 2018

@dposada dposada added this to To do in Pools via automation May 30, 2018

dposada added some commits Jun 1, 2018

@dposada dposada moved this from To do to In progress in Pools Jun 14, 2018

dposada added some commits Jun 28, 2018

@shamsimam

This comment has been minimized.

Copy link
Contributor

shamsimam commented Aug 24, 2018

@dposada Can you please rebase this PR?

@shamsimam
Copy link
Contributor

shamsimam left a comment

This PR would have been simpler to review as two separate PRs, one that modified the scheduler and a subsequent one that modified the rebalancer.

"Host had a different attribute than other jobs in the group.",
# Or, if there are no other offers, we simply don't have enough cpus:
"Not enough cpus available."
]

This comment has been minimized.

@shamsimam

shamsimam Aug 24, 2018

Contributor

Nitpick: should reasons be a set?

This comment has been minimized.

@dposada
return len(active_pools(cook_url)[0]) > 1


def default_pool_num_hosts(cook_url, mesos_url):

This comment has been minimized.

@shamsimam

shamsimam Aug 24, 2018

Contributor

Nitpick: this name is confusing because it tries to handle the case where no pools are configured, which is different from a default pool.

This comment has been minimized.

@dposada

dposada Aug 24, 2018

Member

I agree, renamed to num_hosts_to_consider

self.assertEqual(422, resp.status_code, msg=resp.content)
cpus_over_quota = quota['cpus'] + 0.1
if cpus_over_quota > task_constraint_cpus:
self.logger.info(f'Unable to check CPU quota on {pool_name} because the quota ({quota["cpus"]}) is '

This comment has been minimized.

@shamsimam

shamsimam Aug 24, 2018

Contributor

Should this be a warn?

This comment has been minimized.

@dposada

dposada Aug 24, 2018

Member

I don't think so. We have environments where this is expected to happen.

This comment has been minimized.

@shamsimam

shamsimam Aug 27, 2018

Contributor

Actually, should it log as an error then if this is unexpected?

This comment has been minimized.

@dposada

dposada Aug 28, 2018

Member

We have environments where it's valid and expected for the quota to be higher than the task constraint

self.assertEqual(422, resp.status_code, msg=resp.content)
mem_over_quota = quota['mem'] + 1
if mem_over_quota > task_constraint_mem:
self.logger.info(f'Unable to check mem quota on {pool_name} because the quota ({quota["mem"]}) is '

This comment has been minimized.

@shamsimam

shamsimam Aug 24, 2018

Contributor

Should this be a warn?

This comment has been minimized.

@dposada

dposada Aug 24, 2018

Member

I don't think so. We have environments where this is expected to happen.

This comment has been minimized.

@shamsimam

shamsimam Aug 27, 2018

Contributor

Actually, should it log as an error then if this is unexpected?

This comment has been minimized.

@dposada

dposada Aug 28, 2018

Member

We have environments where it's valid and expected for the quota to be higher than the task constraint

@unittest.skipUnless(util.multi_user_tests_enabled(), 'Requires using multi-user coniguration '
'(e.g., BasicAuth) for Cook Scheduler')
@pytest.mark.timeout(util.DEFAULT_TEST_TIMEOUT_SECS) # individual test timeout
class PoolsCookTest(util.CookTest):

This comment has been minimized.

@shamsimam

shamsimam Aug 24, 2018

Contributor

Should we have an integration test that verifies jobs get scheduled on hosts which have the appropriate pool attributes?

This comment has been minimized.

@dposada

dposada Aug 24, 2018

Member

Good call, I added an assertion for this to the same test

(timers/time!
generate-user-usage-map-duration
(->> (util/get-running-task-ents unfiltered-db)
(map :job/_instance)
(remove #(not= pool (util/job->pool %)))

This comment has been minimized.

@shamsimam

shamsimam Aug 24, 2018

Contributor

Is it better to include the pool filter into the get-running-task-ents query?

This comment has been minimized.

@dposada

dposada Aug 24, 2018

Member

I don't think so. This is also called by the optimizer, which (as discussed in a previous comment) may want to have the full picture.

running-tasks (pool->running-task-ents pool)
user->dru-divisors (pool->user->dru-divisors pool)]
[pool (sort-jobs-by-dru pending-tasks running-tasks user->dru-divisors)]))]
(into {} (map sort-jobs-by-dru-pool-helper) pool->sort-jobs-by-dru-fn))))

This comment has been minimized.

@shamsimam

shamsimam Aug 24, 2018

Contributor

This is a good fit for pc/map-vals

This comment has been minimized.

@dposada

dposada Aug 24, 2018

Member

I would vote to leave this as is - I'm not against using the pc functions in general, but neither map-vals nor map-from-keys is a home run here in my opinion

"Given a collection of pools, and a function val-fn that takes a pool,
returns a map from keyword-ized pool name to (val-fn pool)"
[pools val-fn]
(into {} (map (fn [p] [(pool->keyword p) (val-fn p)]) pools)))

This comment has been minimized.

@shamsimam

shamsimam Aug 24, 2018

Contributor

This is a good fit for pc/map-from-keys and if we decide to keep pool->keyword, pc/map-keys

This comment has been minimized.

@dposada

dposada Aug 24, 2018

Member

Good call, done

(decline-offers driver (map :id offers))
(catch Exception e
(log/error e "Unable to decline offers!")))))))
(decline-offers-safe driver offers)))))

This comment has been minimized.

@shamsimam

shamsimam Aug 24, 2018

Contributor

Nitpick: Can we move the definition of decline-offers-safe closer to receive-offers

This comment has been minimized.

@dposada
@@ -64,7 +65,7 @@
(lookup-cache! cache :db/id miss-fn entity))

(defonce ^Cache job-ent->resources-cache (new-cache))
(defonce ^Cache categorize-job-cache (new-cache))
(defonce ^Cache job-ent->pool-cache (new-cache))

This comment has been minimized.

@shamsimam

shamsimam Aug 24, 2018

Contributor

Is it better to set up this cache as job-uuid->pool ?

This comment has been minimized.

@dposada

dposada Aug 24, 2018

Member

I'm not sure. My goal here was to keep this as close to the previous job to category cache as possible.

This comment has been minimized.

@shamsimam

shamsimam Aug 24, 2018

Contributor

okay.

dposada added some commits Aug 24, 2018

all_job_uuids.append(filling_job_uuid)
instance = util.wait_for_running_instance(self.cook_url, filling_job_uuid)
slave_pool = util.slave_pool(self.mesos_url, instance['hostname'])
self.assertEqual(pool_name, slave_pool)

This comment has been minimized.

@dposada

dposada Aug 24, 2018

Member

@shamsimam Here's the assertion I added to ensure the host where the instance is running has the matching agent attribute

@dposada

This comment has been minimized.

Copy link
Member

dposada commented Aug 24, 2018

@shamsimam I addressed your feedback

@@ -1473,14 +1473,18 @@ def test_queue_endpoint(self):
uuids, resp = util.submit_jobs(self.cook_url, job_spec, clones=100, groups=[group])
self.assertEqual(201, resp.status_code, resp.content)
try:
default_pool = util.default_pool(self.cook_url)
pool = default_pool if default_pool is not None else 'no-pool'

This comment has been minimized.

@shamsimam

shamsimam Aug 27, 2018

Contributor

Nitpick: default_pool or 'no-pool'

This comment has been minimized.

@dposada
self.assertEqual(422, resp.status_code, msg=resp.content)
cpus_over_quota = quota['cpus'] + 0.1
if cpus_over_quota > task_constraint_cpus:
self.logger.info(f'Unable to check CPU quota on {pool_name} because the quota ({quota["cpus"]}) is '

This comment has been minimized.

@shamsimam

shamsimam Aug 27, 2018

Contributor

Actually, should it log as an error then if this is unexpected?

@@ -31,12 +31,15 @@ def test_get_queue(self):
try:
slave_queue = util.session.get('%s/queue' % self.slave_url, allow_redirects=False)
self.assertEqual(307, slave_queue.status_code)
default_pool = util.default_pool(self.master_url)
pool = default_pool if default_pool is not None else 'no-pool'

This comment has been minimized.

@shamsimam

shamsimam Aug 27, 2018

Contributor

Nitpick: default_pool or 'no-pool'

This comment has been minimized.

@dposada
@@ -1185,3 +1194,26 @@ def is_preemption_enabled():
def current_milli_time():
"""Returns the current epoch time in milliseconds"""
return int(round(time.time() * 1000))


@functools.lru_cache()

This comment has been minimized.

@shamsimam

shamsimam Aug 27, 2018

Contributor

Nitpick: should we configure a maxsize for the cache?

This comment has been minimized.

@dposada

dposada Aug 28, 2018

Member

The function being decorated has no arguments, so we're really just saying "don't call this more than once"

@@ -101,7 +101,7 @@
Parameters:
get-queue -- fn, no args fn that returns ordered list of jobs to run
get-running -- fn, no args fn that returns a set of tasks running
get-offers -- fn, no args fn that returns a set of offers
get-offers -- fn, 1 arg fn that takes a pool keyword and returns a set of offers in that pool

This comment has been minimized.

@shamsimam

shamsimam Aug 27, 2018

Contributor

Should it be pool name instead of pool keyword in the docstring?

This comment has been minimized.

@dposada
[:cpus :mem :gpus])]))
(into {}))
pool-name (name pool)
pool-ent (if (= :no-pool pool)

This comment has been minimized.

@shamsimam

shamsimam Aug 27, 2018

Contributor

Is pool intentionally left as a keyword here?

This comment has been minimized.

@dposada

dposada Aug 28, 2018

Member

Thanks, this was fixed in a subsequent commit

(log/info "Rebalancing for pool" pool-name)
(rebalance! db conn driver agent-attributes-cache rebalancer-reservation-atom
params init-state jobs-to-make-room-for)))
@pending-jobs-atom)

This comment has been minimized.

@shamsimam

shamsimam Aug 27, 2018

Contributor

Nitpick: can we rename pending-jobs-atom to pool->pending-jobs-atom?

This comment has been minimized.

@dposada
@dposada

This comment has been minimized.

Copy link
Member

dposada commented Aug 28, 2018

@shamsimam I addressed your feedback

@dposada

This comment has been minimized.

Copy link
Member

dposada commented Aug 28, 2018

@pschorf Do you have any other feedback?

@pschorf pschorf merged commit d8e4827 into twosigma:master Aug 28, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

Pools automation moved this from In progress to Done Aug 28, 2018

@dposada dposada deleted the dposada:pool-scheduling branch Aug 28, 2018

dposada added a commit to dposada/Cook that referenced this pull request Nov 1, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment