Skip to content
This repository has been archived by the owner on Apr 24, 2023. It is now read-only.

Latest commit

 

History

History
62 lines (38 loc) · 5.09 KB

rebalancer-config.adoc

File metadata and controls

62 lines (38 loc) · 5.09 KB

Rebalancer Configuration

Cook’s preemption is performed by the Rebalancer. The Rebalancer is distinct from the Scheduler, which of course is responsible for actually scheduling jobs.

The Rebalancer periodically examines all running Cook tasks on the cluster, together with all of the pending jobs. Combining these two sets of jobs, Cook imagines alternate permutations of tasks that could be running on the cluster. If it finds a permutation that represents a significant improvement (in terms of resource usage balance between users) over the set of tasks that are currently running, the Rebalancer will kill currently running tasks that are not a part of the desirable set of running tasks.

Importantly, the Rebalancer does not actually schedule the pending jobs that are a part of the desirable permuation; the idea is that the Scheduler can be trusted to do a better job to improve balance, given a second chance with the new room that is Rebalancer is creating by pre-empting jobs.

For more detailed information about the design of the Rebalancer, see the source code here:

How the Configuration is stored

While you can optionally set the rebalancer configuration parameters via the scheduler’s main configuration file, the Rebalancer configuration is also stored in Datomic. Among other advantages, this allows interested parties to use Datomic’s history API to discover what configuration was in effect at any point in the past, in order to facilitate analysis and troubleshooting.

In Cook’s database, Rebalancer configuration options are facts about the entity with id :rebalancer/config. Thus, to configure the systemwide behavior of the Rebalancer, you will need to run code like this:

(let [conn (d/connect "(your-cook-database-url)")
        db (d/db conn)]
    @(d/transact conn [{:db/id :rebalancer/config
                        :rebalancer.config/safe-dru-threshold 0.0
                        :rebalancer.config/min-dru-diff 0.0000000001
                        :rebalancer.config/max-preemption 64.0}]))

Conversely, you can read the current configuration like this:

(d/pull db ["*"] :rebalancer/config)

Significance of the parameters

  • safe-dru-threshold: Task with a DRU lower than safe-dru-threshold will not be preempted. If safe-dru-threshold is set to 1.0, then tasks that consume resources in aggregate less than the user resource share will not be preempted.

  • min-dru-diff: The minimal DRU difference required to make a preemption action. This is also the maximal "unfairness" Rebalancer is willing to tolerate.

  • max-preemption: The maximum number of preemptions Rebalancer can make in one cycle.

Configuring user shares

In addition to the system-wide rebalancer parameters described above, you may want to set specific resources shares for individual users, or set the default shares that are in effect when a user hasn’t been assigned an individual share.

User shares are per-resource, per-user. For example, a user "researcher1" could be allocated 150 CPU’s and 1 Terabyte of memory. The resource usage of a set of jobs is divided by a user’s share values in order to arrive at the DRU for that set of jobs.

User shares are also stored only in Datomic. For examples of how to read and set shares, see share.clj

If you don’t set any shares in Datomic, the default share value in effect for any resource and user will be Double/MAX_VALUE. This will mean that the Rebalancer will be working with extremely low DRU values.

Practical advice

Note that there is no default configuration for the Rebalancer. Thus, no preemption will ever occur unless you configure it (as shown above).

If you set your parameters to be encourage a lot of pre-emption, as in the example above, it is possible that you will harm your cluster’s performance, as jobs will rarely be allowed to finish. On the other hand, if you don’t configure pre-emption, or use parameters that don’t allow preemption to happen very often, the Fairness of Cook may not be effectively enforced as a few users consume too many resources via long-running tasks. Unfortunately, it’s impossible to suggest specific parameter values, because the point at which each of these problems will start to occur will vary depending on your cluster configuration, and on what types of jobs are being run on your cluster. Some experimentation will likely be required in order to find the optimal settings for your own installation.

In order to intelligently tweak safe-dru-threshold and min-dur-diff, you will need to know specifics about the DRU values that are in play during the Rebalancer’s calculation on your cluster. Fortunately, Cook records these values in the forms of histogram metrics, which you can access in a variety of ways (see main configuration documentation. Look for these histograms:

  • pending-job-drus

  • nearest-task-drus

  • positive-dru-diffs (this one in particular will help you to speculate about the ramifications of various values of min-dru-diff).