Skip to content
Unified Resource Scheduler to co-schedule mixed types of workloads such as batch, stateless and stateful jobs in a single cluster for better resource utilization.
Go Python Thrift Shell Makefile Ruby
Branch: master
Clone or download
zhixinwen Add MesosManager for plugin and separate HostSummary for different cl…
…usters

Summary:
k8s and mesos hosts behave in different ways on host/pod event. For example,
mesos is based on offers which contains available resources, while k8s calculate
available resources based on capacity and pods running on the node.

The common logic shared by k8s and mesos hosts are abstracted in `baseHostSummary`,
`kubeletHostSummary` and `mesosHostSummary` would overwrite the methods that act
differently.

`MesosManager` is added to receive mesos events on offers. `hostCache` is edited to handle
the new events.

Test Plan:
unit test

integration tests prove that hostcache get updated on mesos offers.

Reviewers: O521 Project Peloton: Add reviewers, #p2k, adityacb

Reviewed By: O521 Project Peloton: Add reviewers, #p2k, adityacb

Subscribers: jenkins

Maniphest Tasks: T3567131

Differential Revision: https://code.uberinternal.com/D3161281
Latest commit d2b1cdf Aug 15, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.jenkins Integration Tests: First K8S integration test Aug 8, 2019
cmd Add api proxy to minicluster Aug 15, 2019
config Add api proxy to minicluster Aug 15, 2019
docker Remove unused 'default-config' directory Aug 13, 2019
docs Add a page on Peloton UI Errrors in Peloton documentation Jun 10, 2019
example Fix the way we convert v1alpha entrypoint to k8s podspec Aug 1, 2019
pkg Add MesosManager for plugin and separate HostSummary for different cl… Aug 16, 2019
protobuf Add host labels into host infos table Aug 13, 2019
scripts Added cQoS API protobuf definitions Jul 18, 2019
tests Fix broken performance test client Aug 14, 2019
tools Add 60 secs wait for mesos agent to register with mesos master Aug 15, 2019
.dockerignore Initial commit to sync Peloton repo to github. This commit squashes Feb 28, 2019
.gitignore Removed .arclint from repo, and added it to .gitignore Apr 30, 2019
.gitmodules Initial commit to sync Peloton repo to github. This commit squashes Feb 28, 2019
.pep8 Added python lint checking May 1, 2019
.travis.yml Removed .arclint from repo, and added it to .gitignore Apr 30, 2019
CHANGELOG.md Update changelog for `0.8.9.6` Aug 5, 2019
Dockerfile Add api proxy to minicluster Aug 15, 2019
Dockerfile.deb.jessie Fix debian package make Jun 19, 2019
Dockerfile.deb.trusty Fix debian package make Jun 19, 2019
LICENSE Initial commit to sync Peloton repo to github. This commit squashes Feb 28, 2019
Makefile Add api proxy to minicluster Aug 15, 2019
README.md Clean up markdown files in /docs for readthedocs.io Mar 8, 2019
glide.lock Upgrade grpc to version v1.22.0 Jul 8, 2019
glide.yaml Upgrade grpc to version v1.22.0 Jul 8, 2019
mkdocs.yml Add a page on Peloton UI Errrors in Peloton documentation Jun 10, 2019
requirements.txt Add hostmgr_api_version flag and configure it through fixture params Jul 29, 2019

README.md

Peloton

As compute clusters scale, making efficient use of cluster resources becomes very important. Peloton is a Unified Resource Scheduler to co-schedule mixed types of workloads such as batch, stateless and stateful jobs in a single cluster for better resource utilization. Peloton is designed for web-scale companies like Uber with millions of containers and tens of thousands of nodes. Peloton features advanced resource management capabilities such as elastic resource sharing, hierarchical max-min fairness, resource overcommit, workload preemption, etc. Peloton is also Cloud agnostic and can be run in on-premise datacenters or in the Cloud.

For more details, please see the Peloton Blog Post and Documentation.

Features

  • Elastic Resource Sharing: Support hierachical resource pools to elastically share resources among different teams.

  • Resource Overcommit and Task Preemption: Improve cluster utilization by scheduling workloads using slack resources and preempting best effort workloads.

  • Optimized for Big Data and Machine Learning: Support GPU and Gang scheduling for Tensorflow. Also support advanced Spark features such as dynamic resource allocation.

  • High Scalability: Scale to millions of containers and tens of thousands of nodes.

  • Protobuf/gRPC based API: Support most of the language bindings such as golang, java, python, node.js etc.

  • Co-scheduling Mixed Workloads: Support mixed workloads such as batch, stateless and stateful jobs in a single cluster.

Getting Started

See the Tutorial for step-by-step instructions to start a local minicluster and submit a HelloWorld job to Peloton.

Architecture

To achieve high-availability and scalability, Peloton uses an active-active architecture with four separate daemon types: job manager, resource manager, placement engine, and host manager. The interactions among those daemons are designed so that the dependencies are minimized and only occur in one direction. All four daemons depend on Zookeeper for service discovery and leader election.

Figure , below, shows the high-level architecture of Peloton built on top of Mesos, Zookeeper, and Cassandra: image

Components:

Peloton consists of the following components:

  • Peloton UI: is web-based UI for managing jobs, tasks, volumes, and resource pools in Peloton.
  • Peloton CLI: is command-line interface for Peloton with similar functionality to the web-based interface.
  • Peloton API: uses Protocol Buffers as the interface definition language and YARPC as its RPC runtime. Peloton UI, Peloton CLI, and other Peloton extensions are all built on top of the same Peloton API.
  • Host Manager: abstracts away Mesos details from other Peloton components. It registers with Mesos via Mesos HTTP API.
  • Resource Manager: maintains the resource pool hierarchy and periodically calculates the resource entitlement of each resource pool, which is then used to schedule or preempt tasks correspondingly.
  • Placement Engine: finds the placement (i.e., task to host mapping) by taking into consideration the job and task constraints as well as host attributes. Placement engines could be pluggable for different job types such as stateful services and batch jobs.
  • Job Manager: handles the lifecycle of jobs, tasks, and volumes. It also supports rolling upgrades of tasks in a job for long-running services.
  • Storage Gateway: provides an abstraction layer on top of different storage backends so that we can migrate from one storage backend to another without significant change in Peloton itself. We have a default backend for Cassandra built-in, but can extend it to other backends.
  • Group Membership: manages the set of Peloton master instances and elects a leader to both register to Mesos as a framework and instantiate a resource manager.

References

User Guide

See the User Guide for more detailed information on how to use Peloton.

Peloton CLI

Peloton CLI is a command line interface for interacting with Peloton clusters, such as creating jobs, check job status etc. For detailed Peloton CLI commands and arguments, see CLI Reference.

Peloton API

Peloton defines the APIs using Protobuf as the IDL and the clients can access Peloton API via gRPC. Peloton supports three client bindings by default including Python, Golang and Java. Any other language bindings supported by gRPC should work as well.

See the API Guide for examples of how to use Peloton clients to access the APIs. For detailed Peloton API definition, see the API Reference.

Contributing

See the Developer Guide on how to build Peloton from source code.

Resources

Documentation

Blogs

Tech Talks

Contact

To contact us, please join our Slack channel.

License

Peloton is under the Apache 2.0 license. See the LICENSE file for details.

You can’t perform that action at this time.