Skip to content
P2P Docker registry capable of distributing TBs of data in seconds
Branch: master
Clone or download
codygibb Remove over verbose log statement (#131)
Thrashes production logs unless everything is on https.
Latest commit 716edd0 Mar 19, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
agent Decouple component CLI flag parsing from run sequence (#123) Mar 14, 2019
assets Update architecture again (#81) Feb 27, 2019
build-index Decouple component CLI flag parsing from run sequence (#123) Mar 14, 2019
config Change agent registry addr to unix sock (#86) Mar 6, 2019
core Update license (#66) Feb 23, 2019
docker Update devcluster setup to have one herd container and multiple agents ( Dec 28, 2018
docs Fix grammar / whitespace (#104) Mar 8, 2019
examples Update docs about k8s setup (#119) Mar 13, 2019
gen/go/proto/p2p Update license (#66) Feb 23, 2019
helm Simple helm chart added (#117) Mar 13, 2019
lib Only start conn once in active state (#128) Mar 16, 2019
localdb Update license (#66) Feb 23, 2019
metrics Update license (#66) Feb 23, 2019
mocks Scope tag replication failure metrics by destination (#115) Mar 13, 2019
nginx Enable gzip only for origin, proxy and agent (#127) Mar 18, 2019
origin Decouple component CLI flag parsing from run sequence (#123) Mar 14, 2019
proto/p2p Add mutual connection threshold to handshake Jul 26, 2018
proxy Decouple component CLI flag parsing from run sequence (#123) Mar 14, 2019
test Introduce PortReservation utility to integration tests (#124) Mar 14, 2019
tools Update License (#67) Feb 23, 2019
tracker Decouple component CLI flag parsing from run sequence (#123) Mar 14, 2019
utils Remove over verbose log statement (#131) Mar 19, 2019
.dockerignore Remove go-build from repo (#2) Dec 23, 2018
.gitignore Introduce PortReservation utility to integration tests (#124) Mar 14, 2019
.travis.yml WIP Fixing flaky dependency downloading (#110) Mar 11, 2019
LICENSE Add LICENSE (#65) Feb 22, 2019
Makefile Introduce PortReservation utility to integration tests (#124) Mar 14, 2019
README.md Update README.md Mar 18, 2019
codecov.yml Add godoc, goreport, and gocov (#108) Mar 10, 2019
glide.lock Added back glide files (#129) Mar 19, 2019
glide.yaml Added back glide files (#129) Mar 19, 2019
go.mod Using miniredis instead of real redis container for unit tests (#100) Mar 8, 2019
go.sum WIP Fixing flaky dependency downloading (#110) Mar 11, 2019
requirements-tests.txt Remove docker-py for compliance issue (#79) Feb 26, 2019

README.md

Kraken 🐙

Build Status Github Release GoDoc GoReportCard Codecov

Kraken is a P2P-powered Docker registry that focuses on scalability and availability. It is designed for Docker image management, replication and distribution in a hybrid cloud environment. With pluggable backend support, Kraken can easily integrate into existing Docker registry setups as the distribution layer.

Kraken has been in production at Uber since early 2018. In our busiest cluster, Kraken distributes more than 1 million blobs per day, including 100k 1G+ blobs. At its peak production load, Kraken distributes 20K 100MB-1G blobs in under 30 sec.

Below is the visualization of a small Kraken cluster at work:

Table of Contents

Features

Following are some highlights of Kraken:

  • Highly scalable. Kraken is capable of distributing Docker images at > 50% of max download speed limit on every host. Cluster size and image size do not have significant impact on download speed.
    • Supports at least 15k hosts per cluster.
    • Supports arbitrarily large blobs/layers. We normally limit max size to 20G for best performance.
  • Highly available. No component is a single point of failure.
  • Secure. Support uploader authentication and data integrity protection through TLS.
  • Pluggable storage options. Instead of managing data, Kraken plugs into reliable blob storage options, like S3, HDFS or another registry. The storage interface is simple and new options are easy to add.
  • Lossless cross cluster replication. Kraken supports rule-based async replication between clusters.
  • Minimal dependencies. Other than pluggable storage, Kraken only has an optional dependency on DNS.

Design

The high level idea of Kraken is to have a small number of dedicated hosts seed content to a network of agents running on each host in the cluster. A central component, the tracker, will orchestrate all participants in the network to form a pseudo-random regular graph. Such a graph has high connectivity and small diameter, so all participants in a reasonably sized cluster can reach > 80% of max upload/download speed in theory, and performance doesn't degrade much as the blob size and cluster size increase.

Architecture

  • Agent
    • Deployed on every host
    • Implements Docker registry interface
    • Announces available content to tracker
    • Connects to peers returned by tracker to download content
  • Origin
    • Dedicated seeders
    • Stores blobs as files on disk backed by pluggable storage (e.g. S3)
    • Forms a self-healing hash ring to distribute load
  • Tracker
    • Tracks which peers have what content (both in-progress and completed)
    • Provides ordered lists of peers to connect to for any given blob
  • Proxy
    • Implements Docker registry interface
    • Uploads each image layer to the responsible origin (remember, origins form a hash ring)
    • Uploads tags to build-index
  • Build-Index
    • Mapping of human readable tag to blob digest
    • No consistency guarantees: client should use unique tags
    • Powers image replication between clusters (simple duplicated queues with retry)
    • Stores tags as files on disk backed by pluggable storage (e.g. S3)

Benchmark

The following data is from a test where a 3G Docker image with 2 layers is downloaded by 2600 hosts concurrently (5200 blob downloads), with 300MB/s speed limit on all agents (using 5 trackers and 5 origins):

  • p50 = 10s (at speed limit)
  • p99 = 18s
  • p99.9 = 22s

Usage

All Kraken components can be deployed as Docker containers. To build the Docker images:

$ make images

For information about how to configure and use Kraken, please refer to the documentation.

Kraken on Kubernetes

You can use our example Helm chart to deploy Kraken (with an example http fileserver backend) on your k8s cluster:

$ helm install --name=kraken-demo ./helm

Once deployed, each and every node will have a docker registry API exposed on localhost:30081. For an example pod spec that pulls images from Kraken agent, see example.

For more information on k8s setup, see README.

Devcluster

To start a herd container (which contains origin, tracker, build-index and proxy) and two agent containers with development configuration:

$ make devcluster

Docker-for-Mac is required for making dev-cluster work on your laptop. For more information on devcluster, please check out devcluster README.

Comparison With Other Projects

Dragonfly from Alibaba

Dragonfly cluster has one or a few "supernodes" that coordinates transfer of every 4MB chunk of data in the cluster. While the supernode would be able to make optimal decisions, the throughput of the whole cluster is limited by the processing power of one or a few hosts, and the performance would degrade linearly as either blob size or cluster size increases.

Kraken's tracker only helps orchestrate the connection graph, and leaves negotiation of actual data transfer to individual peers, so Kraken scales better with large blobs. On top of that, Kraken is HA and supports cross cluster replication, both are required for a reliable hybrid cloud setup.

BitTorrent

Kraken was initially built with a BitTorrent driver, however we ended up implementing our own P2P driver based on BitTorrent protocol to allow for tighter integration with storage solutions and more control over performance optimizations.

Kraken's problem space is slightly different than what BitTorrent was designed for. Kraken's goal is to reduce global max download time and communication overhead in a stable environment, while BitTorrent was designed for an unpredictable and adversarial environment, so it needs to preserve more copies of scarce data and defend against malicious or bad behaving peers.

Despite the differences, we re-examine Kraken's protocol from time to time, and if it's feasible, we hope to make it compatible with BitTorrent again.

Limitations

  • If Docker registry throughput is not the bottleneck in your deployment workflow, switching to Kraken will not magically speed up your docker pull. To actually speed up docker pull, consider switching to Makisu to improve layer reusability at build time, or tweak compression ratios, as docker pull spends most of the time on data decompression.
  • Mutating tags (e.g. updating a latest tag) is allowed, however a few things will not work: tag lookups immediately afterwards will still return the old value due to Nginx caching, and replication probably won't trigger. We are working on supporting this functionality better. If you need tag mutation support right now, please reduce cache interval of build-index component. If you also need replication in a multi-cluster setup, please consider setting up another Docker registry as Kraken's backend.
  • Theoretically, Kraken should distribute blobs of any size without significant performance degradation, but at Uber we enforce a 20G limit and cannot endorse of the production use of ultra-large blobs (i.e. 100G+). Peers enforce connection limits on a per blob basis, and new peers might be starved for connections if no peers become seeders relatively soon. If you have ultra-large blobs you'd like to distribute, we recommend breaking them into <10G chunks first.

Contributing

Please check out our guide.

Contact

To contact us, please join our Slack channel.

You can’t perform that action at this time.