Improve performance of multi-node scenarios #26

dhiltgen · 2020-11-07T17:58:17Z

Describe the problem/challenge you have
When the client detects multiple nodes in a cluster (via multiple builder pods) it switches modes to download the image from the builder that built the image at the end of the build, then uploads the image back to all the other builder pods and injects the image into the runtimes of each node. If you have a slow link between the client and cluster, or have lots of nodes, this can easily saturate the network link.

Description of the solution you'd like
We should find a way to get the image transferred directly between the builders so it does not have to be ping-ponged through the client, over a potentially slow link.

Design/Architecture Details

Straw-man proposal (prototyping may yield better/refined options...)

While https://github.com/moby/buildkit/blob/master/cmd/buildctl/dialstdio.go is a nice building block, it isn't quite sufficient to originate a transfer from the primary builder to the secondary builder instances.
Consider a "thin" Dockerfile wrapper in this project that takes the upstream builder (docker.io/moby/buildkit) and injects an additional binary that facilitates image transfer
This transfer assistant would implement a gRPC API on stdin/stdout, and would have 2 modes of operating
- Sender
- Receiver
At startup, both modes require the local runtime path and runtime type (containerd or dockerd)
The Receiver would generate a random key and bind to a port. A gRPC API would be used by the CLI to gather these details from each receiver
Progress reporting from the receivers would be a nice added touch over the gRPC API
The sender would implement a gRPC API to transfer an image. It would take as input:
- The local image to transfer
- The list of remote IPs, port numbers, and secrets to transfer to
The transfer should be "smart" and skip layers that are already present on the receivers
Upon completion of the transfer, the CLI would terminate all of the transfer assistants

If this works well, we can explore upstreaming this to the BuildKit project so it could be included in the base image so we no longer have to maintain our own specialized image

Environment Details:

kubectl buildkit version (use kubectl buildkit version)

v0.1.0

Kubernetes version (use kubectl version)

NA

Where are you running kubernetes (e.g., bare metal, vSphere Tanzu, Cloud Provider xKS, etc.)

Largely applicable in remote clusters (e.g., *KS, Tanzu, etc.) - Local performance is more than adequate with the current implementation.

Container Runtime and version (e.g. containerd sudo ctr version or dockerd docker version on one of your kubernetes worker nodes)

Both

Vote on this request

This is an invitation to the community to vote on issues. Use the "smiley face" up to the right of this comment to vote.

👍 "This project will be more useful if this feature were added"
👎 "This feature will not enhance the project in a meaningful way"

The text was updated successfully, but these errors were encountered:

dhiltgen added the help wanted Extra attention is needed label Nov 10, 2020

dhiltgen mentioned this issue Nov 18, 2020

k3d: unable to start built container #46

Open

dhiltgen self-assigned this Sep 29, 2021

dhiltgen linked a pull request Sep 29, 2021 that will close this issue

Implement fast multi-node via an in-cluster proxy #106

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of multi-node scenarios #26

Improve performance of multi-node scenarios #26

dhiltgen commented Nov 7, 2020

Improve performance of multi-node scenarios #26

Improve performance of multi-node scenarios #26

Comments

dhiltgen commented Nov 7, 2020