Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes Driver: switch to 'StatefulSet' from 'Deployment' #2938

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

ArielLahiany
Copy link

I've switched the Kubernetes driver object to 'StatefulSet' from 'Deployment' In order to support long-living persistent data directories between restart, and to meet the needs of the following issue: #2056
Please let me know if there are any other changes that are required, or if a different PR format is required.
Thank you.

Signed-off-by: ArielLahiany <ariel.lahiany@gmail.com>
Signed-off-by: ArielLahiany <ariel.lahiany@gmail.com>
@AkihiroSuda
Copy link
Collaborator

I avoided using StatefulSet because creating StatefulSet pods was quite slow at that time due to non-parallelizability.
Has the situation changed?

},
ObjectMeta: metav1.ObjectMeta{
Namespace: opt.Namespace,
Name: opt.Name,
Labels: labels,
Annotations: annotations,
},
Spec: appsv1.DeploymentSpec{
Spec: appsv1.StatefulSetSpec{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

podManagementPolicy should be set to Parallel

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please make sure to measure the startup time of the pods with a good number of replicas (around 5-10?), and there is no significant delay compared to Deployment

Signed-off-by: ArielLahiany <ariel.lahiany@gmail.com>
@ArielLahiany
Copy link
Author

@AkihiroSuda
I've tested that logic on a freshly-installed vanilla K3s and everything seems to work as required.
I believe that the PR is failing duo to way that the GitHub Action is deploying the cluster.
How can we processed and move that PR along?

@AkihiroSuda
Copy link
Collaborator

ERROR: expected 1 replicas to be ready, got 0

Needs kubectl get pod -o yaml and kubectl logs for further analysis

@ArielLahiany
Copy link
Author

Good afternoon, @AkihiroSuda.
I've run some tests on a freshly installed K3s cluster:

  1. For deployment with the buildx-v0.20.0.linux-amd64 version:
function buildx_create_deployment() {
    buildx-v0.20.0.linux-amd64 create \
        --bootstrap \
        --buildkitd-config="/home/debian/github/buildx/bin/build/buildkitd.toml" \
        --driver-opt=image="docker.io/moby/buildkit:v0.20.0" \
        --driver-opt=limits.ephemeral-storage="10Gi" \
        --driver-opt=loadbalance="sticky" \
        --driver-opt=namespace="buildkitd" \
        --driver-opt=replicas=10 \
        --driver-opt=requests.ephemeral-storage="2Gi" \
        --driver="kubernetes" \
        --name="buildkitd"
}

This resulted in

#1 [internal] booting buildkit
#1 waiting for 10 pods to be ready, timeout: 2 minutes
#1 waiting for 10 pods to be ready, timeout: 2 minutes 7.9s done
#1 DONE 8.2s
buildkitd

real    0m8.660s
user    0m0.103s
sys     0m0.126s
  1. For StatefulSet with the native local-path provisioner that comes with K3s by default:
function buildx_create_statefulset() {
    $BUILDX_BINARY_PATH create \
        --bootstrap \
        --buildkitd-config="/home/debian/github/buildx/bin/build/buildkitd.toml" \
        --driver-opt=image="docker.io/moby/buildkit:v0.20.0" \
        --driver-opt=limits.persistent-storage="10Gi" \
        --driver-opt=loadbalance="sticky" \
        --driver-opt=namespace="buildkitd" \
        --driver-opt=replicas=10 \
        --driver-opt=requests.persistent-storage="2Gi" \
        --driver="kubernetes" \
        --name="buildkitd"
}

This resulted in

#1 [internal] booting buildkit
#1 waiting for 10 pods to be ready, timeout: 2 minutes
#1 waiting for 10 pods to be ready, timeout: 2 minutes 50.7s done
#1 DONE 51.0s
buildkitd

real    0m51.463s
user    0m0.518s
sys     0m0.286s

As you can see, provisioning of 10 StatefulSet pods took longer time but still in the 2 minutes timeout that was defined by the team.
I do believe that the benefits are greater than the cons of that solution as there are multiple cases in which a user would like to get greater resilience of the BuiltKiD cluster with long-living pods.
Please let me know if there anything else that is required on my end.

@AkihiroSuda
Copy link
Collaborator

8.2s vs 51.0s is an extreme difference, even though it is within the threshold of your team.
The default mode should remain Deployment then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants