-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Step using Kaniko build does not wait for PVC to be bound #403
Comments
Service account:
|
Add Pipeline and PipelineRun definition. Since TaskRuns do not work because of tektoncd/pipeline#403, I added a PVC yaml to preprovision the PVC. As long as it has the right name, the TaskRun picks it up, which is a decent workaround. When the TaskRun is deleted, the PVC stays there too. NOTE: This does not work as it is! Main open issues: - kaniko is missing the credentials to push images - the pipeline run does nothing
Pre-provisioning the PVC with the same name knative would setup is a workaround to this issue. |
I could reproduce the same issue using a PipelineRun instead of a TaskRun. |
Thanks for all the detail @afrittoli !! I wonder if we are starting to see differences in how the different cloud providers are treating PVCs 🤔 afaik we don't see this behavior with GKE
I wonder if we could change the controller logic to wait for PVCs to be up and available before attempting to schedule the TaskRun pod (In the long run, I'm thinking in one of our next sprints we need to get serious about testing on other cloud providers...) |
Even though this doesn't seem to be the kubernetes way 🤔 @afrittoli a couple of follow up questions:
|
The PVC usually takes a little longer than 60s to be ready, I can get some better numbers if you need them. I wonder if this might be an issue on k8s side, I took the latest version available in IKS, I could try with a cluster an older version. |
I tried with a cluster running k8s v1.10.11_1536 - and I didn't hit this issue there, so it might be an issue on k8s side, or some behaviour that changed on k8s side. |
Looking at the change logs, there definitely have been changes in the PVC attach/detach logic in 1.11 and 1.12: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.12.md https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.11.md For instance: kubernetes/kubernetes#66863 might have an impact on knative. |
Whoa, interesting! Thanks for tracking this down @afrittoli - I'm a bit confused by kubernetes/kubernetes#66863 , it's not clear to me why we'd want to start running pods that need resources before those resources are actually able to be used 🤔 but that's neither here nor there, looks like we're going to need to update our logic like you described regardless!
It looks like it's 1.11.6 right now, from what I can tell in the logs:
|
@afrittoli @bobcatfish is this still an issue ? |
@vdemeester I'm not sure, I stopped using the PVC in favour of the bucket a long time ago :P |
I'm really surprised this isn't happening more often :O |
I haven't heard of this bothering anyone since it was originally opened, so I'm going to close it for now. |
Expected Behavior
The container associated to the step should be retried until the PVC is bound (with a limit) and succeed once the PVC is ready.
Actual Behavior
Kubernetes attempted to schedule the kaniko container several times, but it always failed, even if the PVC became available.
Running describe on the pod:
Describe on the PVC
Steps to Reproduce the Problem
Define a Task and TaskRun following the tutorial in https://github.com/knative/build-pipeline/blob/master/docs/tutorial.md
kubectl apply -f taskrun.yaml
Task:
TaskRun:
Additional Info
Kubernetes 1.12.3_1531
Knative build pipeline installed from master using ko: HEAD 41d513b
Using IKS and IBM Cloud Registry (the service account kaniko-build-controller is extended with the imagePullSecret to use the registry)
The text was updated successfully, but these errors were encountered: