Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor statefulset issue #1962

Closed
chuckha opened this issue Aug 1, 2019 · 4 comments
Closed

Minor statefulset issue #1962

chuckha opened this issue Aug 1, 2019 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@chuckha
Copy link

chuckha commented Aug 1, 2019

One of my workloads is a statefulset. The yaml is generated from kustomize. The image at this point does not exist. So when tilt creates the statefulset, the statefulset creates a pod with a non-existing image.

I expect tilt to update the workload image to something that exists, which it does, but because of one reason or another, probably related to how statefulsets work, tilt gets stuck in Pending//Not running but not error because the statefulset pod is in ImagePullBackOff. If I look at the image for the pod it is the original statefulset image generated by kustomize. If I look at the statefulset pod it is the image I expect, the one tilt has generated for me.

The solution is to delete the ImagePullBackOff pod so that the next statefulset pod will reference the correct image.

@maiamcc
Copy link
Contributor

maiamcc commented Aug 1, 2019

@chuckha let me see if I understand: is the issue here that the statefulset yaml is getting applied before the appropriate image has been docker build'd, and thus errors? (Is the statefulset yaml getting applied as part of uncategorized or is it attached to the same resource as the image you need?)

another question: does this same error behavior happen if you make sure that the statefulset in question doesn't exist, and THEN run tilt up?

If you could post a minimal repro case, that would be amazing!

@maiamcc maiamcc added the bug Something isn't working label Aug 1, 2019
@chuckha
Copy link
Author

chuckha commented Aug 6, 2019

workspace.tar.gz

To reproduce:

  1. fix the docker repository to your user in the tiltfile and the matching image in ss.yaml
  2. Untar
  3. setup a kind cluster (i used v0.4.0, likely v0.3.0 is fine too)
  4. run tilt file in this tarball
  5. Observe an error spinning up the pod
  6. Ah, dockerfile messup. Replace ARG => CMD
  7. Watch tilt get stuck in pending

run

kubectl get po -o yaml | grep image
kubectl get statefulset web -o yaml | grep image

to observe how the statefulset has updated but the pod must be deleted manually in order for kubernetes to create a new pod with the correct image.

@nicks
Copy link
Member

nicks commented Aug 6, 2019

Thanks for the repro steps! They helped me a lot to poke around at this.

For now, you can work-around this manually by changing the podMangementPolicy on the StatefulSet to "Parallel"

Background:

In my testing, sometimes the pod eventually gets replaced. But it's hard to predict when that will happen. My educated hypothesis is that the StatefulSetController won't touch pods in CrashLoopBackOff. Unless the StatefulSetController shows up for work at exactly the right moment, the Pod will cycle indefinitely.

There are some people complaining about this upstream, for example, here:
kubernetes/kubernetes#60164

A good fix for this on the Tilt side would be to automatically rewrite the podManagementPolicy to Parallel. For dev, that's probably what you always want.

@nicks
Copy link
Member

nicks commented Aug 12, 2019

fixed in #1998

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants