-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Job never becomes ready when re-attaching to exiting env when pods for the job are deleted #6272
Labels
bug
Something isn't working
Comments
nicks
added a commit
to nicks/tilt
that referenced
this issue
Dec 5, 2023
fixes tilt-dev#6272 History of this change: - we used to have a very naive definition of "Ready" - callers of the runtime status API had hacks to look at Job pods to determine completeness (tilt-dev#4367) - later, we changed it to have a more sophisticated definition, so that Jobs only count as 'ready' when they're complete. (tilt-dev#5013) - we forgot to remove the old caller hacks :grimace: in any case, i added a bunch of tests, since we need more job integration tests anyway. Signed-off-by: Nick Santos <nick.santos@docker.com>
nicks
added a commit
to nicks/tilt
that referenced
this issue
Dec 5, 2023
fixes tilt-dev#6272 History of this change: - we used to have a very naive definition of "Ready" - callers of the runtime status API had hacks to look at Job pods to determine completeness (tilt-dev#4367) - later, we changed it to have a more sophisticated definition, so that Jobs only count as 'ready' when they're complete. (tilt-dev#5013) - we forgot to remove the old caller hacks :grimace: in any case, i added a bunch of tests, since we need more job integration tests anyway. Signed-off-by: Nick Santos <nick.santos@docker.com>
nicks
added a commit
to nicks/tilt
that referenced
this issue
Dec 5, 2023
fixes tilt-dev#6272 History of this change: - we used to have a very naive definition of "Ready" - callers of the runtime status API had hacks to look at Job pods to determine completeness (tilt-dev#4367) - later, we changed it to have a more sophisticated definition, so that Jobs only count as 'ready' when they're complete. (tilt-dev#5013) - we forgot to remove the old caller hacks :grimace: in any case, i added a bunch of tests, since we need more job integration tests anyway. Signed-off-by: Nick Santos <nick.santos@docker.com>
nicks
added a commit
that referenced
this issue
Dec 8, 2023
) * Handle jobs with deleted pods Signed-off-by: Daniel Nephin <dnephin@gmail.com> * session: ensure 'tilt ci' exits properly when reattaching to jobs fixes #6272 History of this change: - we used to have a very naive definition of "Ready" - callers of the runtime status API had hacks to look at Job pods to determine completeness (#4367) - later, we changed it to have a more sophisticated definition, so that Jobs only count as 'ready' when they're complete. (#5013) - we forgot to remove the old caller hacks :grimace: in any case, i added a bunch of tests, since we need more job integration tests anyway. Signed-off-by: Nick Santos <nick.santos@docker.com> --------- Signed-off-by: Daniel Nephin <dnephin@gmail.com> Signed-off-by: Nick Santos <nick.santos@docker.com> Co-authored-by: Daniel Nephin <dnephin@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Background
I believe this is the same issue as described in #5822. That issue was closed over a year ago, so creating a new one.
We have a system that pre-creates tilt environments for
tilt ci
, so whentilt ci
runs it resumes from an existing environment. We've noticed that occasionally those resumes will fail with a timeout. Using the new output from #6262 we noticed that when it happens it's always a job that never goes ready. Looking at logs we notice that these jobs had deleted pods.#5859 added special handling for "completed jobs without pods" to kubernetesapply/reconciler.go, but that does not appear to work for processExitCondition.
It appears that session/conv.go:Reconciler.k8sRuntimeTarget sees a job with
status == v1alpha1.RuntimeStatusOK
, but there are no pods, sov1.PodSucceeded == phase
is false, and the resource is marked asActive.Ready = true
. InprocessExitCondition
tilt/internal/controllers/core/session/status.go
Line 101 in 38238ec
means that the ready == True is ignored for jobs, so they never appear ready when the pods are gone.
Expected Behavior
Re-attaching to an environment with
tilt ci
recognizes a job is completed, even when the pods for that job are deleted.Current Behavior
Today it will timeout with something like:
Steps to Reproduce
Tiltfile
from job never becomes ready when re-attaching to existing env #5822tilt ci
and wait for it to completekubectl delete
thepod/db-init-
podtilt ci --timeout 1m
- this will timeout instead of going readyImplementation Questions
I think we understand the problem well enough, but I'm not entirely sure how to fix it yet.
I think in
k8sRuntimeTarget
we can handle when no pod is found fromkrs.MostRecentPod
, and produce a state ofTerminated
. It seems like the condition is available at that time.The text was updated successfully, but these errors were encountered: