-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better handling of terraform validation failure in digger #1286
Comments
I'm having the same issue. If for instance a |
I would like to vote this up, I encountered this today as well. |
I can take a look at this, is there any steps that can help me reproduce this scenario? Its probably a case of digger bailing early but not updaing the reporter |
@motatoes , one scenario was the recent issue where the locks were broken. |
Here is another one where the status does not get updated:
|
Hey @evanstachowiak thanks for chiming in! To be honest there is a solution that has a PR pending here for general plan and apply failures: #1579 Regarding things like failure to report to backend etc. and the job crashing or not starting I was thinking that we should have a catch-all timeout on the orchestrator side that if it doesn't hear from the job within X minutes it considers it timed out. This would require implementing a heartbeat method where the cli will report in the background heartbeat pings to the orchestrator every few seconds and if the orchestrator, from the time it first the job doesn't hear an initial heartbeat after 1 minute, and subsequently doesn't get a heartbeat every 10 seconds or so it can fail with a timeout and post comment etc. that's not the only solution but I think its a catch-all for cases when the job crashes, for whatever reason.thoughts ? |
@motatoes, shouldn't the orchestrator have the id of the job running when it first marks it as pending? Then you don't necessarily need a heartbeat, but the orchestrator can check the job status every however long and if the job is finished with no status update it can mark it as failed. |
@evanstachowiak in fact it does know the id of the job not when it first fires it (github api does not return the ID on workflow dispatch), we have to query the list of jobs running and cross-match on the ID that we sent as input. We could do that in the background after firing the job but now we actually only know the ID when the cli first reports back the first time |
Reported by user A.T.
The text was updated successfully, but these errors were encountered: