Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Step freezes when container image can't be pulled (ImagePullBackOff) #3555

Closed
3 tasks done
fernandrone opened this issue Mar 25, 2024 · 0 comments · Fixed by #3580
Closed
3 tasks done

Step freezes when container image can't be pulled (ImagePullBackOff) #3555

fernandrone opened this issue Mar 25, 2024 · 0 comments · Fixed by #3580
Labels
backend/kubernetes bug Something isn't working
Milestone

Comments

@fernandrone
Copy link
Contributor

fernandrone commented Mar 25, 2024

Component

server

Describe the bug

On a Kubernetes backend, if any container that is part of a step fails to pull an image and gets stuck in an ImagePullBackOff error, the step will just keep running indefinitely, with no feedback for the user.

I think the expected behavior here would be something along these lines:

  • Woodpecker to try to pull the image for a while
  • If it fails (after a timeout) it displays a error to the user informing that it timed out/failed to pull the specific image
  • It fails the step
  • It terminates the pod on the cluster

I'd assume that similar errors can happen if other issues cause a Pod to be in a pending state (for example, there are no nodes available in the cluster). Maybe a similar "timeout" strategy could be implemented to deal with all these similar scenarios?

Note: canceling the pipeline terminates the pipeline and terminates the pod, but marks the pipeline as successful, which is another issue.

System Info

{"source":"https://github.com/woodpecker-ci/woodpecker","version":"2.3.0"}

Additional context

Here's a sample I did to showcase the issue (it's running in an internal Woodpecker cluster based on Woodpecker 2.3 so I can't share an open link).

I have built a pipeline where I have referenced an image that does not exist, image: broken-image-ref.

image

Here's the result. It just stays stuck on the broken step, indefinitely (or at least possibility until the pipeline timeout; didn't get to wait that long) without logging anything.

image

If I go look at this pod in my cluster, I can see that it is stuck with the ImagePullBackOff error:

...
Events:
  Type     Reason                  Age                    From                     Message
  ----     ------                  ----                   ----                     -------
  Normal   Scheduled               4m55s                  default-scheduler        Successfully assigned woodpecker-pipelines/wp-01hsvnwbdgge7msffe0qn6zz68 to < redacated >
  Normal   SuccessfulAttachVolume  4m45s                  attachdetach-controller  AttachVolume.Attach succeeded for volume  < redacated >
  Warning  Failed                  3m25s (x6 over 4m43s)  kubelet                  Error: ImagePullBackOff
  Normal   Pulling                 3m10s (x4 over 4m44s)  kubelet                  Pulling image "broken-image-ref"
  Warning  Failed                  3m10s (x4 over 4m43s)  kubelet                  Failed to pull image "broken-image-ref": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/broken-image-ref:latest": failed to resolve reference "docker.io/library/broken-image-ref:latest": pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
  Warning  Failed                  3m10s (x4 over 4m43s)  kubelet                  Error: ErrImagePull
  Normal   BackOff                 2m58s (x7 over 4m43s)  kubelet                  Back-off pulling image "broken-image-ref"

Validations

  • Read the docs.
  • Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.
  • Checked that the bug isn't fixed in the next version already [https://woodpecker-ci.org/faq#which-version-of-woodpecker-should-i-use]
@fernandrone fernandrone added the bug Something isn't working label Mar 25, 2024
@qwerty287 qwerty287 added this to the 2.x.x milestone Mar 26, 2024
anbraten added a commit that referenced this issue Apr 15, 2024
close: #3555

Put the same logic from `waitStep` and call the function
`isImagePullBackOffState` in the `tailStep` function.

---------

Co-authored-by: elias.souza <elias.souza@quintoandar.com.br>
Co-authored-by: Anbraten <6918444+anbraten@users.noreply.github.com>
@qwerty287 qwerty287 modified the milestones: 2.x.x, 2.5.0 Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend/kubernetes bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants