Skip to content

fix: add build failed condition when failure state reached#337

Merged
shreddedbacon merged 1 commit intomainfrom
failed-build-container-state
Dec 15, 2025
Merged

fix: add build failed condition when failure state reached#337
shreddedbacon merged 1 commit intomainfrom
failed-build-container-state

Conversation

@shreddedbacon
Copy link
Member

@shreddedbacon shreddedbacon commented Dec 15, 2025

Checklist

  • Affected Issues have been mentioned in the Closing issues section
  • Documentation has been written/updated
  • PR title is ready for changelog and subsystem label(s) applied

If a build pod reaches an ImagePullBackOff or CrashLoopBackOff condition, the build does not update its status in the API to reflect, and the LagoonBuild CR remains in a pending/running state blocking subsequent builds in that namespace from starting.

Right now, the controller captures the failure, but the LagoonBuild CR does not get updated properly for the controller to detect it. So the build pod is cleaned up, and the controller thinks everything is ok, when it isn't.

To observe this, spin up a local-stack and deploy one of the demo environments, watch it complete successfully. Now log into the registry in the local-stack and delete the library/build-deploy-image, this will force the ImagePullBackOff condition on the next build. Now trigger another build and you'll see the build pod start, hit the condition. The controller will clean it up and mark the build as "failed". However, the LagoonBuild CR for that build will show that it is Pending with a running state. If you trigger another deployment, it will not progress due to the previous build thinking it is still running.

Now, if you delete the build CR that is stuck, the next build will start, but will then do the same thing.

To verify the fix, delete all the LagoonBuild CRs from the namespace. Replace the controller image with the one in this PR and repeat the steps. You'll see the builds now no longer get stuck in pending, and report back as failed correctly.

@shreddedbacon shreddedbacon marked this pull request as ready for review December 15, 2025 01:16
@shreddedbacon shreddedbacon requested a review from bomoko December 15, 2025 01:16
@bomoko bomoko requested a review from Copilot December 15, 2025 04:00
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug where LagoonBuild CRs were not properly updating their status when build pods entered failure states like ImagePullBackOff or CrashLoopBackOff. Previously, the controller would clean up the failed pod and only update the build label, but not the status conditions and phase, leaving the build stuck in a pending/running state and blocking subsequent builds in the namespace.

Key Changes:

  • Replaced r.Update() with r.Patch() for updating the LagoonBuild resource to ensure atomic status updates
  • Added status condition setting using BuildStepToStatusConditions helper to properly track build step failures
  • Added phase field to the status update to correctly reflect the failed state

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@shreddedbacon shreddedbacon force-pushed the failed-build-container-state branch from 7bc2c7d to 7744412 Compare December 15, 2025 04:55
Copy link
Contributor

@bomoko bomoko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, all makes sense.

@shreddedbacon shreddedbacon merged commit e92b034 into main Dec 15, 2025
13 checks passed
@shreddedbacon shreddedbacon deleted the failed-build-container-state branch December 15, 2025 05:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants