Drop sync-wave from CRBS#103
Merged
mbaldessari merged 1 commit intoMar 19, 2026
Merged
Conversation
In commit 688ddb6 (Force rolebindings as early as possible) we made the RBACs at sync-wave: -100 to create them as early as possible. Without that we'd und up with the following error on the spokes: Failed sync attempt to : one or more objects failed to apply, reason: serviceaccounts is forbidden: User "system:serviceaccount:openshift-gitops:openshift-gitops-argocd-application-controller" cannot create resource "serviceaccounts" in API group "" in the namespace "imperative" due to application controller sync timeout. Retrying attempt validatedpatterns#1 at 6:38PM. 20 minutes ago (Wed Mar 18 2026 19:39:04 GMT+0100) The problem is that by using these sync-waves, we seem to trigger an ArgoCD bug where selfHeal simply stops trying: https://www.github.com/argoproj/argo-cd/issues/18442 It does not always happen but it is certainly frequent enough to be noticed. In order to avoid this bug (and potentially others) we fully drop the sync-waves around the CRBs and to avoid the original problem of the openshift-gitops-argocd-application-controller service account being unable to create SAs, we also precreate that CRB via ACM. This way we actually avoid the argoCD issue and still get everything working. Tested on 6 separate MCG hub/spoke installations without any issues. Previously I would hit the issue at least 4 times. Closes: validatedpatterns#63
mbaldessari
added a commit
to mbaldessari/acm-chart
that referenced
this pull request
Mar 19, 2026
This is to fix an issue on spokes. See validatedpatterns/clustergroup-chart#103 for the full reasoning. TLDR: we need to drop sync-waves in clustergroup from CRBs to avoid an argo bug, but then without those the SA will never have the right permissions to create another service account, so we precreate the CRB via the acm-chart Closes: validatedpatterns/clustergroup-chart#63
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In commit 688ddb6 (Force rolebindings
as early as possible) we made the RBACs at sync-wave: -100 to create
them as early as possible. Without that we'd und up with the following
error on the spokes:
The problem is that by using these sync-waves, we seem to trigger an
ArgoCD bug where selfHeal simply stops trying:
https://www.github.com/argoproj/argo-cd/issues/18442
It does not always happen but it is certainly frequent enough to be
noticed.
In order to avoid this bug (and potentially others) we fully drop the
sync-waves around the CRBs and to avoid the original problem of the
openshift-gitops-argocd-application-controller service account being
unable to create SAs, we also precreate that CRB via ACM. This way we
actually avoid the argoCD issue and still get everything working.
Tested on 6 separate MCG hub/spoke installations without any issues.
Previously I would hit the issue at least 4 times.
Closes: #63