Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIRA:VZ-5023 handle pending-upgrade helm component status and unblock stuck upgrades #2674

Merged
merged 4 commits into from
Mar 3, 2022

Conversation

jmaron99
Copy link
Contributor

@jmaron99 jmaron99 commented Mar 3, 2022

Description

  • Testing revealed that post a VPO failure and restart a helm component upgrade may have an associated secret indicating a status of "pending-upgrade" which will prevent another upgrade from starting/succeeding. Searches for similar issues indicate that the only viable solution is to remove the secret and retry the upgrade - helm rollback doesn't seem to recognize the release in a pending-upgrade state. See Helm release stuck with status "pending-upgrade" helm/helm#7476 for one of many discussion on this issue. The added method will attempt to remove such secrets should they exist for the given component.
  • Added a Todo example test post-upgrade to ensure that app, weblogic, and coherence operators are restarted successfully.

Fixes VZ-5023

Checklist

As the author of this PR, I have:

  • Checked that I included or updated copyright and license notices in all files that I altered
  • Added or updated unit tests for any new functions I added
  • Added or updated integration tests if appropriate
  • Added or updated acceptance tests if appropriate

Code reviewer, please confirm this PR:

  • Addressed the requirement and meets the acceptance criteria
  • Does not introduce unrelated or spurious changes
  • Does not introduce any unapproved dependency
  • Makes sense and it easy to understand, and/or difficult areas of code are clearly documented so that they can be understood

@jmaron99 jmaron99 changed the title Jmaron/vz 5023 JIRA:VZ-5023 handle pending-upgrade helm component status and unblock stuck upgrades Mar 3, 2022
func (r *Reconciler) resolvePendingUpgrades(compName string, compLog vzlog.VerrazzanoLogger) {
labelSelector := kblabels.Set{"name": compName, "status": "pending-upgrade"}.AsSelector()
helmSecrets := v1.SecretList{}
err := r.Client.List(context.TODO(), &helmSecrets, &clipkg.ListOptions{LabelSelector: labelSelector})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a namespace unless i am missing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - no namespace. It's not easily accessible here. It also isn't required since the search specified both the name of the component and the status as "pending-upgrade", so even a cross namespace search should just yield the specific results for the given component. I don't believe we currently expect to be creating VZ components of the same name in differing namespaces.

@@ -138,6 +144,29 @@ func (r *Reconciler) reconcileUpgrade(log vzlog.VerrazzanoLogger, cr *installv1a
return ctrl.Result{}, nil
}

// resolvePendingUpgrdes will delete any helm secrets with a "pending-upgrade" status for the given component
func (r *Reconciler) resolvePendingUpgrades(compName string, compLog vzlog.VerrazzanoLogger) {
Copy link
Contributor

@pfmackin pfmackin Mar 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if this should be in the helm_component.go Upgrade method after this line

_, _, err = upgradeFunc(context.Log(), h.ReleaseName, namespace, h.ChartDir, true, context.IsDryRun(), overrides)
return err

Currently, this function would be called for Istio also which is not helm based, but i don't think it is a problem since it will just return an empty list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly. If you want me to move the code there I can do that. Let me know.

Example of the way it currently works:

{"level":"error","@timestamp":"2022-03-03T20:03:31.331Z","caller":"helm/helmcli.go:205","message":"Failed running Helm command for release weblogic-operator: stderr Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress\n","resource_namespace":"default","resource_name":"my-verrazzano","controller":"verrazzano","component":"weblogic-operator","operation":"upgrade","stacktrace":"github.com/verrazzano/verrazzano/pkg/helm.runHelm\n\t/home/opc/go/src/github.com/verrazzano/verrazzano/pkg/helm/helmcli.go:205\ngithub.com/verrazzano/verrazzano/pkg/helm.Upgrade\n\t/home/opc/go/src/github.com/verrazzano/verrazzano/pkg/helm/helmcli.go:142\ngithub.com/verrazzano/verrazzano/platform-operator/controllers/verrazzano/component/helm.HelmComponent.Upgrade\n\t/home/opc/go/src/github.com/verrazzano/verrazzano/platform-operator/controllers/verrazzano/component/helm/helm_component.go:315\ngithub.com/verrazzano/verrazzano/platform-operator/controllers/verrazzano.(*Reconciler).reconcileUpgrade\n\t/home/opc/go/src/github.com/verrazzano/verrazzano/platform-operator/controllers/verrazzano/upgrade.go:106\ngithub.com/verrazzano/verrazzano/platform-operator/controllers/verrazzano.(*Reconciler).ProcUpgradingState\n\t/home/opc/go/src/github.com/verrazzano/verrazzano/platform-operator/controllers/verrazzano/controller.go:301\ngithub.com/verrazzano/verrazzano/platform-operator/controllers/verrazzano.(*Reconciler).doReconcile\n\t/home/opc/go/src/github.com/verrazzano/verrazzano/platform-operator/controllers/verrazzano/controller.go:158\ngithub.com/verrazzano/verrazzano/platform-operator/controllers/verrazzano.(*Reconciler).Reconcile\n\t/home/opc/go/src/github.com/verrazzano/verrazzano/platform-operator/controllers/verrazzano/controller.go:96\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/opc/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:235\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/opc/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/home/opc/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:188\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/home/opc/go/pkg/mod/k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/home/opc/go/pkg/mod/k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/home/opc/go/pkg/mod/k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/home/opc/go/pkg/mod/k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:90"} {"level":"error","@timestamp":"2022-03-03T20:03:31.331Z","caller":"verrazzano/upgrade.go:107","message":"Error upgrading component weblogic-operator: failed to run '/usr/bin/helm upgrade weblogic-operator /verrazzano/platform-operator/thirdparty/charts/weblogic-operator --wait --namespace verrazzano-system --install -f /tmp/values-936994409.yaml -f /verrazzano/platform-operator/helm_config/overrides/weblogic-values.yaml --set image=ghcr.io/oracle/weblogic-kubernetes-operator:3.3.8,serviceAccount=weblogic-operator-sa,domainNamespaceSelectionStrategy=LabelSelector,domainNamespaceLabelSelector=verrazzano-managed,enableClusterRoleBinding=true,istioLocalhostBindingsEnabled=false : Error exit status 1","resource_namespace":"default","resource_name":"my-verrazzano","controller":"verrazzano","component":"weblogic-operator","operation":"upgrade","stacktrace":"github.com/verrazzano/verrazzano/platform-operator/controllers/verrazzano.(*Reconciler).reconcileUpgrade\n\t/home/opc/go/src/github.com/verrazzano/verrazzano/platform-operator/controllers/verrazzano/upgrade.go:107\ngithub.com/verrazzano/verrazzano/platform-operator/controllers/verrazzano.(*Reconciler).ProcUpgradingState\n\t/home/opc/go/src/github.com/verrazzano/verrazzano/platform-operator/controllers/verrazzano/controller.go:301\ngithub.com/verrazzano/verrazzano/platform-operator/controllers/verrazzano.(*Reconciler).doReconcile\n\t/home/opc/go/src/github.com/verrazzano/verrazzano/platform-operator/controllers/verrazzano/controller.go:158\ngithub.com/verrazzano/verrazzano/platform-operator/controllers/verrazzano.(*Reconciler).Reconcile\n\t/home/opc/go/src/github.com/verrazzano/verrazzano/platform-operator/controllers/verrazzano/controller.go:96\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/opc/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:235\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/opc/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/home/opc/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.2/pkg/internal/controller/controller.go:188\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/home/opc/go/pkg/mod/k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/home/opc/go/pkg/mod/k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/home/opc/go/pkg/mod/k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/home/opc/go/pkg/mod/k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:90"} {"level":"info","@timestamp":"2022-03-03T20:03:31.541Z","caller":"verrazzano/upgrade.go:165","message":"Resolved pending upgrade for component weblogic-operator","resource_namespace":"default","resource_name":"my-verrazzano","controller":"verrazzano","component":"weblogic-operator","operation":"upgrade"} {"level":"info","@timestamp":"2022-03-03T20:03:31.938Z","caller":"helm/helmcli.go:194","message":"Running Helm command /usr/bin/helm upgrade weblogic-operator /verrazzano/platform-operator/thirdparty/charts/weblogic-operator --wait --namespace verrazzano-system --install -f /tmp/values-187013812.yaml -f /verrazzano/platform-operator/helm_config/overrides/weblogic-values.yaml --set image=ghcr.io/oracle/weblogic-kubernetes-operator:3.3.8,serviceAccount=weblogic-operator-sa,domainNamespaceSelectionStrategy=LabelSelector,domainNamespaceLabelSelector=verrazzano-managed,enableClusterRoleBinding=true,istioLocalhostBindingsEnabled=false for release weblogic-operator","resource_namespace":"default","resource_name":"my-verrazzano","controller":"verrazzano","component":"weblogic-operator","operation":"upgrade"}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ok for now. This might get moved in the future to helm_component.go if it makes more sense.

@jmaron99 jmaron99 merged commit 3b50f83 into master Mar 3, 2022
@jmaron99 jmaron99 deleted the jmaron/VZ-5023 branch March 3, 2022 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants