Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to set apply-only per cluster via config item #654

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

linki
Copy link
Member

@linki linki commented Mar 14, 2023

Sometimes you want to have CLM update manifests as fast as possible and skip and defer any node rolls. CLM has a global flag for that --apply-only but it's rarely used since it requires to redeploy CLM and it's global.

This adds a config item, similar to update_strategy, that allows to configure the apply-only behaviour at runtime on a per-cluster basis.

When a cluster is stuck in a long node roll, the way it's supposed to be used is like that:

// set apply only to true
$ zregistry clusters config-items set -k update_apply_only -v true --cluster-alias <cluster>
// briefly block cluster update to retrigger CLM
$ echo "retrigger CLM" | zkubectl --context <cluster> cluster-update block
// wait a couple of seconds, then unblock the cluster again
$ echo y | zkubectl --context <cluster> cluster-update unblock
// once all the manifests have been applied, set apply only back to the original to start rolling nodes again
$ zregistry clusters config-items delete -k update_apply_only --cluster-alias <cluster>

Note: don't merge, untested

@AlexanderYastrebov
Copy link
Member

// briefly block cluster update to retrigger CLM

Why is this needed? I think CLM will be triggered by the update_apply_only config item change, won't it?

@linki
Copy link
Member Author

linki commented Jun 5, 2023

@AlexanderYastrebov Yes, if there's no update in progress. The example specifically handles the case where a long-running cluster update with node rotations is already underway. In this case changing the config item will not retrigger CLM until the node rotations are done which can take a long time. Restarting CLM or blocking/unblocking does the trick then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants