Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HealthController] Perform rolling update of scheduler versions #618

Merged
merged 3 commits into from
Jul 3, 2024

Conversation

hspedro
Copy link
Collaborator

@hspedro hspedro commented May 8, 2024

  • refactor(healthcontroller): perform rolling update

It is now responsibility of the healthcontroller to perform a rolling update. When there was a switch on the current scheduler active version we need to rollout pods with the new version. Health controller either does autoscaling or rolling update.

On rolling update, the following will happen:
1. Are there rooms with a previous scheduler version? If so, start
update
2. Compute how many rooms we can surge (above desired from autoscaling)
3. Enqueue priority operation to create this rooms
4. Check how many Ready rooms we have above desired from autoscaling
5. Enqueue operation to delete those rooms - they are a buffer, should
never offend readyTarget

  • refactor(switchversion): do not replace pods

    The switch version operation will just update the scheduler version.
    Let healthcontroller perform the rolling update

@hspedro hspedro self-assigned this May 8, 2024
@hspedro hspedro force-pushed the refactor/switchversion-not-delete branch 2 times, most recently from a9bc814 to 1ca2dd7 Compare May 22, 2024 00:18
@hspedro hspedro force-pushed the refactor/switchversion-not-delete branch 2 times, most recently from d8c366e to bcfce98 Compare June 5, 2024 17:51
@hspedro hspedro force-pushed the refactor/switchversion-not-delete branch from 21372cd to 9388674 Compare June 6, 2024 17:14
@codecov-commenter
Copy link

codecov-commenter commented Jun 6, 2024

Codecov Report

Attention: Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.

Project coverage is 64.27%. Comparing base (bb118c8) to head (a71052f).
Report is 5 commits behind head on main.

Files Patch % Lines
internal/adapters/runtime/kubernetes/scheduler.go 0.00% 3 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #618      +/-   ##
==========================================
- Coverage   64.30%   64.27%   -0.03%     
==========================================
  Files          39       39              
  Lines        2905     2911       +6     
==========================================
+ Hits         1868     1871       +3     
- Misses        909      913       +4     
+ Partials      128      127       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hspedro hspedro force-pushed the refactor/switchversion-not-delete branch 4 times, most recently from dfbc13c to c050115 Compare June 7, 2024 19:28
@hspedro hspedro force-pushed the refactor/switchversion-not-delete branch from c050115 to 83d132d Compare June 10, 2024 13:53
@hspedro hspedro force-pushed the refactor/switchversion-not-delete branch 3 times, most recently from b829339 to 1abf37d Compare June 14, 2024 18:09
@hspedro hspedro force-pushed the refactor/switchversion-not-delete branch 5 times, most recently from a71052f to 3f61ae3 Compare June 26, 2024 19:15
The switch version operation will just update the scheduler version.
Let healthcontroller perform the rolling update
It is now responsibility of the healthcontroller to perform a rolling
update. When there was a switch on the current scheduler active version
we need to rollout pods with the new version. Health controller either
does autoscaling or rolling update.

On rolling update, the following will happen:
1. Are there rooms with a previous scheduler version? If so, start
   update
2. Compute how many rooms we can surge
3. Enqueue priority operation to create this rooms
4. Check how many Ready rooms we have above desired from autoscaling
5. Enqueue operation to delete those rooms - they are a buffer, should
   never offend readyTarget
Add reference to the new Rolling Update mechanism performed by the
health_controller operation
@hspedro hspedro force-pushed the refactor/switchversion-not-delete branch from 3f61ae3 to aef5850 Compare July 3, 2024 17:29
@hspedro hspedro merged commit 1a615ba into main Jul 3, 2024
6 checks passed
@hspedro hspedro deleted the refactor/switchversion-not-delete branch July 3, 2024 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants