Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/rolling update #66

Merged
merged 8 commits into from
Aug 9, 2019
Merged

Feature/rolling update #66

merged 8 commits into from
Aug 9, 2019

Conversation

lftakakura
Copy link
Contributor

Deploy pods before deleting them in a rolling-update respecting maxSurge

@coveralls
Copy link

coveralls commented Aug 7, 2019

Coverage Status

Coverage decreased (-0.06%) to 81.972% when pulling 8cce3e0 on feature/rolling-update into 3f5eb82 on master.

@andrehp
Copy link
Contributor

andrehp commented Aug 7, 2019

Would it add too much complexity to allow maxSurge and maxUnavailable on the same config like deployments do?

@cscatolini
Copy link
Contributor

Would it add too much complexity to allow maxSurge and maxUnavailable on the same config like deployments do?

I'd rather not change any behaviour and/or scheduler config at this moment @andrehp. Do you think this is a must have?

Copy link
Contributor

@henrod henrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree @cscatolini. Maestro already has the query string maxsurge which suffices many cases

db, clientset, configYAML, scheduler)
if err != nil {
if err == nil || !strings.Contains(err.Error(), "not found") {
logger.WithField("pod", pod.GetName()).Debugf("pod still exists, deleting again")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log the error here if not nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@andrehp
Copy link
Contributor

andrehp commented Aug 7, 2019

Would it add too much complexity to allow maxSurge and maxUnavailable on the same config like deployments do?

I'd rather not change any behaviour and/or scheduler config at this moment @andrehp. Do you think this is a must have?

I wouldn't say it's a must have, but it might be nice to speed up deploys, though I do agree it might not be the best idea to change this behavior right now.

I agree @cscatolini. Maestro already has the query string maxsurge which suffices many cases

I did a quick search on the docs about maxsurge but didn't find anything. But I think maestro's maxsurge is different from a k8s deployment's, isn't it?

Copy link
Contributor

@cscatolini cscatolini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments. I'll review this again once you made the changes we've discussed.

controller/utils.go Show resolved Hide resolved
@@ -276,51 +271,83 @@ func waitTerminatingPods(
exit := true
select {
case <-ticker.C:
for _, pod := range deletedPods {
if operationManager.WasCanceled() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't you have the same check here?
if operationManager != nil && operationManager.WasCanceled() {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it should

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

}

if len(createdPod.Status.Phase) == 0 {
//HACK! Trying to detect if we are running unit tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't this have any unintended side effects? Maybe add a logger here?

@cscatolini
Copy link
Contributor

Would it add too much complexity to allow maxSurge and maxUnavailable on the same config like deployments do?

I'd rather not change any behaviour and/or scheduler config at this moment @andrehp. Do you think this is a must have?

I wouldn't say it's a must have, but it might be nice to speed up deploys, though I do agree it might not be the best idea to change this behavior right now.

You can talk to @lftakakura. We don't want major changes but we believe that creating 25% new pods before deleting previous versions may be a big burden in our infrastructure. He is thinking about some way to speed the process and will update this PR soon.

I agree @cscatolini. Maestro already has the query string maxsurge which suffices many cases

I did a quick search on the docs about maxsurge but didn't find anything. But I think maestro's maxsurge is different from a k8s deployment's, isn't it?

Yes. It is a global maestro's config instead of per scheduler. The default we're using in prod is 25%.

Copy link
Contributor

@cscatolini cscatolini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lftakakura lftakakura merged commit ff9a53c into master Aug 9, 2019
lftakakura added a commit that referenced this pull request Oct 11, 2019
* Create rooms before deleting old ones

* Remove unnecessary comments

* Fix rolling-update flow

* Create pod and delete old one after in each chunk

* Remove waitCreatingAndDeleteOldPods

* Log error if not able to get pod in waitTerminatingPods and waitCreatingPods

* WIP: Use goroutines to perform rolling update

* Use goroutines to perform rolling update
lftakakura added a commit that referenced this pull request Oct 15, 2019
* Create rooms before deleting old ones

* Remove unnecessary comments

* Fix rolling-update flow

* Create pod and delete old one after in each chunk

* Remove waitCreatingAndDeleteOldPods

* Log error if not able to get pod in waitTerminatingPods and waitCreatingPods

* WIP: Use goroutines to perform rolling update

* Use goroutines to perform rolling update
lftakakura added a commit that referenced this pull request Oct 15, 2019
* Create rooms before deleting old ones

* Remove unnecessary comments

* Fix rolling-update flow

* Create pod and delete old one after in each chunk

* Remove waitCreatingAndDeleteOldPods

* Log error if not able to get pod in waitTerminatingPods and waitCreatingPods

* WIP: Use goroutines to perform rolling update

* Use goroutines to perform rolling update
@caiordjesus caiordjesus deleted the feature/rolling-update branch February 14, 2022 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants