30s down-moratorium before allowing suspension #14455

hakonhall · 2020-09-18T15:00:11Z

If all services of a node are down, we used allow suspension. If those services are monitored with /state/v1/health, we have the timestamp the service became unhealthy - the "since" timestamp. Now we will require the services to have been down for at least 30s before allowing suspension based on unhealthiness.

Also, for config servers only, we will log all healthiness transitions to track down some orchestrator issues.

application-model/src/main/java/com/yahoo/vespa/applicationmodel/ClusterId.java

hmusum

One suggested change, otherwise LGTM

…el/ClusterId.java Co-authored-by: Harald Musum <musum@verizonmedia.com>

hakonhall · 2020-09-18T21:40:59Z

I'm unable to reproduce the Travis failure (ControllerTest, testDevDeployment), and I no longer have the ability to trigger another run on Travis. I'll merge and make a revert just in case.

hakonhall added 2 commits September 18, 2020 17:00

30s down-moratorium before allowing suspension

8ef2939

Log when config server changes health

d300efc

hakonhall requested a review from hmusum September 18, 2020 17:58

hmusum reviewed Sep 18, 2020

View reviewed changes

application-model/src/main/java/com/yahoo/vespa/applicationmodel/ClusterId.java Outdated Show resolved Hide resolved

hmusum requested changes Sep 18, 2020

View reviewed changes

Update application-model/src/main/java/com/yahoo/vespa/applicationmod…

2416431

…el/ClusterId.java Co-authored-by: Harald Musum <musum@verizonmedia.com>

hmusum approved these changes Sep 18, 2020

View reviewed changes

hakonhall merged commit 3a6bcde into master Sep 18, 2020

hakonhall deleted the hakonhall/30s-down-moratorium-before-allowing-suspension branch September 18, 2020 21:41

hakonhall mentioned this pull request Sep 18, 2020

Revert "30s down-moratorium before allowing suspension" #14456

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

30s down-moratorium before allowing suspension #14455

30s down-moratorium before allowing suspension #14455

hakonhall commented Sep 18, 2020 •

edited

hmusum left a comment

hakonhall commented Sep 18, 2020

30s down-moratorium before allowing suspension #14455

30s down-moratorium before allowing suspension #14455

Conversation

hakonhall commented Sep 18, 2020 • edited

hmusum left a comment

Choose a reason for hiding this comment

hakonhall commented Sep 18, 2020

hakonhall commented Sep 18, 2020 •

edited