Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discovered backends are added live before healthcheck runs #251

Closed
KnicKnic opened this issue Jan 14, 2020 · 3 comments
Closed

discovered backends are added live before healthcheck runs #251

KnicKnic opened this issue Jan 14, 2020 · 3 comments

Comments

@KnicKnic
Copy link
Contributor

KnicKnic commented Jan 14, 2020

Here we compare if a health transitions from between healthy and not healthy, and only if it transitions do we notify server of the state change.

func (this *Worker) process(checkResult CheckResult) {
log := logging.For("healthcheck/worker")
if this.LastResult.Live && !checkResult.Live {
this.passes = 0
this.fails++
} else if !this.LastResult.Live && checkResult.Live {
this.fails = 0
this.passes++
} else {
// check status not changed
return
}

However we see initial state is assumed to be healthy

if keep == nil {
keep = &Worker{
target: t,
stop: make(chan bool),
out: this.Out,
cfg: this.cfg,
check: this.check,
LastResult: CheckResult{
Live: true,
},
}
keep.Start()
}

This was odd to me. I would have assumed that liveness would have been false(if health check is configured), and kicked of an immediate health check. However Health check is not kicked off immediately, and instead delays for wait period seconds.

ticker := time.NewTicker(interval)
c := make(chan CheckResult, 1)
go func() {
for {
select {
/* new check interval has reached */
case <-ticker.C:
log.Debug("Next check ", this.cfg.Kind, " for ", this.target)
go this.check(this.target, this.cfg, c)

There is a related issue to this, that asked for a delay before backends go live.
related to #50

@nickdoikov
Copy link
Collaborator

From my perspective - you should implement such behavioural changes with reflection in the config file.
it needs to allow user to choose what will be server behaviour - immediate traffic ingesting based on discovery, also check is optional and it needs to leave previous behaviour as-is and add optional config value in server health check section to define your behaviour.

something like :

[servers.default.healthcheck]
check_initial_status=unhealthy  #mark initially discovered backend unhealthy until health check verification will be passed  

Also we have a failpolicy , so such changes can affect it also.

One more remark : Pease remember that fixing your own issue related to specific use case you can dramatically affect other functionality.

@yyyar @illarion could you please review #253 taking to account side effects that it can cause .

@KnicKnic
Copy link
Contributor Author

From my perspective - you should implement such behavioural changes with reflection in the config file.
it needs to allow user to choose what will be server behaviour - immediate traffic ingesting based on discovery, also check is optional and it needs to leave previous behaviour as-is and add optional config value in server health check section to define your behaviour.

something like :

[servers.default.healthcheck]
check_initial_status=unhealthy  #mark initially discovered backend unhealthy until health check verification will be passed  

Also we have a failpolicy , so such changes can affect it also.

One more remark : Pease remember that fixing your own issue related to specific use case you can dramatically affect other functionality.

I do understand, but to me this seems like what should be done (also why I am seeking feedback :-) ). If you set a health check policy your nodes should not receive traffic till they are deemed "Healthy". One thing I have done to mitigate this change was to ensure that Healthcheck is immediately kicked off. This means the only people that are functionally affected are those that set HealthcheckConfig.Passes to greater than 1 (as health check will immediately declare the discovered node as up).

There are 2 questions,

  1. the impact to existing users of this change, and does it warrant a workaround to get them the old behavior.
    1. I prepared the change without a workaround to get old behavior, due to it complicating code, complicating user documentation.
    2. @nickdoikov if you still think this is necessary even with the change to immedately schedule a health check (maybe the user configured passes to 3) then I can prepare the change
  2. What is the behavior that a user would expect.
    1. Currently I am wondering what is the scenario to have healthcheck but not use it after initial discovery to mark a backend as live, but rather to mark it as up and wait for a health check to fail.

@KnicKnic
Copy link
Contributor Author

Spoke to @nickdoikov offline, so I have prepared change that makes it initial backend status as live, user has option to make it unhealthy.

@illarion illarion closed this as completed Aug 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants