discovered backends are added live before healthcheck runs #251

KnicKnic · 2020-01-14T07:26:23Z

Here we compare if a health transitions from between healthy and not healthy, and only if it transitions do we notify server of the state change.

gobetween/src/healthcheck/worker.go

Lines 95 to 108 in 7d8a736

    
           func (this *Worker) process(checkResult CheckResult) { 
        
           	log := logging.For("healthcheck/worker") 
        
           	if this.LastResult.Live && !checkResult.Live { 
        
           		this.passes = 0 
        
           		this.fails++ 
        
           	} else if !this.LastResult.Live && checkResult.Live { 
        
           		this.fails = 0 
        
           		this.passes++ 
        
           	} else { 
        
           		// check status not changed 
        
           		return 
        
           	}

However we see initial state is assumed to be healthy

gobetween/src/healthcheck/healthcheck.go

Lines 142 to 154 in 7d8a736

    
           if keep == nil { 
        
           	keep = &Worker{ 
        
           		target: t, 
        
           		stop:   make(chan bool), 
        
           		out:    this.Out, 
        
           		cfg:    this.cfg, 
        
           		check:  this.check, 
        
           		LastResult: CheckResult{ 
        
           			Live: true, 
        
           		}, 
        
           	} 
        
           	keep.Start() 
        
           }

This was odd to me. I would have assumed that liveness would have been false(if health check is configured), and kicked of an immediate health check. However Health check is not kicked off immediately, and instead delays for wait period seconds.

gobetween/src/healthcheck/worker.go

Lines 63 to 73 in 7d8a736

    
           ticker := time.NewTicker(interval) 
        
           c := make(chan CheckResult, 1) 
        
           go func() { 
        
           	for { 
        
           		select { 
        
           		/* new check interval has reached */ 
        
           		case <-ticker.C: 
        
           			log.Debug("Next check ", this.cfg.Kind, " for ", this.target) 
        
           			go this.check(this.target, this.cfg, c)

There is a related issue to this, that asked for a delay before backends go live.
related to #50

nickdoikov · 2020-01-14T18:04:44Z

From my perspective - you should implement such behavioural changes with reflection in the config file.
it needs to allow user to choose what will be server behaviour - immediate traffic ingesting based on discovery, also check is optional and it needs to leave previous behaviour as-is and add optional config value in server health check section to define your behaviour.

something like :

[servers.default.healthcheck]
check_initial_status=unhealthy  #mark initially discovered backend unhealthy until health check verification will be passed

Also we have a failpolicy , so such changes can affect it also.

One more remark : Pease remember that fixing your own issue related to specific use case you can dramatically affect other functionality.

@yyyar @illarion could you please review #253 taking to account side effects that it can cause .

KnicKnic · 2020-01-14T21:01:08Z

From my perspective - you should implement such behavioural changes with reflection in the config file.
it needs to allow user to choose what will be server behaviour - immediate traffic ingesting based on discovery, also check is optional and it needs to leave previous behaviour as-is and add optional config value in server health check section to define your behaviour.

something like :
[servers.default.healthcheck]
check_initial_status=unhealthy  #mark initially discovered backend unhealthy until health check verification will be passed  
Also we have a failpolicy , so such changes can affect it also.

One more remark : Pease remember that fixing your own issue related to specific use case you can dramatically affect other functionality.

I do understand, but to me this seems like what should be done (also why I am seeking feedback :-) ). If you set a health check policy your nodes should not receive traffic till they are deemed "Healthy". One thing I have done to mitigate this change was to ensure that Healthcheck is immediately kicked off. This means the only people that are functionally affected are those that set HealthcheckConfig.Passes to greater than 1 (as health check will immediately declare the discovered node as up).

There are 2 questions,

the impact to existing users of this change, and does it warrant a workaround to get them the old behavior.
1. I prepared the change without a workaround to get old behavior, due to it complicating code, complicating user documentation.
2. @nickdoikov if you still think this is necessary even with the change to immedately schedule a health check (maybe the user configured passes to 3) then I can prepare the change
What is the behavior that a user would expect.
1. Currently I am wondering what is the scenario to have healthcheck but not use it after initial discovery to mark a backend as live, but rather to mark it as up and wait for a health check to fail.

KnicKnic · 2020-01-18T04:33:55Z

Spoke to @nickdoikov offline, so I have prepared change that makes it initial backend status as live, user has option to make it unhealthy.

This was referenced Jan 14, 2020

Do healthcheck upon worker startup without delay #252

Merged

Initial backends dead #253

Closed

illarion closed this as completed Aug 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

discovered backends are added live before healthcheck runs #251

discovered backends are added live before healthcheck runs #251

KnicKnic commented Jan 14, 2020 •

edited

nickdoikov commented Jan 14, 2020

KnicKnic commented Jan 14, 2020

KnicKnic commented Jan 18, 2020

discovered backends are added live before healthcheck runs #251

discovered backends are added live before healthcheck runs #251

Comments

KnicKnic commented Jan 14, 2020 • edited

nickdoikov commented Jan 14, 2020

KnicKnic commented Jan 14, 2020

KnicKnic commented Jan 18, 2020

KnicKnic commented Jan 14, 2020 •

edited