-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Healthcheck: add support at the load-balancers of services level #8057
Conversation
Thanks, I was really eager to try this one ;) Here is the base static config used: http:
services:
whoami-public:
weighted:
#healthCheck: {}
services:
- name: whoami-dc1
weight: 1
- name: whoami-dc2
weight: 1
whoami-dc1:
loadBalancer:
servers:
- url: "http://127.0.0.1:8090"
- url: "http://127.0.0.1:8091"
healthCheck:
scheme: http
interval: 2s
timeout: 1s
path: /ping
whoami-dc2:
loadBalancer:
servers:
- url: "http://127.0.0.1:8092"
- url: "http://127.0.0.1:8093"
healthCheck:
scheme: http
interval: 2s
timeout: 1s
path: /ping
routers:
dashboard:
rule: Host(`traefik.docker.localhost`)
service: api@internal
whoami-router:
rule: Host(`whoami.docker.localhost`)
service: whoami-public Step / Status:
Even after enabling the I'm also attaching logs starting from the config reload as behavior confirmation. Other than that, everything work as expected when I start it with PS.: |
Thanks @ddtmachado , I will investigate. |
dae94e6
to
0a1126d
Compare
@ddtmachado , I've updated the PR with a new, simpler, design. From what I've tested it seems it should also fix the problem you reported. But could you please test it out and confirm? |
@mpl just tested again and I confirm it does work for that edge case as well, nice work! |
pkg/healthcheck/healthcheck.go
Outdated
|
||
if !upBefore { | ||
// we were already down, and we still are, no need to propagate. | ||
log.WithoutContext().Debugf("child %s now DOWN, but we were already DOWN, so no need to propagate.", u.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we could add context for the log messages?
IMO it's easier to follow when there are other log messages in between
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I also already had the same dilemma with SetStatus (in wrr.go), where the only reason I pass balancerName as an argument is for better logging. Julien and I were considering passing a context instead (with the balancerName inside it), but it didn't seem much better.
But yeah, we probably should. Adding a TODO for now.
Ah, and in the case of RemoveServer (and UpsertServer) maybe we can't even do that, because the signatures have to match the ones from oxy. I have to double-check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, as I was suspecting, at the moment we're kindof blocked because of the fixed signatures of RemoveServer and UpsertServer, because they are coming from oxy. It might be doable to work-around that, but I didn't want to delay the PR further with that. Especially since we hope to rework the part depending on oxy soon.
So for now I half-way addressed your suggestion: the code within the method already uses a (useless for now) context, but we'll pass around as argument the actual context later.
eab0332
to
3e72b00
Compare
This change adds support for automatic self-healthcheck for the WeightedRoundRobin type of service, in order to make the whole tree of load-balancing aware of status changes at the "leaf" level. That is, let the load-balancing algorithm adjust globally when a server status changes (gets down or up). So far healthcheck was supported at the "leaf" level, i.e. a load-balancer of servers (at the bottom of the load-balancing tree) was able to do active health checks on his own servers to adjust its load-balancing algorithm. However when e.g. all of its servers would go down, there was no propagation upwards of this status change, which means requests would still arrive to this load-balancer even though it would in effect be down, and should have been ignored by its parent(s). Therefore this change adds a mechanism so that all status changes can be propagated upwards to let all parents know (and by extension, the whole tree) when a service is in effect down. The corresponding configuration change is the introduction of the HealthCheck option to the WeightedRoundRobin element. When the HealthCheck option is present in a WeightedRoundRobin, automatic propagation of a status change of its children is enabled. None of the fields of HealthCheck are relevant in this context, so they are ignored if present.
Also added logic to break if healthcheck is enabled for some part of the tree, but not everywhere. Still a WIP, needs more doc, and more testing.
Also finish mirroring support, and improve logging.
What does this PR do?
This change adds support for automatic self-healthcheck for the
WeightedRoundRobin type of service, in order to make the whole tree of
load-balancing aware of status changes at the "leaf" level. That is, let the
load-balancing algorithm adjust globally when a server status changes (gets down
or up).
Motivation
So far healthcheck was supported only at the "leaf" level, i.e. a load-balancer of
servers (at the bottom of the load-balancing tree) was able to do active health
checks on his own servers to adjust its load-balancing algorithm. However when
e.g. all of its servers would go down, there was no propagation upwards of this
status change, which means requests would still arrive to this load-balancer
even though it would in effect be down, and should have been ignored by its
parent(s).
Therefore this change adds a mechanism so that all status changes can be
propagated upwards to let all parents know (and by extension, the whole tree)
when a service is in effect down.
The corresponding configuration change is the introduction of the HealthCheck
option to the WeightedRoundRobin element. When the HealthCheck option is present
in a WeightedRoundRobin, automatic propagation of a status change of its
children is enabled. None of the fields of HealthCheck are relevant in this
context, so they are ignored if present.
More
Additional Notes
Fixes #7693
Co-authored-by: Dmitry Sharshakov d3dx12.xx@gmail.com
Co-authored-by: Julien Salleyron julien.salleyron@gmail.com
Co-authored-by: Jean-Baptiste Doumenjou 925513+jbdoumenjou@users.noreply.github.com
Co-authored-by: Romain rtribotte@users.noreply.github.com
Co-authored-by: Tom Moulard tom.moulard@traefik.io