-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sometimes service health is checked only once for docker services #6096
Comments
I grabbed the deployment script logs from our CI:
The delay between checks is 10 seconds. |
I'm encountering exactly the same behavior with the latest traefik release. It's clearly a healthcheck issue as the service is correctly registered with traefik and its responding to http request from traefik container but no healthchecks are being sent. Exactly as described above - once the old container is shut down then the healthchecks are resumed and container is reported correctly as healthy again.
|
I have tried to add some extra debug logging and reproduce the behavior to pinpoint where the problem can be. Here's what I observed: If an event triggers https://github.com/containous/traefik/blob/0c90f6afa24ef390fec43ca654f806915e821daa/pkg/server/service/service.go#L201 then To me it would make more sense to store the disabled URLs on the LB so that it cannot get lost when healthcheck configuration is changed or if there are multiple events in short succession. I'm not a go developer and have no insight into the rest of the code though. Please find the extra debug logs prepended with Docker compose file for traefik:
Docker compose file for application:
Debug log:
Please feel free to reach out if there's some more info I might be able to provide. |
Workaround for traefik#6096
I seem to be seeing the same, or at least a very similar, issue when containers are restarted when docker comes back up after a host reboot - in this case, a vanilla Traefik v2.1.4 container and a single application container that registers itself with Traefik. The symptoms appear to be identical to those posted by others - two initial health checks are started very close to one another, one fails, then all subsequent health checks (including the second initial health check) run but do nothing and the server continues to appear down to Traefik despite actually being up. |
I also seem to be seeing this happen on another host I manage that has two applications behind Traefik. In this case, application B's health check starts failing when application A is deployed (application B is not deployed, so its container remains unchanged throughout). The root cause would appear to be the same, although in this case it's very interesting that the health check for an existing and unchanged container is affected. It seems that the docker provider restarts health checks for all containers in response to any single container event? |
Yes, this seems to be the case. |
Closed by #6372. |
Should the #3834 be also closed? |
@Himura2la no, because the change is only on the 2.1 codebase. |
Do you want to request a feature or report a bug?
Bug
What did you do?
What did you expect to see?
traefik.http.services.service.loadbalancer.healthcheck.interval
label.What did you see instead?
As a workaround, I changed the deployment script so that it checks the endpoint directly and stops the old instance once the new one is OK. When the instance remains only on in the service, it is healthchecked and fortunately becomes UP.
This is happens not every time, and I did not manage to determine the conditions to reproduce it.
The only clue I noticed is possibly related to this #3834 (comment) issue:
(according to logs)
Output of
traefik version
:What is your environment & configuration (arguments, toml, provider, platform, ...)?
If applicable, please paste the log output in DEBUG level (
--log.level=DEBUG
switch)The text was updated successfully, but these errors were encountered: