Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes service health is checked only once for docker services #6096

Closed
Himura2la opened this issue Dec 27, 2019 · 9 comments
Closed

Sometimes service health is checked only once for docker services #6096

Himura2la opened this issue Dec 27, 2019 · 9 comments
Assignees
Labels
area/healthcheck kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. status/5-frozen-due-to-age
Projects
Milestone

Comments

@Himura2la
Copy link

Himura2la commented Dec 27, 2019

Do you want to request a feature or report a bug?

Bug

What did you do?

What did you expect to see?

  1. New container starts with the same labels, so it goes as a second instance of the same service
  2. First healthcheck on it fails, and Traefik does not route requests on it
  3. Healthcheck repeats as configured using the traefik.http.services.service.loadbalancer.healthcheck.interval label.
  4. Once the healthcheck passes, the deployment script stops the old instance.

What did you see instead?

  1. New container starts with the same labels, so it goes as a second instance of the same service
  2. First healthcheck on the new service fails, Traefik "removes the instance from server list" (wording from logs).
  3. The instance becomes available and replies 200 for the direct request on a healthcheck endpoint
  4. Traefik does not check it and the instance remains DOWN forever.

As a workaround, I changed the deployment script so that it checks the endpoint directly and stops the old instance once the new one is OK. When the instance remains only on in the service, it is healthchecked and fortunately becomes UP.

This is happens not every time, and I did not manage to determine the conditions to reproduce it.

The only clue I noticed is possibly related to this #3834 (comment) issue:

  • It works as expected if the first healthcheck runs twice
  • It stucks if the first healthcheck runs only once
    (according to logs)

Output of traefik version:

Version:      2.1.1
Codename:     cantal
Go version:   go1.13.5
Built:        2019-12-12T19:01:37Z
OS/Arch:      linux/amd64

What is your environment & configuration (arguments, toml, provider, platform, ...)?

entryPoints:
  myservices:
    address: ":80"
providers:
  docker:
    endpoint: "unix:///var/run/docker.sock"
    exposedByDefault: false
    network: webgateway
api:
  dashboard: true
  insecure: true
log:
  level: info
  service:
    build:
      labels:
        - "traefik.enable=true"
        - "traefik.http.routers.service.rule=Host(`foo.bar.com`)"
        - "traefik.http.routers.service.entryPoints=service-entrypoint"
        - "traefik.http.services.service.loadbalancer.healthcheck.path=/"
        - "traefik.http.services.service.loadbalancer.healthcheck.interval=10s"
        - "traefik.http.services.service.loadbalancer.healthcheck.timeout=9s"
    networks:
      - webgateway

If applicable, please paste the log output in DEBUG level (--log.level=DEBUG switch)

**deploy new instance of service1**
time="2019-12-24T17:08:59Z" level=warning msg="Health check failed, removing from server list. Backend: \"service1@docker\" URL: \"http://172.18.0.6:80\" Weight: 1 Reason: HTTP request failed: Get http://172.18.0.6:80/: dial tcp 172.18.0.6:80: connect: connection refused"
time="2019-12-24T17:08:59Z" level=warning msg="Health check failed, removing from server list. Backend: \"service1@docker\" URL: \"http://172.18.0.6:80\" Weight: 1 Reason: HTTP request failed: Get http://172.18.0.6:80/: dial tcp 172.18.0.6:80: connect: connection refused"
time="2019-12-24T17:08:59Z" level=error msg="server not found"
time="2019-12-24T17:09:10Z" level=warning msg="Health check up: Returning to server list. Backend: \"service1@docker\" URL: \"http://172.18.0.6:80\" Weight: 1"
**remove old instance of service1, everything is fine**

**deploy new instance of service2**
time="2019-12-24T17:12:29Z" level=warning msg="Health check failed, removing from server list. Backend: \"service2@docker\" URL: \"http://172.18.0.3:80\" Weight: 1 Reason: HTTP request failed: Get http://172.18.0.3:80/: dial tcp 172.18.0.3:80: connect: connection refused"
**172.18.0.3 stuck in unhealthy status forever**
@juliens juliens added area/healthcheck kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. and removed status/0-needs-triage labels Dec 27, 2019
@Himura2la
Copy link
Author

I grabbed the deployment script logs from our CI:

Deploying 'foo_BLUE' in place of 'foo_GREEN'...
Using default tag: latest
latest: Pulling from foo
804555ee0376: Already exists
970251047358: Already exists
f3d4c41a4fd1: Already exists
32afd03f1854: Already exists
e014e07c0b51: Pulling fs layer
ccc5b75dfcb4: Pulling fs layer
e014e07c0b51: Verifying Checksum
e014e07c0b51: Download complete
ccc5b75dfcb4: Download complete
e014e07c0b51: Pull complete
ccc5b75dfcb4: Pull complete
Digest: sha256:f124b9d3gcf14be126f68274dcc5c8e36d9e41768997251eee8a8cfbe671f730
Status: Downloaded newer image for our.internal.registry/foo:latest
our.internal.registry/foo:latest
8a3f2f662b2a0f1731b75ec4453b63a5ce987256c092a7d3bf0f42544c429f42
Container started. Starting health check...
Using Traefik health check:
    Traefik API response: {"loadBalancer":{"servers":[{"url":"http://172.18.0.4:80"},{"url":"http://172.18.0.3:80"}],"healthCheck":{"path":"/bar","interval":"10s","timeout":"9s"},"passHostHeader":true},"status":"enabled","usedBy":["foo@docker"],"serverStatus":{"http://172.18.0.3:80":"UP","http://172.18.0.4:80":"DOWN"},"name":"foo@docker","provider":"docker","type":"loadbalancer"}
New 'foo_BLUE' service (172.18.0.4) seems unhealthy. Waiting (1)...
Using Traefik health check:
    Traefik API response: {"loadBalancer":{"servers":[{"url":"http://172.18.0.4:80"},{"url":"http://172.18.0.3:80"}],"healthCheck":{"path":"/bar","interval":"10s","timeout":"9s"},"passHostHeader":true},"status":"enabled","usedBy":["foo@docker"],"serverStatus":{"http://172.18.0.3:80":"UP","http://172.18.0.4:80":"DOWN"},"name":"foo@docker","provider":"docker","type":"loadbalancer"}
New 'foo_BLUE' service (172.18.0.4) seems unhealthy. Waiting (2)...
Using Traefik health check:
    Traefik API response: {"loadBalancer":{"servers":[{"url":"http://172.18.0.4:80"},{"url":"http://172.18.0.3:80"}],"healthCheck":{"path":"/bar","interval":"10s","timeout":"9s"},"passHostHeader":true},"status":"enabled","usedBy":["foo@docker"],"serverStatus":{"http://172.18.0.3:80":"UP","http://172.18.0.4:80":"DOWN"},"name":"foo@docker","provider":"docker","type":"loadbalancer"}
New 'foo_BLUE' service (172.18.0.4) seems unhealthy. Waiting (3)...
Using Traefik health check:
    Traefik API response: {"loadBalancer":{"servers":[{"url":"http://172.18.0.4:80"},{"url":"http://172.18.0.3:80"}],"healthCheck":{"path":"/bar","interval":"10s","timeout":"9s"},"passHostHeader":true},"status":"enabled","usedBy":["foo@docker"],"serverStatus":{"http://172.18.0.3:80":"UP","http://172.18.0.4:80":"DOWN"},"name":"foo@docker","provider":"docker","type":"loadbalancer"}
New 'foo_BLUE' service (172.18.0.4) seems unhealthy. Waiting (4)...
Using Traefik health check:
    Traefik API response: {"loadBalancer":{"servers":[{"url":"http://172.18.0.4:80"},{"url":"http://172.18.0.3:80"}],"healthCheck":{"path":"/bar","interval":"10s","timeout":"9s"},"passHostHeader":true},"status":"enabled","usedBy":["foo@docker"],"serverStatus":{"http://172.18.0.3:80":"UP","http://172.18.0.4:80":"DOWN"},"name":"foo@docker","provider":"docker","type":"loadbalancer"}
New 'foo_BLUE' service (172.18.0.4) seems unhealthy. Waiting (5)...
Using Traefik health check:
    Traefik API response: {"loadBalancer":{"servers":[{"url":"http://172.18.0.4:80"},{"url":"http://172.18.0.3:80"}],"healthCheck":{"path":"/bar","interval":"10s","timeout":"9s"},"passHostHeader":true},"status":"enabled","usedBy":["foo@docker"],"serverStatus":{"http://172.18.0.3:80":"UP","http://172.18.0.4:80":"DOWN"},"name":"foo@docker","provider":"docker","type":"loadbalancer"}
New 'foo_BLUE' service (172.18.0.4) seems unhealthy. Waiting (6)...
Using Traefik health check:
    Traefik API response: {"loadBalancer":{"servers":[{"url":"http://172.18.0.4:80"},{"url":"http://172.18.0.3:80"}],"healthCheck":{"path":"/bar","interval":"10s","timeout":"9s"},"passHostHeader":true},"status":"enabled","usedBy":["foo@docker"],"serverStatus":{"http://172.18.0.3:80":"UP","http://172.18.0.4:80":"DOWN"},"name":"foo@docker","provider":"docker","type":"loadbalancer"}
New 'foo_BLUE' service (172.18.0.4) seems unhealthy. Waiting (7)...
Using Traefik health check:
    Traefik API response: {"loadBalancer":{"servers":[{"url":"http://172.18.0.4:80"},{"url":"http://172.18.0.3:80"}],"healthCheck":{"path":"/bar","interval":"10s","timeout":"9s"},"passHostHeader":true},"status":"enabled","usedBy":["foo@docker"],"serverStatus":{"http://172.18.0.3:80":"UP","http://172.18.0.4:80":"DOWN"},"name":"foo@docker","provider":"docker","type":"loadbalancer"}
New 'foo_BLUE' service (172.18.0.4) seems unhealthy. Waiting (8)...
Using Traefik health check:
    Traefik API response: {"loadBalancer":{"servers":[{"url":"http://172.18.0.4:80"},{"url":"http://172.18.0.3:80"}],"healthCheck":{"path":"/bar","interval":"10s","timeout":"9s"},"passHostHeader":true},"status":"enabled","usedBy":["foo@docker"],"serverStatus":{"http://172.18.0.3:80":"UP","http://172.18.0.4:80":"DOWN"},"name":"foo@docker","provider":"docker","type":"loadbalancer"}
New 'foo_BLUE' service (172.18.0.4) seems unhealthy. Waiting (9)...
Using Traefik health check:
    Traefik API response: {"loadBalancer":{"servers":[{"url":"http://172.18.0.4:80"},{"url":"http://172.18.0.3:80"}],"healthCheck":{"path":"/bar","interval":"10s","timeout":"9s"},"passHostHeader":true},"status":"enabled","usedBy":["foo@docker"],"serverStatus":{"http://172.18.0.3:80":"UP","http://172.18.0.4:80":"DOWN"},"name":"foo@docker","provider":"docker","type":"loadbalancer"}
New 'foo_BLUE' service (172.18.0.4) seems unhealthy. Waiting (10)...
Using Traefik health check:
    Traefik API response: {"loadBalancer":{"servers":[{"url":"http://172.18.0.4:80"},{"url":"http://172.18.0.3:80"}],"healthCheck":{"path":"/bar","interval":"10s","timeout":"9s"},"passHostHeader":true},"status":"enabled","usedBy":["foo@docker"],"serverStatus":{"http://172.18.0.3:80":"UP","http://172.18.0.4:80":"DOWN"},"name":"foo@docker","provider":"docker","type":"loadbalancer"}
New 'foo_BLUE' service (172.18.0.4) seems unhealthy. Waiting (11)...
Using Traefik health check:
    Traefik API response: {"loadBalancer":{"servers":[{"url":"http://172.18.0.4:80"},{"url":"http://172.18.0.3:80"}],"healthCheck":{"path":"/bar","interval":"10s","timeout":"9s"},"passHostHeader":true},"status":"enabled","usedBy":["foo@docker"],"serverStatus":{"http://172.18.0.3:80":"UP","http://172.18.0.4:80":"DOWN"},"name":"foo@docker","provider":"docker","type":"loadbalancer"}
New 'foo_BLUE' service (172.18.0.4) seems unhealthy. Waiting (12)...
Using Traefik health check:
    Traefik API response: {"loadBalancer":{"servers":[{"url":"http://172.18.0.4:80"},{"url":"http://172.18.0.3:80"}],"healthCheck":{"path":"/bar","interval":"10s","timeout":"9s"},"passHostHeader":true},"status":"enabled","usedBy":["foo@docker"],"serverStatus":{"http://172.18.0.3:80":"UP","http://172.18.0.4:80":"DOWN"},"name":"foo@docker","provider":"docker","type":"loadbalancer"}
New 'foo_BLUE' service (172.18.0.4) seems unhealthy. Waiting (13)...
Using Traefik health check:
    Traefik API response: {"loadBalancer":{"servers":[{"url":"http://172.18.0.4:80"},{"url":"http://172.18.0.3:80"}],"healthCheck":{"path":"/bar","interval":"10s","timeout":"9s"},"passHostHeader":true},"status":"enabled","usedBy":["foo@docker"],"serverStatus":{"http://172.18.0.3:80":"UP","http://172.18.0.4:80":"DOWN"},"name":"foo@docker","provider":"docker","type":"loadbalancer"}
New 'foo_BLUE' service (172.18.0.4) seems unhealthy. Waiting (14)...
Using manual health check, because Traefik seems hung (HACK):
    Requesting http://172.18.0.4/bar in the docker network:

<div>
    page content
</div>

New 'foo_BLUE' service seems operational. Stopping 'foo_GREEN' project...
foo_GREEN
Deployment successful!

The delay between checks is 10 seconds.
If it fails 14 times, the last 5 times is checked manually like this: docker run --rm --network webgateway busybox wget -qO- "$health_check_address"

@zvymazal
Copy link

zvymazal commented Feb 4, 2020

I'm encountering exactly the same behavior with the latest traefik release. It's clearly a healthcheck issue as the service is correctly registered with traefik and its responding to http request from traefik container but no healthchecks are being sent. Exactly as described above - once the old container is shut down then the healthchecks are resumed and container is reported correctly as healthy again.

Version:      2.1.3
Codename:     cantal
Go version:   go1.13.6
Built:        2020-01-21T17:30:29Z
OS/Arch:      linux/amd64```

@ldez ldez added this to issues in v2 via automation Feb 4, 2020
@zvymazal
Copy link

zvymazal commented Feb 13, 2020

I have tried to add some extra debug logging and reproduce the behavior to pinpoint where the problem can be. Here's what I observed:

If an event triggers https://github.com/containous/traefik/blob/0c90f6afa24ef390fec43ca654f806915e821daa/pkg/server/service/service.go#L201 then BackendConfig configurations get re-created and eventually healthchecks are updated at: https://github.com/containous/traefik/blob/0c90f6afa24ef390fec43ca654f806915e821daa/pkg/healthcheck/healthcheck.go#L115
The problem seems to be that disabledURLs is a property of BackendConfig and the value can get lost when configuration is updated. This is the case if two events trigger this behavior at nearly the same time. Exactly what is happening in: https://github.com/containous/traefik/blob/0c90f6afa24ef390fec43ca654f806915e821daa/pkg/server/routerfactory.go#L74
Then it's only a matter of specific timing if the server's URL is preserved in the healthcheck configuration or not.

To me it would make more sense to store the disabled URLs on the LB so that it cannot get lost when healthcheck configuration is changed or if there are multiple events in short succession. I'm not a go developer and have no insight into the rest of the code though.

Please find the extra debug logs prepended with >>> and referencing particular line in the source code (HEAD 0c90f6afa24ef390fec43ca654f806915e821daa).

Docker compose file for traefik:

version: "3.7"

services:
  traefik:
    image: "containous/traefik:latest"
    container_name: "traefik"
    command:
      - "--accesslog=true"
      - "--accesslog.bufferingsize=100"
      - "--accesslog.filepath=/var/log/traefik/access.log"
      - "--accesslog.format=json"
      - "--api=false"
      - "--api.dashboard=false"
      - "--entrypoints.web.address=:80"
      - "--entryPoints.web.forwardedHeaders.insecure"
      - "--log.format=common"
      - "--log.level=DEBUG"
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
    networks:
      - "proxy"
    ports:
      - "80:80"
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock:ro"

networks:
  proxy:
    driver: bridge
    name: proxy

Docker compose file for application:

version: "3.7"

services:
  web:
    image: "xxx/xxx"
    networks:
      - "proxy"
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.web.rule=hostregexp(`{host:.+}`)"
      - "traefik.http.routers.web.entrypoints=web"
      - "traefik.http.routers.web.middlewares=retry@docker,header_https@docker"
      - "traefik.http.services.web.loadbalancer.healthcheck.headers.X-Forwarded-Proto=https"
      - "traefik.http.services.web.loadbalancer.healthcheck.interval=5s"
      - "traefik.http.services.web.loadbalancer.healthcheck.path=/_ping"
      - "traefik.http.services.web.loadbalancer.healthcheck.port=80"
      - "traefik.http.services.web.loadbalancer.healthcheck.timeout=4s"
      - "traefik.http.middlewares.retry.retry.attempts=3"
      - "traefik.http.middlewares.header_https.headers.customrequestheaders.X-Forwarded-Proto=https"

networks:
  proxy:
    external: true
    name: proxy

Debug log:

traefik    | time="2020-02-13T15:08:05Z" level=debug msg="Configuration received from provider docker: {\"http\":{\"routers\":{\"web\":{\"entryPoints\":[\"web\"],\"middlewares\":[\"retry@docker\",\"header_https@docker\"],\"service\":\"web\",\"rule\":\"hostregexp(`{host:.+}`)\"}},\"middlewares\":{\"header_https\":{\"headers\":{\"customRequestHeaders\":{\"X-Forwarded-Proto\":\"https\"}}},\"retry\":{\"retry\":{\"attempts\":3}}},\"services\":{\"web\":{\"loadBalancer\":{\"servers\":[{\"url\":\"http://172.19.0.3:80\"}],\"healthCheck\":{\"path\":\"/_ping\",\"port\":80,\"interval\":\"5s\",\"timeout\":\"4s\",\"headers\":{\"X-Forwarded-Proto\":\"https\"}},\"passHostHeader\":true}}}},\"tcp\":{},\"udp\":{}}" providerName=docker
traefik    | time="2020-02-13T15:08:05Z" level=debug msg=">>>: routerfactory.go:L61: Entering CreateRouters"
traefik    | time="2020-02-13T15:08:05Z" level=debug msg=">>>: router.go:L73: Entering BuildHandlers - tls: false"
traefik    | time="2020-02-13T15:08:05Z" level=debug msg="Creating Middleware (ResponseModifier)" middlewareName=header_https@docker middlewareType=Headers entryPointName=web routerName=web@docker
traefik    | time="2020-02-13T15:08:05Z" level=debug msg="Creating middleware" routerName=web@docker serviceName=web middlewareName=pipelining middlewareType=Pipelining entryPointName=web
traefik    | time="2020-02-13T15:08:05Z" level=debug msg="Creating load-balancer" entryPointName=web routerName=web@docker serviceName=web
traefik    | time="2020-02-13T15:08:05Z" level=debug msg="Creating server 0 http://172.19.0.3:80" serviceName=web serverName=0 entryPointName=web routerName=web@docker
traefik    | time="2020-02-13T15:08:05Z" level=debug msg="Added outgoing tracing middleware web" middlewareType=TracingForwarder middlewareName=tracing entryPointName=web routerName=web@docker
traefik    | time="2020-02-13T15:08:05Z" level=debug msg="Creating middleware" middlewareType=Headers entryPointName=web routerName=web@docker middlewareName=header_https@docker
traefik    | time="2020-02-13T15:08:05Z" level=debug msg="Setting up customHeaders/Cors from %v{map[X-Forwarded-Proto:https] map[] false [] []  [] 0 false [] [] false false  map[] false 0 false false false false  false false      false}" middlewareName=header_https@docker middlewareType=Headers entryPointName=web routerName=web@docker
traefik    | time="2020-02-13T15:08:05Z" level=debug msg="Adding tracing to middleware" entryPointName=web routerName=web@docker middlewareName=header_https@docker
traefik    | time="2020-02-13T15:08:05Z" level=debug msg="Creating middleware" middlewareName=retry@docker middlewareType=Retry entryPointName=web routerName=web@docker
traefik    | time="2020-02-13T15:08:05Z" level=debug msg="Adding tracing to middleware" entryPointName=web routerName=web@docker middlewareName=retry@docker
traefik    | time="2020-02-13T15:08:05Z" level=debug msg="Creating middleware" entryPointName=web middlewareName=traefik-internal-recovery middlewareType=Recovery
traefik    | time="2020-02-13T15:08:05Z" level=debug msg=">>>: service.go:L202 : Entering LaunchHealthCheck"
traefik    | time="2020-02-13T15:08:05Z" level=debug msg="Setting up healthcheck for service web@docker with [Hostname:  Headers: map[X-Forwarded-Proto:https] Scheme:  Path: /_ping Port: 80 Interval: 5s Timeout: 4s]" serviceName=web@docker
traefik    | time="2020-02-13T15:08:05Z" level=warning msg=">>>: healthcheck.go:L115 : new backend: [name: web@docker Options: [Hostname:  Headers: map[X-Forwarded-Proto:https] Scheme:  Path: /_ping Port: 80 Interval: 5s Timeout: 4s] disabledURLs: []]"
traefik    | time="2020-02-13T15:08:05Z" level=debug msg="Initial health check for backend: \"web@docker\""
traefik    | time="2020-02-13T15:08:05Z" level=debug msg=">>>: healthcheck.go:L152 : Enabled URLs: [http://172.19.0.3:80]"
traefik    | time="2020-02-13T15:08:05Z" level=debug msg=">>>: healthcheck.go:L152 : Disabled URLs: []"
traefik    | time="2020-02-13T15:08:05Z" level=warning msg="Health check failed, removing from server list. Backend: \"web@docker\" URL: \"http://172.19.0.3:80\" Weight: 1 Reason: HTTP request failed: Get http://172.19.0.3:80/_ping: dial tcp 172.19.0.3:80: connect: connection refused"
traefik    | time="2020-02-13T15:08:05Z" level=debug msg=">>>: healthcheck.go:L190 : backend status: [name: web@docker Options: [Hostname:  Headers: map[X-Forwarded-Proto:https] Scheme:  Path: /_ping Port: 80 Interval: 5s Timeout: 4s] disabledURLs: [[url: http://172.19.0.3:80 weight: 1]]]"
traefik    | time="2020-02-13T15:08:05Z" level=debug msg=">>>: router.go:L73: Entering BuildHandlers - tls: true"
traefik    | time="2020-02-13T15:08:05Z" level=debug msg=">>>: service.go:L202 : Entering LaunchHealthCheck"
traefik    | time="2020-02-13T15:08:05Z" level=debug msg="Setting up healthcheck for service web@docker with [Hostname:  Headers: map[X-Forwarded-Proto:https] Scheme:  Path: /_ping Port: 80 Interval: 5s Timeout: 4s]" serviceName=web@docker
traefik    | time="2020-02-13T15:08:05Z" level=warning msg=">>>: healthcheck.go:L115 : old backend: [name: web@docker Options: [Hostname:  Headers: map[X-Forwarded-Proto:https] Scheme:  Path: /_ping Port: 80 Interval: 5s Timeout: 4s] disabledURLs: [[url: http://172.19.0.3:80 weight: 1]]]"
traefik    | time="2020-02-13T15:08:05Z" level=warning msg=">>>: healthcheck.go:L115 : new backend: [name: web@docker Options: [Hostname:  Headers: map[X-Forwarded-Proto:https] Scheme:  Path: /_ping Port: 80 Interval: 5s Timeout: 4s] disabledURLs: []]"
traefik    | time="2020-02-13T15:08:05Z" level=debug msg="Stopping current health check goroutines of backend: web@docker"
traefik    | time="2020-02-13T15:08:05Z" level=debug msg="No default certificate, generating one"
traefik    | time="2020-02-13T15:08:05Z" level=debug msg="Initial health check for backend: \"web@docker\""
traefik    | time="2020-02-13T15:08:05Z" level=debug msg=">>>: healthcheck.go:L152 : Enabled URLs: []"
traefik    | time="2020-02-13T15:08:05Z" level=debug msg=">>>: healthcheck.go:L152 : Disabled URLs: []"
traefik    | time="2020-02-13T15:08:10Z" level=debug msg="Refreshing health check for backend: web@docker"
traefik    | time="2020-02-13T15:08:10Z" level=debug msg=">>>: healthcheck.go:L152 : Enabled URLs: []"
traefik    | time="2020-02-13T15:08:10Z" level=debug msg=">>>: healthcheck.go:L152 : Disabled URLs: []"
traefik    | time="2020-02-13T15:08:15Z" level=debug msg="Refreshing health check for backend: web@docker"
traefik    | time="2020-02-13T15:08:15Z" level=debug msg=">>>: healthcheck.go:L152 : Enabled URLs: []"
traefik    | time="2020-02-13T15:08:15Z" level=debug msg=">>>: healthcheck.go:L152 : Disabled URLs: []"
traefik    | time="2020-02-13T15:08:20Z" level=debug msg="Refreshing health check for backend: web@docker"
traefik    | time="2020-02-13T15:08:20Z" level=debug msg=">>>: healthcheck.go:L152 : Enabled URLs: []"
traefik    | time="2020-02-13T15:08:20Z" level=debug msg=">>>: healthcheck.go:L152 : Disabled URLs: []"

Please feel free to reach out if there's some more info I might be able to provide.

zvymazal added a commit to zvymazal/traefik that referenced this issue Feb 14, 2020
@clownba0t
Copy link

I seem to be seeing the same, or at least a very similar, issue when containers are restarted when docker comes back up after a host reboot - in this case, a vanilla Traefik v2.1.4 container and a single application container that registers itself with Traefik. The symptoms appear to be identical to those posted by others - two initial health checks are started very close to one another, one fails, then all subsequent health checks (including the second initial health check) run but do nothing and the server continues to appear down to Traefik despite actually being up.

@clownba0t
Copy link

clownba0t commented Feb 19, 2020

I also seem to be seeing this happen on another host I manage that has two applications behind Traefik. In this case, application B's health check starts failing when application A is deployed (application B is not deployed, so its container remains unchanged throughout).

The root cause would appear to be the same, although in this case it's very interesting that the health check for an existing and unchanged container is affected. It seems that the docker provider restarts health checks for all containers in response to any single container event?

@zvymazal
Copy link

It seems that the docker provider restarts health checks for all containers in response to any single container event?

Yes, this seems to be the case.

@traefiker
Copy link
Contributor

Closed by #6372.

v2 automation moved this from issues to Done Feb 25, 2020
@Himura2la
Copy link
Author

Should the #3834 be also closed?

@juliens
Copy link
Member

juliens commented Feb 26, 2020

@Himura2la no, because the change is only on the 2.1 codebase.

@traefik traefik locked and limited conversation to collaborators Apr 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/healthcheck kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. status/5-frozen-due-to-age
Projects
No open projects
v2
Done
Development

No branches or pull requests

5 participants