Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traefik shuts down when SystemD watchdog run #1353

Closed
jskarpe opened this issue Mar 28, 2017 · 13 comments
Closed

Traefik shuts down when SystemD watchdog run #1353

jskarpe opened this issue Mar 28, 2017 · 13 comments
Labels
Milestone

Comments

@jskarpe
Copy link

jskarpe commented Mar 28, 2017

What version of Traefik are you using (traefik version)?

1.2.1

What did you do?

Install service file from: https://github.com/containous/traefik/blob/master/contrib/systemd/traefik.service

What did you expect to see?

Running service

What did you see instead?

Crashing service every 2s, with SystemD starting it again. Adjusting watchdog timings correlate with time between crashes

@timoreimann
Copy link
Contributor

@yuav can you paste log outputs please?

@q7r
Copy link

q7r commented Apr 19, 2017

@timoreimann I have the very same problem with v1.2.3. Which log do you need?
Here is systemctl status:

[root@traefik system]# systemctl status traefik.service
● traefik.service - Traefik
   Loaded: loaded (/etc/systemd/system/traefik.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2017-04-19 23:40:28 EEST; 418ms ago
 Main PID: 9890 (traefik)
   CGroup: /system.slice/traefik.service
           └─9890 /usr/bin/traefik --configFile=/etc/traefik.toml

Apr 19 23:40:28 traefik systemd[1]: traefik.service holdoff time over, scheduling restart.
Apr 19 23:40:28 traefik systemd[1]: Starting Traefik...
Apr 19 23:40:28 traefik systemd[1]: Started Traefik.
Apr 19 23:40:28 traefik traefik[9890]: 2017/04/19 23:40:28 structs.go:21: Connected to 10.10.10.10:2181
Apr 19 23:40:28 traefik traefik[9890]: 2017/04/19 23:40:28 structs.go:21: Authenticated: id=97820254626581384, timeout=40000
[root@traefik system]# systemctl status traefik.service
● traefik.service - Traefik
   Loaded: loaded (/etc/systemd/system/traefik.service; enabled; vendor preset: disabled)
   Active: deactivating (stop-sigabrt) (Result: watchdog) since Wed 2017-04-19 23:40:29 EEST; 3ms ago
 Main PID: 9890 (traefik)
   CGroup: /system.slice/traefik.service
           └─9890 /usr/bin/traefik --configFile=/etc/traefik.toml

Apr 19 23:40:28 traefik systemd[1]: traefik.service holdoff time over, scheduling restart.
Apr 19 23:40:28 traefik systemd[1]: Starting Traefik...
Apr 19 23:40:28 traefik systemd[1]: Started Traefik.
Apr 19 23:40:28 traefik traefik[9890]: 2017/04/19 23:40:28 structs.go:21: Connected to 10.10.10.10:2181
Apr 19 23:40:28 traefik traefik[9890]: 2017/04/19 23:40:28 structs.go:21: Authenticated: id=97820254626581384, timeout=40000
Apr 19 23:40:29 traefik systemd[1]: traefik.service watchdog timeout (limit 1s)!
[root@traefik system]# systemctl status traefik.service
● traefik.service - Traefik
   Loaded: loaded (/etc/systemd/system/traefik.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Wed 2017-04-19 23:40:31 EEST; 5ms ago
  Process: 9899 ExecStart=/usr/bin/traefik --configFile=/etc/traefik.toml (code=exited, status=2)
 Main PID: 9899 (code=exited, status=2)

Apr 19 23:40:31 traefik traefik[9899]: rip    0x477223
Apr 19 23:40:31 traefik traefik[9899]: rflags 0x246
Apr 19 23:40:31 traefik traefik[9899]: cs     0x33
Apr 19 23:40:31 traefik traefik[9899]: fs     0x0
Apr 19 23:40:31 traefik traefik[9899]: gs     0x0
Apr 19 23:40:31 traefik systemd[1]: traefik.service holdoff time over, scheduling restart.
Apr 19 23:40:31 traefik systemd[1]: start request repeated too quickly for traefik.service
Apr 19 23:40:31 traefik systemd[1]: Failed to start Traefik.
Apr 19 23:40:31 traefik systemd[1]: Unit traefik.service entered failed state.
Apr 19 23:40:31 traefik systemd[1]: traefik.service failed.

It works without WatchdogSec=1s line in systemd file.

@emilevauge
Copy link
Member

@guilhem any idea on what's going on here ? 🤔

@philiplb
Copy link

It seems to work for me if I set the WatchdogSec to 10s.

@Yggdrasil
Copy link
Contributor

Same issue for me with 1.2.3. @philiplb's suggestion doesn't work for me. Only disabling the Watchdog timer completely (i.e. removing the line WatchdogSec=1s) keeps Traefik running.

@guilhem
Copy link
Contributor

guilhem commented Apr 30, 2017

I will look at it this week-end

@ldez ldez closed this as completed in #1525 May 3, 2017
m3co-code pushed a commit to m3co-code/traefik that referenced this issue Aug 22, 2017
Commit coreos/go-systemd@0c088e introduce cleaning environment.
First usage of sdnotify (for type=notify) was clearing NOTIFY_SOCKET environment variable.
sdnotify in watchdog was unable to ping back.

Fix traefik#1353
@nsteinmetz
Copy link

nsteinmetz commented Feb 13, 2018

Hi there,

Just had the issue with 1.5.2 whereas I didn't have it before. New thing in my context is that I use ubuntu 16.04 LTS at my client whereas I used to used Debian on servers or HypriotOS on arm boards.

I add to remove the Watchdog directive in the systemd service file to make it work again.

cat /etc/systemd/system/traefik.service
[Unit]
Description=Traefik

[Service]
Type=notify
ExecStart=/usr/bin/traefik --configFile=/etc/traefik/conf/traefik.toml
Restart=always

[Install]
WantedBy=multi-user.target

@guilhem
Copy link
Contributor

guilhem commented Feb 13, 2018

@nsteinmetz do you have a log (journald) to provides?

@nsteinmetz
Copy link

Oh yes, I posted them on slack:

Feb 13 10:44:34  systemd[1]: traefik.service: Service hold-off time over, scheduling restart.
Feb 13 10:44:34  systemd[1]: Stopped Traefik.
Feb 13 10:44:34  systemd[1]: Starting Traefik...
Feb 13 10:44:35  systemd[1]: Started Traefik.
Feb 13 10:44:38  traefik[10341]: goroutine 71 [running]:
Feb 13 10:44:38  traefik[10341]: runtime/debug.Stack(0x74f40f, 0xc42000e370, 0x2128009)
Feb 13 10:44:38  traefik[10341]:         /usr/local/go/src/runtime/debug/stack.go:24 +0xa7
Feb 13 10:44:38  traefik[10341]: runtime/debug.PrintStack()
Feb 13 10:44:38  traefik[10341]:         /usr/local/go/src/runtime/debug/stack.go:16 +0x22
Feb 13 10:44:38  traefik[10341]: github.com/containous/traefik/safe.defaultRecoverGoroutine(0x1dd7b00, 0x32e5af0)
Feb 13 10:44:38  traefik[10341]:         /go/src/github.com/containous/traefik/safe/routine.go:148 +0x7d
Feb 13 10:44:38  traefik[10341]: github.com/containous/traefik/safe.GoWithRecover.func1.1(0x21aa728)
Feb 13 10:44:38  traefik[10341]:         /go/src/github.com/containous/traefik/safe/routine.go:139 +0x57
Feb 13 10:44:38  traefik[10341]: panic(0x1dd7b00, 0x32e5af0)
Feb 13 10:44:38  traefik[10341]:         /usr/local/go/src/runtime/panic.go:491 +0x283
Feb 13 10:44:38  traefik[10341]: main.healthCheck(0xc4205316a0, 0x0, 0x100, 0x0, 0x0, 0xc4202ee5c0, 0x0, 0x0, 0xc4202ee600, 0xc42035ad28, ...)
Feb 13 10:44:38  traefik[10341]:         /go/src/github.com/containous/traefik/cmd/traefik/healthcheck.go:53 +0x37
Feb 13 10:44:38  traefik[10341]: main.run.func1()
Feb 13 10:44:38  traefik[10341]:         /go/src/github.com/containous/traefik/cmd/traefik/traefik.go:180 +0xfd
Feb 13 10:44:38  traefik[10341]: github.com/containous/traefik/safe.GoWithRecover.func1(0x21aa728, 0xc4205a3360)
Feb 13 10:44:38  traefik[10341]:         /go/src/github.com/containous/traefik/safe/routine.go:142 +0x4d
Feb 13 10:44:38  traefik[10341]: created by github.com/containous/traefik/safe.GoWithRecover
Feb 13 10:44:38  traefik[10341]:         /go/src/github.com/containous/traefik/safe/routine.go:136 +0x49
...skipping...
Feb 13 10:44:34  traefik[10325]:         /usr/local/go/src/bufio/bufio.go:129 +0x3a
Feb 13 10:44:34  traefik[10325]: net/http.(*conn).serve(0xc420260d20, 0x30b9f40, 0xc42065e300)
Feb 13 10:44:34  traefik[10325]:         /usr/local/go/src/net/http/server.go:1826 +0x88f
Feb 13 10:44:34  traefik[10325]: created by net/http.(*Server).Serve
Feb 13 10:44:34  traefik[10325]:         /usr/local/go/src/net/http/server.go:2720 +0x288
Feb 13 10:44:34  traefik[10325]: rax    0xca
Feb 13 10:44:34  traefik[10325]: rbx    0x32ff8c0
Feb 13 10:44:34  traefik[10325]: rcx    0x45eb23
Feb 13 10:44:34  traefik[10325]: rdx    0x0
Feb 13 10:44:34  traefik[10325]: rdi    0x32ff9f8
Feb 13 10:44:34  traefik[10325]: rsi    0x0
Feb 13 10:44:34  traefik[10325]: rbp    0x7ffe5bb89010
Feb 13 10:44:34  traefik[10325]: rsp    0x7ffe5bb88fc8
Feb 13 10:44:34  traefik[10325]: r8     0x0
Feb 13 10:44:34  traefik[10325]: r9     0x0
Feb 13 10:44:34  traefik[10325]: r10    0x0
Feb 13 10:44:34  traefik[10325]: r11    0x286
Feb 13 10:44:34  traefik[10325]: r12    0x0
Feb 13 10:44:34  traefik[10325]: r13    0x0
Feb 13 10:44:34  traefik[10325]: r14    0x45c360
Feb 13 10:44:34  traefik[10325]: r15    0x0
Feb 13 10:44:34  traefik[10325]: rip    0x45eb21
Feb 13 10:44:34  traefik[10325]: rflags 0x286
Feb 13 10:44:34  traefik[10325]: cs     0x33
Feb 13 10:44:34  traefik[10325]: fs     0x0
Feb 13 10:44:34  traefik[10325]: gs     0x0

@guilhem
Copy link
Contributor

guilhem commented Feb 13, 2018

@nsteinmetz watchdog is sharing code with traefik healtcheck command.
Can you try to run it and see what happen?
It seems to have a problem with it.

@ldez
Copy link
Member

ldez commented Feb 13, 2018

@nsteinmetz could you open a new issue.

@nsteinmetz
Copy link

@ldez ok I will.

@ldez ldez added this to the 1.3 milestone Feb 13, 2018
@ldez ldez added the kind/bug/confirmed a confirmed bug (reproducible). label Feb 13, 2018
@nsteinmetz
Copy link

Following issue now on : #2851

@traefik traefik locked and limited conversation to collaborators Sep 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

10 participants