Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Thanos Rule] Rules Files not reloading after SIGHUP signal #4432

Closed
jumakasy opened this issue Jul 9, 2021 · 4 comments
Closed

[Thanos Rule] Rules Files not reloading after SIGHUP signal #4432

jumakasy opened this issue Jul 9, 2021 · 4 comments

Comments

@jumakasy
Copy link

jumakasy commented Jul 9, 2021

Thanos, Prometheus and Golang version used:

thanos, version 0.21.1 (branch: HEAD, revision: 3558f4a)
build user: root@744cf7ef4576
build date: 20210604-12:11:58
go version: go1.16.5
platform: linux/amd64

What happened:
After rule files update, a SIGHUP signal is sent to Thanos in order to reload the rules in run time. The rule files update is no applied by Thanos Rule, keeping applying the same rules as when Thanos was started.

By restarting the Thanos process the rules are updated.

What you expected to happen:
Thanos should reload the rules at runtime after receiving SIGHUP signal, with no need to stop and start the Thanos process.

How to reproduce it (as minimally and precisely as possible):

Run Thanos Rule with minimal configuration and a basic and valid rule_test.yaml file:

thanos rule --log.level debug --log.format logfmt --http-address 0.0.0.0:10902 --http-grace-period 2m --grpc-address 0.0.0.0:10901 --grpc-grace-period 2m --data-dir ./data --rule-file './*.yml' --resend-delay 1m --eval-interval 30s --tsdb.block-duration 2h --tsdb.retention 2d --query thanos-query.domain :20902

Access UI and check that rules defined in rule_test.yamls file are being applied.
Add a new rules file (rule__test_2.yaml) to the same folder.

Get Thanos proccess id: thanos_pid=$(pgrep thanos)
Reload Thanos process: kill -1 $thanos_pid

Acces the UI, and check the rules.

Full logs to relevant components:

After sending the SIGUHP this is the log:

level=info ts=2021-07-09T15:43:27.658638469Z caller=main.go:180 msg="caught signal. Reloading." signal=hangup
level=info ts=2021-07-09T15:41:04.904147954Z caller=main.go:183 msg="reload dispatched."
level=debug ts=2021-07-09T15:43:49.605131566Z caller=promclient.go:398 component=rules msg="querying instant" url="http://10.103.69.158:30902/api/v1/query?......

Anything else we need to know:

Tested other Thanos versions, and the last version where it worked properly was v0.19.0.

Following the same steps with v0.19.0 the rules files are reloaded. The log is different, and its reporting that rules files are being loaded:
level=info ts=2021-07-09T15:41:04.904087152Z caller=main.go:180 msg="caught signal. Reloading." signal=hangup
level=info ts=2021-07-09T15:41:04.904147954Z caller=main.go:183 msg="reload dispatched."
level=debug ts=2021-07-09T15:41:04.904165954Z caller=rule.go:820 component=rules msg="configured rule files" files=./*.yaml
level=info ts=2021-07-09T15:41:04.904271701Z caller=rule.go:843 component=rules msg="reload rule files" numFiles=2

@jmichalek132
Copy link
Contributor

jmichalek132 commented Jul 10, 2021

Possible workaround until it's fixed is triggering reload by calling http endpoint: curl localhost:10902/-/reload -X POST.
I managed to reproduce the issue locally and the reloading trough the http endpoint still works.

@jmichalek132
Copy link
Contributor

I was able to track down this pr which introduces the bug. I was also able to fix it in my branch. With this change reloading using sighup works for me again.

@jumakasy
Copy link
Author

Possible workaround until it's fixed is triggering reload by calling http endpoint: curl localhost:10902/-/reload -X POST.
I managed to reproduce the issue locally and the reloading trough the http endpoint still works.

Yes, the workaround works.
Thanks 👍

@GiedriusS
Copy link
Member

Fixed by #4442. It's now covered by tests so it's working 100% 💪

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants