Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat store: added readiness and livenes prober #1460

Merged
merged 1 commit into from
Sep 23, 2019

Conversation

FUSAKLA
Copy link
Member

@FUSAKLA FUSAKLA commented Aug 26, 2019

Added the prober to yet another component. Store this time.

The store does the bucket initialization and sync synchronously, so the default http server will start up only once it is finished. Then it gets healthy(live) and once the gRPC server starts up the store gets also ready.

Personally I'd much rather see the liveness probe to answer right from the start during the bucket init since this way we leave the bucket sync duration to some initial back-off of an orchestrator.

  • CHANGELOG entry if change is relevant to the end user.

Changes

  • added prober to the thanos store

Verification

Started it and verified it gets ready end healthy as expected.

@FUSAKLA FUSAKLA force-pushed the fus-store-prober branch 2 times, most recently from 0027f18 to e44d882 Compare August 26, 2019 21:05
@FUSAKLA FUSAKLA force-pushed the fus-store-prober branch 2 times, most recently from 8457aec to f15ba68 Compare August 29, 2019 19:37
Copy link
Member

@kakkoyun kakkoyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @FUSAKLA, thanks a lot for this work, we desperately need it. Let's try to push it to finish line.
I have left a small comment, could you please check it?

And maybe you can also rebase it.

cmd/thanos/store.go Show resolved Hide resolved
@kakkoyun
Copy link
Member

Hey @FUSAKLA, I realized that we miss liveness and healthiness checks for various components when I was working on https://github.com/metalmatze/kube-thanos/pull/42 and then I have seen your work #656, #1395 and #1297.

We need probes in place for all other components as well. I'd like to push this effort further as fast as possible. I'm happy to help. I could create PRs for components which you haven't touch yet if it is also ok for you? I'd be happy to hear your opinion on those PRs as well.

What do you think?

@FUSAKLA
Copy link
Member Author

FUSAKLA commented Sep 17, 2019

Hi @kakkoyun ! Yeah I didn't want to make more PRs at once since even this way it takes it's time to get these reviewed. Every one of them raises big discussion on when to report as ready and when as healthy. Thats also the reason it got eventually splitted up but help is always welcomed so feel free to add those PRS! I'd love to see this done finally :D
Thanks for the initiative!

@kakkoyun
Copy link
Member

kakkoyun commented Sep 17, 2019

@FUSAKLA Alright, great! Let's do this then. I've already sent the first one and pinged you on #1534, PTAL.

It'd be great if we can push this one too.

CHANGELOG.md Outdated Show resolved Hide resolved
cmd/thanos/store.go Outdated Show resolved Hide resolved
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some minor nits only (:

cmd/thanos/store.go Show resolved Hide resolved
cmd/thanos/store.go Outdated Show resolved Hide resolved
@FUSAKLA
Copy link
Member Author

FUSAKLA commented Sep 18, 2019

Thanks @bwplotka ! Should be resolved and rebased on master.

if err != nil {
return errors.Wrap(err, "create bucket client")
runutil.CloseWithLogOnErr(logger, bkt, "bucket client")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove if err !=nil, as it is run in defer, err will be overwritten by non related code which happens after this line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it was meant to close the client if the runStore ended up with an error.
Otherwise it should not close the client since others are still using it.
Maybe that's why the if err != nil is there and the fact err will be overwritten by the following code was the intention?

But I did not touch this code actually, just removed the unneeded code block wrapping it.

cancel()
ctx, cancel := context.WithCancel(context.Background())
g.Add(func() error {
defer runutil.CloseWithLogOnErr(logger, bkt, "bucket client")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit confusing, so we were closing bkt earlier, why are we closing here again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented here #1460 (comment)

cmd/thanos/store.go Outdated Show resolved Hide resolved
Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
@FUSAKLA
Copy link
Member Author

FUSAKLA commented Sep 22, 2019

@kakkoyun PTAL I removed the metricHTTPListenGroup since it is not used anymore and unified the changelog

Copy link
Member

@kakkoyun kakkoyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @FUSAKLA. LGTM. I believe, we've covered all components with this PR.

@brancz brancz merged commit 7a2f54e into thanos-io:master Sep 23, 2019
ivan-kiselev pushed a commit to ivan-kiselev/thanos that referenced this pull request Sep 26, 2019
Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
ivan-kiselev pushed a commit to ivan-kiselev/thanos that referenced this pull request Sep 26, 2019
Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>
brancz pushed a commit that referenced this pull request Sep 26, 2019
* Some updates to compact docs

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* some formatting

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Update docs/components/compact.md

accept PR suggestions

Co-Authored-By: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Add metalmatze to list of maintainers (#1547)

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* resolve comments

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* resolve last comment

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* receive: Add liveness and readiness probe (#1537)

* Add prober to receive

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add changelog entries

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Update README

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Remove default

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Wait hashring to be ready

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* downsample: Add liveness and readiness probe (#1540)

* Add readiness and liveness probes for downsampler

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add changelog entry

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Remove default

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Set ready

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Update CHANGELOG

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Clean CHANGELOG

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Document the dnssrvnoa option (#1551)

Signed-off-by: Antonio Santos <antonio@santosvelasco.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* feat store: added readiness and livenes prober (#1460)

Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Add Hotstar to adopters. (#1553)

It's the largest streaming service in India that does cricket and GoT
for India. They have insane scale and are using Thanos to scale their
Prometheus.

Spoke to them offline about adding the logo and will get a signoff here
too.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Fix hotstar logo in the adoptor's list (#1558)

Signed-off-by: Karthik Vijayaraju <karthik@hotstar.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Fix typos, including 'fomrat' -> 'format' in tracing.config-file help text. (#1552)

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Compactor: Fix for #844 - Ignore object if it is the current directory (#1544)

* Ignore object if it is the current directory

Signed-off-by: Jamie Poole <jimbobby5@yahoo.com>

* Add full-stop

Signed-off-by: Jamie Poole <jimbobby5@yahoo.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Adding doc explaining the importance of groups for compactor (#1555)

Signed-off-by: Leo Meira Vital <leo.vital@nubank.com.br>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Add blank line for list (#1566)

The format of these files is wrong in the web.

Signed-off-by: dongwenjuan <dong.wenjuan@zte.com.cn>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Refactor compactor constants, fix bucket column (#1561)

* compact: unify different time constants

Use downsample.* constants where possible. Move the downsampling time
ranges into constants and use them as well.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* bucket: refactor column calculation into compact

Fix the column's name and name it UNTIL-DOWN because that is what it
actually shows - time until the next downsampling.

Move out the calculation into a separate function into the compact
package. Ideally we could use the retention policies in this calculation
as well but the `bucket` subcommand knows nothing about them :-(

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* compact: fix issues with naming

Reorder the constants and fix mistakes.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* remove duplicate

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>
GiedriusS pushed a commit that referenced this pull request Oct 28, 2019
Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
GiedriusS pushed a commit that referenced this pull request Oct 28, 2019
* Some updates to compact docs

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* some formatting

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Update docs/components/compact.md

accept PR suggestions

Co-Authored-By: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Add metalmatze to list of maintainers (#1547)

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* resolve comments

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* resolve last comment

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* receive: Add liveness and readiness probe (#1537)

* Add prober to receive

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add changelog entries

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Update README

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Remove default

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Wait hashring to be ready

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* downsample: Add liveness and readiness probe (#1540)

* Add readiness and liveness probes for downsampler

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add changelog entry

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Remove default

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Set ready

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Update CHANGELOG

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Clean CHANGELOG

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Document the dnssrvnoa option (#1551)

Signed-off-by: Antonio Santos <antonio@santosvelasco.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* feat store: added readiness and livenes prober (#1460)

Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Add Hotstar to adopters. (#1553)

It's the largest streaming service in India that does cricket and GoT
for India. They have insane scale and are using Thanos to scale their
Prometheus.

Spoke to them offline about adding the logo and will get a signoff here
too.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Fix hotstar logo in the adoptor's list (#1558)

Signed-off-by: Karthik Vijayaraju <karthik@hotstar.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Fix typos, including 'fomrat' -> 'format' in tracing.config-file help text. (#1552)

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Compactor: Fix for #844 - Ignore object if it is the current directory (#1544)

* Ignore object if it is the current directory

Signed-off-by: Jamie Poole <jimbobby5@yahoo.com>

* Add full-stop

Signed-off-by: Jamie Poole <jimbobby5@yahoo.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Adding doc explaining the importance of groups for compactor (#1555)

Signed-off-by: Leo Meira Vital <leo.vital@nubank.com.br>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Add blank line for list (#1566)

The format of these files is wrong in the web.

Signed-off-by: dongwenjuan <dong.wenjuan@zte.com.cn>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Refactor compactor constants, fix bucket column (#1561)

* compact: unify different time constants

Use downsample.* constants where possible. Move the downsampling time
ranges into constants and use them as well.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* bucket: refactor column calculation into compact

Fix the column's name and name it UNTIL-DOWN because that is what it
actually shows - time until the next downsampling.

Move out the calculation into a separate function into the compact
package. Ideally we could use the retention policies in this calculation
as well but the `bucket` subcommand knows nothing about them :-(

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* compact: fix issues with naming

Reorder the constants and fix mistakes.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* remove duplicate

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>
Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants