Proposal - config component within Thanos #387

domgreen · 2018-06-26T14:46:53Z

Changes

Added a proposal document on how we can do configuration management within Thanos, this should allow an improved and more consistent experience when scaling.

cc @krasi-georgiev @brancz

brancz · 2018-06-26T15:16:42Z

To be honest I don't think it's a good idea to re-invent configuration management within Thanos. Everything in terms of correctness from consistency to reproduce-ability is going to be a huge pain. I recommend sticking to the sidecar approach and build a wrapper around the sidecar for more specific or unique uses.

Regarding your tenancy distribution, sure it's a little overhead, but how bad is it really to have separate sets of Prometheus servers per tenant? (and from experience I can tell you some high profile customers have strict compliance requirements that will force you to have physically separated tenants) In our case we really are only looking to shard targets of a single tenant across multiple Prometheus servers, so if we can handle cutting tsdb chunks directly at shutdown and uploading them to object storage we would probably be fine with hashmod as well.

bwplotka

Some suggestions to the motivation section.

@domgreen I think the main thing here is to clearly divide TWO goals we want to solve here:

enable auto-scaling of scrapers to meet dynamic targets demands (including scaling down if they are under-utilized)
keep tenant's targets together if possible (I don't think we are sure it is must-have)

bwplotka · 2018-06-26T16:32:00Z

docs/proposals/config.md

+
+Currently, each scraper manages their own configuration via [Prometheus Configuration](https://prometheus.io/docs/prometheus/latest/configuration/configuration/) which contains information about the [scrape_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#%3Cscrape_config%3E) and targets that the scraper will be collecting metrics from.
+
+As we start to dynamically scale the collection of metrics (new Prometheus instances) or increase the targets for a current tenant we wish to keep the collection of metrics to a consistent node and not re-allocate shards to other scrapers. 


I would add some mention, that this is not always the case and not always needed for example:

"Most of the time you have static number of targets that allows you to use Prometheus hashmod to shard targets among available Prometheus instances. Even if they rarely change, you can quite easily add another instance and reshard which increases time series churn, but should be fine if done not too often.
However, this does not really work well for:

very dynamic targets (a couple of hundreds of targets going up and down every couple of hours)

multi-tenancy awareness. We might want to limit which scrapes we want to ask for data for certain user's metrics

I would love to bring down our use case to very simple form that would be easy to imagine. What do you think?

bwplotka · 2018-06-26T16:41:04Z

docs/proposals/config.md

+
+As we start to dynamically scale the collection of metrics (new Prometheus instances) or increase the targets for a current tenant we wish to keep the collection of metrics to a consistent node and not re-allocate shards to other scrapers. 
+
+One key use case would be keeping metrics from a given tenant (customer / team etc.) on a consistant and minimal amount of nodes.


Wonder if we should attack this goal from the very beginning. Basically, we don't really care if every query will ask 10 scrapers or 40 for just 1d data... Or in other words, we don't know if that's is the issue or not. (:

Can we start with first goal to support massive amount of dynamic targets going up and down in quite short time?

bwplotka · 2018-06-26T17:09:41Z

docs/proposals/config.md

+
+An alternative to this is to use the existing [hashmod](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config) functionality within Prometheus to enable [horizontal scaling](https://www.robustperception.io/scaling-and-federating-prometheus/), in this scenario a user would supply their entire configuaration to every Prometheus node and use hashing to scale out.
+The main downside of this is that as a Prometheus instance is added or removed from the cluster the targets will be moved to a new scrape instance. This is bad if you are wanting to keep tenants to a minimal number of Prometheus instances.
+We have seen this approach also have a problem with hot shards in the past whereby a given Prometheus instance may endup being overloaded and is difficult to redistribute the load onto other Promethus instances in the scrape pool.


this very strong argument, and should be placed IMO in the very up in motivation section

bwplotka · 2018-06-26T17:17:28Z

@brancz:

To be honest, I don't think it's a good idea to re-invent configuration management within Thanos.

I can agree with it but what if we don't have any other options?

Regarding your tenancy distribution, sure it's a little overhead, but how bad is it really to have separate sets of Prometheus servers per tenant?

Sure, let's assume we do so. That will meet our goal to keep tenant in single known places. But I don't think that helps us with our main problem here. We still want dynamic scaling. Tenant's targets can go up and down quite often in large number. Ideally, we would like to equally balance the targets and create scrapers in advance automatically and tear down when some shard is underutilized. Definitely, current hashmod is not ready for this. This proposal allows for more flexible target distribution. Any other ideas how to solve it?

joshpmcghee · 2018-06-27T10:05:17Z

from experience I can tell you some high profile customers have strict compliance requirements that will force you to have physically separated tenants

We(and I suspect many who don't run multi-tenant systems) don't have this requirement when it comes to our current Thanos use-case, which does mean it'd be a significant management overhead & source of underutilisation. Worth noting that enabling multi-tenant target distribution on the scrapers doesn't preclude the model of single-tenant target-distribution.

povilasv · 2018-06-27T13:36:37Z

What about the alerts? How will alerts move to scrapers? Or everyone will need to move stuff onto thanos rule ?

bwplotka · 2018-06-27T13:58:05Z

@povilasv not sure how alerts could be affected here. And even though you should not be worried. Our proposal is about adding an extra logic that helps with a highly dynamic target changes. It will not affect the current flow, especially alerting. It will be only an optional thing.

EDIT: @povilasv you are right about alert handling. This will mean the same issue with alerts as with hashmod + alerts (: You need to alert on federated data (thanos rule) or have to be careful with your local alerts.

domgreen · 2018-06-27T16:00:32Z

@povilasv @Bplotka correct, it currently only covers allocating targets to scrapers for a dynamic environment.

mattbostock · 2018-07-02T13:55:45Z

docs/proposals/config.md

+An alternative to this is to use the existing [hashmod](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config) functionality within Prometheus to enable [horizontal scaling](https://www.robustperception.io/scaling-and-federating-prometheus/), in this scenario a user would supply their entire configuaration to every Prometheus node and use hashing to scale out.
+The main downside of this is that as a Prometheus instance is added or removed from the cluster the targets will be moved to a new scrape instance. This is bad if you are wanting to keep tenants to a minimal number of Prometheus instances.
+
+### Prometheus & Consistant Hashing


Typo: 'consistent' instead of 'consistant'

mattbostock · 2018-07-02T13:56:29Z

docs/proposals/config.md

+
+### Prometheus & Consistant Hashing
+
+[Issue 4309](https://github.com/prometheus/prometheus/issues/4309) looks at adding a consistant hashing algorith to Promethus which would allow us to minimise the re-allocation of targets as the scrape pool is scaled. 


Typo: algorith -> algorithm

mattbostock · 2018-07-02T13:56:51Z

docs/proposals/config.md

+### Prometheus & Consistant Hashing
+
+[Issue 4309](https://github.com/prometheus/prometheus/issues/4309) looks at adding a consistant hashing algorith to Promethus which would allow us to minimise the re-allocation of targets as the scrape pool is scaled. 
+Unfortunatley, based on discussions this does not look like it will be accepted and added to the Prometheus codebase.


Typo: Unfortunatley -> Unfortunately

mattbostock · 2018-07-02T13:57:10Z

docs/proposals/config.md

+
+### Prometheus & Consistant Hashing
+
+[Issue 4309](https://github.com/prometheus/prometheus/issues/4309) looks at adding a consistant hashing algorith to Promethus which would allow us to minimise the re-allocation of targets as the scrape pool is scaled. 


Typo: Promethus -> Prometheus

mattbostock · 2018-07-02T14:06:09Z

It's not clear to me from the proposal or discussions why this functionality would need to be an integral part of Thanos, rather than using external orchestration/configuration management as @brancz suggested.

For example, if you're deploying to Kubernetes, would an operator be a suitable option?

brancz · 2018-07-02T14:13:40Z

The way I understand this proposal is that you want to solve horizontal scalability of ingestion within Thanos. That in itself seems reasonable (although as you already described should be opt-in), but couldn't be mixed with configuration management itself I think.

I think you would be better off leaving configuration management completely up to the user and just solve distributing targets across multiple Prometheus servers via file discovery. That way you have a clear separation where the Thanos sidecar perform the consistent hashing (or any other sharding mechanism) to distribute the targets on ingesting Prometheus servers, and configuration stays unopinionated.

(fwiw we're building integrated support for Thanos into the Prometheus Operator for Kubernetes)

bwplotka · 2018-07-02T17:48:16Z

What do you exactly mean by integral part of Thanos? Maybe we were not clear somewhere, but we did not want to change nor take the freedom of choosing the targets for Prometheus instances from the user. We just want to enable dynamic scalability of scrapers if someone requires it, leveraging the fact that scrapers are super lightweight for Thanos. Or maybe you just mean not to add this logic to Thanos project, and create another project for it?

I think we are totally fine with it, however, the essential part where we want to reuse Thanos sidecar is scale down logic, where we trigger tear down for a Prometheus instance by:

(some thanos config, any other method of assigning): Removing all assignments for the Prometheus
Sidecar: doing snapshot with Head
Sidecar: Stopping Prometheus
Sidecar: Removing WAL file
Sidecar: Ensuring all blocks are uploaded to the bucket.
So we thought it might be reasonable to put this logic as some Thanos gRPC service / HTTP endpoint. What do you think @brancz @mattbostock?

@domgreen do you think it makes sense to add the scale down logic description to this proposal?

(fwiw we're building integrated support for Thanos into the Prometheus Operator for Kubernetes)

Do you mean any particular feature for horizontal scalability for ingestion? (Prom sharding?)

domgreen · 2018-07-02T18:45:43Z

As @Bplotka mentions the main reason we were going to add it as an optional component into Thanos was that a sidecar (either as part of Thanos or separately) would be being used to reload the targets on a scraper and then manage the scale down of a scraper as outlined above. So we would have to update Thanos regardless to ensure it knew to flush / snapshot the current WAL and upload when all targets had been removed and it is being drained.

It is also worth noting in the scale down scenario the "drained" scraper would still serve requests from thanos query whilst it has no targets and is waiting to be terminated (time would depend on the WAL shipping).

The integration with the Prometheus scraper will always be file_sd_config when adding / updating targets to an existing job. This is something that we do not propose to change.

@krasi-georgiev - from other issue ... my main reservation with the DIBS approach is the pod annotations as although our scrapers (Prom/Thanos) are within k8s our targets do not have to be within a k8s cluster so would not all be within pods.
You are correct you could work bin packing out from each node individually but IMO having this centralised (at least per cluster) would all us to test, debug and maintain a system with less overhead.

@Bplotka will update with considerations around scaling down and ensure that it is explicit that we want minimal targets to move scrapers when scaling up.

The format of this proposal mirrors an existing draft proposal: thanos-io#387

mattbostock · 2018-07-09T14:55:58Z

@Bplotka: By 'integral part of Thanos', I mean that you're adding new functionality to the Thanos codebase (regardless of how the functionality is invoked) that assumes a new responsibility - orchestrating Prometheus' scrape configuration.

maybe you just mean not to add this logic to Thanos project, and create another project for it?

Potentially. If you're using Prometheus for scraping and persistence (and not alerting), I think it might be cleaner to build a component that does just that, giving you maximum flexibility in your implementation. If you are using Prometheus for alerting as well as scraping, then I would expect to solve this problem at the layer above using configuration management or an operator.

I think Thanos is an unsuitable place for this logic because Thanos currently has no knowledge of how Prometheus scrapes the metrics - it just deals with the output from that scrape (the TSDB chunks on disk).

domgreen · 2018-07-09T16:45:50Z

Looking at the discussion it seems that the proposal could be fairly specific requirements to Improbable.
Will close the issue and move implementation to a better suited repo. Thank you all for the feedback and surrounding discussions 👍

mattbostock · 2018-07-09T18:13:31Z

Should the proposal still be merged but with the status changed to 'rejected' with a one-line explanation linking to the discussion here?

…

On Mon, 9 Jul 2018, 17:45 Dominic Green, ***@***.***> wrote: Closed #387 <#387>. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#387 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEJbsLiwbhMECxzaTRSG9L-rHJK6YxNSks5uE4jBgaJpZM4U4GU9> .

brancz · 2018-07-09T18:35:57Z

I like @mattbostock’s idea. Kind of like a “non-goals of Thanos”.

domgreen · 2018-07-09T18:49:16Z

Yeah, that sounds good. Will update the status and add a short summary.

bwplotka · 2018-07-09T20:16:00Z

Yup, your opinion was really helpful and shaped our approach. Definitely agree that it is not the best idea to put this inside Thanos and make this project even more complex.

I think when we will have some MVP working we will try to open source it in a separate repo, maybe someone will find it useful. (: Cheers!

bwplotka · 2018-07-09T20:16:57Z

Awesome idea with non-goals section

bwplotka

Let's not commit idea file and we are good to land this (:

bwplotka · 2018-08-02T12:11:01Z

.idea/workspace.xml

@@ -0,0 +1,626 @@
+<?xml version="1.0" encoding="UTF-8"?>


do not commit this file (:

* Add proposal for store instance high availability The format of this proposal mirrors an existing draft proposal: #387 * Update HA store proposal based on feedback

initial proposal for config component within Thanos

bc6da8f

domgreen added the feature request/improvement label Jun 26, 2018

domgreen self-assigned this Jun 26, 2018

domgreen requested a review from bwplotka June 26, 2018 14:46

domgreen added the proposal label Jun 26, 2018

bwplotka reviewed Jun 26, 2018

View reviewed changes

bwplotka mentioned this pull request Jun 26, 2018

Consistent hashing in addition to hashmod prometheus/prometheus#4309

Closed

udpates

b2cbe55

mattbostock reviewed Jul 2, 2018

View reviewed changes

mattbostock added a commit to mattbostock/thanos that referenced this pull request Jul 3, 2018

Add proposal for store instance high availability

1ca587d

The format of this proposal mirrors an existing draft proposal: thanos-io#387

mattbostock mentioned this pull request Jul 3, 2018

Add proposal for store instance high availability #404

Merged

updated proposal

d8428d7

domgreen closed this Jul 9, 2018

domgreen reopened this Jul 9, 2018

Dominic Green added 2 commits July 11, 2018 11:13

rejecting proposal

cdf61ff

add links to discussions

cf3cc20

domgreen requested a review from bwplotka July 11, 2018 16:22

bwplotka approved these changes Aug 2, 2018

View reviewed changes

.idea/workspace.xml Outdated

@@ -0,0 +1,626 @@

<?xml version="1.0" encoding="UTF-8"?>

Copy link

Member

bwplotka Aug 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not commit this file (:

removing unneeded file

5493971

domgreen merged commit 42c8778 into thanos-io:master Aug 2, 2018

domgreen deleted the proposal_config branch August 2, 2018 12:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal - config component within Thanos #387

Proposal - config component within Thanos #387

domgreen commented Jun 26, 2018 •

edited

Loading

brancz commented Jun 26, 2018

bwplotka left a comment •

edited

Loading

bwplotka Jun 26, 2018 •

edited

Loading

bwplotka Jun 26, 2018

bwplotka Jun 26, 2018

bwplotka commented Jun 26, 2018

joshpmcghee commented Jun 27, 2018 •

edited

Loading

povilasv commented Jun 27, 2018

bwplotka commented Jun 27, 2018 •

edited

Loading

domgreen commented Jun 27, 2018

mattbostock Jul 2, 2018

mattbostock Jul 2, 2018

mattbostock Jul 2, 2018

mattbostock Jul 2, 2018

mattbostock commented Jul 2, 2018 •

edited

Loading

brancz commented Jul 2, 2018 •

edited

Loading

bwplotka commented Jul 2, 2018 •

edited

Loading

domgreen commented Jul 2, 2018

mattbostock commented Jul 9, 2018

domgreen commented Jul 9, 2018

mattbostock commented Jul 9, 2018 via email

brancz commented Jul 9, 2018

domgreen commented Jul 9, 2018

bwplotka commented Jul 9, 2018

bwplotka commented Jul 9, 2018

bwplotka left a comment

bwplotka Aug 2, 2018


		Currently, each scraper manages their own configuration via [Prometheus Configuration](https://prometheus.io/docs/prometheus/latest/configuration/configuration/) which contains information about the [scrape_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#%3Cscrape_config%3E) and targets that the scraper will be collecting metrics from.

		As we start to dynamically scale the collection of metrics (new Prometheus instances) or increase the targets for a current tenant we wish to keep the collection of metrics to a consistent node and not re-allocate shards to other scrapers.


		As we start to dynamically scale the collection of metrics (new Prometheus instances) or increase the targets for a current tenant we wish to keep the collection of metrics to a consistent node and not re-allocate shards to other scrapers.

		One key use case would be keeping metrics from a given tenant (customer / team etc.) on a consistant and minimal amount of nodes.


		### Prometheus & Consistant Hashing

		[Issue 4309](https://github.com/prometheus/prometheus/issues/4309) looks at adding a consistant hashing algorith to Promethus which would allow us to minimise the re-allocation of targets as the scrape pool is scaled.

Proposal - config component within Thanos #387

Proposal - config component within Thanos #387

Conversation

domgreen commented Jun 26, 2018 • edited Loading

Changes

brancz commented Jun 26, 2018

bwplotka left a comment • edited Loading

Choose a reason for hiding this comment

bwplotka Jun 26, 2018 • edited Loading

Choose a reason for hiding this comment

bwplotka Jun 26, 2018

Choose a reason for hiding this comment

bwplotka Jun 26, 2018

Choose a reason for hiding this comment

bwplotka commented Jun 26, 2018

joshpmcghee commented Jun 27, 2018 • edited Loading

povilasv commented Jun 27, 2018

bwplotka commented Jun 27, 2018 • edited Loading

domgreen commented Jun 27, 2018

mattbostock Jul 2, 2018

Choose a reason for hiding this comment

mattbostock Jul 2, 2018

Choose a reason for hiding this comment

mattbostock Jul 2, 2018

Choose a reason for hiding this comment

mattbostock Jul 2, 2018

Choose a reason for hiding this comment

mattbostock commented Jul 2, 2018 • edited Loading

brancz commented Jul 2, 2018 • edited Loading

bwplotka commented Jul 2, 2018 • edited Loading

domgreen commented Jul 2, 2018

mattbostock commented Jul 9, 2018

domgreen commented Jul 9, 2018

mattbostock commented Jul 9, 2018 via email

brancz commented Jul 9, 2018

domgreen commented Jul 9, 2018

bwplotka commented Jul 9, 2018

bwplotka commented Jul 9, 2018

bwplotka left a comment

Choose a reason for hiding this comment

bwplotka Aug 2, 2018

Choose a reason for hiding this comment

domgreen commented Jun 26, 2018 •

edited

Loading

bwplotka left a comment •

edited

Loading

bwplotka Jun 26, 2018 •

edited

Loading

joshpmcghee commented Jun 27, 2018 •

edited

Loading

bwplotka commented Jun 27, 2018 •

edited

Loading

mattbostock commented Jul 2, 2018 •

edited

Loading

brancz commented Jul 2, 2018 •

edited

Loading

bwplotka commented Jul 2, 2018 •

edited

Loading