Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal - config component within Thanos #387

Merged
merged 6 commits into from
Aug 2, 2018
Merged

Proposal - config component within Thanos #387

merged 6 commits into from
Aug 2, 2018

Conversation

domgreen
Copy link
Contributor

@domgreen domgreen commented Jun 26, 2018

Changes

Added a proposal document on how we can do configuration management within Thanos, this should allow an improved and more consistent experience when scaling.

cc @krasi-georgiev @brancz

@brancz
Copy link
Member

brancz commented Jun 26, 2018

To be honest I don't think it's a good idea to re-invent configuration management within Thanos. Everything in terms of correctness from consistency to reproduce-ability is going to be a huge pain. I recommend sticking to the sidecar approach and build a wrapper around the sidecar for more specific or unique uses.

Regarding your tenancy distribution, sure it's a little overhead, but how bad is it really to have separate sets of Prometheus servers per tenant? (and from experience I can tell you some high profile customers have strict compliance requirements that will force you to have physically separated tenants) In our case we really are only looking to shard targets of a single tenant across multiple Prometheus servers, so if we can handle cutting tsdb chunks directly at shutdown and uploading them to object storage we would probably be fine with hashmod as well.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions to the motivation section.

@domgreen I think the main thing here is to clearly divide TWO goals we want to solve here:

  • enable auto-scaling of scrapers to meet dynamic targets demands (including scaling down if they are under-utilized)
  • keep tenant's targets together if possible (I don't think we are sure it is must-have)


Currently, each scraper manages their own configuration via [Prometheus Configuration](https://prometheus.io/docs/prometheus/latest/configuration/configuration/) which contains information about the [scrape_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#%3Cscrape_config%3E) and targets that the scraper will be collecting metrics from.

As we start to dynamically scale the collection of metrics (new Prometheus instances) or increase the targets for a current tenant we wish to keep the collection of metrics to a consistent node and not re-allocate shards to other scrapers.
Copy link
Member

@bwplotka bwplotka Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add some mention, that this is not always the case and not always needed for example:

"Most of the time you have static number of targets that allows you to use Prometheus hashmod to shard targets among available Prometheus instances. Even if they rarely change, you can quite easily add another instance and reshard which increases time series churn, but should be fine if done not too often.
However, this does not really work well for:

  • very dynamic targets (a couple of hundreds of targets going up and down every couple of hours)
  • multi-tenancy awareness. We might want to limit which scrapes we want to ask for data for certain user's metrics

I would love to bring down our use case to very simple form that would be easy to imagine. What do you think?


As we start to dynamically scale the collection of metrics (new Prometheus instances) or increase the targets for a current tenant we wish to keep the collection of metrics to a consistent node and not re-allocate shards to other scrapers.

One key use case would be keeping metrics from a given tenant (customer / team etc.) on a consistant and minimal amount of nodes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we should attack this goal from the very beginning. Basically, we don't really care if every query will ask 10 scrapers or 40 for just 1d data... Or in other words, we don't know if that's is the issue or not. (:

Can we start with first goal to support massive amount of dynamic targets going up and down in quite short time?


An alternative to this is to use the existing [hashmod](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config) functionality within Prometheus to enable [horizontal scaling](https://www.robustperception.io/scaling-and-federating-prometheus/), in this scenario a user would supply their entire configuaration to every Prometheus node and use hashing to scale out.
The main downside of this is that as a Prometheus instance is added or removed from the cluster the targets will be moved to a new scrape instance. This is bad if you are wanting to keep tenants to a minimal number of Prometheus instances.
We have seen this approach also have a problem with hot shards in the past whereby a given Prometheus instance may endup being overloaded and is difficult to redistribute the load onto other Promethus instances in the scrape pool.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this very strong argument, and should be placed IMO in the very up in motivation section

@bwplotka
Copy link
Member

@brancz:

To be honest, I don't think it's a good idea to re-invent configuration management within Thanos.

I can agree with it but what if we don't have any other options?

Regarding your tenancy distribution, sure it's a little overhead, but how bad is it really to have separate sets of Prometheus servers per tenant?

Sure, let's assume we do so. That will meet our goal to keep tenant in single known places. But I don't think that helps us with our main problem here. We still want dynamic scaling. Tenant's targets can go up and down quite often in large number. Ideally, we would like to equally balance the targets and create scrapers in advance automatically and tear down when some shard is underutilized. Definitely, current hashmod is not ready for this. This proposal allows for more flexible target distribution. Any other ideas how to solve it?

@joshpmcghee
Copy link
Contributor

joshpmcghee commented Jun 27, 2018

from experience I can tell you some high profile customers have strict compliance requirements that will force you to have physically separated tenants

We(and I suspect many who don't run multi-tenant systems) don't have this requirement when it comes to our current Thanos use-case, which does mean it'd be a significant management overhead & source of underutilisation. Worth noting that enabling multi-tenant target distribution on the scrapers doesn't preclude the model of single-tenant target-distribution.

@povilasv
Copy link
Member

What about the alerts? How will alerts move to scrapers? Or everyone will need to move stuff onto thanos rule ?

@bwplotka
Copy link
Member

bwplotka commented Jun 27, 2018

@povilasv not sure how alerts could be affected here. And even though you should not be worried. Our proposal is about adding an extra logic that helps with a highly dynamic target changes. It will not affect the current flow, especially alerting. It will be only an optional thing.

EDIT: @povilasv you are right about alert handling. This will mean the same issue with alerts as with hashmod + alerts (: You need to alert on federated data (thanos rule) or have to be careful with your local alerts.

@domgreen
Copy link
Contributor Author

@povilasv @Bplotka correct, it currently only covers allocating targets to scrapers for a dynamic environment.

An alternative to this is to use the existing [hashmod](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config) functionality within Prometheus to enable [horizontal scaling](https://www.robustperception.io/scaling-and-federating-prometheus/), in this scenario a user would supply their entire configuaration to every Prometheus node and use hashing to scale out.
The main downside of this is that as a Prometheus instance is added or removed from the cluster the targets will be moved to a new scrape instance. This is bad if you are wanting to keep tenants to a minimal number of Prometheus instances.

### Prometheus & Consistant Hashing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: 'consistent' instead of 'consistant'


### Prometheus & Consistant Hashing

[Issue 4309](https://github.com/prometheus/prometheus/issues/4309) looks at adding a consistant hashing algorith to Promethus which would allow us to minimise the re-allocation of targets as the scrape pool is scaled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: algorith -> algorithm

### Prometheus & Consistant Hashing

[Issue 4309](https://github.com/prometheus/prometheus/issues/4309) looks at adding a consistant hashing algorith to Promethus which would allow us to minimise the re-allocation of targets as the scrape pool is scaled.
Unfortunatley, based on discussions this does not look like it will be accepted and added to the Prometheus codebase.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Unfortunatley -> Unfortunately


### Prometheus & Consistant Hashing

[Issue 4309](https://github.com/prometheus/prometheus/issues/4309) looks at adding a consistant hashing algorith to Promethus which would allow us to minimise the re-allocation of targets as the scrape pool is scaled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Promethus -> Prometheus

@mattbostock
Copy link
Contributor

mattbostock commented Jul 2, 2018

It's not clear to me from the proposal or discussions why this functionality would need to be an integral part of Thanos, rather than using external orchestration/configuration management as @brancz suggested.

For example, if you're deploying to Kubernetes, would an operator be a suitable option?

@brancz
Copy link
Member

brancz commented Jul 2, 2018

The way I understand this proposal is that you want to solve horizontal scalability of ingestion within Thanos. That in itself seems reasonable (although as you already described should be opt-in), but couldn't be mixed with configuration management itself I think.

I think you would be better off leaving configuration management completely up to the user and just solve distributing targets across multiple Prometheus servers via file discovery. That way you have a clear separation where the Thanos sidecar perform the consistent hashing (or any other sharding mechanism) to distribute the targets on ingesting Prometheus servers, and configuration stays unopinionated.

(fwiw we're building integrated support for Thanos into the Prometheus Operator for Kubernetes)

@bwplotka
Copy link
Member

bwplotka commented Jul 2, 2018

What do you exactly mean by integral part of Thanos? Maybe we were not clear somewhere, but we did not want to change nor take the freedom of choosing the targets for Prometheus instances from the user. We just want to enable dynamic scalability of scrapers if someone requires it, leveraging the fact that scrapers are super lightweight for Thanos. Or maybe you just mean not to add this logic to Thanos project, and create another project for it?

I think we are totally fine with it, however, the essential part where we want to reuse Thanos sidecar is scale down logic, where we trigger tear down for a Prometheus instance by:

  • (some thanos config, any other method of assigning): Removing all assignments for the Prometheus
  • Sidecar: doing snapshot with Head
  • Sidecar: Stopping Prometheus
  • Sidecar: Removing WAL file
  • Sidecar: Ensuring all blocks are uploaded to the bucket.
    So we thought it might be reasonable to put this logic as some Thanos gRPC service / HTTP endpoint. What do you think @brancz @mattbostock?

@domgreen do you think it makes sense to add the scale down logic description to this proposal?

(fwiw we're building integrated support for Thanos into the Prometheus Operator for Kubernetes)

Do you mean any particular feature for horizontal scalability for ingestion? (Prom sharding?)

@domgreen
Copy link
Contributor Author

domgreen commented Jul 2, 2018

As @Bplotka mentions the main reason we were going to add it as an optional component into Thanos was that a sidecar (either as part of Thanos or separately) would be being used to reload the targets on a scraper and then manage the scale down of a scraper as outlined above. So we would have to update Thanos regardless to ensure it knew to flush / snapshot the current WAL and upload when all targets had been removed and it is being drained.

It is also worth noting in the scale down scenario the "drained" scraper would still serve requests from thanos query whilst it has no targets and is waiting to be terminated (time would depend on the WAL shipping).

The integration with the Prometheus scraper will always be file_sd_config when adding / updating targets to an existing job. This is something that we do not propose to change.

@krasi-georgiev - from other issue ... my main reservation with the DIBS approach is the pod annotations as although our scrapers (Prom/Thanos) are within k8s our targets do not have to be within a k8s cluster so would not all be within pods.
You are correct you could work bin packing out from each node individually but IMO having this centralised (at least per cluster) would all us to test, debug and maintain a system with less overhead.

@Bplotka will update with considerations around scaling down and ensure that it is explicit that we want minimal targets to move scrapers when scaling up.

mattbostock added a commit to mattbostock/thanos that referenced this pull request Jul 3, 2018
The format of this proposal mirrors an existing draft proposal:
thanos-io#387
@mattbostock
Copy link
Contributor

@Bplotka: By 'integral part of Thanos', I mean that you're adding new functionality to the Thanos codebase (regardless of how the functionality is invoked) that assumes a new responsibility - orchestrating Prometheus' scrape configuration.

maybe you just mean not to add this logic to Thanos project, and create another project for it?

Potentially. If you're using Prometheus for scraping and persistence (and not alerting), I think it might be cleaner to build a component that does just that, giving you maximum flexibility in your implementation. If you are using Prometheus for alerting as well as scraping, then I would expect to solve this problem at the layer above using configuration management or an operator.

I think Thanos is an unsuitable place for this logic because Thanos currently has no knowledge of how Prometheus scrapes the metrics - it just deals with the output from that scrape (the TSDB chunks on disk).

@domgreen
Copy link
Contributor Author

domgreen commented Jul 9, 2018

Looking at the discussion it seems that the proposal could be fairly specific requirements to Improbable.
Will close the issue and move implementation to a better suited repo. Thank you all for the feedback and surrounding discussions 👍

@domgreen domgreen closed this Jul 9, 2018
@mattbostock
Copy link
Contributor

mattbostock commented Jul 9, 2018 via email

@brancz
Copy link
Member

brancz commented Jul 9, 2018

I like @mattbostock’s idea. Kind of like a “non-goals of Thanos”.

@domgreen
Copy link
Contributor Author

domgreen commented Jul 9, 2018

Yeah, that sounds good. Will update the status and add a short summary.

@domgreen domgreen reopened this Jul 9, 2018
@bwplotka
Copy link
Member

bwplotka commented Jul 9, 2018

Yup, your opinion was really helpful and shaped our approach. Definitely agree that it is not the best idea to put this inside Thanos and make this project even more complex.

I think when we will have some MVP working we will try to open source it in a separate repo, maybe someone will find it useful. (: Cheers!

@bwplotka
Copy link
Member

bwplotka commented Jul 9, 2018

Awesome idea with non-goals section

@domgreen domgreen requested a review from bwplotka July 11, 2018 16:22
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not commit idea file and we are good to land this (:

@@ -0,0 +1,626 @@
<?xml version="1.0" encoding="UTF-8"?>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not commit this file (:

@domgreen domgreen merged commit 42c8778 into thanos-io:master Aug 2, 2018
@domgreen domgreen deleted the proposal_config branch August 2, 2018 12:33
bwplotka pushed a commit that referenced this pull request Oct 26, 2018
* Add proposal for store instance high availability

The format of this proposal mirrors an existing draft proposal:
#387

* Update HA store proposal based on feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants