Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal - config component within Thanos #387

Merged
merged 6 commits into from
Aug 2, 2018
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 109 additions & 0 deletions docs/proposals/config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# Thanos Cluster Configuration

Status: **draft** | in-review | rejected | accepted | complete

Implementation Owner: [@domgreen](https://github.com/domgreen)

## Motivation

Currently, each scraper manages their own configuration via [Prometheus Configuration](https://prometheus.io/docs/prometheus/latest/configuration/configuration/) which contains information about the [scrape_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#%3Cscrape_config%3E) and targets that the scraper will be collecting metrics from.

As we start to dynamically scale the collection of metrics (new Prometheus instances) or increase the targets for a current tenant we wish to keep the collection of metrics to a consistent node and not re-allocate shards to other scrapers.
Copy link
Member

@bwplotka bwplotka Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add some mention, that this is not always the case and not always needed for example:

"Most of the time you have static number of targets that allows you to use Prometheus hashmod to shard targets among available Prometheus instances. Even if they rarely change, you can quite easily add another instance and reshard which increases time series churn, but should be fine if done not too often.
However, this does not really work well for:

  • very dynamic targets (a couple of hundreds of targets going up and down every couple of hours)
  • multi-tenancy awareness. We might want to limit which scrapes we want to ask for data for certain user's metrics

I would love to bring down our use case to very simple form that would be easy to imagine. What do you think?


One key use case would be keeping metrics from a given tenant (customer / team etc.) on a consistant and minimal amount of nodes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we should attack this goal from the very beginning. Basically, we don't really care if every query will ask 10 scrapers or 40 for just 1d data... Or in other words, we don't know if that's is the issue or not. (:

Can we start with first goal to support massive amount of dynamic targets going up and down in quite short time?


## Proposal

As we scale out horizontally we wish to manage configuration such as target assignment to scrapes in a more intelligent way, firstly by keeping targets on the same scrape instance as much as possible as we scale the scrape pool and in the future more intelligent bin packing of targets onto scrapers.

We would look to add a new component to the Thanos system `thanos config` that would be a central point for loading configuration for the entire cluster and advertising via APIs the configuration for each sidecar.

The `thanos config` will have a configuration endpoint that each `thanos sidecar` will call into to get their own scrape_config jobs along with their targets. Once the sidecar has its jobs it will be able to update targets / scrape_config for the Prometheus instance it is running alongside. This instance can then be reloaded or automatically pick up new targets via file_sd_config.

The config component will keep track of what sidecar's are in the cluster at any time so will be able to have a central view of where targets / labels are being scraped from. Therefore, as a new scrape instance comes online it join the gossip cluster and therefoe config will know that it can start assigning configuration to this new node.

```
┌─ reload ─┐ ┌─ reload ─┐
v │ v │
┌──────────────────────┐ ┌────────────┬─────────┐
│ Prometheus │ Sidecar │ │ Prometheus │ Sidecar │
└─────────────────┬────┘ └────────────┴────┬────┘
│ │
GetConfig GetConfig
│ │
v v
┌─────────────────────────────┐
│ Config │
└──────────────┬──────────────┘
Read files
v
┌─────────────────────────────┐
│ prom.yml │
└─────────────────────────────┘
```


## User Experience

### Use Cases

- We would wish to dynamically configure what targets / labels are on each Prometheus instance within the Thanos cluster
- Allocate targets to a Prometheus instance based on data such as CPU utilization of a node.
- Ensure that as we scale the number of Prometheus instances in the scrape pool we do not move scrape targets to other instances.

## Implementation

### Thanos Config

The config component will read one or many Prometheus configuration files and dynamically allocate configuration to sidecars within the cluster. It initially joins a Thanos cluster mesh and can therefore find sidecars that it wishes to assign configuration.

```
$ thanos config \
--http-address "0.0.0.0:9090" \
--grpc-address "0.0.0.0:9091" \
--cluster.peers "thanos-cluster.example.org" \
--config "promA.yaml;promB.yaml" \
```

The configuration supplied to the config component will then be distributed to the peers that config is aware of and exposed via a gRPC API so that each sidecar can get its configuarion.
The gRPC endponit will take a request from a sidecar and based on the sidecar making the request it will give back the computed configuration.

`thanos config` will initially only support a subset of the Prometheus Config will initially include a subset of global and scrape_config along with static_sd_config and file_sd_config.

The inital configuration strategy will be intended for a multi-tenant use case and so keep jobs with a given label together on a single Prometheus instance.

### Thanos Sidecar

With the addition of `thanos config` we will look to update the `thanos sidecar` to periodically get its configuration from the config component. Each time it does so it will update the Prometheus instance by either updating the targets via file_sd_config or adding a new configuration job and forcing a reload on the Prometheus configuration.

To do this the sidecar will need to know the location of the config component to get its configuration.

```
$ thanos sidecar \
--tsdb.path "/path/to/prometheus/data/dir" \
--prometheus.url "http://localhost:9090" \
--gcs.bucket "example-bucket" \
--cluster.peers "thanos-cluster.example.org" \
--config.url "thanos-cluster.config.org" \
```

### Client/Server Backwards/Forwards compatibility

This change would be fully backwards compatible, you do not have to use the config component within your Thanos setup and can still choose to manually set up configuration and service discovery for each node.

The alternatives below might be a good starting point for users that do not need to worry about re-allocation of targets when the number of Prometheus instances in a cluster scales.

## Alternatives considered

### Prometheus & Hashmod

An alternative to this is to use the existing [hashmod](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config) functionality within Prometheus to enable [horizontal scaling](https://www.robustperception.io/scaling-and-federating-prometheus/), in this scenario a user would supply their entire configuaration to every Prometheus node and use hashing to scale out.
The main downside of this is that as a Prometheus instance is added or removed from the cluster the targets will be moved to a new scrape instance. This is bad if you are wanting to keep tenants to a minimal number of Prometheus instances.
We have seen this approach also have a problem with hot shards in the past whereby a given Prometheus instance may endup being overloaded and is difficult to redistribute the load onto other Promethus instances in the scrape pool.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this very strong argument, and should be placed IMO in the very up in motivation section


### Prometheus & Consistant Hashing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: 'consistent' instead of 'consistant'


[Issue 4309](https://github.com/prometheus/prometheus/issues/4309) looks at adding a consistant hashing algorith to Promethus which would allow us to minimise the re-allocation of targets as the scrape pool is scaled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: algorith -> algorithm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Promethus -> Prometheus

Unfortunatley, based on discussions this does not look like it will be accepted and added to the Prometheus codebase.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Unfortunatley -> Unfortunately