Skip to content

Commit

Permalink
Compact resolution/retention docs update. (#1548)
Browse files Browse the repository at this point in the history
* Some updates to compact docs

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* some formatting

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Update docs/components/compact.md

accept PR suggestions

Co-Authored-By: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Add metalmatze to list of maintainers (#1547)

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* resolve comments

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* resolve last comment

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* receive: Add liveness and readiness probe (#1537)

* Add prober to receive

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add changelog entries

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Update README

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Remove default

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Wait hashring to be ready

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* downsample: Add liveness and readiness probe (#1540)

* Add readiness and liveness probes for downsampler

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add changelog entry

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Remove default

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Set ready

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Update CHANGELOG

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Clean CHANGELOG

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Document the dnssrvnoa option (#1551)

Signed-off-by: Antonio Santos <antonio@santosvelasco.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* feat store: added readiness and livenes prober (#1460)

Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Add Hotstar to adopters. (#1553)

It's the largest streaming service in India that does cricket and GoT
for India. They have insane scale and are using Thanos to scale their
Prometheus.

Spoke to them offline about adding the logo and will get a signoff here
too.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Fix hotstar logo in the adoptor's list (#1558)

Signed-off-by: Karthik Vijayaraju <karthik@hotstar.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Fix typos, including 'fomrat' -> 'format' in tracing.config-file help text. (#1552)

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Compactor: Fix for #844 - Ignore object if it is the current directory (#1544)

* Ignore object if it is the current directory

Signed-off-by: Jamie Poole <jimbobby5@yahoo.com>

* Add full-stop

Signed-off-by: Jamie Poole <jimbobby5@yahoo.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Adding doc explaining the importance of groups for compactor (#1555)

Signed-off-by: Leo Meira Vital <leo.vital@nubank.com.br>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Add blank line for list (#1566)

The format of these files is wrong in the web.

Signed-off-by: dongwenjuan <dong.wenjuan@zte.com.cn>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Refactor compactor constants, fix bucket column (#1561)

* compact: unify different time constants

Use downsample.* constants where possible. Move the downsampling time
ranges into constants and use them as well.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* bucket: refactor column calculation into compact

Fix the column's name and name it UNTIL-DOWN because that is what it
actually shows - time until the next downsampling.

Move out the calculation into a separate function into the compact
package. Ideally we could use the retention policies in this calculation
as well but the `bucket` subcommand knows nothing about them :-(

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* compact: fix issues with naming

Reorder the constants and fix mistakes.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* remove duplicate

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>
  • Loading branch information
Ivan Kiselev authored and brancz committed Sep 26, 2019
1 parent 99bc7d2 commit ae45e27
Showing 1 changed file with 24 additions and 4 deletions.
28 changes: 24 additions & 4 deletions docs/components/compact.md
Expand Up @@ -28,13 +28,33 @@ config:
The compactor needs local disk space to store intermediate data for its processing. Generally, about 100GB are recommended for it to keep working as the compacted time ranges grow over time.
On-disk data is safe to delete between restarts and should be the first attempt to get crash-looping compactors unstuck.

## Downsampling, Resolution and Retention

Resolution - distance between data points on your graphs. E.g.

* raw - the same as scrape interval at the moment of data ingestion
* 5m - data point is every 5 minutes
* 1h - data point is every 1h

Keep in mind, that the initial goal of downsampling is not saving disk space (Read further for elaboration on storage space consumption). The goal of downsampling is providing an opportunity to get fast results for range queries of big time intervals like months or years. In other words, if you set `--retention.resolution-raw` less then `--retention.resolution-5m` and `--retention.resolution-1h` - you might run into a problem of not being able to "zoom in" to your historical data.

To avoid confusion - you might want to think about `raw` data as about "zoom in" opportunity. Considering the values for mentioned options - always think "Will I need to zoom in to the day 1 year ago?" if the answer "yes" - you most likely want to keep raw data for as long as 1h and 5m resolution, otherwise you'll be able to see only downsampled representation of how your raw data looked like.

There's also a case when you might want to disable downsampling at all with `debug.disable-downsampling`. You might want to do it when you know for sure that you are not going to request long ranges of data (obviously, because without downsampling those requests are going to be much much more expensive than with it). A valid example of that case if when you only care about the last couple of weeks of your data or use it only for alerting, but if it's your case - you also need to ask yourself if you want to introduce Thanos at all instead of vanilla Prometheus?

Ideally, you will have equal retention set (or no retention at all) to all resolutions which allow both "zoom in" capabilities as well as performant long ranges queries. Since object storages are usually quite cheap, storage size might not matter that much, unless your goal with thanos is somewhat very specific and you know exactly what you're doing.

## Storage space consumption

In fact, downsampling doesn't save you any space but instead it adds 2 more blocks for each raw block which are only slightly smaller or relatively similar size to raw block. This is required by internal downsampling implementation which to be mathematically correct holds various aggregations. This means that downsampling can increase the size of your storage a bit (~3x), but it gives massive advantage on querying long ranges.

## Groups

The compactor groups blocks using the [external_labels](https://thanos.io/getting-started.md/#external-labels) added by the
Prometheus who produced the block. The labels must be both _unique_ and _persistent_ across different Prometheus instances.
The compactor groups blocks using the [external_labels](https://thanos.io/getting-started.md/#external-labels) added by the
Prometheus who produced the block. The labels must be both _unique_ and _persistent_ across different Prometheus instances.

By _unique_, we mean that the set of labels in a Prometheus instance must be different from all other sets of labels of
your Prometheus instances, so that the compactor will be able to group blocks by Prometheus instance.
By _unique_, we mean that the set of labels in a Prometheus instance must be different from all other sets of labels of
your Prometheus instances, so that the compactor will be able to group blocks by Prometheus instance.

By _persistent_, we mean that one Prometheus instance must keep the same labels if it restarts, so that the compactor will keep
compacting blocks from an instance even when a Prometheus instance goes down for some time.
Expand Down

0 comments on commit ae45e27

Please sign in to comment.