Activate automatic compression using the helm chart #26

jpigree · 2020-05-07T00:36:00Z

Hi. I installed a setup with timescaledb + timescale-prometheus + prometheus using this helm chart. However, I store a lot of metrics (multiple cadvisors and ~20 node exporters) and my timescale db fills up at the rate of ~1Gb/hour.

I saw that it exists a compression mechanism for timescaledb which should greatly help keeping the disk usage low but I can't see a way to set it up easily with this helm chart and its subcharts.

Is there a way to do so or any alternatives to help? I saw that the cronjob doing cleanups on data runs a CALL prom.drop_chunks() command based on the retention period. However, it just outright delete data so this isn't what I want.

Ideally, I would want to automatically compress data older than 6 hours.

Thanks for your help!

The text was updated successfully, but these errors were encountered:

jpigree · 2020-05-07T01:53:42Z

I also have trouble finding the size of each metrics data in the database which would really help to filter out the unused metrics taking the most disk.

I am new to timescaledb but I tried to look around the DB with pgadmin4 without success. I also tried this but I got an empty result:

postgres=# \d cadvisor_version_info
                    View "prom_metric.cadvisor_version_info"
       Column        |           Type           | Collation | Nullable | Default
---------------------+--------------------------+-----------+----------+---------
 time                | timestamp with time zone |           |          |
 value               | double precision         |           |          |
 series_id           | integer                  |           |          |
 labels              | label_array              |           |          |
 cadvisorRevision_id | integer                  |           |          |
 cadvisorVersion_id  | integer                  |           |          |
 cluster_id          | integer                  |           |          |
 dockerVersion_id    | integer                  |           |          |
 instance_id         | integer                  |           |          |
 job_id              | integer                  |           |          |
 kernelVersion_id    | integer                  |           |          |
 osVersion_id        | integer                  |           |          |

postgres=# SELECT table_bytes, index_bytes, toast_bytes, total_bytes
FROM hypertable_relation_size('cadvisor_version_info');
 table_bytes | index_bytes | toast_bytes | total_bytes
-------------+-------------+-------------+-------------
             |             |             |
(1 row)

atanasovskib · 2020-05-07T11:00:02Z

Hello @jpigree thank you for trying out timescale-prometheus.
I'll reply with the compression related question first:

Compression is enabled by default for each metric when timescale-prometheus creates the schema. Why you might not be seeing your data get compressed is because the default chunk-size for each metric is 8h. Compression only works on chunks whose end-time is in the past, so when a new chunk is created for each metric, the old chunks will be eligible for compression.
Several things you can do here (I'll be referencing functions from our public_api: https://github.com/timescale/timescale-prometheus/blob/master/docs/sql_api.md):

Modify default chunk-interval for each metric (set_default_chunk_interval). Reducing the chunk-interval will make timescale create smaller chunks, and older the chunks will become compressed sooner. But setting a too-small-chunk interval will mean a lot of chunks and that has adverse effects on query performance.
Modify the chunk-interval for a specific metric (set_metric_chunk_interval). Not all metrics receive data at the same rates, you can reduce the chunk_interval for metrics with high ingest rates, and increase it for sparser metrics. Depending on your scrape intervals you should have some hints which metrics should be adjusted.

For example: A clean database with a default metric chunk-interval of 4 hours where you're seeing a 1GB/h ingest rates. Data starts coming in at t=0, first chunks are created for each metric. At t=4h the db would grow up to ~4GB. at t=4h+1s new chunks get created for each metric, old chunks are available for compression.

atanasovskib · 2020-05-07T11:14:39Z

Regarding the size of each metric table.cadvisor_version_info is a view that we create for easier querying, but the metrics are stored in prom_data.cadvisor_version_info, a normalized table. You can see the size of all metrics by:

SELECT * 
FROM timescaledb_information.hypertable
WHERE table_name IN (SELECT table_name FROM _prom_catalog.metric);

timescaledb_information.hypertable is documented here https://docs.timescale.com/latest/api#timescaledb_information-hypertable
_prom_catalog.metric is a regular table that contains information about each metric

Also forgot to mention that you can set the retention period for all or specific metrics check out the API doc I linked in the previous comment

jpigree · 2020-05-07T23:35:57Z

Thank you so much for the in-depth answers @blagojts. I understand better now how compression works and I am able to check the metrics size.

However, there isn't a mecanism to easily set the metric chunk intervals from the Helm chart. Is there any plan to make this configurable from the values.yaml?

Otherwise, I will have to do it with a separate script. Thank you again!

atanasovskib · 2020-05-08T16:01:01Z

There is no mechanism in place for setting this from the helm chart, or a flag of Timescale-Prometheus, since it's handled in the migration scripts. The idea is that you would use the SQL API for this. But we'll have it under consideration

jpigree · 2020-05-09T01:52:15Z

Okay. If I find something myself I will share. Thanks again for your help!

jpigree · 2020-05-12T02:33:28Z

Hi @blagojts.

I tested a bit the compression and I definitely felt the performance hit on queries when I decreased the chunks interval. However, when I am looking at the storage usage, I don't see the reduction I expected to see at each occurrence of "chunk interval".

Here is a screenshot from the persistent volume usage of my timescaledb instance:

During this time range, I reduced the chunks interval to 2 hours. I tried to zoom in to see the compression effect without success. The increase rate looks constant.

What I expected to see, was a graph like this:

This is the persistent volume usage of one of my prometheus instance using local storage.
When looking at the graph, the compression effect is much more evident.

Is this deemed normal behavior? My scrape interval is 1 minute so I know I can miss fluctuations. Another possibility is that, the storage gained is not reclaimed.

Is there a way to improve the compression further?

Thanks for your help!

atanasovskib · 2020-05-12T09:31:08Z

Can you paste your PromQL for the disk usage graph here? I want to run some tests

atanasovskib · 2020-05-12T12:29:46Z

We discovered an issue that didn't activate the compress_chunk policies, thus leaving your data uncompressed even after it is eligible. Working on a fix

jpigree · 2020-05-12T18:21:08Z

hi @blagojts. I use the prometheus-operator so the PromQL is from the kubernetes-mixins repo.

I managed to extract it:

(
sum without(instance, node) (kubelet_volume_stats_capacity_bytes{job="kubelet", metrics_path="/metrics", namespace="$namespace", persistentvolumeclaim="$volume"})
  -
  sum without(instance, node) (kubelet_volume_stats_available_bytes{job="kubelet", metrics_path="/metrics", namespace="$namespace", persistentvolumeclaim="$volume"})
)

jpigree · 2020-07-07T20:00:18Z

Hi @blagojts.

I recreated my whole setup a while ago with the new version of the chart "timescale-observability-0.1.0-alpha.4.1".
And compression started to work magically. The only parameter I changed recently was the dropChunk interval I reduced to every hour.

Is this a fix from the team?

atanasovskib · 2020-07-13T13:36:14Z

@jpigree yes, this is related to several small changes we did throughout the codebase, but also a bugfix in the latest TimescaleDB release regarding the background worker framework. I'll mark the issue as closed

atanasovskib added the question Further information is requested label May 7, 2020

jpigree closed this as completed May 9, 2020

atanasovskib reopened this May 12, 2020

atanasovskib added the bug Something isn't working label May 12, 2020

atanasovskib closed this as completed Jul 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Activate automatic compression using the helm chart #26

Activate automatic compression using the helm chart #26

jpigree commented May 7, 2020 •

edited

Loading

jpigree commented May 7, 2020 •

edited

Loading

atanasovskib commented May 7, 2020

atanasovskib commented May 7, 2020

jpigree commented May 7, 2020 •

edited

Loading

atanasovskib commented May 8, 2020

jpigree commented May 9, 2020

jpigree commented May 12, 2020 •

edited

Loading

atanasovskib commented May 12, 2020

atanasovskib commented May 12, 2020

jpigree commented May 12, 2020 •

edited

Loading

jpigree commented Jul 7, 2020 •

edited

Loading

atanasovskib commented Jul 13, 2020

Activate automatic compression using the helm chart #26

Activate automatic compression using the helm chart #26

Comments

jpigree commented May 7, 2020 • edited Loading

jpigree commented May 7, 2020 • edited Loading

atanasovskib commented May 7, 2020

atanasovskib commented May 7, 2020

jpigree commented May 7, 2020 • edited Loading

atanasovskib commented May 8, 2020

jpigree commented May 9, 2020

jpigree commented May 12, 2020 • edited Loading

atanasovskib commented May 12, 2020

atanasovskib commented May 12, 2020

jpigree commented May 12, 2020 • edited Loading

jpigree commented Jul 7, 2020 • edited Loading

atanasovskib commented Jul 13, 2020

jpigree commented May 7, 2020 •

edited

Loading

jpigree commented May 7, 2020 •

edited

Loading

jpigree commented May 7, 2020 •

edited

Loading

jpigree commented May 12, 2020 •

edited

Loading

jpigree commented May 12, 2020 •

edited

Loading

jpigree commented Jul 7, 2020 •

edited

Loading