Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Feature request: add ability to apply delta or delta-of-delta encoding to numeric columns before compression #838
Clickhouse fits well for time-series DB from the performance point of view. But the compression ratio for the data stored in such databases may be further improved.
Tables in time-series DBs usually contain numeric values of two types: gauges (cpu usage, memory usage, network bandwidth usage, etc.) and counters (bytes transferred, requests processed, cpu cycles used, etc.).
It would be great if clickhouse would support delta and delta-of-delta encoding for numeric columns. This may be done via special type hints in
CREATE TABLE t ( EventDate Date, EventTime DateTime, -- CPU usage in percents [0..100]. Use delta encoding before compression. CPUUsage DeltaCoding(UInt8), -- The number of requests processed. Use delta-of-delta encoding before compression. RequestsProcessed DeltaDeltaCoding(UInt64), -- Random number equally distributed in the range [0...2^32). -- Do not use any encoding before compression, since it has no sense. RandomNum UInt32 ) ENGINE = MergeTree(...)
Our compression format is quite simple.
In short, it is the following:
Compressed data is formed of blocks, one after the other. Blocks are completely independent.
What we want to change:
After we will add this support in block format, we can extend CREATE syntax to allow specifying preferred algorithms. Example:
But explicit specifying compression method for columns is not the only choice.
It's unclear, how we can configure these possible options?
We decided the following:
For every column in a table it will be possible to specify a list of compression codecs stacked together.
If codec is specified for a column, it will have highest priority over other (global) compression settings.
On https://clickhouse.yandex/docs/en/roadmap/ I see compression is scheduled for Q2 2018. Does it mean somebody from CH team will start working on better compression then, or can external contributors provide PRs earlier than that? For example, it looks like @Sindbag is saying he is up for doing it?
@Sindbag Is there any new progress about this feature you can share with us?
This feature could make ClickHouse perform much better on the time-series Data monitor scene. We are really very urgent to see that.