New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Window functions #1469
Comments
Window functions are not supported currently, but there are chances that they will appear in the future, in some cases you can find workarounds. Possible directions: parametric aggregate functions, higher-order functions, subselects, and LIMIT BY statements. For running total, there is a related function Latest release (v1.1.54310-stable) add support for a function runningIncome (not documented yet) |
With the following versions:
It seems like both runningIncome and runningDifference are implementing difference operation. :) select number, runningDifference(number), runningIncome(number) from system.numbers limit 10;
SELECT
number,
runningDifference(number),
runningIncome(number)
FROM system.numbers
LIMIT 10
┌─number─┬─runningDifference(number)─┬─runningIncome(number)─┐
│ 0 │ 0 │ 0 │
│ 1 │ 1 │ 1 │
│ 2 │ 1 │ 1 │
│ 3 │ 1 │ 1 │
│ 4 │ 1 │ 1 │
│ 5 │ 1 │ 1 │
│ 6 │ 1 │ 1 │
│ 7 │ 1 │ 1 │
│ 8 │ 1 │ 1 │
│ 9 │ 1 │ 1 │
└────────┴───────────────────────────┴───────────────────────┘
10 rows in set. Elapsed: 0.009 sec. cc @filimonov |
@kszucs You are right.
You can use the
|
Hey @ztlpn! Thanks for the clarification! Currently I'm implementing ClickHouse backend for https://github.com/ibis-project/ibis which is a match made in heaven except for the missing window functionality in CH which I'd really like to use. Meanwhile could You suggest an alternative way (without a join) to compute e.g. zscore? With window support It would like something like this: select column - avg(column) over (partition by key) from t AFAIK currently this requires a join: select column - avg_column
from t left join (
select key, avg(column) as avg_column
from t
group by key
) _
using (key) |
Cool! Of course window functions support is a must-have feature but it is a pretty big task and currently there is no definite timeline. You can get pretty far with arrays and ARRAY JOINs though. Your query becomes select arrayJoin(values) - avg_value from (
select avg(value) as avg_value, groupArray(value) as values
from t group by key) |
@ztlpn is it somehow possible to use runningDifference in combination with a group by? For example to calculate some lag-value for each unit. |
I am also looking for a solution to this very problem: I would like to get running accumulates grouped by a certain key. |
My experience is xxxState (sumState, countState) will only work across 1-dimension, if you group using more than 1 columns, then the runningAccumulate will get messed up (not in sequence). |
Just as a followup: I have seen in recent changelogs, that there is the I am not sure whether this performs well enough on large scale, but as for testing, the following query seems to work:
|
Wow! thanks for the heads-up! I will try this out. |
I have noticed this too. Is there any way to make runningAccumulate work with more than 1 group by column ? |
Is there a plan on when the window function will be part of a stable release tentatively? |
Another way to make cumulative sum:
|
This is a pretty long issue. Let me post my findings and how we made this work. For us the problem turned out to be the After many experiments it turns out that sorting by non-partition key first pushes CH into providing us with a single set of data. Providing a rather complex example with a group by on a key: select bucket, sum(aggValue) as aggValue from (
select
key,
toDateTime(
toStartOfInterval(
time,
INTERVAL 9999999999 SECOND
)
) as bucket,
sumIf(runningDifference(value) as diff, diff > 0) as aggValue
from (select key, time, value from table
where key in (....)
order by key, time) t
group by key, bucket
)
group by bucket
order by bucket If someone can spot any limitation here, we'd be glad to hear your feedback. And I hope this saves someone else a day or two. |
those functions are hard to understand, i prefer the standard window functions (#5132). |
when the window function will be supported in Clickhouse? |
1 similar comment
when the window function will be supported in Clickhouse? |
It's supported under experimental flag in the recent releases. |
We have been very encouraged by Clickhouse. However, as we are trying to port all of our existing scripts to Clickhouse, we are running into few roadblocks. For example: CUMULATIVE SUM or RUNNING TOTAL. We are trying to find an equivalent of Window Functions e.g. SUM(SALES) OVER (PARTITION BY PRODUCT ORDER BY SALES)
Is there a way to get Cumulative Sum or Running Total. Any inputs or guidance is much appreciated. Thanks!
The text was updated successfully, but these errors were encountered: