Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AggregateFunction windowFunnel #2352

Merged
merged 5 commits into from
May 13, 2018

Conversation

sundy-li
Copy link
Contributor

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

@sundy-li
Copy link
Contributor Author

sundy-li commented May 13, 2018

In 2017, there was a wonderful share by Mariya Mansurova who taught us how to calculate complex funnel metrics in ClickHouse by sequenceMatch.

But it was less efficient (pattern matching) and it lacks the sliding timestamp window support, which is very important on web analysis, especially on the E-commerce website analysis. So I do think it is very necessary to add this windowFunnel AggregateFunction.

Related issue #2096 #1836 #1241

alexey-milovidov added a commit that referenced this pull request May 13, 2018
alexey-milovidov added a commit that referenced this pull request May 13, 2018
@alexey-milovidov alexey-milovidov merged commit b096f95 into ClickHouse:master May 13, 2018
@alexey-milovidov
Copy link
Member

Good feature. Thank you!

@Jingyi244
Copy link

cheer 4 this.

@hongyuzhou
Copy link

In 2017, there was a wonderful share by Mariya Mansurova who taught us how to calculate complex funnel metrics in ClickHouse by sequenceMatch.

But it was less efficient (pattern matching) and it lacks the sliding timestamp window support, which is very important on web analysis, especially on the E-commerce website analysis. So I do think it is very necessary to add this windowFunnel AggregateFunction.

Related issue #2096 #1836 #1241

HI, Sunday... Here is a question about windowFunnel, the ComparePairFirst only compare event_time, what if two different things happened on the same event_time, and it did happen in the real world, it may cause random sort.

My suggestion is do a little change

struct ComparePairFirst final
{
    template <typename T1, typename T2>
    bool operator()(const std::pair<T1, T2> & lhs, const std::pair<T1, T2> & rhs) const
    {
        if (lhs.first < rhs.first) return true;
        else if (lhs.first = rhs.first) return lhs.second < rhs.second;
        else return false;
    }
};

Hope you can respond to me, thanks a lot.

@sundy-li
Copy link
Contributor Author

sundy-li commented May 22, 2020

My suggestion is do a little change

In this case, your advice may make the versatility lost. I don't think we should modify the ComparePairFirst function, yet we can make a new column, such as a1 * c1 + a2 * c2 + a3 * c3.. to be another kind of timestamp in source data.

@hongyuzhou
Copy link

column

Thanks for your answer... Make a new timestamp column sound a good idea, but could you please explain a1 * c1 + a2 * c2 + a3 * c3.. specifically more? What do these a1, c1, a2, c2... parameters represent?

For example, here is a demo:

pid event event_time
123 A 10s
123 B 10s

@sundy-li
Copy link
Contributor Author

What do these a1, c1, a2, c2... parameters represent

It's just a simple demo equation. In your case, if you want to sort the events by sequential comparison by (event_time, pid), you can create an equation based on your data.

@hongyuzhou
Copy link

hongyuzhou commented May 23, 2020

What do these a1, c1, a2, c2... parameters represent

It's just a simple demo equation. In your case, if you want to sort the events by sequential comparison by (event_time, pid), you can create an equation based on your data.

Thanks a lot... I try to make the description of my confusion clearer again. Actually, in my case, if I try to use windowFunnel, it will get two different results usually. For example:

select t.uid,
       windowFunnel(20) (event_time, event='A', event='B') as funnel
from (
select 123 as uid,
	   'A' as event,
	   10 as event_time
union all
select 123 as uid,
	   'B' as event,
	   10 as event_time
) t group by t.uid

clickhouse

How could I fix the problem use by SQL without modify source code? Or does it works if I modify ComparePairFirst function like before?

@ZhangzheBJUT
Copy link

ZhangzheBJUT commented Sep 28, 2020

What do these a1, c1, a2, c2... parameters represent

It's just a simple demo equation. In your case, if you want to sort the events by sequential comparison by (event_time, pid), you can create an equation based on your data.

it can not use equation. the equation will make the window size not work!

i think you shoud change the ComparePairFirst function as :

struct ComparePairFirst final
{
    template <typename T1, typename T2>
    bool operator()(const std::pair<T1, T2> & lhs, const std::pair<T1, T2> & rhs) const
    {
        if (lhs.first < rhs.first) return true;
        else if (lhs.first = rhs.first) return lhs.second < rhs.second;
        else return false;
    }
};

Copy link
Contributor

@HeenaBansal2009 HeenaBansal2009 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we check the max_events limit while deserializing the Buffer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants