Just to verify my test result #8

ywu-stats · 2019-10-24T23:22:17Z

Hi,

I was finally able to run it through with the sample data.
I just wanted to verify that my result look as expected(attached visualization). Very cool visualization though!

xychang · 2019-10-24T23:24:31Z

Yes, the results look like it is supposed to! Glad you figured it out by yourself! Nice work!

…

On Thu, Oct 24, 2019, 4:22 PM ywu-stats ***@***.***> wrote: Hi, I was finally able to run it through with the sample data. I just wanted to verify that my result look as expected(attached visualization). Very cool visualization though! [image: test] <https://user-images.githubusercontent.com/56888960/67532216-54c2b400-f67a-11e9-9525-4df95579f4a2.png> — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#8?email_source=notifications&email_token=AAL6WMC2Z7JBXC54PIJWFMDQQIU2TA5CNFSM4JE4UIT2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HUH6W3A>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAL6WMDGSUHSAUQS543DJ33QQIU2TANCNFSM4JE4UITQ> .

ywu-stats · 2019-10-28T19:55:42Z

Now you can tell how badly I wanted to make use of this.
So now I'm trying to figure out in each of the clusters, which userid are in it and what are the top ngrams shared in each of the cluster. Is it something I can easily get from the result.json? Or I have to print out in previous steps?

xychang · 2019-10-29T06:06:32Z

It's actually fairly easy to get the userid for each cluster. If you look at the visulization.py, you can see a function called allUser, which takes in a tree/sub-tree and returns all the users in it. For ways to traverse the tree structure, you can look at line 59-79 in visulization.py.

As to the ngrams, you can look at line 90-91 in visulization.py.

You might need a bit of knowledge in python to modify the code to suit your own needs, but it should be fairly straightforward.

ywu-stats · 2019-10-29T06:22:54Z

Thank you for the information! I'm indeed learning Python recently :) Another question I have is that where can I change the length of ngrams? I remember in your publication you mentioned 5, but I only see one action per feature in different test case I have.

…

On Mon, Oct 28, 2019 at 11:06 PM Xinyi Zhang ***@***.***> wrote: It's actually fairly easy to get the userid for each cluster. If you look at the visulization.py, you can see a function called allUser, which takes in a tree/sub-tree and returns all the users in it. For ways to traverse the tree structure, you can look at line 59-79 in visulization.py. As to the ngrams, you can look at line 90-91 in visulization.py. You might need a bit of knowledge in python to modify the code to suit your own needs, but it should be fairly straightforward. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#8?email_source=notifications&email_token=ANSA5AHF7GNMKL6IQYIXXCTQQ7HGRA5CNFSM4JE4UIT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECPLEGI#issuecomment-547271193>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANSA5AFD5NV7ETTKUIZGZDDQQ7HGRANCNFSM4JE4UITQ> .

xychang · 2019-10-29T06:26:12Z

Yes, you can indeed do any length you want. It's just a part of preprocessing which is not included in this code, which means you need to be able to write some preprocessing code.
If you check out the link below, you can see a more detailed description.
https://github.com/xychang/RecursiveHierarchicalClustering#frequently-asked-questions

ywu-stats · 2019-10-29T17:58:45Z

Hmmm...seems like it's the predefined input structure? Then I do want to clarify something about the methodology in the publication. My understanding was, the whole feature space is a union set of all possible Ngrams and the values are count of each Ngram appeared in whole path at userid level.

For example, from the path of ABCDEFG, if I set N-grams N=3 I should look at features={ABC,BCD,CDE,DEF,...EFG}, right?
So you are saying, this Ngram is part of the data processing step and ABC etc. are predefined in input data. I'm confused about how I should format my input. Is it ABC()BCD()CDE()...? Or I only need to modify the way of splitting the line for sid_seq?

xychang · 2019-10-29T21:28:30Z

Yes, the input format should be ABC()BCD()CDE(). This is because this github repo is intended to be more general purpose than what is described in the paper. Hope this answers your question!

…

On Tue, Oct 29, 2019, 10:58 AM ywu-stats ***@***.***> wrote: Hmmm...seems like it's the predefined input structure? Then I do want to clarify something about the methodology in the publication. My understanding was, the whole feature space is a union set of all possible Ngrams and the values are count of each Ngram appeared in whole path at userid level. For example, from the path of ABCDEFG, if I set N-grams N=3 I should look at {ABC,BCD,CDE,DEF,...EFG}, right? So you are saying, this Ngram is part of the data processing step and ABC etc. are predefined in input data. I'm confused about how I should format my input. Is it ABC()BCD()CDE()...? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#8?email_source=notifications&email_token=AAL6WMB7DY3OUNM5D5Y4MHDQRB2VXA5CNFSM4JE4UIT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECRP7SI#issuecomment-547553225>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAL6WMBOMHO6VINHI4LTHMTQRB2VXANCNFSM4JE4UITQ> .

ywu-stats · 2019-10-29T21:45:44Z

I see, thanks!

akhildevelops · 2019-11-20T07:00:52Z

Hi @xychang
In the paper it is mentioned that 5 as the optimal k value for creating K-Grams. Now if I understand it correctly below is what actions define from the point of view of the Repo.

Ex: A(5)B(7)
A => S1(g3)S2(g1)S3(g2)S4(g1)S2(g2)
B => S3(g1)S3(g1)S3(g2)S8(g1)S6(g1)

xychang · 2019-11-20T07:03:15Z

Hi @Enforcer007, when we say 5-gram, it actually includes the timegap.
Following your example,
A => S1(g3)S2(g1)S3
B => S3(g1)S3(g1)S3

akhildevelops · 2019-11-20T07:22:38Z

Hi @xychang

Thanks for responding. I have 2 questions:

Q1:
Consider we go for 3 gram and below is the click stream:
Sequence = S1g1S2g2S1g1S3g1S4g2S2g3S4g1S1

Then what wud be T3(Sequence):

T3(Sequence) = {(S1g1S2),(g1,S2,g2),(S2,g2,S1),......}
OR
T3(Sequence) = {(S1g1S2),(S2,g2,S1),(S1,g1,S3),......}

Q2:
When you say it's 5 gram. I see in the visualisation there is a 3 gram pattern. Can you please explain.

Thanks

xychang · 2019-11-20T07:27:57Z

So, in our implementation, we actually included both 3 grams and 5 grams. We found it to be helpful in practice.

…

On Tue, Nov 19, 2019, 11:22 PM Akhil a.k.a Enforcer007 < ***@***.***> wrote: Hi @xychang <https://github.com/xychang> Thanks for responding. I have 2 questions: *Q1:* Consider we go for 3 gram and below is the click stream: Sequence = S1g1S2g2S1g1S3g1S4g2S2g3S4g1S1 Then what wud be T3(Sequence): T3(Sequence) = {(S1g1S2),(g1,S2,g2),(S2,g2,S1),......} OR T3(Sequence) = {(S1g1S2),(S2,g2,S1),(S1,g1,S3),......} *Q2*: When you say it's 5 gram. I see in the visualisation there is a 3 gram pattern. Can you please explain. [image: doubt] <https://user-images.githubusercontent.com/6951100/69217376-8e60df00-0b94-11ea-8db9-85246448de06.png> Thanks — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8?email_source=notifications&email_token=AAL6WMDKRE6YHE7PX6FLM7LQUTQT5A5CNFSM4JE4UIT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEQ7L4Y#issuecomment-555873779>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAL6WMHMSNJKP4BRD3Y6IXDQUTQT5ANCNFSM4JE4UITQ> .

akhildevelops · 2019-11-20T07:29:13Z

K, that's gr8. Can you pls confirm on Q1

Thanks

xychang · 2019-11-20T07:33:17Z

For q1, the answer would be the latter.

…

On Tue, Nov 19, 2019, 11:22 PM Akhil a.k.a Enforcer007 < ***@***.***> wrote: Hi @xychang <https://github.com/xychang> Thanks for responding. I have 2 questions: *Q1:* Consider we go for 3 gram and below is the click stream: Sequence = S1g1S2g2S1g1S3g1S4g2S2g3S4g1S1 Then what wud be T3(Sequence): T3(Sequence) = {(S1g1S2),(g1,S2,g2),(S2,g2,S1),......} OR T3(Sequence) = {(S1g1S2),(S2,g2,S1),(S1,g1,S3),......} *Q2*: When you say it's 5 gram. I see in the visualisation there is a 3 gram pattern. Can you please explain. [image: doubt] <https://user-images.githubusercontent.com/6951100/69217376-8e60df00-0b94-11ea-8db9-85246448de06.png> Thanks — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8?email_source=notifications&email_token=AAL6WMDKRE6YHE7PX6FLM7LQUTQT5A5CNFSM4JE4UIT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEQ7L4Y#issuecomment-555873779>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAL6WMHMSNJKP4BRD3Y6IXDQUTQT5ANCNFSM4JE4UITQ> .

ywu-stats closed this as completed Oct 28, 2019

ywu-stats reopened this Oct 28, 2019

AnthonyruihChen mentioned this issue Aug 18, 2021

Failed to reproduce the results with sample data 'input.txt' provided #17

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Just to verify my test result #8

Just to verify my test result #8

ywu-stats commented Oct 24, 2019

xychang commented Oct 24, 2019 via email

ywu-stats commented Oct 28, 2019

xychang commented Oct 29, 2019

ywu-stats commented Oct 29, 2019 via email

xychang commented Oct 29, 2019

ywu-stats commented Oct 29, 2019 •

edited

xychang commented Oct 29, 2019 via email

ywu-stats commented Oct 29, 2019

akhildevelops commented Nov 20, 2019

xychang commented Nov 20, 2019

akhildevelops commented Nov 20, 2019

xychang commented Nov 20, 2019 via email

akhildevelops commented Nov 20, 2019

xychang commented Nov 20, 2019 via email

Just to verify my test result #8

Just to verify my test result #8

Comments

ywu-stats commented Oct 24, 2019

xychang commented Oct 24, 2019 via email

ywu-stats commented Oct 28, 2019

xychang commented Oct 29, 2019

ywu-stats commented Oct 29, 2019 via email

xychang commented Oct 29, 2019

ywu-stats commented Oct 29, 2019 • edited

xychang commented Oct 29, 2019 via email

ywu-stats commented Oct 29, 2019

akhildevelops commented Nov 20, 2019

xychang commented Nov 20, 2019

akhildevelops commented Nov 20, 2019

xychang commented Nov 20, 2019 via email

akhildevelops commented Nov 20, 2019

xychang commented Nov 20, 2019 via email

ywu-stats commented Oct 29, 2019 •

edited