Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Just to verify my test result #8

Open
ywu-stats opened this issue Oct 24, 2019 · 14 comments
Open

Just to verify my test result #8

ywu-stats opened this issue Oct 24, 2019 · 14 comments

Comments

@ywu-stats
Copy link

Hi,

I was finally able to run it through with the sample data.
I just wanted to verify that my result look as expected(attached visualization). Very cool visualization though!
test

@xychang
Copy link
Owner

xychang commented Oct 24, 2019 via email

@ywu-stats ywu-stats reopened this Oct 28, 2019
@ywu-stats
Copy link
Author

Now you can tell how badly I wanted to make use of this.
So now I'm trying to figure out in each of the clusters, which userid are in it and what are the top ngrams shared in each of the cluster. Is it something I can easily get from the result.json? Or I have to print out in previous steps?

@xychang
Copy link
Owner

xychang commented Oct 29, 2019

It's actually fairly easy to get the userid for each cluster. If you look at the visulization.py, you can see a function called allUser, which takes in a tree/sub-tree and returns all the users in it. For ways to traverse the tree structure, you can look at line 59-79 in visulization.py.

As to the ngrams, you can look at line 90-91 in visulization.py.

You might need a bit of knowledge in python to modify the code to suit your own needs, but it should be fairly straightforward.

@ywu-stats
Copy link
Author

ywu-stats commented Oct 29, 2019 via email

@xychang
Copy link
Owner

xychang commented Oct 29, 2019

Yes, you can indeed do any length you want. It's just a part of preprocessing which is not included in this code, which means you need to be able to write some preprocessing code.
If you check out the link below, you can see a more detailed description.
https://github.com/xychang/RecursiveHierarchicalClustering#frequently-asked-questions

@ywu-stats
Copy link
Author

ywu-stats commented Oct 29, 2019

Hmmm...seems like it's the predefined input structure? Then I do want to clarify something about the methodology in the publication. My understanding was, the whole feature space is a union set of all possible Ngrams and the values are count of each Ngram appeared in whole path at userid level.

For example, from the path of ABCDEFG, if I set N-grams N=3 I should look at features={ABC,BCD,CDE,DEF,...EFG}, right?
So you are saying, this Ngram is part of the data processing step and ABC etc. are predefined in input data. I'm confused about how I should format my input. Is it ABC()BCD()CDE()...? Or I only need to modify the way of splitting the line for sid_seq?

@xychang
Copy link
Owner

xychang commented Oct 29, 2019 via email

@ywu-stats
Copy link
Author

I see, thanks!

@akhildevelops
Copy link
Contributor

Hi @xychang
In the paper it is mentioned that 5 as the optimal k value for creating K-Grams. Now if I understand it correctly below is what actions define from the point of view of the Repo.

Ex: A(5)B(7)
A => S1(g3)S2(g1)S3(g2)S4(g1)S2(g2)
B => S3(g1)S3(g1)S3(g2)S8(g1)S6(g1)

@xychang
Copy link
Owner

xychang commented Nov 20, 2019

Hi @Enforcer007, when we say 5-gram, it actually includes the timegap.
Following your example,
A => S1(g3)S2(g1)S3
B => S3(g1)S3(g1)S3

@akhildevelops
Copy link
Contributor

Hi @xychang

Thanks for responding. I have 2 questions:

Q1:
Consider we go for 3 gram and below is the click stream:
Sequence = S1g1S2g2S1g1S3g1S4g2S2g3S4g1S1

Then what wud be T3(Sequence):

T3(Sequence) = {(S1g1S2),(g1,S2,g2),(S2,g2,S1),......}
OR
T3(Sequence) = {(S1g1S2),(S2,g2,S1),(S1,g1,S3),......}

Q2:
When you say it's 5 gram. I see in the visualisation there is a 3 gram pattern. Can you please explain.

doubt

Thanks

@xychang
Copy link
Owner

xychang commented Nov 20, 2019 via email

@akhildevelops
Copy link
Contributor

K, that's gr8. Can you pls confirm on Q1

Thanks

@xychang
Copy link
Owner

xychang commented Nov 20, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants