Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getCoresetFromManager of BucketManager #70

Closed
pcandido opened this issue Mar 22, 2017 · 2 comments
Closed

getCoresetFromManager of BucketManager #70

pcandido opened this issue Mar 22, 2017 · 2 comments

Comments

@pcandido
Copy link

pcandido commented Mar 22, 2017

The function getCoresetFromManager() of class BucketManager is responsible to retrieve the coreset summarized in the buckets.

Why does the funcion return only the last bucket if it is full? And about new objects? The last bucket has the oldest objects of stream, and the new objects can spend much time to reach it.

See that when the last bucket is full, the next (2^(L-1))*m objects will make no difference to clustering, since only last bucket is returned.

@richard-moulton
Copy link
Contributor

This part of the getCoresetFromManager() function does not make sense to me either. Having looked a bit deeper, I am not sure that this behaviour is consistent with the original paper. It seems to me that the coreset should be computed from all of the non-empty buckets whenever it is needed so that the clustering produced makes use of the most recent instances.

This is one of the modifications I have made to MOA's StreamKM++ algorithm and made available on Github here.

@richard-moulton
Copy link
Contributor

This modification was a part of pull request #100, which as been merged into MOA's master branch.

Recommend closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants