getCoresetFromManager of BucketManager #70

pcandido · 2017-03-22T20:58:09Z

The function getCoresetFromManager() of class BucketManager is responsible to retrieve the coreset summarized in the buckets.

Why does the funcion return only the last bucket if it is full? And about new objects? The last bucket has the oldest objects of stream, and the new objects can spend much time to reach it.

See that when the last bucket is full, the next (2^(L-1))*m objects will make no difference to clustering, since only last bucket is returned.

richard-moulton · 2017-07-31T15:37:55Z

This part of the getCoresetFromManager() function does not make sense to me either. Having looked a bit deeper, I am not sure that this behaviour is consistent with the original paper. It seems to me that the coreset should be computed from all of the non-empty buckets whenever it is needed so that the clustering produced makes use of the most recent instances.

This is one of the modifications I have made to MOA's StreamKM++ algorithm and made available on Github here.

richard-moulton · 2017-08-22T12:21:51Z

This modification was a part of pull request #100, which as been merged into MOA's master branch.

Recommend closing this issue.

richard-moulton mentioned this issue Aug 1, 2017

Modify StreamKM. #100

Merged

pcandido closed this as completed Aug 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

getCoresetFromManager of BucketManager #70

getCoresetFromManager of BucketManager #70

pcandido commented Mar 22, 2017 •

edited

Loading

richard-moulton commented Jul 31, 2017

richard-moulton commented Aug 22, 2017

getCoresetFromManager of BucketManager #70

getCoresetFromManager of BucketManager #70

Comments

pcandido commented Mar 22, 2017 • edited Loading

richard-moulton commented Jul 31, 2017

richard-moulton commented Aug 22, 2017

pcandido commented Mar 22, 2017 •

edited

Loading