Refactor bisecting k-means #4

mengxr · 2015-11-09T08:19:52Z

@yu-iskw This PR contains the refactoring code based on our offline discussion.

yu-iskw · 2015-11-09T18:57:58Z

LGTM, merging this. Thank you for this!

I think we still have a few points to discuss. As you suggested, these should be supported in Spark 1.7. One important thing is that we should design the fist commit to leave room for improvement. We can modify that without changing the top-level API.

Support other distance metrics
- ex) Some NLP guys requested me to support cosine distance
Support other costs to evaluate the clusters, such as entropy, instead of average costs
Export the dendrogram
- ex) adjacency list, linkage matrix
Calculate dendrogram distance between two leaf clusters
Find the best k by cutting a cluster tree without recomputation

Refactor bisecting k-means

mengxr force-pushed the SPARK-6517 branch from a2a065c to d422be7 Compare November 9, 2015 08:27

refactor

d422be7

yu-iskw added a commit that referenced this pull request Nov 9, 2015

Merge pull request #4 from mengxr/SPARK-6517

75ca2a0

Refactor bisecting k-means

yu-iskw merged commit 75ca2a0 into yu-iskw:new-hierarchical-clustering Nov 9, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor bisecting k-means #4

Refactor bisecting k-means #4

mengxr commented Nov 9, 2015

yu-iskw commented Nov 9, 2015

Refactor bisecting k-means #4

Refactor bisecting k-means #4

Conversation

mengxr commented Nov 9, 2015

yu-iskw commented Nov 9, 2015