Skip to content

Conversation

JelmerBot
Copy link
Collaborator

This PR simplifies and optimises the CondensedTree._select_clusters and simplify_hierarchy functions. It turns out that the simplify_hierarchy operation reduces to iterating over the cluster_tree in reverse order (so children are processed before their parents).

The PR introduces one behavioural change to simplify_hierarchy. Density values are no longer updated when two leaves combine into one. This change ensures that the resulting leaf-combination retains its full density range. Consequently, the resulting simplified tree more accurately reflects the persistence threshold. Only when a leaf merges into a non-leaf are the leaf's density values updated to avoid breaking the condensed tree plot. Those points can have densities higher than the non-leaf's birth density, resulting in child clusters breaking off from a parent cluster before its icicle ends.

@JelmerBot JelmerBot merged commit 8ecf239 into scikit-learn-contrib:master May 13, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant