- Added the ability to specify the nearest neighbour algorithm in the
HyperParamBuilder
and also implemented a brute force nearest neighbour algorithm. Internally, HDBSCAN calculates a density measure called core distances, which is defined as the distance of a data point to it's kth neighbour. Now it is possible to choose the nearest neighbour algorithm using theNnAlgorithm
enum, being any ofAuto
,KdTree
orBruteForce
.Auto
will choose the algorithm internally based on the nature of the data.
- Performance gain, which allows the algorithm to scale better to larger datasets as fewer operations are now required to calculate the minimum spanning tree of the data points in a data set. An implication of this change is that the order in which the labels are applied to data points will change from run to run.
- Critical fix for a bug where clusters greater than one node down in the tree were not being deselected in cases where the "grandparent" cluster was the most stable and should have been selected.
- Further bug fix for a panic that occurred when
allow_single_cluster == true
.
- Added support for calculating cluster centroids once clustering is done, with
Hdbscab::calc_centers
method
- Added
max_cluster_size
hyper parameter, with support in the hyper parameter builder - Improved read me documentation on current state of the algorithm