Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confused by DBSCAN min points #6

Closed
awenger opened this issue May 26, 2015 · 4 comments
Closed

Confused by DBSCAN min points #6

awenger opened this issue May 26, 2015 · 4 comments

Comments

@awenger
Copy link

awenger commented May 26, 2015

I modified the sample and all the points are marked as noise if you increase the min points value to 3:

var dataset = [
    [1,1],[0,1],[1,0],
    [10,10],[10,13],[13,13],
    [54,54],[55,55],[89,89],[57,55]
];

var clustering = require('density-clustering');
var dbscan = new clustering.DBSCAN();
var clusters = dbscan.run(dataset, 5, 3);
console.log(clusters);
console.log(dbscan.noise);

I'm not sure whether this is the intended behavior for DBSCAN. I think this should be fixed or documented. What do you think?

I guess this could be fixed arround here: https://github.com/LukaszKrawczyk/clustering/blob/master/lib/DBSCAN.js#L68

@uhho
Copy link
Owner

uhho commented Jun 1, 2015

@awenger Thanks for your comment!

As far as I understand the algorithm, this is correct behaviour.
According to algorithm description, minPts is a minimum number of points in ε-neighbourhood around the core point to form a cluster.
In your case, ε is was too small to form a cluster containing at least 3 points, so all points were marked as a noise.

Yes, I agree with you it may be confusing and all parameters needs brief explanation.
I've got two ideas:

  • write more detailed explaination it in JSDoc comments included in the code
  • add API.md containing list of exposed functions and detailed explaination

What do you think?
If you want to help, feel free to submit pull request!

@awenger
Copy link
Author

awenger commented Jun 1, 2015

@lukaszkrawczyk I'm also not 100% sure. But if I look at wikipedia it describes the parameters as: DBSCAN requires two parameters: ε (eps) and the minimum number of points required to form a dense region [1]

The pseudocode for regionQuery also includes the point that was used to query the neighborhood [1]:

regionQuery(P, eps)
   return all points within P's eps-neighborhood (including P)

I guess this is different in this DBSCAN implementation: https://github.com/LukaszKrawczyk/clustering/blob/master/lib/DBSCAN.js#L184

What do you think?
[1] http://en.wikipedia.org/wiki/DBSCAN

@uhho
Copy link
Owner

uhho commented Jun 1, 2015

Yes... there was a similar discussion on including core point in OPTICS algorithm some time ago:
#4.

I checked how it's implemented in other libraries (such as e.g. SciKit), and I'm more and more confident we should change the algorithm to include core points, both in DBSCAN and OPTICS.

I will modify code and update package on NPM.

@uhho uhho closed this as completed Jun 3, 2015
@awenger
Copy link
Author

awenger commented Jun 3, 2015

Thanks a lot for the update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants