Skip to content

ww6623/dbscan

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

DBSCAN: Density-Based Spatial Clustering of Applications with Noise

Reference:

Ester, M., H. P. Kriegel, J. Sander, and X. Xu, "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise". Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, pp. 226 - 231. 1996

Take a look at the test in the source code for a working example.

Code Example

import dbscan
dbs = DBSCAN(myData, eps, minPts)
results = dbs.run()

To run the included example:
>> import dbscan
>> dbscan.test("sample/iris.data")

Motivation

DBSCAN is a popular clustering algorithm. There is already a very good implementation of this algorithm in scikit-learn which is much faster than this one. But maybe this simple implementation can be useful for some folks studying the algorithm.

Usage

        Parameters
        ----------

        D: list of tuples
            stores points as a list of tuples
            of the form (<string id>, <float x>, <float y>)

            E.g. D = [('001', 0.5, 2.1), ('002', 1.0, 2.4)]

            Point ids don't have to be unique.

        eps: float
            maximum distance for two points to be
            considered the same neighborhood

            E.g. 0.001

        minPts: int
            Minimum number of points in a neighborhood for
            a point to be considered a core point. This
            includes the point itself.

            E.g. 4


        Returns
        -------

        A tuple of a list of Cluster objects and a list of
        noise, e.i. ([<list clusters>, <list noise pts>])


        Methods
        -------

        printClusters() - handy method for printing results
        run()           - run DBSCAN

        Example Usage
        -------------

        import dbscan

        dbs = DBSCAN(D, 0.001, 4)
        clusters = dbs.scan()

        # Print with printClusters
        dbs.printClusters()

        # Print with iteration
        for cluster in clusters:
            print(cluster.cid, cluster.pts)

Test

Try running DBSCAN on the included iris.data file which comes from the popular Iris Flower dataset.

You should get something that looks like this:

Iris flower dbscan clusters

About

Simple implementation of the popular DBSCAN clustering algorithm in Python.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%