GitHub - ww6623/dbscan: Simple implementation of the popular DBSCAN clustering algorithm in Python.

Introduction

DBSCAN: Density-Based Spatial Clustering of Applications with Noise

Reference:

Ester, M., H. P. Kriegel, J. Sander, and X. Xu, "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise". Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, pp. 226 - 231. 1996

Take a look at the test in the source code for a working example.

Code Example

import dbscan
dbs = DBSCAN(myData, eps, minPts)
results = dbs.run()

To run the included example:
>> import dbscan
>> dbscan.test("sample/iris.data")

Motivation

DBSCAN is a popular clustering algorithm. There is already a very good implementation of this algorithm in scikit-learn which is much faster than this one. But maybe this simple implementation can be useful for some folks studying the algorithm.

Usage

        Parameters
        ----------

        D: list of tuples
            stores points as a list of tuples
            of the form (<string id>, <float x>, <float y>)

            E.g. D = [('001', 0.5, 2.1), ('002', 1.0, 2.4)]

            Point ids don't have to be unique.

        eps: float
            maximum distance for two points to be
            considered the same neighborhood

            E.g. 0.001

        minPts: int
            Minimum number of points in a neighborhood for
            a point to be considered a core point. This
            includes the point itself.

            E.g. 4


        Returns
        -------

        A tuple of a list of Cluster objects and a list of
        noise, e.i. ([<list clusters>, <list noise pts>])


        Methods
        -------

        printClusters() - handy method for printing results
        run()           - run DBSCAN

        Example Usage
        -------------

        import dbscan

        dbs = DBSCAN(D, 0.001, 4)
        clusters = dbs.scan()

        # Print with printClusters
        dbs.printClusters()

        # Print with iteration
        for cluster in clusters:
            print(cluster.cid, cluster.pts)

Test

Try running DBSCAN on the included iris.data file which comes from the popular Iris Flower dataset.

You should get something that looks like this:

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
lib		lib
sample		sample
README.md		README.md
figure_1.png		figure_1.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Motivation

Usage

Test

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Introduction

Motivation

Usage

Test

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages