# Introduction
Density-Based Spatial Clustering of Applications with Noise (DBSCAN), is a data clustering algorithm that works by identifying clusters of high density points separated by regions of low density. Unlike K-Means or GMMs that rely on distances or predefined centres, DBSCAN focuses on the density of points to group them.

### Density based clustering
DBSCAN does not require pre-defining the number of clusters or specifying initial centroids. It discovers clusters based on how closely packed the data points are in a specific area.

### Outlier handling
DBSCAN excels at identifying outlier because points with few neighbors are classified as noise, making it robust to data with outliers.

### Flexible clustering
DBSCAN isn't limited to spherical clusters like K-Means. It can effectlvely handle clusters of arbitrary shapes, such as elongated or crescent-shaped ones.

### Intuition behind DBSCAN
Unlike K-Means and GMMs, which rely on distance metrics, DBSCAN focuses on the density of points to identify clusters. This makes it adept at handling datasets with outliers or clusters of irregular shapes.

Imagine a dataset visualized as points of a graph. Dense regions with many neighboring points likely represent clusters, while isolated points could be outliers. DBSCAN leverages this intuition to group points.

### Advantages
- Robust to outliers: DBSCAN can effectively identify outliers due to its density-based approach.
- Flexible cluster shapes: It can handle clusters of arbitrary shapes, unlike K-Means which assumes spherical clusters.

### Disadvantages
- Parameter sensitivity: The performance of DBSCAN is highly dependent on choosing the appropriate values for epsilon (`eps`) and minimum data points (`min_samples`).
- High time complexity: The time complexity of DBSCAN can be higher compared to K-Means, especially for large datasets.

# Terminology in DBSCAN
DBSCAN operates on a unique principle: identifying clusters based on the density of data points.

### Core points
- Imagine a data point, p.
- Epsilon (`eps`): This parameter defines a radius around p. Think of it as a circle or sphere (hypersphere in higher dimension) centered at p.
- Density at p: This refers to the number of points within the epsilon-neighborhood of p. In simpler terms, it captures how many other data points are close to p.
- A point p is crowned a core point if the density at p (the number of points within its epsilon-neighborhood) is greater than or equal to the minimum points parameter (`min_samples`). Essentially, a core point has a high concentration of neighbors around it, suggesting it resides within a dense region.

### Border points
Not all points fall neatly into the core category. Some exist on the outskirts of clusters. A border point is a point that,
- Is not a core point itself. It does not have enough points in its own epsilon-neighborhood to meet the `min_samples` criteria and be classified as a core point.
- Lies within the epsilon-neighborhood of a core point. It benefits from the density of a nearby core point, even though it does not have a high density of neighbors itself.
- Think of border points like those living in the suburbs of a city - ther're close to a densely populated area (the core point), but their own neighborhood might be less populated.

### Outliers (or noise)
- Data can have isolated points that do not seem to belong to any well defined cluster.
- Points that are neither core points nor border points are classified as outliers or noise.
- These points typically have low density around them (fewer than `min_samples` neighbors within `eps`). They are the loners of the dataset, far away from the hustle and bustle of the clusters.

### Hyperparameters
DBSCAN's performance relies heavily on 2 crucial hyperparameters,
- `min_samples`: This defines the minimum number of neighbors a point needs to be considered a core point. Setting `min_samples` too high might break up the clusters, while setting it too low might include noise points as core points.
- `eps`: This defines the radius of the neighborhood around a point. A small `eps` might miss the points that truly belong to the same cluster, while a large `eps` might merge distinct clusters together. Choosing the right `eps` like picking the perfect zoom level on a map - the details of the cluster should be clearly visible without getting lost in the bigger picture.

### Density edges
Density edge refers to a connection between two data points (vertices) where the distance betweent them is less than or equal to `eps`. It is essentially a line segment connecting 2 points within each other's epsilon-neighborhood. Imagine 2 friends living close enough to walk to each other's house - that is a density edge.

### Density connected points
This concept extends the idea of density edges. Imagine 2 points, p and q, that might be far apart directly. However, if there's a chain of points between them, each connected by density edges (less than `eps` apart), then p and q are considered to be density connected. This captures the notion of clusters that might be elongated or have sparse regions within them. Think of a long winding road connecting 2 towns - even though the towns themselves might be far apart, they are connected by the road (the chain of density connected points).