### DBSCAN

DBSCAN stands for density-based spatial clustering of applications with noise. It is able to find arbitrary shaped clusters and clusters with noise (i.e. outliers).

The main idea behind DBSCAN is that a point belongs to a cluster if it is close to many points from that cluster.

#### In which cases might it be more useful to apply?

DBSCAN is a density-based clustering technique. When it comes to arbitrary shaped clusters or detecting outliers, density-based techniques are more efficient than partition based and hierarchical clustering

The following datasets are examples in which is isefull to have techiques like DBSCAN 

![image.png](attachment:image.png)

![image.png](attachment:image.png)

The data points in these figures are grouped in arbitrary shapes or include outliers. Density-based clustering algorithms are very effienct at finding high-density regions and outliers. It is very important to detect outliers for some task, e.g. anomaly detection.

Examples of applications where DBSCAN can be useful include:

- Identifying clusters of customer locations for targeted marketing campaigns.
- Clustering spatial data such as GPS locations, weather data, and satellite imagery.
- Identifying clusters of genes with similar expression patterns in genomic data.
- Clustering social network data to identify communities of users with similar interests or behaviors.
- Clustering text data to identify groups of documents with similar topics.

#### What are the mathematical fundamentals of it?

The DBSCAN algorithm is based on two key concepts: density and neighborhood.

Density: In DBSCAN, density is defined as the number of data points within a given radius (epsilon, ε) of a particular point. The density of a point is important because it determines whether the point is a core point, a border point, or a noise point.

Neighborhood: In DBSCAN, the neighborhood of a point is defined as the set of points within a distance of ε from that point. The neighborhood of a point is important because it is used to determine whether a point is a core point or a border point.

The main idea behind DBSCAN is that a point belongs to a cluster if it is close to many points from that cluster.

There are two key parameters of DBSCAN:

eps: The distance that specifies the neighborhoods. Two points are considered to be neighbors if the distance between them are less than or equal to eps.
minPts: Minimum number of data points to define a cluster.


Based on these two parameters, points are classified as core point, border point, or outlier:

Core point: A point is a core point if there are at least minPts number of points (including the point itself) in its surrounding area with radius eps.
Border point: A point is a border point if it is reachable from a core point and there are less than minPts number of points within its surrounding area.
Outlier: A point is an outlier if it is not a core point and not reachable from any core points.


![image.png](attachment:image.png)

In this case, minPts is 4. Red points are core points because there are at least 4 points within their surrounding area with radius eps. This area is shown with the circles in the figure. The yellow points are border points because they are reachable from a core point and have less than 4 points within their neighborhood. Reachable means being in the surrounding area of a core point. The points B and C have two points (including the point itself) within their neigborhood (i.e. the surrounding area with a radius of eps). Finally N is an outlier because it is not a core point and cannot be reached from a core point.

#### Algorithm 

- minPts and eps (ε) are determined.

- For each point in the dataset, the algorithm counts the number of points within a distance ε of that point. If this number is greater than or equal to a specified threshold (minPts), the point is classified as a core point. Otherwise, the point is classified as a border point or a noise point.

- For each core point, the algorithm forms a cluster by connecting it to all other core points within a distance ε. If two core points are connected, they belong to the same cluster.

- Any border point that is not part of a cluster is labeled as noise.

- The algorithm repeats steps 2 and 3 until all points have been assigned to a cluster or labeled as noise.

#### Is there any relation between DBSCAN and Spectral Clustering? If so, what is it?

DBSCAN and Spectral Clustering are two different clustering algorithms, but they do share some similarities.

Spectral Clustering is a graph-based clustering algorithm that works by representing the data points as nodes in a graph and clustering them based on the graph structure. The algorithm first constructs an affinity matrix that measures the similarity between pairs of data points, and then uses spectral decomposition to cluster the data points based on the eigenvectors of the affinity matrix.

DBSCAN, on the other hand, is a density-based clustering algorithm that works by identifying regions of high density in the data and grouping the data points in those regions into clusters.

One way in which DBSCAN and Spectral Clustering are related is that they can both be used to cluster data that is not linearly separable. Spectral Clustering can identify non-linearly separable clusters by projecting the data onto a lower-dimensional space using the eigenvectors of the affinity matrix, while DBSCAN can identify clusters of arbitrary shape by grouping data points based on their density.