# Density-Based Clustering with USA Arrest Data Set

## 1. Data Collection

**Description:**

This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas.

The data set has 50 record stand for 50 US states and 4 variables as the following:
- **Murder**: Murder arrests (per 100 000) 
- **Assault**: Assault arrests (per 100 000) 
- **Rape** : Rape arrests (per 100 000)
- **UrbanPop**: Percent urban population


## 2. Data Presentation

In [None]:
dat_raw = USArrests
head(dat_raw)

In [None]:
summary(dat_raw)
options(repr.plot.width = 6, repr.plot.height = 6, repr.plot.res = 200)
require(graphics)
pairs(dat_raw, panel = panel.smooth, main = "USArrests data")

## 3. Analysis and Interpretation

### 3.1. Analysis with K-mean Methods

In [None]:
#install.packages("factoextra")
#install.packages("NbClust")
require(factoextra)
require(NbClust)
dat = scale(dat_raw)
options(repr.plot.width = 6, repr.plot.height = 4, repr.plot.res = 200)
fviz_nbclust(dat, kmeans, method = "wss") + geom_line()

#### Chọn k = 4, số cụm ít, sự khác biệt giữa các cá thể trong cụm cũng ít.

In [None]:
km = kmeans(dat,centers = 4)
options(repr.plot.width = 7, repr.plot.height = 4.5, repr.plot.res = 200)
fviz_cluster(km, dat, ellipse.type = "norm")
summary(km)

In [None]:
#options(repr.plot.width = 6, repr.plot.height = 6, repr.plot.res = 200)
#plot(km,dat_raw)

### 3.2. Analysis with Density-Based Methods

In [None]:
#install.packages("fpc"
require(fpc)
require(dbscan)
#Obtaining optimal Eps value
dens_dat = dat_raw[-3]
head(dens_dat)
options(repr.plot.width = 6, repr.plot.height = 4, repr.plot.res = 200)
kNNdistplot(dens_dat, k = 3)
abline(h=22,lty=2)

In [None]:
#Density-Based Clustering with DBSCAN
set.seed(123)
res <- fpc::dbscan(dat_raw
                   ,eps=25
                   ,MinPts=3)
res

In [None]:
options(repr.plot.width = 9, repr.plot.height = 4.5, repr.plot.res = 200)
res.plot = fviz_cluster(res,dat_raw,geom = "point")
res.plot

In [None]:
options(repr.plot.width = 6, repr.plot.height = 6, repr.plot.res = 200)
plot(res,dat_raw)