# Lab Assignment 07
The objective of this lab assignment is to cluster customers of a telephone company (`data_lab_07.csv`), using different clustering techniques and evaluate the clusters found

#### Instructions:
Complete each task and question by filling in the blanks (`...`) with one or more lines of code or text. Tasks 1-11 and questions 1-5 are worth **0.5 points** each and questions 6-7 are worth **1 point** each (out of **10 points**).

#### Submission:
This assignment is due **Monday, November 18, at 11:59PM (Central Time)**.

This assignment must be submitted on Gradescope as a **PDF file** containing the completed code for each task and the corresponding output. Late submissions will be accepted within **0-12**  hours after the deadline with a **0.5-point (5%) penalty** and within **12-24** hours after the deadline with a **2-point (20%) penalty**. No late submissions will be accepted more than 24 hours after the deadline.

**This assignment is individual**. Offering or receiving any kind of unauthorized or unacknowledged assistance is a violation of the University’s academic integrity policies, will result in a grade of zero for the assignment, and will be subject to disciplinary action.

### Part 1: Hierarchical Clustering

In [None]:
# Load libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler  
from scipy.cluster.hierarchy import linkage, fcluster
from sklearn.cluster import KMeans, DBSCAN
from sklearn import metrics

In [None]:
# Load dataset and display the first five rows
data = pd.read_csv('data_lab_07.csv')
data.head()

**Task 01 (of 11): Create a new numerical attribute named 'Total charge' that contains the sum of the attributes 'Total day charge', 'Total eve charge', and 'Total night charge'.**

In [None]:
data['Total charge'] = ...

In [None]:
# Partition the dataset into attributes and true clusters (churned/non-churned)
# Consider only the following attributes: 'International plan', 'Total charge', and 'Customer service calls'
X = data[['International plan', 'Total charge', 'Customer service calls']]
Y = data['Churn']

**Task 02 (of 11): Standardize the attributes.**

In [None]:
scaler = StandardScaler()
scaler.fit(...)
X_scaled = scaler.transform(...)

**Task 03 (of 11): Cluster the dataset using hierarchical clustering with single linkage method.**
_Hint:_ Use single linkage as the method and Euclidean distance as the distance metric.

In [None]:
clustering = linkage(...)
clusters = fcluster(clustering, 2, criterion = 'maxclust')

**Task 04 (of 11): Plot contingency matrix and compute evaluation metrics for hierarchical clustering with single linkage method.**

In [None]:
cont_matrix = metrics.cluster.contingency_matrix(...)
sns.heatmap(cont_matrix, annot = True, fmt = ".3f", square = True, cmap = plt.cm.Blues)
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.title('Contingency matrix')
plt.tight_layout()

In [None]:
adjusted_rand_index = ...
silhouette_coefficient = ...
print([adjusted_rand_index, silhouette_coefficient])

In [None]:
# Plot clusters found using hierarchical clustering with single linkage method
data['clusters'] = clusters
ax = data.plot(kind = 'scatter', x = 'Total charge', y = 'International plan', c = 'clusters', colormap = plt.cm.brg)
ax = data.plot(kind = 'scatter', x = 'Total charge', y = 'Customer service calls', c = 'clusters', colormap = plt.cm.brg)

**Question 01 (of 07): What can you conclude about the clusters found using hierarchical clustering with single linkage method from the plot and the evaluation metrics?**

**Answer:** . . .

**Task 05 (of 11): Cluster the dataset using hierarchical clustering with complete linkage method.**
_Hint:_ Use complete linkage as the method and Euclidean distance as the distance metric.

In [None]:
clustering = linkage(...)
clusters = fcluster(clustering, 2, criterion = 'maxclust')

**Task 06 (of 11): Plot contingency matrix and compute evaluation metrics for hierarchical clustering with complete linkage method.**

In [None]:
cont_matrix = metrics.cluster.contingency_matrix(...)
sns.heatmap(cont_matrix, annot = True, fmt = ".3f", square = True, cmap = plt.cm.Blues)
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.title('Contingency matrix')
plt.tight_layout()

In [None]:
adjusted_rand_index = ...
silhouette_coefficient = ...
print([adjusted_rand_index, silhouette_coefficient])

In [None]:
# Plot clusters found using hierarchical clustering with complete linkage method
data['clusters'] = clusters
ax = data.plot(kind = 'scatter', x = 'Total charge', y = 'International plan', c = 'clusters', colormap = plt.cm.brg)
ax = data.plot(kind = 'scatter', x = 'Total charge', y = 'Customer service calls', c = 'clusters', colormap = plt.cm.brg)

**Question 02 (of 07): What can you conclude about the clusters found using hierarchical clustering with complete linkage method from the plot and the evaluation metrics?**

**Answer:** . . .

### Part 2: K-Means Clustering

**Task 07 (of 11): Cluster the dataset using K-Means clustering.**
_Hint:_ Use random initialization of centroids, 10 iterations, and set parameter `random_state` to 0.

In [None]:
clustering = KMeans(...).fit(...)
clusters = clustering.labels_

**Task 08 (of 11): Plot contingency matrix and compute evaluation metrics for K-Means clustering.**

In [None]:
cont_matrix = metrics.cluster.contingency_matrix(...)
sns.heatmap(cont_matrix, annot = True, fmt = ".3f", square = True, cmap = plt.cm.Blues)
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.title('Contingency matrix')
plt.tight_layout()

In [None]:
adjusted_rand_index = ...
silhouette_coefficient = ...
print([adjusted_rand_index, silhouette_coefficient])

In [None]:
# Plot clusters found using K-Means clustering
data['clusters'] = clusters
ax = data.plot(kind = 'scatter', x = 'Total charge', y = 'International plan', c = 'clusters', colormap = plt.cm.brg)
ax = data.plot(kind = 'scatter', x = 'Total charge', y = 'Customer service calls', c = 'clusters', colormap = plt.cm.brg)
# ax.set(title = 'iris data', xlabel = 'petal width', ylabel = 'petal length')

**Question 03 (of 07): What can you conclude about the clusters found using K-Means clustering from the plot and the evaluation metrics?**

**Answer:** . . .

### Part 3: DBSCAN

**Task 09 (of 11): Cluster the dataset using DBSCAN.**
_Hint:_ Use parameters `Eps=2`, `MinPts=5`, and Euclidean distance as the distance metric.

In [None]:
clustering = DBSCAN(...).fit(...)
clusters = clustering.labels_

**Task 10 (of 11): Plot contingency matrix and compute evaluation metrics for DBSCAN.**

In [None]:
cont_matrix = metrics.cluster.contingency_matrix(...)
sns.heatmap(cont_matrix, annot = True, fmt = ".3f", square = True, cmap = plt.cm.Blues)
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.title('Contingency matrix')
plt.tight_layout()

In [None]:
adjusted_rand_index = ...
silhouette_coefficient = ...
print([adjusted_rand_index, silhouette_coefficient])

In [None]:
# Plot clusters found using DBSCAN
data['clusters'] = clusters
ax = data.plot(kind = 'scatter', x = 'Total charge', y = 'International plan', c = 'clusters', colormap = plt.cm.brg)
ax = data.plot(kind = 'scatter', x = 'Total charge', y = 'Customer service calls', c = 'clusters', colormap = plt.cm.brg)
# ax.set(title = 'iris data', xlabel = 'petal width', ylabel = 'petal length')

**Question 04 (of 07): How many clusters were found using DBSCAN?**

**Answer:** . . .

**Question 05 (of 07): What can you conclude about the clusters found using DBSCAN from the plot and the evaluation metrics?**

**Answer:** . . .

**Question 06 (of 07): Which of the clustering techniques had the best performance?**

**Answer:** . . .

**Task 11 (of 11): Compute evaluation metrics for the true clusters of the data (churned/non-churned).**

In [None]:
silhouette_coefficient = ...
print(silhouette_coefficient)

In [None]:
# Plot true clusters (churned/non-churned)
ax = data.plot(kind = 'scatter', x = 'Total charge', y = 'International plan', c = 'Churn', colormap = plt.cm.brg)
ax = data.plot(kind = 'scatter', x = 'Total charge', y = 'Customer service calls', c = 'Churn', colormap = plt.cm.brg)

**Question 07 (of 07): What can you conclude about the data from the plot and the evaluation metrics?**

**Answer:** . . .