# A Simple Example of Clustering 

You are given much more country data. Using the same methodology as the one in the lecture, group all the countries in 2 clusters. 

Try with other numbers of clusters and see if they match your expectations. Maybe 7 is going to be a cool one!

Plot the data using the <i> c </i> parameter to separate the data by the clusters we defined.  

<i> Note: c stands for color <i>

## Import the relevant libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
from sklearn.cluster import KMeans

## Load the data

Load data from the csv file: <i> 'Countries.csv'</i>.


In [None]:
data=pd.read_csv('Countries-exercise.csv')
data

## Plot the data

Plot the <i>'Longtitude'</i> and <i>'Latitude'</i> columns. 

In [None]:
plt.scatter(data['Longitude'],data['Latitude'])
plt.xlim(-180,180)
plt.ylim(-90,90)
plt.show()

## Select the features

Create a copy of that data and remove all parameters apart from <i>Longitude</i> and <i>Latitude</i>.

In [None]:
x=data.iloc[:,1:3]
x

## Clustering

Assume there are only two clusters. 

In [None]:
kmeans=KMeans(7)

In [None]:
kmeans.fit(x)

### Clustering Resutls

In [None]:
identified_clusters=kmeans.fit_predict(x)
identified_clusters

In [None]:
data_with_clusters=data.copy()
data_with_clusters['Cluster']=identified_clusters
data_with_clusters

Did you remember to use the <i> c </i> parameter to separate the data by the clusters we defined?

In [None]:
plt.scatter(data['Longitude'],data_with_clusters['Latitude'],c=data_with_clusters['Cluster'],cmap='rainbow')
plt.xlim(-180,180)
plt.ylim(-90,90)
plt.show()

If you haven't, go back and play around with the number of clusters. 

Try 3, 7 and 8 and see if the results match your expectations!

### WCSS

In [None]:
kmeans.inertia_

In [None]:
wcss=[]
cl_num = 11
for i in range (1,cl_num):
    kmeans=KMeans(i)
    kmeans.fit(x)
    wcss_iter= kmeans.inertia_
    wcss.append(wcss_iter)

In [None]:
wcss

## Elbow Method

In [None]:
number_clusters=range(1,cl_num)
plt.plot(number_clusters,wcss)
plt.title('the elbow method')
plt.xlabel('Number of clusters')
plt.ylabel('Within-cluster sum of squares(WCSS)')

In [None]:
#therefore, that would be 2 or 3