<h1>Segmenting and Clustering Neighborhoods in Toronto</h1>

## Week-3 Notebook-3:

For this assignment, you will be required to explore and cluster the neighborhoods in Toronto.

Start by creating a new Notebook for this assignment.
Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe like the one shown below:

## 1. Install & Import

In [1]:
import requests 
import pandas as pd 
import numpy as np 
import random 

!pip install beautifulsoup4
from bs4 import BeautifulSoup
print("\rbeautifulsoup4 OK")

!pip install lxml
print("\rlxml OK")

beautifulsoup4 OK
lxml OK


## 2. Request to Wikipedia to get HTML Table

In [2]:
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup=BeautifulSoup(source,'lxml')
from IPython.display import display_html
table = str(soup.table)
print(table)

<table class="wikitable sortable">
<tbody><tr>
<th>Postal Code
</th>
<th>Borough
</th>
<th>Neighbourhood
</th></tr>
<tr>
<td>M1A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M2A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3A
</td>
<td>North York
</td>
<td>Parkwoods
</td></tr>
<tr>
<td>M4A
</td>
<td>North York
</td>
<td>Victoria Village
</td></tr>
<tr>
<td>M5A
</td>
<td>Downtown Toronto
</td>
<td>Regent Park, Harbourfront
</td></tr>
<tr>
<td>M6A
</td>
<td>North York
</td>
<td>Lawrence Manor, Lawrence Heights
</td></tr>
<tr>
<td>M7A
</td>
<td>Downtown Toronto
</td>
<td>Queen's Park, Ontario Provincial Government
</td></tr>
<tr>
<td>M8A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M9A
</td>
<td>Etobicoke
</td>
<td>Islington Avenue, Humber Valley Village
</td></tr>
<tr>
<td>M1B
</td>
<td>Scarborough
</td>
<td>Malvern, Rouge
</td></tr>
<tr>
<td>M2B
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3B
</td>
<td

### 3. HTML Table to Pandas DataFrame

In [3]:
# read html table
tab=pd.read_html(table)
df=tab[0]
df

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


## 4. Delete 'Not assigned' cells.

In [4]:
# Ignore cells with a borough that is 'Not assigned'.
dfc = df[df['Borough'] != 'Not assigned']
dfc

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


## 5. Shape of clean DataFrame

In [5]:
print(dfc.shape)

(103, 3)


### End of Notebook-1

<h2 style="color:red"> Notebook-2 is starting from this line.</h2>

## 6. Getting the geographical coordinates of each postal code

In [6]:
# Geographical Coordinates
geo_coor = pd.read_csv('https://cocl.us/Geospatial_data')
geo_coor

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


## 7. Adding new columns to new DataFrame (df_new)

In [7]:

df_new = pd.merge(dfc, geo_coor, on='Postal Code')
df_new.rename(columns={'Postal Code':'Postcode'}, inplace=True)
df_new.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


### End of Notebook-2

<h2 style="color:red"> Notebook-3 is starting from this line.</h2>

## 8. Exploring the neighborhoods in Toronto. 

In [8]:
df_last = df_new[df_new['Borough'].str.contains('Toronto',regex=False)]
df_last

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
19,M4E,East Toronto,The Beaches,43.676357,-79.293031
20,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
24,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
25,M6G,Downtown Toronto,Christie,43.669542,-79.422564
30,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
31,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259


## 9. Install Folium

In [9]:
!pip install folium
import folium 

print("\rFolium OK")

Folium OK


## 10. Show Toronto in Map

In [10]:
map_toronto = folium.Map(location=[43.651070,-79.347015],zoom_start=12)

for lat,lng,borough,neighbourhood in zip(df_last['Latitude'],df_last['Longitude'],df_last['Borough'],df_last['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='black',
    fill=True,
    fill_color='yellow',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)

map_toronto


<h2><a href="https://github.com/zekiegitimcom/Coursera_Capstone/blob/main/week3-assignment-image-1.png" target="_blank">IMAGE-1</a></h2>

## 11. Clustering with K-Means Algorithm

In [11]:
# Droping nodigit columns and create new dataframe
df_cluster = df_last.drop(['Postcode','Borough','Neighbourhood'],1)

from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters = 5, random_state=0).fit(df_cluster)
df_last.insert(0, 'Label', kmeans.labels_)

kmeans.labels_

array([0, 0, 0, 0, 1, 0, 0, 3, 0, 2, 0, 3, 1, 0, 3, 1, 0, 1, 4, 4, 4, 4,
       2, 4, 3, 2, 4, 3, 2, 4, 3, 4, 0, 0, 0, 0, 0, 0, 1])

## 12. Show the new DataFrame with Cluster Labels

In [12]:
# let's show first 10 rows.
df_last.head(10)

Unnamed: 0,Label,Postcode,Borough,Neighbourhood,Latitude,Longitude
2,0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
4,0,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
9,0,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
15,0,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
19,1,M4E,East Toronto,The Beaches,43.676357,-79.293031
20,0,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
24,0,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
25,3,M6G,Downtown Toronto,Christie,43.669542,-79.422564
30,0,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
31,2,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259


In [13]:
map_clusterz = folium.Map(location=[43.651070,-79.347015], zoom_start=12)

# colors are based on labels (clusters).
colors = ["black","red","blue","orange","green"]
for la, lo, neighbourhood, cluster in zip(df_last['Latitude'], df_last['Longitude'], df_last['Neighbourhood'], df_last['Label']):
    label = folium.Popup(' Cluster: ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [la, lo],
        radius=7,
        popup=label,
        color=colors[cluster],
        fill=True,
        fill_color=colors[cluster],
        fill_opacity=0.5).add_to(map_clusterz)

map_clusterz

<h2><a href="https://github.com/zekiegitimcom/Coursera_Capstone/blob/main/week3-assignment-image-2.png" target="_blank">IMAGE-2</a></h2>

End of the story...