# Neighborhoods in Toronto

This notebook explores neighborhoods in Toronto for clustering

## 1. Preparing Neighborhood Data

In [1]:
#!conda install -c conda-forge beautifulsoup4
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import requests

The data source of Toronto neighborhoods is a Wikipedia page. You can explore the data from <a href="http://www.wikizero.biz/index.php?q=aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvTGlzdF9vZl9wb3N0YWxfY29kZXNfb2ZfQ2FuYWRhOl9N" target=_blank>here</a>

In [2]:
url="http://www.wikizero.biz/index.php?q=aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvTGlzdF9vZl9wb3N0YWxfY29kZXNfb2ZfQ2FuYWRhOl9N" 
#resource for the Toronto Neighborhoods
html_doc=requests.get(url).text
soup=BeautifulSoup(html_doc)
soup.prettify()

'<!DOCTYPE html>\n<html class="client-nojs" dir="ltr" lang="en">\n <head>\n  <script src="https://www.powr.io/powr.js?platform=html">\n  </script>\n </head>\n <body>\n  <div class="powr-popup" id="c11ada13_1555965620">\n  </div>\n  <style type="text/css">\n   #footer{display:none}#p-personal{display:none}#ca-viewsource{display:none}#ca-history{display:none}#ca-ve-edit{display:none}#ca-edit{display:none}.mw-editsection{display:none}#userloginForm{display:none}.new{pointer-events:none;cursor:default;color:#425869}\n  </style>\n  <style type="text/css">\n   html body{margin-top:50px!important}#top_form{position:fixed;bottom:0;left:0;width:100%;margin:0;z-index:2100000000;-moz-user-select:none;-khtml-user-select:none;-webkit-user-select:none;-o-user-select:none;height:105px;line-height:45px}\n  </style>\n  <meta content="index, follow" name="robots"/>\n  <meta content="index" name="Mediapartners-Google"/>\n  <script async="" src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js">\n

Toronto neigbhborhood data is in a table. "th" tags are column heads, "tr" is the row and "td" tags are for cells.

In [3]:
#lets find the column names which are inside <th> tags
column_names=[]
column_heads=soup.table.find_all('th')
for ch in column_heads:
    column_names.append(ch.get_text())
column_names


['Postcode', 'Borough', 'Neighbourhood\n']

In [4]:
#select all the rows and initialize np array with column headers
rows=soup.table.find_all("tr")
a=np.array([column_names])
i=0
for r in rows:
    j=0
    cells=r.find_all('td') #select all cells in the row i.e. postcode, borough and neighbourhood
    row=[]
    for c in cells:
        row.append(c.get_text().replace("\n","")) #get the row and remove "\n" at the ende
        j=j+1
    if(i>0): #first row doesn't contain data 
        if(row[1]!='Not assigned'): #if borough is not assigned ignore
            a=np.append(a,[row],axis=0)
    i=i+1

#create data frame and assing first row as headers
df=pd.DataFrame(a)
df.columns = df.iloc[0]
df.drop(0,inplace=True)
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
1,M3A,North York,Parkwoods
2,M4A,North York,Victoria Village
3,M5A,Downtown Toronto,Harbourfront
4,M5A,Downtown Toronto,Regent Park
5,M6A,North York,Lawrence Heights


In [61]:
df.columns

Index(['Postcode', 'Borough', 'Neighbourhood\n'], dtype='object', name=0)

In [5]:
df.rename(columns={'Neighbourhood\n':'Neighborhood'},inplace=True) #change the column name
df.head()

Unnamed: 0,Postcode,Borough,Neighborhood
1,M3A,North York,Parkwoods
2,M4A,North York,Victoria Village
3,M5A,Downtown Toronto,Harbourfront
4,M5A,Downtown Toronto,Regent Park
5,M6A,North York,Lawrence Heights


In [6]:
df.shape

(211, 3)

The postcode column should be unique, if there are more than one neighborhood under the same postcode, the neighborhoods should be combined using "," as seperator.

In [7]:
test_postcode=df["Postcode"].is_unique
print(test_postcode)

False


In [8]:
#since the postcode column is not unique the dataframe should be sorted and neighborhoods will be combined
df.sort_values(by=['Postcode','Borough','Neighborhood'],inplace=True)
for i in range(df.shape[0]-1,0,-1):
    if df.iloc[i,0]==df.iloc[i-1,0]:
        if df.iloc[i,1]==df.iloc[i-1,1]:
            df.iloc[i-1,2]=df.iloc[i-1,2]+", "+df.iloc[i,2]
            df.drop(df.index[i],inplace=True)
df.head(10)

Unnamed: 0,Postcode,Borough,Neighborhood
10,M1B,Scarborough,"Malvern, Rouge"
22,M1C,Scarborough,"Highland Creek, Port Union, Rouge Hill"
34,M1E,Scarborough,"Guildwood, Morningside, West Hill"
40,M1G,Scarborough,Woburn
44,M1H,Scarborough,Cedarbrae
55,M1J,Scarborough,Scarborough Village
67,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
80,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
94,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
109,M1N,Scarborough,"Birch Cliff, Cliffside West"


In [9]:
test_postcode=df["Postcode"].is_unique
print(test_postcode)

True


The neighborhoods are combined and we have unique postcode list. <br>
Finally we can check the shape of the dataframe.

In [10]:
df.shape

(103, 3)

## 2. Adding Coordinates to the Table

We will get coordinates of the neighborhoods in Toronto by postal code and add to our neighborhood table.

In [11]:
df_coor=pd.read_csv("https://cocl.us/Geospatial_data")
df_coor.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [12]:
df_coor.shape

(103, 3)

In [13]:
df_toronto=pd.merge(df,df_coor,left_on="Postcode",right_on="Postal Code")
df_toronto.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Postal Code,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",M1B,43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Port Union, Rouge Hill",M1C,43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",M1E,43.763573,-79.188711
3,M1G,Scarborough,Woburn,M1G,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,M1H,43.773136,-79.239476


In [14]:
df_toronto.drop("Postal Code", axis=1, inplace=True)
df_toronto.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Port Union, Rouge Hill",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
