<h2> Neighborhoods in Toronto</h2>

In this notebook, I am trying to explore and cluster the neighborhoods in Toronto.

<strong>Import required libraries

In [1]:
import pandas as pd  # library for data analsysis
import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

<strong>Scrape data using beautifulsoup library

In [2]:
#requests the url
data = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')
result=soup.find('table').find_all('tr')

<Strong> Extract required variables data

In [4]:
postal_code = []
borough = []
neighborhood = []
for row in result:
    cells = row.find_all('td')
    if(len(cells) > 0):
        postal_code.append(cells[0].text)
        borough.append(cells[1].text)
        neighborhood.append(cells[2].text.rstrip('\n')) 

<Strong> Create dataframe for the data extracted

In [30]:
# create a new DataFrame from the three lists
df_toronto = pd.DataFrame({"Postal_code": postal_code,
                           "Borough": borough,
                           "Neighborhood": neighborhood})

df_toronto.head()

Unnamed: 0,Postal_code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


<Strong> Data Cleaning

In [31]:
# Data cleaning: Drop cells with a borough that is Not assigned
df_toronto_dropna = df_toronto[df_toronto.Borough != "Not assigned"].reset_index(drop=True)
df_toronto_dropna.head()

Unnamed: 0,Postal_code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor


<strong> Grouping neighbourds

In [32]:
# group neighborhoods in the same borough
df_toronto_grouped = df_toronto_dropna.groupby(["Postal_code", "Borough"], as_index=False).agg(lambda x: ", ".join(x))
df_toronto_grouped.head()

Unnamed: 0,Postal_code,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [33]:
for index, row in toronto_df_grouped.iterrows():
    if row["Neighborhood"] == "Not assigned":
        row["Neighborhood"] = row["Borough"]
        
df_toronto_grouped.head()

Unnamed: 0,Postal_code,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [38]:
# create a new test dataframe
column_names = ["Postal_code", "Borough", "Neighborhood"]
df_new = pd.DataFrame(columns=column_names)

new_list = ["M5G", "M2H", "M4B", "M1J", "M4G", "M4M", "M1R", "M9V", "M9L", "M5V", "M1B", "M5A"]

for postcode in new_list:
    df_new = df_new.append(df_toronto_grouped[df_toronto_grouped["Postal_code"]==postcode], ignore_index=True)
    
df_new

Unnamed: 0,Postal_code,Borough,Neighborhood
0,M5G,Downtown Toronto,Central Bay Street
1,M2H,North York,Hillcrest Village
2,M4B,East York,"Woodbine Gardens, Parkview Hill"
3,M1J,Scarborough,Scarborough Village
4,M4G,East York,Leaside
5,M4M,East Toronto,Studio District
6,M1R,Scarborough,"Maryvale, Wexford"
7,M9V,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ..."
8,M9L,North York,Humber Summit
9,M5V,Downtown Toronto,"CN Tower, Bathurst Quay, Island airport, Harbo..."


In [39]:
# print the number of rows of the cleaned dataframe
df_toronto_grouped.shape

(103, 3)

In [40]:
df_geospatial = pd.read_csv('http://cocl.us/Geospatial_data')
df_geospatial.columns = ['Postal_code', 'Latitude', 'Longitude']

In [44]:
df_post = pd.merge(df_toronto_grouped, df_geospatial, on=['Postal_code'], how='inner')

In [45]:
df_post.head(10)

Unnamed: 0,Postal_code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848
