## Usha Manoharan
### Segmenting and Clustering Neighborhoods in Toronto

In this notebook we will build code to 
* scrape the web to obtain data about the Neighborhoods of Toronto 
* use postal codes to obtain the latitude and longitude of the Neighborhoods
* analyze the neighborhoods 

### Scrape the Web to obtain neighborhood data of Toronto

In [12]:
import pandas as pd
import numpy as np


In [13]:
# Read the html table into a dataframe
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
dfs = pd.read_html(url, header=0)

# The first table is the data for toronto city neighborhood 
toronto_data = dfs[0]

# Remove any row which does not have a value assigned for Borough
toronto_data.drop(toronto_data[toronto_data['Borough'] == 'Not assigned'].index, inplace=True)
# reset the index
toronto_data.reset_index(inplace=True, drop=True)


# If neighbourhood has a"Not assigned", replace it with the value of the "borough" of that row
toronto_data.loc[toronto_data['Neighborhood'] == 'Not assigned', 'Neighborhood'] = toronto_data['Borough']

# group by postalcode and concat the neighborhood values
toronto_data.groupby('Postal Code')

# check the number of rows in the table
toronto_data.shape


(103, 3)

In [14]:
# Examine the first few rows of data
toronto_data.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


### Get the latitude and longitude for each neighborhood

In [17]:
lat_lng_df = pd.read_csv(r'~/Downloads/Geospatial_Coordinates.csv')

# merge on postal code
pd.merge(toronto_data, lat_lng_df, on='Postal Code')
toronto_data.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
