In [21]:
import pandas as pd
import numpy as np

# Part 1: Getting the Neighborhoods from Wikipedia

## Scrape the table from the webpage

In [115]:
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
postal_codes=pd.read_html(url)

In [116]:
pc=postal_codes[0]
pc

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


## Clean the table

First let's see how many postal codes are missing boroughs

In [117]:
na=pc[pc['Borough']=='Not assigned']
na

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
7,M8A,Not assigned,Not assigned
10,M2B,Not assigned,Not assigned
15,M7B,Not assigned,Not assigned
...,...,...,...
174,M4Z,Not assigned,Not assigned
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned


Now let's convert 'Not assigned' to numpy's NaN, so the rows can be dropped, and then drop those rows.

In [118]:
pc.Borough.replace({'Not assigned':np.nan}, inplace=True)

In [119]:
pc.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,,Not assigned
1,M2A,,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [120]:
pc.dropna(axis=0, inplace=True)

## Let's the view the cleaned dataframe

In [121]:
pc

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


... and its size:

In [122]:
pc.shape

(103, 3)

# Part 2: Latitude and Longitude

I tried the suggestd with no luck. I also tried the geopy.geocoder Nominatim from the New York clustering lab, but that gave poor results for Canadian postal codes, so I decided to use the csv file provided.

In [123]:
coord=pd.read_csv('/Users/matthewyoung/Documents/Education - Personal/IBM Data Science Coursera Course/Data Downloads/Geospatial_Coordinates.csv')

In [124]:
coord=coord.sort_values('Postal Code')
print('Coordinates:')
coord.head()

Coordinates:


Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Sort the neighborhoods dataframe by postal code and reset the index so that it will match up with the lat/long dataframe.

In [125]:
pc=pc.sort_values('Postal Code')
print('Neighborhoods:')
pc.reset_index(inplace=True)
pc.head()

Neighborhoods:


Unnamed: 0,index,Postal Code,Borough,Neighbourhood
0,9,M1B,Scarborough,"Malvern, Rouge"
1,18,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,27,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,36,M1G,Scarborough,Woburn
4,45,M1H,Scarborough,Cedarbrae


Create a new dataframe, pcll, that is the same as pc, and add the columns "Latitude" and "Longitude" from the coordinates dataframe. I reset the indexes a couple times, which generated new columns, so I dropped those. And now we have a dataframe showing postal code, borough, neighborhood, and coordinates!

In [132]:
pcll=pc
pcll[['Latitude','Longitude']]=coord[['Latitude','Longitude']]
pcll.drop(['level_0','index'], axis=1,inplace=True)

In [133]:
pcll

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...
98,M9N,York,Weston,43.706876,-79.518188
99,M9P,Etobicoke,Westmount,43.696319,-79.532242
100,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ...",43.688905,-79.554724
101,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.739416,-79.588437
