# Capstone Clustering Lab



### Part 1: Wrangling the Data

First I import Pandas:

In [1]:
import pandas as pd


Then, I create the dataframe by reading it in from the HTML file. I then use a loop to remove the unassigned codes.

In [2]:
#reading in the table to a dataframe
codes = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
#checking each postal code for assignment to a Borough, removing it if no Borough is listed
for i in range(180):
    if codes.loc[i,'Borough']=='Not assigned':
        codes.drop(i, axis=0, inplace=True)
codes



        
        

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


Lastly, I need to reset the index, as the previous dataframe skips numbers in the index. As we can see, there are 103 total entries.

In [5]:
codes=codes.reset_index(drop=True) #'drop=True' prevents the previous index from becoming an additional column
codes

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


### Part 2: Incorporating the Coordinates

Now we need to collect information on their latitude and longitude coordinates. First, we create a dataframe from the provided CSV file:

In [6]:
all_coords=pd.read_csv('http://cocl.us/Geospatial_data')
all_coords

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


Again, there are 103 total entries, except this time they are grouped by number rather than by last letter. Since they are in different orders, we must match each coordinate pair to its respective code in the other dataframe. To do this, we can use nested loops. First we must add new columns in the old dataframe that are to be filled.

In [7]:
import numpy as np
lat=np.zeros(103).tolist()
long=np.zeros(103).tolist()
type(lat) #this ensures that we have lists of zeros to add into our dataframe.

list

In [9]:
codes['Latitude']=lat
codes['Longitude']=long
codes # The zeros will be replaced in this dataframe shortly.

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,0.0,0.0
1,M4A,North York,Victoria Village,0.0,0.0
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",0.0,0.0
3,M6A,North York,"Lawrence Manor, Lawrence Heights",0.0,0.0
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",0.0,0.0
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",0.0,0.0
99,M4Y,Downtown Toronto,Church and Wellesley,0.0,0.0
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",0.0,0.0
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",0.0,0.0


In [15]:
for i in range(103):
    code=all_coords.loc[i,'Postal Code']
    la=all_coords.loc[i,'Latitude']
    lg=all_coords.loc[i,'Longitude']
    
    for j in range(103):
        if (codes.loc[j,'Postal Code']==code):
            codes.loc[j,'Latitude']=la
            codes.loc[j,'Longitude']=lg
            
codes

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


### Part 3: Analysis

In [16]:
import folium

In [36]:
toronto = folium.Map(location=[43.70,-79.40],zoom_start=10.5)


In [37]:
for lat, lng, label_a,label_b in zip(codes['Latitude'], codes['Longitude'], codes['Postal Code'],codes['Neighbourhood']):
    label_a = folium.Popup(label_a, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label_a,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(toronto)

In [38]:
toronto