# Segmenting and Clustering Neighbourhoods in Toronto II
## Applied Data Science Capstone by IBM on Coursera
**Fernanda Oliveira**  
Data Analyst

## Introduction

In this lab, I will use a dataset in the format CSV that contains the latitude and longitude of respective geographical coordinates of each postal code http://cocl.us/Geospatial_data. It is also possible to get the geographical coordinates of the neighborhoods using the Geocoder package. Then I will use the dataset Toronto_cleandata.csv that I created in the last lab (part I) merging the two datasets in one.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


In [2]:
df = pd.read_csv('Geospatial_Coordinates.csv')

In [3]:
df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [4]:
df.shape

(103, 3)

In [5]:
df2 = pd.read_csv('Toronto_cleandata.csv')

In [6]:
df2.head()

Unnamed: 0.1,Unnamed: 0,Postcode,Borough,Neighbourhood
0,0,M3A,North York,Parkwoods
1,1,M4A,North York,Victoria Village
2,2,M5A,Downtown Toronto,Harbourfront
3,3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,4,M7A,Queen's Park,Queen's Park


In [7]:
df2.keys()

Index([u'Unnamed: 0', u'Postcode', u'Borough', u'Neighbourhood'], dtype='object')

In [8]:
df2.drop(['Unnamed: 0'], axis=1, inplace=True);

In [9]:
df2.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Queen's Park


In [10]:
df2.shape

(103, 3)

In [11]:
df2.sort_values(by=['Postcode']).head().reset_index()

Unnamed: 0,index,Postcode,Borough,Neighbourhood
0,6,M1B,Scarborough,"Rouge, Malvern"
1,12,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,18,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,22,M1G,Scarborough,Woburn
4,26,M1H,Scarborough,Cedarbrae


In [12]:
df2.shape

(103, 3)

In [13]:
df['Borough'] = df2['Borough']

In [14]:
df['Neighbourhood'] = df2['Neighbourhood']

In [15]:
df.head()

Unnamed: 0,Postal Code,Latitude,Longitude,Borough,Neighbourhood
0,M1B,43.806686,-79.194353,North York,Parkwoods
1,M1C,43.784535,-79.160497,North York,Victoria Village
2,M1E,43.763573,-79.188711,Downtown Toronto,Harbourfront
3,M1G,43.770992,-79.216917,North York,"Lawrence Heights, Lawrence Manor"
4,M1H,43.773136,-79.239476,Queen's Park,Queen's Park


In [16]:
cols = df.columns.tolist()

In [17]:
cols = ['Postal Code', 'Borough', 'Neighbourhood', 'Latitude', 'Longitude'] 

In [18]:
df = df[cols]

In [19]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,North York,Parkwoods,43.806686,-79.194353
1,M1C,North York,Victoria Village,43.784535,-79.160497
2,M1E,Downtown Toronto,Harbourfront,43.763573,-79.188711
3,M1G,North York,"Lawrence Heights, Lawrence Manor",43.770992,-79.216917
4,M1H,Queen's Park,Queen's Park,43.773136,-79.239476


In [20]:
df.to_csv(r'Neighbourhood.csv')