# Segmenting and Clustering Neighbourhoods in Toronto.

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Part 1</a>

2. <a href="#item2">Part 2</a>

3. <a href="#item3">Part 3</a>
 
</font>
</div>

###    1. PART 1

####    Scrape data table

In [2]:
pip install beautifulsoup4

Collecting beautifulsoup4
[?25l  Downloading https://files.pythonhosted.org/packages/66/25/ff030e2437265616a1e9b25ccc864e0371a0bc3adb7c5a404fd661c6f4f6/beautifulsoup4-4.9.1-py3-none-any.whl (115kB)
[K     |████████████████████████████████| 122kB 7.4MB/s eta 0:00:01
[?25hCollecting soupsieve>1.2 (from beautifulsoup4)
  Downloading https://files.pythonhosted.org/packages/6f/8f/457f4a5390eeae1cc3aeab89deb7724c965be841ffca6cfca9197482e470/soupsieve-2.0.1-py3-none-any.whl
Installing collected packages: soupsieve, beautifulsoup4
Successfully installed beautifulsoup4-4.9.1 soupsieve-2.0.1
Note: you may need to restart the kernel to use updated packages.


In [18]:
from bs4 import BeautifulSoup as bs       # import Beautiful Soup
import urllib.request                     # import library for opening URLs  
import pandas as pd                       # import Pandas library

print("libraries imported")

libraries imported


In [19]:
# grab the webpage sourcecode

sauce = urllib.request.urlopen('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').read() 

In [20]:
pip install lxml

Note: you may need to restart the kernel to use updated packages.


In [28]:
# Convert to Beautiful Soup object

soup = bs(sauce) 

In [29]:
#  extract the whole table.
table = soup.find('table')

In [30]:
#  estract rows
table_rows = table.find_all('tr')

In [31]:
# Initiate list for df rows

list=[]  
list

[]

In [32]:
# Populate list of df rows whlie removing rows with Borough = "Not assigned" and assigning borough name to neighbourhood where latter missing.

for tr in table_rows:
    td = tr.find_all('td')
    row = [i.text.replace('\n','') for i in td]
    if row == [] or row[1] == 'Not assigned':   # if row missing Borough, or Borough 'Not assigned' then ...
        continue                                # ignore row, else ...
    else :                   #  add row where ...
        if row[2] == []:     #  if neighbourhood missing ...
            row[2] = row[1]  #  ...  assign borough name to neighbourhood.
        list.append(row)     #  add row to list, and ...

In [33]:
df = pd.DataFrame(list)  # construct the dataframe, df
df.head()

Unnamed: 0,0,1,2
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [34]:
df.columns = ['PostalCode','Borough','Neighborhood']  # rename df's columns
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [36]:
df.shape

(103, 3)

###    2. PART 2

####    Add Lat Long

In [151]:
# Geocoder not working, so reverting to lats and longs provided in .csv

In [37]:
# Change key of df to Postal Code

df.set_index('PostalCode')

Unnamed: 0_level_0,Borough,Neighborhood
PostalCode,Unnamed: 1_level_1,Unnamed: 2_level_1
M3A,North York,Parkwoods
M4A,North York,Victoria Village
M5A,Downtown Toronto,"Regent Park, Harbourfront"
M6A,North York,"Lawrence Manor, Lawrence Heights"
M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...
M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
M4Y,Downtown Toronto,Church and Wellesley
M7Y,East Toronto,Business reply mail Processing Centre
M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


In [38]:
df.sort_values(by='PostalCode')
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [39]:
# Lats and Longs from .csv

latlongs=[
['M1B',43.8066863,-79.1943534],
['M1C',43.7845351,-79.1604971],
['M1E',43.7635726,-79.1887115],
['M1G',43.7709921,-79.2169174],
['M1H',43.773136,-79.2394761],
['M1J',43.7447342,-79.2394761],
['M1K',43.7279292,-79.2620294],
['M1L',43.7111117,-79.2845772],
['M1M',43.716316,-79.2394761],
['M1N',43.692657,-79.2648481],
['M1P',43.7574096,-79.273304],
['M1R',43.7500715,-79.2958491],
['M1S',43.7942003,-79.2620294],
['M1T',43.7816375,-79.3043021],
['M1V',43.8152522,-79.2845772],
['M1W',43.7995252,-79.3183887],
['M1X',43.8361247,-79.2056361],
['M2H',43.8037622,-79.3634517],
['M2J',43.7785175,-79.3465557],
['M2K',43.7869473,-79.385975],
['M2L',43.7574902,-79.3747141],
['M2M',43.789053,-79.4084928],
['M2N',43.7701199,-79.4084928],
['M2P',43.7527583,-79.4000493],
['M2R',43.7827364,-79.4422593],
['M3A',43.7532586,-79.3296565],
['M3B',43.7459058,-79.352188],
['M3C',43.7258997,-79.340923],
['M3H',43.7543283,-79.4422593],
['M3J',43.7679803,-79.4872619],
['M3K',43.7374732,-79.4647633],
['M3L',43.7390146,-79.5069436],
['M3M',43.7284964,-79.4956974],
['M3N',43.7616313,-79.5209994],
['M4A',43.7258823,-79.3155716],
['M4B',43.7063972,-79.309937],
['M4C',43.6953439,-79.3183887],
['M4E',43.6763574,-79.2930312],
['M4G',43.7090604,-79.3634517],
['M4H',43.7053689,-79.3493719],
['M4J',43.685347,-79.3381065],
['M4K',43.6795571,-79.352188],
['M4L',43.6689985,-79.3155716],
['M4M',43.6595255,-79.340923],
['M4N',43.7280205,-79.3887901],
['M4P',43.7127511,-79.3901975],
['M4R',43.7153834,-79.4056784],
['M4S',43.7043244,-79.3887901],
['M4T',43.6895743,-79.3831599],
['M4V',43.6864123,-79.4000493],
['M4W',43.6795626,-79.3775294],
['M4X',43.667967,-79.3676753],
['M4Y',43.6658599,-79.3831599],
['M5A',43.6542599,-79.3606359],
['M5B',43.6571618,-79.3789371],
['M5C',43.6514939,-79.3754179],
['M5E',43.6447708,-79.3733064],
['M5G',43.6579524,-79.3873826],
['M5H',43.6505712,-79.3845675],
['M5J',43.6408157,-79.3817523],
['M5K',43.6471768,-79.3815764],
['M5L',43.6481985,-79.3798169],
['M5M',43.7332825,-79.4197497],
['M5N',43.7116948,-79.4169356],
['M5P',43.6969476,-79.4113072],
['M5R',43.6727097,-79.4056784],
['M5S',43.6626956,-79.4000493],
['M5T',43.6532057,-79.4000493],
['M5V',43.6289467,-79.3944199],
['M5W',43.6464352,-79.374846],
['M5X',43.6484292,-79.3822802],
['M6A',43.718518,-79.4647633],
['M6B',43.709577,-79.4450726],
['M6C',43.6937813,-79.4281914],
['M6E',43.6890256,-79.453512],
['M6G',43.669542,-79.4225637],
['M6H',43.6690051,-79.4422593],
['M6J',43.6479267,-79.4197497],
['M6K',43.6368472,-79.4281914],
['M6L',43.7137562,-79.4900738],
['M6M',43.6911158,-79.4760133],
['M6N',43.6731853,-79.4872619],
['M6P',43.6616083,-79.4647633],
['M6R',43.6489597,-79.456325],
['M6S',43.6515706,-79.4844499],
['M7A',43.6623015,-79.3894938],
['M7R',43.6369656,-79.615819],
['M7Y',43.6627439,-79.321558],
['M8V',43.6056466,-79.5013207],
['M8W',43.6024137,-79.5434841],
['M8X',43.6536536,-79.5069436],
['M8Y',43.6362579,-79.4985091],
['M8Z',43.6288408,-79.5209994],
['M9A',43.6678556,-79.5322424],
['M9B',43.6509432,-79.5547244],
['M9C',43.6435152,-79.5772008],
['M9L',43.7563033,-79.5659633],
['M9M',43.7247659,-79.5322424],
['M9N',43.706876,-79.5181884],
['M9P',43.696319,-79.5322424],
['M9R',43.6889054,-79.5547244],
['M9V',43.7394164,-79.5884369],
['M9W',43.7067483,-79.5940544]]

In [42]:
# Add lat and long columns to dataframe
df.insert(3,'lat','')
df.insert(4,'long','')

In [43]:
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,lat,long
0,M3A,North York,Parkwoods,,
1,M4A,North York,Victoria Village,,
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",,
3,M6A,North York,"Lawrence Manor, Lawrence Heights",,
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",,


In [44]:
# Enter the lats and longs into the dataframe

for i in range(0,103):
    indexindf=(df[df['PostalCode']==latlongs[i][0]].index.values)
    df.at[indexindf,'lat']=latlongs[i][1]
    df.at[indexindf,'long']=latlongs[i][2]

In [45]:
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,lat,long
0,M3A,North York,Parkwoods,43.7533,-79.3297
1,M4A,North York,Victoria Village,43.7259,-79.3156
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6543,-79.3606
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.7185,-79.4648
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6623,-79.3895


###    3. PART 3

####    Explore and Cluster Toronto Neighborhoods

In [49]:
# import addional required libraries

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


In [50]:
#Use geopy library to get lat and long for Toronto

address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))


The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [51]:
# Create a map of Toronto with using those coordinates.

In [59]:
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=9)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['lat'], df['long'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [60]:
# Define Foursquare Credentials and Version

CLIENT_ID = 'TTKRIJNA4KXOKTJLNKR24GUM5EJBCAPIIG4HDY51S2T3T2OU' # your Foursquare ID
CLIENT_SECRET = 'EZC0YYVZXAIQGEZ0DYCFXM1XEQ4JUNQHY4IBKMKLA3HBJYIL' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: TTKRIJNA4KXOKTJLNKR24GUM5EJBCAPIIG4HDY51S2T3T2OU
CLIENT_SECRET:EZC0YYVZXAIQGEZ0DYCFXM1XEQ4JUNQHY4IBKMKLA3HBJYIL


In [62]:
# First neighborhood in df
df.loc[0, 'Neighborhood']

'Parkwoods'

In [64]:
# get Parkwoods' neighborhood_latitude = df.loc[0, 'lat'] # neighborhood latitude value
neighborhood_longitude = df.loc[0, 'long'] # neighborhood longitude value

neighborhood_name = df.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Parkwoods are 43.7532586, -79.3296565.


#### Find top 100 venues within 500m

In [74]:
# construct URL

limit = 100   # to get top 100
radius = 500 # specify search radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    limit)
              
url

'https://api.foursquare.com/v2/venues/explore?&client_id=TTKRIJNA4KXOKTJLNKR24GUM5EJBCAPIIG4HDY51S2T3T2OU&client_secret=EZC0YYVZXAIQGEZ0DYCFXM1XEQ4JUNQHY4IBKMKLA3HBJYIL&v=20180605&ll=43.7532586,-79.3296565&radius=500&limit=100'

In [75]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ec91d501835dd001bd5a723'},
  'headerLocation': 'Parkwoods - Donalda',
  'headerFullLocation': 'Parkwoods - Donalda, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 2,
  'suggestedBounds': {'ne': {'lat': 43.757758604500005,
    'lng': -79.32343823984928},
   'sw': {'lat': 43.7487585955, 'lng': -79.33587476015072}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4e8d9dcdd5fbbbb6b3003c7b',
       'name': 'Brookbanks Park',
       'location': {'address': 'Toronto',
        'lat': 43.751976046055574,
        'lng': -79.33214044722958,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.751976046055574,
          'lng': -79.33214044722958}],
        'distance': 245,
        'cc': 'CA',
        'c

In [76]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [77]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Brookbanks Park,Park,43.751976,-79.33214
1,Variety Store,Food & Drink Shop,43.751974,-79.333114


In [78]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))


2 venues were returned by Foursquare.


In [83]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [84]:
toronto_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['lat'],
                                   longitudes=df['long'])

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmount Park
Bayview Village
Downsview
The Danforth West, Ri

In [85]:
print(toronto_venues.shape)
toronto_venues.head()

(2128, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant


In [86]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,4,4,4,4,4,4
"Alderwood, Long Branch",8,8,8,8,8,8
"Bathurst Manor, Wilson Heights, Downsview North",21,21,21,21,21,21
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",24,24,24,24,24,24
Berczy Park,54,54,54,54,54,54
"Birch Cliff, Cliffside West",5,5,5,5,5,5
"Brockton, Parkdale Village, Exhibition Place",23,23,23,23,23,23
Business reply mail Processing Centre,17,17,17,17,17,17
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",15,15,15,15,15,15


In [88]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 273 uniques categories.
