In [None]:

Capstone: Find the best neighborhood in Toronto to open a Restaurant Supply Store
1. Load all the Data from all the various sources.
1.1 Toronto neighborhoods broken down by postal code.
In [1]:

# Load the required libraries
import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup

# Found the table using beautifulsoup and used Pandas to read it in. 
res = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0] 
df = pd.read_html(str(table))


# WRANGLE/Transform THE DATA
# Convert the list back into a dataframe
data = pd.DataFrame(df[0])

# Rename the columns as instructed
data = data.rename(columns={0:'PostalCode', 1:'Bourough', 2:'Neighborhood'})

# Get rid of the first row which contained the table headers from the webpage
data = data.iloc[1:]


# Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
data = data[~data['Bourough'].str.contains('Not assigned')]


# More than one neighborhood can exist in one postal code area. 
#For example, in the table on the Wikipedia page, you will notice 
#that M5A is listed twice and has two neighborhoods: Harbourfront 
#and Regent Park. These two rows will be combined into one row with 
#the neighborhoods separated with a comma
df2=data.groupby(['PostalCode', 'Bourough']).apply(lambda group: ', '.join(group['Neighborhood']))


# Convert the Series back into a DataFrame and put the 'Neighbourhood' column label back in
df2=df2.to_frame().reset_index()
df2 = df2.rename(columns={0:'Neighborhood'})

# If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
df2.loc[df2.Neighborhood == 'Not assigned', 'Neighborhood' ] = df2.Bourough

# Display the DataFrame
df2.head()

Out[1]:
	PostalCode 	Bourough 	Neighborhood
0 	M1B 	Scarborough 	Rouge, Malvern
1 	M1C 	Scarborough 	Highland Creek, Rouge Hill, Port Union
2 	M1E 	Scarborough 	Guildwood, Morningside, West Hill
3 	M1G 	Scarborough 	Woburn
4 	M1H 	Scarborough 	Cedarbrae
1.1.1 Load Toronto geospatial cooridinates and merge to Toronto Postal Code Data
In [2]:

# Load Toronto geospatial cooridinates
!wget -O to_geo_space.csv http://cocl.us/Geospatial_data

#Read into dataframe
gf = pd.read_csv('to_geo_space.csv')

#rename the coloumns so the match
gf = gf.rename(columns={'Postal Code':'PostalCode'})

#Merge the Toronto data with geo cooridinate data
gf_new = pd.merge(df2, gf, on='PostalCode', how='inner')

# display the new dataframe
gf_new.head()

--2019-01-19 16:06:44--  http://cocl.us/Geospatial_data
Resolving cocl.us (cocl.us)... 169.48.113.201
Connecting to cocl.us (cocl.us)|169.48.113.201|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://cocl.us/Geospatial_data [following]
--2019-01-19 16:06:44--  https://cocl.us/Geospatial_data
Connecting to cocl.us (cocl.us)|169.48.113.201|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2019-01-19 16:06:48--  https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Resolving ibm.box.com (ibm.box.com)... 107.152.27.197
Connecting to ibm.box.com (ibm.box.com)|107.152.27.197|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.ent.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2019-01-19 16:06:48--  https://ibm.ent.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Resolving ibm.ent.box.com (ibm.ent.box.com)... 107.152.24.211, 107.152.25.211
Connecting to ibm.ent.box.com (ibm.ent.box.com)|107.152.24.211|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://public.boxcloud.com/d/1/b1!rVPJujyF4xq_13bzI8n9wqdw_YjS9ovDRIbUpV1lSr_HOT_ndkBDtNi95XLGdNv6pHMNLmRGHsZ8JY4pbbo5Swwraks9AgeFf0796vo81pUFB_6TyUmH3pTGNW_h_eN1tfUSSgiiZ07ZC0DQqpUuukeSuh_924LXbCXYv93HrQQDC-3D6a_g4NH3Lf1M7feee8Lcl952M5gWMEroGWfSrdn91N4gVk0jJTWlqnYEwqHEI6Npfh1a87FCPeBeJmSY17P1O71Rwj-ear6FtWVqSxcr5C9abdZLZxVkxV2Sb_3TOmK4YLjMA1rezxG-hL5Jg2stImQowcZAV5RSLk5eCAa7fmoy9c7wc1oUF0CRrsQ62eJUNs8xaba-ojEyQX9VbG3K6M8xrPfCoQQLXLTdc4DOZ1sWVi1fmAOnhiJYPl8IMjiDl7B3a4CEzFCExgObz9pxqaVvANjxa8fEg-IbP1Bxy3DzTNxWT_6lFr6gqMxFwqORukCSOJgpkORIq4G-ZbG1keGZZrBWZgKPBuHxc56-Z6U4KugLUYoJjErl1lEYKoDlDHwJvD5eBQFxvnhw4EtoJQsHqzRhO5_ILOZwHTBVuKlYlyK9najgdMplN3QQyaky4F9B4b8c9krr9H6-bij-R1WXnhos9QbqF7YecnEiOG3fN-TL3hTZ4Q22eLCJbEoEjUd_tZd1NGrnEUAqyL8SMX2eB8O4d_jiJNcJuy_RUoSPsJTv2hQq8kHgQ4qq2GpT8x8PYxIq3wOqyB5KtRzBmfDxwwzwBIKepMxoRNjzwTxWnTBvLXCU0jas1eW5HXcB9-Wzwg-P5ikRh7ZWwnxZrNDPMLA2HqvkC1ElA-x5FdoWrkbYqJmjcqIyqP5YjWLwseyAFzl9TizwmlLv8ERkzGUs0PWoxEy3c93iv2sm4-crMrQ60zDD8at8WXEjaD_769mW-w5nUYbABradmixtt_CRqDDqm2wvJs2wD05jeZWBV0HYawufZgP8lGQJqXAiTDqxm1u_EP3HYPhSf9C-989AE80CGLwzWwBpb3kpuYU3kIMtskZ6YUEQV8OaZ19C45VLgHhJkCg9KFNUl2k1O12rB8E34rO-CZNv58lOhp3ZLkYMNXFFgfPjPPm9dWrXpUjx4dYnvzWrn8VzVEakpUefBHC2OMAj-F9ChkwAKXne_47jByXGWMzuupHdfNBThCanYmrNoBLcsb45chQVFaTgXLicEiTIXuPdJYscfDsfmoyqjavyast_9sQwjjUSOwpOMG7XuKtmdZ1ebJRhYVK1vu7Xg7Q7scELxOGrKlbi9z_BSUwjFpQ6a8gp-Y0II0kK5NCBLJEBrwh0Dypt1FuMTXWRON84Oe9LjydBE7ETFGBEPfyIHtMC2-wefzNa/download [following]
--2019-01-19 16:06:49--  https://public.boxcloud.com/d/1/b1!rVPJujyF4xq_13bzI8n9wqdw_YjS9ovDRIbUpV1lSr_HOT_ndkBDtNi95XLGdNv6pHMNLmRGHsZ8JY4pbbo5Swwraks9AgeFf0796vo81pUFB_6TyUmH3pTGNW_h_eN1tfUSSgiiZ07ZC0DQqpUuukeSuh_924LXbCXYv93HrQQDC-3D6a_g4NH3Lf1M7feee8Lcl952M5gWMEroGWfSrdn91N4gVk0jJTWlqnYEwqHEI6Npfh1a87FCPeBeJmSY17P1O71Rwj-ear6FtWVqSxcr5C9abdZLZxVkxV2Sb_3TOmK4YLjMA1rezxG-hL5Jg2stImQowcZAV5RSLk5eCAa7fmoy9c7wc1oUF0CRrsQ62eJUNs8xaba-ojEyQX9VbG3K6M8xrPfCoQQLXLTdc4DOZ1sWVi1fmAOnhiJYPl8IMjiDl7B3a4CEzFCExgObz9pxqaVvANjxa8fEg-IbP1Bxy3DzTNxWT_6lFr6gqMxFwqORukCSOJgpkORIq4G-ZbG1keGZZrBWZgKPBuHxc56-Z6U4KugLUYoJjErl1lEYKoDlDHwJvD5eBQFxvnhw4EtoJQsHqzRhO5_ILOZwHTBVuKlYlyK9najgdMplN3QQyaky4F9B4b8c9krr9H6-bij-R1WXnhos9QbqF7YecnEiOG3fN-TL3hTZ4Q22eLCJbEoEjUd_tZd1NGrnEUAqyL8SMX2eB8O4d_jiJNcJuy_RUoSPsJTv2hQq8kHgQ4qq2GpT8x8PYxIq3wOqyB5KtRzBmfDxwwzwBIKepMxoRNjzwTxWnTBvLXCU0jas1eW5HXcB9-Wzwg-P5ikRh7ZWwnxZrNDPMLA2HqvkC1ElA-x5FdoWrkbYqJmjcqIyqP5YjWLwseyAFzl9TizwmlLv8ERkzGUs0PWoxEy3c93iv2sm4-crMrQ60zDD8at8WXEjaD_769mW-w5nUYbABradmixtt_CRqDDqm2wvJs2wD05jeZWBV0HYawufZgP8lGQJqXAiTDqxm1u_EP3HYPhSf9C-989AE80CGLwzWwBpb3kpuYU3kIMtskZ6YUEQV8OaZ19C45VLgHhJkCg9KFNUl2k1O12rB8E34rO-CZNv58lOhp3ZLkYMNXFFgfPjPPm9dWrXpUjx4dYnvzWrn8VzVEakpUefBHC2OMAj-F9ChkwAKXne_47jByXGWMzuupHdfNBThCanYmrNoBLcsb45chQVFaTgXLicEiTIXuPdJYscfDsfmoyqjavyast_9sQwjjUSOwpOMG7XuKtmdZ1ebJRhYVK1vu7Xg7Q7scELxOGrKlbi9z_BSUwjFpQ6a8gp-Y0II0kK5NCBLJEBrwh0Dypt1FuMTXWRON84Oe9LjydBE7ETFGBEPfyIHtMC2-wefzNa/download
Resolving public.boxcloud.com (public.boxcloud.com)... 107.152.26.200, 107.152.27.200
Connecting to public.boxcloud.com (public.boxcloud.com)|107.152.26.200|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2891 (2.8K) [text/csv]
Saving to: ‘to_geo_space.csv’

to_geo_space.csv    100%[=====================>]   2.82K  --.-KB/s   in 0s     

2019-01-19 16:06:49 (62.1 MB/s) - ‘to_geo_space.csv’ saved [2891/2891]

Out[2]:
	PostalCode 	Bourough 	Neighborhood 	Latitude 	Longitude
0 	M1B 	Scarborough 	Rouge, Malvern 	43.806686 	-79.194353
1 	M1C 	Scarborough 	Highland Creek, Rouge Hill, Port Union 	43.784535 	-79.160497
2 	M1E 	Scarborough 	Guildwood, Morningside, West Hill 	43.763573 	-79.188711
3 	M1G 	Scarborough 	Woburn 	43.770992 	-79.216917
4 	M1H 	Scarborough 	Cedarbrae 	43.773136 	-79.239476
1.2 Toronto neighborhoods populations broken down by postal code
In [3]:

# Load this data from Stats Canada
df_pop = pd.read_csv('https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/hlt-fst/pd-pl/Tables/File.cfm?T=1201&SR=1&RPP=9999&PR=0&CMA=0&CSD=0&S=22&O=A&Lang=Eng&OFT=CSV',encoding = 'unicode_escape')
# Rename the columns appropiatley
df_pop = df_pop.rename(columns={'Geographic code':'PostalCode', 'Geographic name':'PostalCod2', 'Province or territory':'Province', 'Incompletely enumerated Indian reserves and Indian settlements, 2016':'Incomplete', 'Population, 2016':'Population_2016', 'Total private dwellings, 2016':'TotalPrivDwellings', 'Private dwellings occupied by usual residents, 2016':'PrivDwellingsOccupied'})
df_pop= df_pop.drop(columns=['PostalCod2', 'Province', 'Incomplete', 'TotalPrivDwellings', 'PrivDwellingsOccupied'])

# Get rid of the first row 
df_pop = df_pop.iloc[1:]
df_pop.head()

Out[3]:
	PostalCode 	Population_2016
1 	A0A 	46587.0
2 	A0B 	19792.0
3 	A0C 	12587.0
4 	A0E 	22294.0
5 	A0G 	35266.0
1.2.1 Merge Toronto Neighbourhood populations data with Toronto Postal Code data
In [4]:

#Merge the Toronto Pop data with geo postalcode data
gf_new
gf_new = pd.merge(df_pop, gf_new, on='PostalCode', how='right')
# sort on population
gf_new = gf_new.sort_values(by=['Population_2016'], ascending=False)

# display the new dataframe
gf_new.head()

Out[4]:
	PostalCode 	Population_2016 	Bourough 	Neighborhood 	Latitude 	Longitude
22 	M2N 	75897.0 	North York 	Willowdale South 	43.770120 	-79.408493
0 	M1B 	66108.0 	Scarborough 	Rouge, Malvern 	43.806686 	-79.194353
18 	M2J 	58293.0 	North York 	Fairview, Henry Farm, Oriole 	43.778517 	-79.346556
100 	M9V 	55959.0 	Etobicoke 	Albion Gardens, Beaumond Heights, Humbergate, ... 	43.739416 	-79.588437
14 	M1V 	54680.0 	Scarborough 	Agincourt North, L'Amoreaux East, Milliken, St... 	43.815252 	-79.284577

Key Observation here is a list of Toronto Neighborhoods Populations
1.3 Toronto neighborhoods average after tax income broken down by postal code
In [5]:

# It was easier to extract this data manually from Stats Canada and load it then it was to scrape it.
# It was only accessible from indeividual queries per postal code on the statscan web site.
df_income = pd.read_csv('TorontoAvgIncomeByPC.csv',encoding = 'unicode_escape')
# Rename the after tax income column to a more maanageable name
df_income = df_income.rename(columns={"after-tax income of households in 2015":"AfterTaxIncome2015"})
df_income.head()

Out[5]:
	PostalCode 	AfterTaxIncome2015
0 	M1A 	Null
1 	M2A 	Null
2 	M3A 	57698
3 	M4A 	49453
4 	M5A 	46938
1.3.1 Merge Toronto Neighbourhood income data with Toronto Postal Code data
In [6]:

#Merge the Toronto Income data with geo postalcode data

gf_new = pd.merge(df_income, gf_new, on='PostalCode', how='right')
# get rid of the Nulls
gf_new = gf_new.replace('Null', 0)
#gf_new cast as float
gf_new['AfterTaxIncome2015'] = gf_new['AfterTaxIncome2015'].astype('float64') 
# Sort on Income
gf_new = gf_new.sort_values(by=['AfterTaxIncome2015'], ascending=False)

# display the new dataframe
gf_new.to_csv('TO_Affluence.csv')
gf_new.head(10)

Out[6]:
	PostalCode 	AfterTaxIncome2015 	Population_2016 	Bourough 	Neighborhood 	Latitude 	Longitude
66 	M2P 	115237.0 	7843.0 	North York 	York Mills West 	43.752758 	-79.400049
55 	M5M 	111821.0 	25975.0 	North York 	Bedford Park, Lawrence Manor East 	43.733283 	-79.419750
61 	M4N 	109841.0 	15330.0 	Central Toronto 	Lawrence Park 	43.728020 	-79.388790
74 	M5R 	108271.0 	26496.0 	Central Toronto 	The Annex, North Midtown, Yorkville 	43.672710 	-79.405678
98 	M8X 	97210.0 	10787.0 	Etobicoke 	The Kingsway, Montgomery Road, Old Mill North 	43.653654 	-79.506944
45 	M2L 	96512.0 	11717.0 	North York 	Silver Hills, York Mills 	43.757490 	-79.374714
23 	M4G 	94853.0 	19076.0 	East York 	Leaside 	43.709060 	-79.363452
12 	M1C 	93943.0 	35626.0 	Scarborough 	Highland Creek, Rouge Hill, Port Union 	43.784535 	-79.160497
11 	M9B 	91110.0 	32400.0 	Etobicoke 	Cloverdale, Islington, Martin Grove, Princess ... 	43.650943 	-79.554724
7 	M3B 	90841.0 	13324.0 	North York 	Don Mills North 	43.745906 	-79.352188

Key Observation Toronto Affluence by Neighborhood
1.4 What is the Canadian National Average After Tax Income.

Again obtained from the Stats Canada Website Canadian families and unattached individuals had a median after-tax income of $57,000 in 2016.
1.5 Toronto list of Restaurants or Venues that could potentially use Restaurant Equipment.
In [7]:

#FourSquare Credentials

CLIENT_ID = 'APO00QTF2Y3WAWZUT2YTZXSPZGDGHOYNY5FSI1ARNPVQ2WQU' # your Foursquare ID


CLIENT_SECRET = 'RWUKTJGS3Y1GOCBSRX1TMUUFMYPFL2BVV03GVUVIHH3G25UC' # your Foursquare Secret


VERSION = '20180605' # Foursquare API version

In [8]:

#Let's explore neighborhoods in our dataframe.
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

LIMIT = 200 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [9]:

# Toronto Bouroughs
TO_data = gf_new
TO_data.head()

Out[9]:
	PostalCode 	AfterTaxIncome2015 	Population_2016 	Bourough 	Neighborhood 	Latitude 	Longitude
66 	M2P 	115237.0 	7843.0 	North York 	York Mills West 	43.752758 	-79.400049
55 	M5M 	111821.0 	25975.0 	North York 	Bedford Park, Lawrence Manor East 	43.733283 	-79.419750
61 	M4N 	109841.0 	15330.0 	Central Toronto 	Lawrence Park 	43.728020 	-79.388790
74 	M5R 	108271.0 	26496.0 	Central Toronto 	The Annex, North Midtown, Yorkville 	43.672710 	-79.405678
98 	M8X 	97210.0 	10787.0 	Etobicoke 	The Kingsway, Montgomery Road, Old Mill North 	43.653654 	-79.506944
1.5.1 Get all the Venues in Toronto.
In [11]:

# Get all of the Venues
TO_venues = getNearbyVenues(names=TO_data['Neighborhood'],
                                   latitudes=TO_data['Latitude'],
                                   longitudes=TO_data['Longitude']
                                  )

York Mills West
Bedford Park, Lawrence Manor East
Lawrence Park
The Annex, North Midtown, Yorkville
The Kingsway, Montgomery Road, Old Mill North
Silver Hills, York Mills
Leaside
Highland Creek, Rouge Hill, Port Union
Cloverdale, Islington, Martin Grove, Princess Gardens, West Deane Park
Don Mills North
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
East Toronto
Upper Rouge
Birch Cliff, Cliffside West
Little Portugal, Trinity
Rosedale
Moore Park, Summerhill East
Roselawn
The Beaches
Kingsway Park South West, Mimico NW, The Queensway West, Royal York South West, South of Bloor
North Toronto West
Runnymede, Swansea
Berczy Park
Forest Hill North, Forest Hill West
Woodbine Heights
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
Harbourfront East, Toronto Islands, Union Station
Adelaide, King, Richmond
St. James Town
The Beaches West, India Bazaar
Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe
Kingsview Village, Martin Grove Gardens, Richview Gardens, St. Phillips
Parkdale, Roncesvalles
Islington Avenue
Fairview, Henry Farm, Oriole
Studio District
Hillcrest Village
Westmount
Davisville
Christie
Alderwood, Long Branch
Bathurst Manor, Downsview North, Wilson Heights
Humber Bay, King's Mill Park, Kingsway Park South East, Mimico NE, Old Mill South, The Queensway East, Royal York South East, Sunnylea
Rouge, Malvern
Clairlea, Golden Mile, Oakridge
Humewood-Cedarvale
Cliffcrest, Cliffside, Scarborough Village West
Agincourt North, L'Amoreaux East, Milliken, Steeles East
Northwest
Bayview Village
High Park, The Junction South
L'Amoreaux West, Steeles West
Glencairn
Humber Summit
Agincourt
Parkwoods
The Danforth West, Riverdale
Emery, Humberlea
CFB Toronto, Downsview East
Dovercourt Village, Dufferin
Guildwood, Morningside, West Hill
Downsview West
Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown
Willowdale South
Caledonia-Fairbanks
Maryvale, Wexford
Newtonbrook, Willowdale
Woodbine Gardens, Parkview Hill
Cedarbrae
Humber Bay Shores, Mimico South, New Toronto
Davisville North
The Junction North, Runnymede
Dorset Park, Scarborough Town Centre, Wexford Heights
Flemingdon Park, Don Mills South
Willowdale West
Woburn
Maple Leaf Park, North Park, Upwood Park
Scarborough Village
Downsview Central
Northwood Park, York University
Victoria Village
Clarks Corners, Sullivan, Tam O'Shanter
Lawrence Heights, Lawrence Manor
East Birchmount Park, Ionview, Kennedy Park
Del Ray, Keelesdale, Mount Dennis, Silverthorn
Brockton, Exhibition Place, Parkdale Village
Harbord, University of Toronto
Harbourfront, Regent Park
Church and Wellesley
Weston
Ryerson, Garden District
Downsview Northwest
Central Bay Street
Thorncliffe Park
Chinatown, Grange Park, Kensington Market
Cabbagetown, St. James Town
Commerce Court, Victoria Hotel
Canada Post Gateway Processing Centre
Queen's Park
First Canadian Place, Underground city
Design Exchange, Toronto Dominion Centre
Business Reply Mail Processing Centre 969 Eastern
Stn A PO Boxes 25 The Esplanade

Let's count the number of Venues per Neighorhood
In [12]:

TO_venues.groupby('Neighborhood').count()

Out[12]:
	Neighborhood Latitude 	Neighborhood Longitude 	Venue 	Venue Latitude 	Venue Longitude 	Venue Category
Neighborhood 						
Adelaide, King, Richmond 	100 	100 	100 	100 	100 	100
Agincourt 	4 	4 	4 	4 	4 	4
Agincourt North, L'Amoreaux East, Milliken, Steeles East 	2 	2 	2 	2 	2 	2
Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown 	10 	10 	10 	10 	10 	10
Alderwood, Long Branch 	10 	10 	10 	10 	10 	10
Bathurst Manor, Downsview North, Wilson Heights 	17 	17 	17 	17 	17 	17
Bayview Village 	4 	4 	4 	4 	4 	4
Bedford Park, Lawrence Manor East 	25 	25 	25 	25 	25 	25
Berczy Park 	54 	54 	54 	54 	54 	54
Birch Cliff, Cliffside West 	4 	4 	4 	4 	4 	4
Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe 	6 	6 	6 	6 	6 	6
Brockton, Exhibition Place, Parkdale Village 	19 	19 	19 	19 	19 	19
Business Reply Mail Processing Centre 969 Eastern 	17 	17 	17 	17 	17 	17
CFB Toronto, Downsview East 	4 	4 	4 	4 	4 	4
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara 	14 	14 	14 	14 	14 	14
Cabbagetown, St. James Town 	47 	47 	47 	47 	47 	47
Caledonia-Fairbanks 	6 	6 	6 	6 	6 	6
Canada Post Gateway Processing Centre 	11 	11 	11 	11 	11 	11
Cedarbrae 	8 	8 	8 	8 	8 	8
Central Bay Street 	82 	82 	82 	82 	82 	82
Chinatown, Grange Park, Kensington Market 	100 	100 	100 	100 	100 	100
Christie 	15 	15 	15 	15 	15 	15
Church and Wellesley 	87 	87 	87 	87 	87 	87
Clairlea, Golden Mile, Oakridge 	10 	10 	10 	10 	10 	10
Clarks Corners, Sullivan, Tam O'Shanter 	9 	9 	9 	9 	9 	9
Cliffcrest, Cliffside, Scarborough Village West 	3 	3 	3 	3 	3 	3
Cloverdale, Islington, Martin Grove, Princess Gardens, West Deane Park 	2 	2 	2 	2 	2 	2
Commerce Court, Victoria Hotel 	100 	100 	100 	100 	100 	100
Davisville 	36 	36 	36 	36 	36 	36
Davisville North 	10 	10 	10 	10 	10 	10
... 	... 	... 	... 	... 	... 	...
Northwest 	2 	2 	2 	2 	2 	2
Northwood Park, York University 	5 	5 	5 	5 	5 	5
Parkdale, Roncesvalles 	16 	16 	16 	16 	16 	16
Parkwoods 	3 	3 	3 	3 	3 	3
Queen's Park 	44 	44 	44 	44 	44 	44
Rosedale 	4 	4 	4 	4 	4 	4
Roselawn 	3 	3 	3 	3 	3 	3
Rouge, Malvern 	1 	1 	1 	1 	1 	1
Runnymede, Swansea 	40 	40 	40 	40 	40 	40
Ryerson, Garden District 	100 	100 	100 	100 	100 	100
Scarborough Village 	2 	2 	2 	2 	2 	2
St. James Town 	100 	100 	100 	100 	100 	100
Stn A PO Boxes 25 The Esplanade 	95 	95 	95 	95 	95 	95
Studio District 	39 	39 	39 	39 	39 	39
The Annex, North Midtown, Yorkville 	24 	24 	24 	24 	24 	24
The Beaches 	5 	5 	5 	5 	5 	5
The Beaches West, India Bazaar 	20 	20 	20 	20 	20 	20
The Danforth West, Riverdale 	42 	42 	42 	42 	42 	42
The Junction North, Runnymede 	4 	4 	4 	4 	4 	4
The Kingsway, Montgomery Road, Old Mill North 	3 	3 	3 	3 	3 	3
Thorncliffe Park 	16 	16 	16 	16 	16 	16
Victoria Village 	4 	4 	4 	4 	4 	4
Westmount 	7 	7 	7 	7 	7 	7
Weston 	3 	3 	3 	3 	3 	3
Willowdale South 	34 	34 	34 	34 	34 	34
Willowdale West 	5 	5 	5 	5 	5 	5
Woburn 	4 	4 	4 	4 	4 	4
Woodbine Gardens, Parkview Hill 	13 	13 	13 	13 	13 	13
Woodbine Heights 	10 	10 	10 	10 	10 	10
York Mills West 	3 	3 	3 	3 	3 	3

99 rows × 6 columns
In [13]:

# Let's pick out restaurants from Venue Categories

print('Unique Venue Categories:')
list(TO_venues['Venue Category'].unique())

Unique Venue Categories:

Out[13]:

['Bank',
 'Park',
 'Electronics Store',
 'Café',
 'Hardware Store',
 'Sushi Restaurant',
 'Indian Restaurant',
 'Restaurant',
 'Coffee Shop',
 'Pub',
 'Italian Restaurant',
 'Liquor Store',
 'Comfort Food Restaurant',
 'Juice Bar',
 'Grocery Store',
 'Butcher',
 'Greek Restaurant',
 'Thai Restaurant',
 'Sandwich Place',
 'American Restaurant',
 'Pharmacy',
 'Fast Food Restaurant',
 'Pizza Place',
 'Dim Sum Restaurant',
 'Swim School',
 'Bus Line',
 'Vegetarian / Vegan Restaurant',
 'Burger Joint',
 'BBQ Joint',
 'History Museum',
 'Jewish Restaurant',
 'Cosmetics Shop',
 'Furniture / Home Store',
 'Pool',
 'River',
 'Sports Bar',
 'Fish & Chips Shop',
 'Sporting Goods Shop',
 'Gym',
 'Bike Shop',
 'Supermarket',
 'Pet Store',
 'Smoothie Shop',
 'Shopping Mall',
 'Clothing Store',
 'Dessert Shop',
 'Breakfast Spot',
 'Brewery',
 'Mexican Restaurant',
 'Beer Store',
 'Bagel Shop',
 'Convenience Store',
 'Bar',
 'Moving Target',
 'Golf Course',
 'Gym / Fitness Center',
 'Caribbean Restaurant',
 'Japanese Restaurant',
 'Airport',
 'Airport Lounge',
 'Harbor / Marina',
 'Airport Food Court',
 'Airport Terminal',
 'Airport Gate',
 'Plane',
 'Boutique',
 'Airport Service',
 'Sculpture Garden',
 'Boat or Ferry',
 'General Entertainment',
 'Skating Rink',
 'College Stadium',
 'Korean Restaurant',
 'Asian Restaurant',
 'Ice Cream Shop',
 'Cuban Restaurant',
 'New American Restaurant',
 'Vietnamese Restaurant',
 'Art Gallery',
 'Yoga Studio',
 'Record Shop',
 'Wine Bar',
 'Theater',
 'Cocktail Bar',
 'French Restaurant',
 'Salon / Barbershop',
 'Bakery',
 'Diner',
 "Men's Store",
 'Malay Restaurant',
 'Music Venue',
 'Mac & Cheese Joint',
 'Bistro',
 'Cupcake Shop',
 'Miscellaneous Shop',
 'Gift Shop',
 'Southern / Soul Food Restaurant',
 'Tapas Restaurant',
 'Dog Run',
 'Polish Restaurant',
 'Playground',
 'Trail',
 'Tennis Court',
 'Garden',
 'Neighborhood',
 'Discount Store',
 'Wings Joint',
 'Supplement Shop',
 'Social Club',
 'Spa',
 'Chinese Restaurant',
 'Rental Car Location',
 'Food',
 'Burrito Place',
 'Tea Room',
 'Bookstore',
 'Falafel Restaurant',
 'Gourmet Shop',
 'Latin American Restaurant',
 'Indie Movie Theater',
 'Gastropub',
 'Food & Drink Shop',
 'South American Restaurant',
 'Steakhouse',
 'Concert Hall',
 'Museum',
 'Farmers Market',
 'Beer Bar',
 'Belgian Restaurant',
 'Seafood Restaurant',
 'Basketball Stadium',
 'Jazz Club',
 'Fountain',
 'Fish Market',
 'Creperie',
 'Cheese Shop',
 'Beach',
 'Tailor Shop',
 'Hotel',
 'Irish Pub',
 'Jewelry Store',
 'Curling Ice',
 'Athletics & Sports',
 'Video Store',
 'Fried Chicken Joint',
 'Light Rail Station',
 'Lake',
 'Salad Place',
 'Performing Arts Venue',
 'Bubble Tea Shop',
 'Plaza',
 'Deli / Bodega',
 'Office',
 'Aquarium',
 'Lounge',
 'Train Station',
 'Event Space',
 'Dance Studio',
 'Scenic Lookout',
 'Monument / Landmark',
 'Baseball Stadium',
 'Hotel Bar',
 'Speakeasy',
 'Noodle House',
 'Food Court',
 'Opera House',
 'General Travel',
 'Smoke Shop',
 'Brazilian Restaurant',
 'Department Store',
 'Mediterranean Restaurant',
 'Gluten-free Restaurant',
 'Art Museum',
 'Building',
 "Women's Store",
 'Modern European Restaurant',
 'Poutine Place',
 'Food Truck',
 'Middle Eastern Restaurant',
 'Church',
 'Poke Place',
 'Hostel',
 'Molecular Gastronomy Restaurant',
 'Movie Theater',
 'German Restaurant',
 'Board Shop',
 'Intersection',
 'Mobile Phone Shop',
 'Eastern European Restaurant',
 'Toy / Game Store',
 'Candy Store',
 'Video Game Store',
 'Kids Store',
 'Health Food Store',
 'Luggage Store',
 'Bus Station',
 'Baseball Field',
 'Stationery Store',
 'Coworking Space',
 'Music Store',
 'Flower Shop',
 'Baby Store',
 'Nightclub',
 'Bridal Shop',
 'Frozen Yogurt Shop',
 'Metro Station',
 'Soccer Field',
 'Field',
 'Hockey Arena',
 'Motel',
 'Drugstore',
 'Arts & Crafts Store',
 'Flea Market',
 'Antique Shop',
 'Cajun / Creole Restaurant',
 'Thrift / Vintage Store',
 'Empanada Restaurant',
 'Fruit & Vegetable Store',
 'Bus Stop',
 'Medical Center',
 'Ramen Restaurant',
 'Indonesian Restaurant',
 'Market',
 'Auto Garage',
 'Rock Climbing Spot',
 'Hakka Restaurant',
 'Basketball Court',
 'Construction & Landscaping',
 'Massage Studio',
 'Portuguese Restaurant',
 'Accessories Store',
 'Check Cashing Service',
 'Climbing Gym',
 'Stadium',
 'College Gym',
 'College Arts Building',
 'Historic Site',
 'Chocolate Shop',
 'Shoe Store',
 'Piano Bar',
 'Ethiopian Restaurant',
 'Hobby Shop',
 'Adult Boutique',
 'Gay Bar',
 'Sake Bar',
 'Persian Restaurant',
 'Afghan Restaurant',
 'Health & Beauty Service',
 'Strip Club',
 'Comic Shop',
 'Taco Place',
 'College Rec Center',
 'Tanning Salon',
 'Hookah Bar',
 'Other Great Outdoors',
 'Lingerie Store',
 'Donut Shop',
 'Warehouse Store',
 'Organic Grocery',
 'Dumpling Restaurant',
 'Snack Place',
 'Gaming Cafe',
 'Martial Arts Dojo',
 'Arepa Restaurant',
 'Doner Restaurant',
 'Filipino Restaurant',
 'Hotpot Restaurant',
 'Hospital',
 'Taiwanese Restaurant',
 'Soup Place',
 'College Auditorium',
 'Skate Park',
 'Garden Center',
 'Auto Workshop',
 'Recording Studio',
 'Optical Shop']

1.5.2 Only add Restaurants as Venue Categories
In [14]:

# Here we manually pick out restaurants or 'features' from the unique venue list and that we want to examine for similiarity during clustering
rest_list = ['Steakhouse', 'Coffee Shop', 'Café', 'Ramen Restaurant', 'Indonesian Restaurant', 'Restaurant', 'Japanese Restaurant', 
             'Fast Food Restaurant', 'Sushi Restaurant', 'Vietnamese Restaurant', 'Pizza Place', 'Sandwich Place', 'Middle Eastern Restaurant', 
             'Burger Joint', 'American Restaurant', 'Food Court', 'Wings Joint', 'Burrito Place', 'Asian Restaurant', 'Deli / Bodega', 
             'Greek Restaurant', 'Fried Chicken Joint', 'Airport Food Court', 'Chinese Restaurant', 'Breakfast Spot', 'Mexican Restaurant',
             'Indian Restaurant', 'Latin American Restaurant', 'Bar', 'Pub', 'Italian Restaurant', 'French Restaurant', 'Ice Cream Shop', 
             'Caribbean Restaurant', 'Gastropub', 'Thai Restaurant', 'Cajun / Creole Restaurant', 'Diner', 'Dim Sum Restaurant', 'Seafood Restaurant', 
             'Food & Drink Shop', 'Noodle House', 'Food', 'Fish & Chips Shop', 'Falafel Restaurant', 'Gourmet Shop', 'Vegetarian / Vegan Restaurant', 
             'South American Restaurant', 'Korean Restaurant', 'Cuban Restaurant', 'New American Restaurant', 'Malay Restaurant', 'Mac & Cheese Joint',
             'Bistro', 'Southern / Soul Food Restaurant', 'Tapas Restaurant',  'Sports Bar', 'Polish Restaurant', 'Ethiopian Restaurant', 
             'Creperie', 'Sake Bar', 'Persian Restaurant', 'Afghan Restaurant','Mediterranean Restaurant', 'BBQ Joint', 'Jewish Restaurant', 
             'Comfort Food Restaurant',  'Hakka Restaurant', 'Food Truck', 'Taiwanese Restaurant',  'Snack Place', 'Eastern European Restaurant', 
             'Dumpling Restaurant', 'Belgian Restaurant', 'Arepa Restaurant', 'Taco Place', 'Doner Restaurant', 'Filipino Restaurant', 
             'Hotpot Restaurant', 'Poutine Place', 'Salad Place',  'Portuguese Restaurant', 'Modern European Restaurant', 'Empanada Restaurant', 
             'Irish Pub', 'Molecular Gastronomy Restaurant', 'German Restaurant', 'Brazilian Restaurant', 'Gluten-free Restaurant', 'Soup Place']

rest_pd = pd.DataFrame(rest_list)
#rest_pd
#rename the coloumns so the match
rest_pd = rest_pd.rename(columns={0:'Venue Category'})

#Join the 2 dataframes as instructed
TO_new = pd.merge(TO_venues, rest_pd, on='Venue Category', how='right')

# display the new dataframe
#TO_new

TO_new.groupby('Neighborhood').count()

Out[14]:
	Neighborhood Latitude 	Neighborhood Longitude 	Venue 	Venue Latitude 	Venue Longitude 	Venue Category
Neighborhood 						
Adelaide, King, Richmond 	60 	60 	60 	60 	60 	60
Agincourt 	2 	2 	2 	2 	2 	2
Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown 	5 	5 	5 	5 	5 	5
Alderwood, Long Branch 	5 	5 	5 	5 	5 	5
Bathurst Manor, Downsview North, Wilson Heights 	10 	10 	10 	10 	10 	10
Bayview Village 	3 	3 	3 	3 	3 	3
Bedford Park, Lawrence Manor East 	18 	18 	18 	18 	18 	18
Berczy Park 	26 	26 	26 	26 	26 	26
Birch Cliff, Cliffside West 	1 	1 	1 	1 	1 	1
Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe 	2 	2 	2 	2 	2 	2
Brockton, Exhibition Place, Parkdale Village 	10 	10 	10 	10 	10 	10
Business Reply Mail Processing Centre 969 Eastern 	4 	4 	4 	4 	4 	4
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara 	1 	1 	1 	1 	1 	1
Cabbagetown, St. James Town 	28 	28 	28 	28 	28 	28
Caledonia-Fairbanks 	1 	1 	1 	1 	1 	1
Canada Post Gateway Processing Centre 	8 	8 	8 	8 	8 	8
Cedarbrae 	4 	4 	4 	4 	4 	4
Central Bay Street 	60 	60 	60 	60 	60 	60
Chinatown, Grange Park, Kensington Market 	65 	65 	65 	65 	65 	65
Christie 	7 	7 	7 	7 	7 	7
Church and Wellesley 	55 	55 	55 	55 	55 	55
Clairlea, Golden Mile, Oakridge 	1 	1 	1 	1 	1 	1
Clarks Corners, Sullivan, Tam O'Shanter 	8 	8 	8 	8 	8 	8
Cliffcrest, Cliffside, Scarborough Village West 	1 	1 	1 	1 	1 	1
Commerce Court, Victoria Hotel 	67 	67 	67 	67 	67 	67
Davisville 	25 	25 	25 	25 	25 	25
Davisville North 	4 	4 	4 	4 	4 	4
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West 	10 	10 	10 	10 	10 	10
Del Ray, Keelesdale, Mount Dennis, Silverthorn 	3 	3 	3 	3 	3 	3
Design Exchange, Toronto Dominion Centre 	68 	68 	68 	68 	68 	68
... 	... 	... 	... 	... 	... 	...
Kingsway Park South West, Mimico NW, The Queensway West, Royal York South West, South of Bloor 	4 	4 	4 	4 	4 	4
L'Amoreaux West, Steeles West 	9 	9 	9 	9 	9 	9
Lawrence Heights, Lawrence Manor 	2 	2 	2 	2 	2 	2
Lawrence Park 	1 	1 	1 	1 	1 	1
Leaside 	12 	12 	12 	12 	12 	12
Little Portugal, Trinity 	42 	42 	42 	42 	42 	42
Maryvale, Wexford 	2 	2 	2 	2 	2 	2
North Toronto West 	7 	7 	7 	7 	7 	7
Northwood Park, York University 	2 	2 	2 	2 	2 	2
Parkdale, Roncesvalles 	9 	9 	9 	9 	9 	9
Parkwoods 	2 	2 	2 	2 	2 	2
Queen's Park 	31 	31 	31 	31 	31 	31
Rouge, Malvern 	1 	1 	1 	1 	1 	1
Runnymede, Swansea 	30 	30 	30 	30 	30 	30
Ryerson, Garden District 	50 	50 	50 	50 	50 	50
St. James Town 	53 	53 	53 	53 	53 	53
Stn A PO Boxes 25 The Esplanade 	52 	52 	52 	52 	52 	52
Studio District 	22 	22 	22 	22 	22 	22
The Annex, North Midtown, Yorkville 	18 	18 	18 	18 	18 	18
The Beaches 	3 	3 	3 	3 	3 	3
The Beaches West, India Bazaar 	11 	11 	11 	11 	11 	11
The Danforth West, Riverdale 	26 	26 	26 	26 	26 	26
The Junction North, Runnymede 	1 	1 	1 	1 	1 	1
Thorncliffe Park 	6 	6 	6 	6 	6 	6
Victoria Village 	2 	2 	2 	2 	2 	2
Westmount 	6 	6 	6 	6 	6 	6
Willowdale South 	22 	22 	22 	22 	22 	22
Willowdale West 	2 	2 	2 	2 	2 	2
Woburn 	3 	3 	3 	3 	3 	3
Woodbine Gardens, Parkview Hill 	6 	6 	6 	6 	6 	6

80 rows × 6 columns
1.5.3 OneHot encode and count restaurants
In [15]:

# one hot encoding
TO_new_onehot = pd.get_dummies(TO_new[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
TO_new_onehot['Neighborhood'] = TO_new['Neighborhood'] 


# move neighborhood column to the first column
fixed_columns = [TO_new_onehot.columns[-1]] + list(TO_new_onehot.columns[:-1])
TO_new_onehot = TO_new_onehot[fixed_columns]

TO_new_onehot.head()

Out[15]:
	Neighborhood 	Afghan Restaurant 	Airport Food Court 	American Restaurant 	Arepa Restaurant 	Asian Restaurant 	BBQ Joint 	Bar 	Belgian Restaurant 	Bistro 	... 	Sports Bar 	Steakhouse 	Sushi Restaurant 	Taco Place 	Taiwanese Restaurant 	Tapas Restaurant 	Thai Restaurant 	Vegetarian / Vegan Restaurant 	Vietnamese Restaurant 	Wings Joint
0 	Bedford Park, Lawrence Manor East 	0 	0 	0 	0 	0 	0 	0 	0 	0 	... 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0
1 	The Annex, North Midtown, Yorkville 	0 	0 	0 	0 	0 	0 	0 	0 	0 	... 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0
2 	The Annex, North Midtown, Yorkville 	0 	0 	0 	0 	0 	0 	0 	0 	0 	... 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0
3 	The Annex, North Midtown, Yorkville 	0 	0 	0 	0 	0 	0 	0 	0 	0 	... 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0
4 	Don Mills North 	0 	0 	0 	0 	0 	0 	0 	0 	0 	... 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0

5 rows × 91 columns
In [16]:

#Analyze each neighbourhood



TO_grouped = TO_new_onehot.groupby('Neighborhood').mean().reset_index()
TO_grouped.shape


TO_grouped.head()

Out[16]:
	Neighborhood 	Afghan Restaurant 	Airport Food Court 	American Restaurant 	Arepa Restaurant 	Asian Restaurant 	BBQ Joint 	Bar 	Belgian Restaurant 	Bistro 	... 	Sports Bar 	Steakhouse 	Sushi Restaurant 	Taco Place 	Taiwanese Restaurant 	Tapas Restaurant 	Thai Restaurant 	Vegetarian / Vegan Restaurant 	Vietnamese Restaurant 	Wings Joint
0 	Adelaide, King, Richmond 	0.0 	0.0 	0.066667 	0.0 	0.05 	0.0 	0.05 	0.0 	0.0 	... 	0.0 	0.066667 	0.033333 	0.0 	0.0 	0.0 	0.066667 	0.016667 	0.0 	0.0
1 	Agincourt 	0.0 	0.0 	0.000000 	0.0 	0.00 	0.0 	0.00 	0.0 	0.0 	... 	0.0 	0.000000 	0.000000 	0.0 	0.0 	0.0 	0.000000 	0.000000 	0.0 	0.0
2 	Albion Gardens, Beaumond Heights, Humbergate, ... 	0.0 	0.0 	0.000000 	0.0 	0.00 	0.0 	0.00 	0.0 	0.0 	... 	0.0 	0.000000 	0.000000 	0.0 	0.0 	0.0 	0.000000 	0.000000 	0.0 	0.0
3 	Alderwood, Long Branch 	0.0 	0.0 	0.000000 	0.0 	0.00 	0.0 	0.00 	0.0 	0.0 	... 	0.0 	0.000000 	0.000000 	0.0 	0.0 	0.0 	0.000000 	0.000000 	0.0 	0.0
4 	Bathurst Manor, Downsview North, Wilson Heights 	0.0 	0.0 	0.000000 	0.0 	0.00 	0.0 	0.00 	0.0 	0.0 	... 	0.0 	0.000000 	0.100000 	0.0 	0.0 	0.0 	0.000000 	0.000000 	0.0 	0.0

5 rows × 91 columns
2. Begin to Cluster
Use silhouette score to find optimal number of clusters to segment the data
In [17]:

from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import numpy as np


TO_grouped_clustering = TO_grouped.drop('Neighborhood', 1)

# Use silhouette score to find optimal number of clusters to segment the data
kclusters = np.arange(2,10)
results = {}
for size in kclusters:
    model = KMeans(n_clusters = size).fit(TO_grouped_clustering)
    predictions = model.predict(TO_grouped_clustering)
    results[size] = silhouette_score(TO_grouped_clustering, predictions)

best_size = max(results, key=results.get)
best_size

Out[17]:

2

2.1 Run K means and segment data into clusters and generate labels
In [18]:

#import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
kclusters = best_size


# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(TO_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

Out[18]:

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

In [19]:

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = TO_grouped['Neighborhood']

for ind in np.arange(TO_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(TO_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Out[19]:
	Neighborhood 	1st Most Common Venue 	2nd Most Common Venue 	3rd Most Common Venue 	4th Most Common Venue 	5th Most Common Venue 	6th Most Common Venue 	7th Most Common Venue 	8th Most Common Venue 	9th Most Common Venue 	10th Most Common Venue
0 	Adelaide, King, Richmond 	Coffee Shop 	Café 	Steakhouse 	Thai Restaurant 	American Restaurant 	Restaurant 	Asian Restaurant 	Bar 	Sushi Restaurant 	Pizza Place
1 	Agincourt 	Breakfast Spot 	Sandwich Place 	Fish & Chips Shop 	Doner Restaurant 	Dumpling Restaurant 	Eastern European Restaurant 	Empanada Restaurant 	Ethiopian Restaurant 	Falafel Restaurant 	Fast Food Restaurant
2 	Albion Gardens, Beaumond Heights, Humbergate, ... 	Fast Food Restaurant 	Fried Chicken Joint 	Coffee Shop 	Sandwich Place 	Pizza Place 	Wings Joint 	Diner 	Doner Restaurant 	Dumpling Restaurant 	Eastern European Restaurant
3 	Alderwood, Long Branch 	Pizza Place 	Pub 	Coffee Shop 	Sandwich Place 	Falafel Restaurant 	Dim Sum Restaurant 	Diner 	Doner Restaurant 	Dumpling Restaurant 	Eastern European Restaurant
4 	Bathurst Manor, Downsview North, Wilson Heights 	Coffee Shop 	Fast Food Restaurant 	Sandwich Place 	Pizza Place 	Diner 	Fried Chicken Joint 	Sushi Restaurant 	Restaurant 	Deli / Bodega 	Doner Restaurant
2.2 Merge the Toronto data with geo coordinates data and make sure it's the right shape
In [20]:

#Merge the Toronto data with geo cooridinate data and make sure it's the right shape
TO_labels = pd.merge(TO_data,TO_grouped, on='Neighborhood', how='right')
TO_labels.shape


TO_labels = TO_labels.drop(columns=['Steakhouse', 'Coffee Shop', 'Café', 'Ramen Restaurant', 'Indonesian Restaurant', 'Restaurant', 'Japanese Restaurant', 
             'Fast Food Restaurant', 'Sushi Restaurant', 'Vietnamese Restaurant', 'Pizza Place', 'Sandwich Place', 'Middle Eastern Restaurant', 
             'Burger Joint', 'American Restaurant', 'Food Court', 'Wings Joint', 'Burrito Place', 'Asian Restaurant', 'Deli / Bodega', 
             'Greek Restaurant', 'Fried Chicken Joint', 'Airport Food Court', 'Chinese Restaurant', 'Breakfast Spot', 'Mexican Restaurant',
             'Indian Restaurant', 'Latin American Restaurant', 'Bar', 'Pub', 'Italian Restaurant', 'French Restaurant', 'Ice Cream Shop', 
             'Caribbean Restaurant', 'Gastropub', 'Thai Restaurant', 'Cajun / Creole Restaurant', 'Diner', 'Dim Sum Restaurant', 'Seafood Restaurant', 
             'Food & Drink Shop', 'Noodle House', 'Food', 'Fish & Chips Shop', 'Falafel Restaurant', 'Gourmet Shop', 'Vegetarian / Vegan Restaurant', 
             'South American Restaurant', 'Korean Restaurant', 'Cuban Restaurant', 'New American Restaurant', 'Malay Restaurant', 'Mac & Cheese Joint',
             'Bistro', 'Southern / Soul Food Restaurant', 'Tapas Restaurant',  'Sports Bar', 'Polish Restaurant', 'Ethiopian Restaurant', 
             'Creperie', 'Sake Bar', 'Persian Restaurant', 'Afghan Restaurant','Mediterranean Restaurant', 'BBQ Joint', 'Jewish Restaurant', 
             'Comfort Food Restaurant',  'Hakka Restaurant', 'Food Truck', 'Taiwanese Restaurant',  'Snack Place', 'Eastern European Restaurant', 
             'Dumpling Restaurant', 'Belgian Restaurant', 'Arepa Restaurant', 'Taco Place', 'Doner Restaurant', 'Filipino Restaurant', 
             'Hotpot Restaurant', 'Poutine Place', 'Salad Place',  'Portuguese Restaurant', 'Modern European Restaurant', 'Empanada Restaurant', 
             'Irish Pub', 'Molecular Gastronomy Restaurant', 'German Restaurant', 'Brazilian Restaurant', 'Gluten-free Restaurant', 'Soup Place'])
TO_labels.head()

Out[20]:
	PostalCode 	AfterTaxIncome2015 	Population_2016 	Bourough 	Neighborhood 	Latitude 	Longitude
0 	M5M 	111821.0 	25975.0 	North York 	Bedford Park, Lawrence Manor East 	43.733283 	-79.419750
1 	M4N 	109841.0 	15330.0 	Central Toronto 	Lawrence Park 	43.728020 	-79.388790
2 	M5R 	108271.0 	26496.0 	Central Toronto 	The Annex, North Midtown, Yorkville 	43.672710 	-79.405678
3 	M4G 	94853.0 	19076.0 	East York 	Leaside 	43.709060 	-79.363452
4 	M1C 	93943.0 	35626.0 	Scarborough 	Highland Creek, Rouge Hill, Port Union 	43.784535 	-79.160497
2.3 Add the KMeans Labels
In [21]:

TO_merged = TO_labels

# add clustering labels
TO_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
TO_merged = TO_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

TO_merged.head() # check the last columns!

Out[21]:
	PostalCode 	AfterTaxIncome2015 	Population_2016 	Bourough 	Neighborhood 	Latitude 	Longitude 	Cluster Labels 	1st Most Common Venue 	2nd Most Common Venue 	3rd Most Common Venue 	4th Most Common Venue 	5th Most Common Venue 	6th Most Common Venue 	7th Most Common Venue 	8th Most Common Venue 	9th Most Common Venue 	10th Most Common Venue
0 	M5M 	111821.0 	25975.0 	North York 	Bedford Park, Lawrence Manor East 	43.733283 	-79.419750 	1 	Fast Food Restaurant 	Italian Restaurant 	Coffee Shop 	Sushi Restaurant 	Pizza Place 	Pub 	Café 	Sandwich Place 	Indian Restaurant 	Restaurant
1 	M4N 	109841.0 	15330.0 	Central Toronto 	Lawrence Park 	43.728020 	-79.388790 	1 	Dim Sum Restaurant 	Wings Joint 	Doner Restaurant 	Dumpling Restaurant 	Eastern European Restaurant 	Empanada Restaurant 	Ethiopian Restaurant 	Falafel Restaurant 	Fast Food Restaurant 	Filipino Restaurant
2 	M5R 	108271.0 	26496.0 	Central Toronto 	The Annex, North Midtown, Yorkville 	43.672710 	-79.405678 	1 	Café 	Sandwich Place 	Coffee Shop 	Pizza Place 	BBQ Joint 	Burger Joint 	Pub 	Indian Restaurant 	American Restaurant 	Vegetarian / Vegan Restaurant
3 	M4G 	94853.0 	19076.0 	East York 	Leaside 	43.709060 	-79.363452 	1 	Coffee Shop 	Burger Joint 	Breakfast Spot 	Mexican Restaurant 	Sushi Restaurant 	Sports Bar 	Fish & Chips Shop 	Restaurant 	Sandwich Place 	Fast Food Restaurant
4 	M1C 	93943.0 	35626.0 	Scarborough 	Highland Creek, Rouge Hill, Port Union 	43.784535 	-79.160497 	1 	Bar 	Wings Joint 	Food 	Doner Restaurant 	Dumpling Restaurant 	Eastern European Restaurant 	Empanada Restaurant 	Ethiopian Restaurant 	Falafel Restaurant 	Fast Food Restaurant
In [ ]:


In [22]:

TO_merged_new1 = TO_merged.loc[TO_merged['Cluster Labels'] == 0, TO_merged.columns[[3, 4] + list(range(5, TO_merged.shape[1]))]]
TO_merged_new1.shape

Out[22]:

(4, 15)

In [23]:

TO_merged_new2 = TO_merged.loc[TO_merged['Cluster Labels'] == 1, TO_merged.columns[[3, 4] + list(range(5, TO_merged.shape[1]))]]
TO_merged_new2.shape

Out[23]:

(76, 15)

3. Cluster 2 Contains the highest cluster density. We need to find the geographic centroid for this cluster. This is the optimum location for a new Restaurant Supply Store.
In [24]:

# Find the geographic center of the most dense or like cluster.
Cluster_0_coorid = TO_merged_new2[['Latitude', 'Longitude']]
Cluster_0_coorid = list(Cluster_0_coorid.values) 
lat = []
long = []



for l in Cluster_0_coorid:
  lat.append(l[0])
  long.append(l[1])



Blatitude = sum(lat)/len(lat)
Blongitude = sum(long)/len(long)
print(Blatitude)
print(Blongitude)

43.69907216578946
-79.38788743289474

3.1 Install opencage to reverse lookup the coordinates
In [25]:

# Intstall opencage to reverse lookup the cooridinates
!pip install opencage
from opencage.geocoder import OpenCageGeocode
from pprint import pprint

key = '1d97b344df184b1cb0d2427663f85ac6'
geocoder = OpenCageGeocode(key)

results = geocoder.reverse_geocode(Blatitude, Blongitude)
pprint(results)

Collecting opencage
  Downloading https://files.pythonhosted.org/packages/b3/1d/b62f20327fc0cc6e61d64748cbf92fefd1beab875090decb54ae190c6048/opencage-1.1.6-py3-none-any.whl
Requirement already satisfied: six>=1.4.0 in /home/jupyterlab/conda/lib/python3.6/site-packages (from opencage) (1.11.0)
Requirement already satisfied: Requests>=2.2.0 in /home/jupyterlab/conda/lib/python3.6/site-packages (from opencage) (2.20.1)
Requirement already satisfied: certifi>=2017.4.17 in /home/jupyterlab/conda/lib/python3.6/site-packages (from Requests>=2.2.0->opencage) (2018.11.29)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /home/jupyterlab/conda/lib/python3.6/site-packages (from Requests>=2.2.0->opencage) (3.0.4)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in /home/jupyterlab/conda/lib/python3.6/site-packages (from Requests>=2.2.0->opencage) (1.23)
Requirement already satisfied: idna<2.8,>=2.5 in /home/jupyterlab/conda/lib/python3.6/site-packages (from Requests>=2.2.0->opencage) (2.7)
Installing collected packages: opencage
Successfully installed opencage-1.1.6
[{'annotations': {'DMS': {'lat': "43° 41' 56.97528'' N",
                          'lng': "79° 23' 16.39356'' W"},
                  'MGRS': '17TPJ2990139722',
                  'Maidenhead': 'FN03hq37kt',
                  'Mercator': {'x': -8837419.167, 'y': 5389473.786},
                  'OSM': {'edit_url': 'https://www.openstreetmap.org/edit?way=401259844#map=17/43.69916/-79.38789',
                          'url': 'https://www.openstreetmap.org/?mlat=43.69916&mlon=-79.38789#map=17/43.69916/-79.38789'},
                  'callingcode': 1,
                  'currency': {'alternate_symbols': ['C$', 'CAD$'],
                               'decimal_mark': '.',
                               'disambiguate_symbol': 'C$',
                               'html_entity': '$',
                               'iso_code': 'CAD',
                               'iso_numeric': 124,
                               'name': 'Canadian Dollar',
                               'smallest_denomination': 5,
                               'subunit': 'Cent',
                               'subunit_to_unit': 100,
                               'symbol': '$',
                               'symbol_first': 1,
                               'thousands_separator': ','},
                  'flag': '🇨🇦',
                  'geohash': 'dpz897pw01ucsykh4j5t',
                  'qibla': 54.57,
                  'sun': {'rise': {'apparent': 1547901900,
                                   'astronomical': 1547895840,
                                   'civil': 1547899980,
                                   'nautical': 1547897880},
                          'set': {'apparent': 1547935860,
                                  'astronomical': 1547941920,
                                  'civil': 1547937780,
                                  'nautical': 1547939880}},
                  'timezone': {'name': 'America/Toronto',
                               'now_in_dst': 0,
                               'offset_sec': -18000,
                               'offset_string': -500,
                               'short_name': 'EST'},
                  'what3words': {'words': 'farmland.divides.charging'}},
  'bounds': {'northeast': {'lat': 43.6992598, 'lng': -79.3877871},
             'southwest': {'lat': 43.6990598, 'lng': -79.3879871}},
  'components': {'ISO_3166-1_alpha-2': 'CA',
                 'ISO_3166-1_alpha-3': 'CAN',
                 '_type': 'building',
                 'city_district': 'Old Toronto',
                 'country': 'Canada',
                 'country_code': 'ca',
                 'house_number': '268',
                 'neighbourhood': 'Davisville',
                 'postcode': 'M4S 1C2',
                 'road': 'Balliol Street',
                 'state': 'Ontario',
                 'state_code': 'ON'},
  'confidence': 10,
  'formatted': '268 Balliol Street, ON M4S 1C2, Canada',
  'geometry': {'lat': 43.6991598, 'lng': -79.3878871}}]

In [26]:

#Obtain the popupstring of the best location
popstring = TO_data[TO_data['PostalCode'].str.contains('M4S')]

def str_join(*args):
    return ''.join(map(str, args))

popstring_new = str_join('The Best Neighbourhood to locate a Restaurant Supply Store is in: ', popstring['Neighborhood'].values,  ' in ' ,  popstring['Bourough'].values)


print(popstring_new)

The Best Neighbourhood to locate a Restaurant Supply Store is in: ['Davisville'] in ['Central Toronto']

In [27]:

# Let's get the coordinates for Toronto

from geopy.geocoders import Nominatim
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.

4. Results
4.1 Plot the clusters on a Map of the Toronto and Super Impose the best location of a Store
In [28]:

# getfolium
import folium 
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(TO_merged['Latitude'], TO_merged['Longitude'], TO_merged['Neighborhood'], TO_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
    
folium.CircleMarker([Blatitude, Blongitude],
                    radius=50,
                    popup='Toronto',
                    color='red',
                    ).add_to(map_clusters)

# Interactive marker
map_clusters.add_child(folium.ClickForMarker(popup=popstring_new))
       
#map_clusters
map_clusters.save('map_clusters.html')

4.2 Exact Address of desired Location
In [29]:

print('The exact Address to locate would be: 268 Balliol Street, ON M4S 1C2, Canada or lat: 43.6991598, lng: -79.3878871')

The exact Address to locate would be: 268 Balliol Street, ON M4S 1C2, Canada or lat: 43.6991598, lng: -79.3878871

Discussion:
5.1 Explaining the results

As we built our list of neighborhoods with Restaurant venues exclusively we discovered most neighborhoods were similar and the greatest concentration of restaurants was in Central Toronto and downtown Toronto. This might seem obvious but it would also appear that these are some of the most affluent neighborhoods in Toronto so there appears to be correlation. By Locating in the general vicinity of the Exact location my friend could be geographically centered in this cluster and poised to service his restaurant customer base with the greatest efficiency.

When we built our our K-Means dataset we used Silhouette analysis to tell us there was a lot of similarity between neighborhoods and the most common restaurants contained with in. Really there was only 2 types of cluster or neighborhoods in greater Toronto. The vast majority of those were in 1 cluster. So Toronto restaurants might be many but they are very homogeneously located near the center of Toronto.

Of the 103 Toronto Neighborhoods gathered only 55.3% or 57 Neighborhoods are above the median after-tax income. 37.8% or 39 Neighborhoods are below he median after-tax income. 6.7% or 7 neighborhoods did not register as it appears their populations are too low. It appears that the greatest concentration of affluence is near central Toronto. We decided to keep all neighborhoods in the dataset regardless of income of population as the majority were close enough.
Conclusion:

I feel confident with the recommendation I have given my friend as it is backed up with demonstrated data analysis. While nothing can ever be 100% certain he will certainly be better informed than he was prior to asking for my help.

Much more inference can be obtained with more work. A potential side business for my friend might be assisting new restaurant owners where they might locate a new restaurant, who their competition is and who their clientele might be.
In [30]:

map_clusters

