# 1. Introduction to Business Problem ¶

## Business Problem¶


In this project we will try to find an optimal location for a new Fried Chicken Restaurant in Toronto, Canada. Since there are lots of restaurants in Toronto, we will try to detect locations that are not already so crowded with venues, especially restaurants. By the way, the place should not be too secluded. We are particularly interested in a potential neighborhood with no Fried Chicken Restaurant in vicinity. We would also prefer locations as close to the city center as possible to attract more customers, assuming that the first two conditions are met.

We will use some data science and machine learning techniques to generate a few most promissing neighborhoods based on these criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by my client.


## About Toronto¶
Toronto is Canada’s largest city and a world leader in such areas as business, finance, technology, entertainment and culture. Its large population of immigrants from all over the globe has also made Toronto one of the most multicultural cities in the world. So Toronto has full potential but also is a very challenging district to open a business because of high competition.

## 2.Data Acquisition ¶
In order to segment the neighborhoods and explore them, we will essentially need a dataset that contains the boroughs and the neighborhoods that exist in each borough as well as the latitude and longitude coordinates of each neighborhood. So we will scrape the data that contain neighborhoods names and their postal code from the following Wikipedia page: 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
Then, we will merge it with the data that contain all the geographical coordinates of the neighborhoods thanks to the following csv file: “canada_geo.csv' http://cocl.us/Geospatial_data”
Finally, to get the locations(latitude and longitude) and other informations about various venues in Toronto, we will use Foursquare’s API.

In [68]:
pip install geopy

Note: you may need to restart the kernel to use updated packages.


# Import Libraries

In [6]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
#import folium # map rendering library

print('Libraries imported.')

Libraries imported.


# Reading wikipedia data and then constructing data using BeautifulSoup

In [7]:
from bs4 import BeautifulSoup
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
html_content = requests.get(url).text
soup=BeautifulSoup(html_content,'lxml')
print(soup.prettify())
print(soup.title)


<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of postal codes of Canada: M - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"404a1aeb-13fc-4974-a508-de451d8a53a0","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":1032600019,"wgRevisionId":1032600019,"wgArticleId":539066,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles with short description","Short description is different from Wikidata","Communica

In [8]:
print(soup.title.text)

List of postal codes of Canada: M - Wikipedia


# Extracting the table to get Columns

In [9]:
table_contents=[]
table=soup.find('table')
for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)
print(table_contents)


[{'PostalCode': 'M3A', 'Borough': 'North York', 'Neighborhood': 'Parkwoods'}, {'PostalCode': 'M4A', 'Borough': 'North York', 'Neighborhood': 'Victoria Village'}, {'PostalCode': 'M5A', 'Borough': 'Downtown Toronto', 'Neighborhood': 'Regent Park, Harbourfront'}, {'PostalCode': 'M6A', 'Borough': 'North York', 'Neighborhood': 'Lawrence Manor, Lawrence Heights'}, {'PostalCode': 'M7A', 'Borough': "Queen's Park", 'Neighborhood': 'Ontario Provincial Government'}, {'PostalCode': 'M9A', 'Borough': 'Etobicoke', 'Neighborhood': 'Islington Avenue'}, {'PostalCode': 'M1B', 'Borough': 'Scarborough', 'Neighborhood': 'Malvern, Rouge'}, {'PostalCode': 'M3B', 'Borough': 'North York', 'Neighborhood': 'Don Mills North'}, {'PostalCode': 'M4B', 'Borough': 'East York', 'Neighborhood': 'Parkview Hill, Woodbine Gardens'}, {'PostalCode': 'M5B', 'Borough': 'Downtown Toronto', 'Neighborhood': 'Garden District, Ryerson'}, {'PostalCode': 'M6B', 'Borough': 'North York', 'Neighborhood': 'Glencairn'}, {'PostalCode': 'M9

# Construct the dataframe from html
## Get the three Columns (PostalCode,Borough,Neighborhood)

In [10]:
df=pd.DataFrame(table_contents)
df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})
df

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills North
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [11]:
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government


In [12]:
df.shape

(103, 3)

In [13]:
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
column_names

['Borough', 'Neighborhood', 'Latitude', 'Longitude']

In [14]:
! pip install folium==0.5.0
import folium


Collecting folium==0.5.0
  Downloading folium-0.5.0.tar.gz (79 kB)
[K     |████████████████████████████████| 79 kB 500 kB/s eta 0:00:01
[?25hCollecting branca
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Building wheels for collected packages: folium
  Building wheel for folium (setup.py) ... [?25ldone
[?25h  Created wheel for folium: filename=folium-0.5.0-py3-none-any.whl size=76240 sha256=1d1c375a133970236796862ed9d5bc131b91805d322278f34a903b7824305cb9
  Stored in directory: /home/aarav/.cache/pip/wheels/ef/4c/4a/17fd3d7fb7b6243d5a7a8d165870cd5c6ad2ec4c0582f039e4
Successfully built folium
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.5.0


In [15]:
! pip install geocoder



In [16]:
import geocoder
g = geocoder.google('Mountain View, CA', key='AIzaSyCDVAxypPtrMx0JJR9pKDpuMv3Y0iFSUnI')
g

<[REQUEST_DENIED] Google - Geocode [empty]>

In [17]:
!pip install pgeocode
import pgeocode
pgeocode.Nominatim('ca')
geolocator = pgeocode.Nominatim('ca')
postal_codes = df['PostalCode'].tolist()
latitudes = []
longitudes = []
for i, postal_code in enumerate(postal_codes):
    # initialize your variable to None
    #print(f'--Getting Postal Code: {postal_code}')
    g = geolocator.query_postal_code(postal_code)
    
    if not g.empty:
        #print(f'Postal Code {postal_code} has been retrieved. {len(postal_codes) - (i + 1)} codes left')
        latitudes.append(g.latitude)
        longitudes.append(g.longitude)

Collecting pgeocode
  Downloading pgeocode-0.3.0-py3-none-any.whl (8.5 kB)
Installing collected packages: pgeocode
Successfully installed pgeocode-0.3.0


In [18]:
g.head()

postal_code                                                   M8Z
country_code                                                   CA
place_name      Etobicoke (Mimico NW / The Queensway West / So...
state_name                                                Ontario
state_code                                                     ON
Name: 0, dtype: object

In [19]:

!wget -q -O 'canada_geo.csv' http://cocl.us/Geospatial_data
print('Data downloaded!')

Data downloaded!


In [20]:
geocode = pd.read_csv('canada_geo.csv')
geocode

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


# More than one neighborhood can exist in one postal code area.Rows will be combined into one row with the neighborhoods separated with a comma 

In [21]:
geocode_geo = df.set_index('PostalCode').join(geocode.set_index('Postal Code')).reset_index()
geocode_geo.shape
geocode_geo.columns = ['PostalCode', 'Borough', 'Neighborhood', 'Latitude', 'Longitude']
geocode_geo

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills North,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


## To get geographical cordinates of Toronto

In [22]:
address = 'Toronto, CN'

geolocator = Nominatim(user_agent="Canada-explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6425637, -79.38708718320467.


## Create a map of Toronto with neighborhoods superimposed on top

In [23]:
map_Toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(geocode_geo['Latitude'], geocode_geo['Longitude'], geocode_geo['Borough'], geocode_geo['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Toronto)
    
map_Toronto
    

## Define Foursquare Credentials and Version

In [24]:
CLIENT_ID = '35TAEFVXHFDQ5YIZ0LPLJKPNIJRBVH1E1D0F2BREAC2ROFDA' # your Foursquare ID
CLIENT_SECRET = 'DBFDAS4LUC5BGYQTHWLHQMI0K1SCJ5VVVZULIHMUVARSDWY3' # your Foursquare Secret
ACCESS_TOKEN = 'VAMU3O4XYK23VMORSM4REVYOY5EIB5TX2S5M330D5JVKZQ11' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 35TAEFVXHFDQ5YIZ0LPLJKPNIJRBVH1E1D0F2BREAC2ROFDA
CLIENT_SECRET:DBFDAS4LUC5BGYQTHWLHQMI0K1SCJ5VVVZULIHMUVARSDWY3


In [25]:
neighborhood_latitude = geocode_geo.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = geocode_geo.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = geocode_geo.loc[0, 'Neighborhood'] # neighborhood name
neighborhood_name = geocode_geo.loc[0, 'Borough']

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of North York are 43.7532586, -79.3296565.


In [26]:
neighborhood_latitude = geocode_geo.loc[94, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = geocode_geo.loc[97, 'Longitude'] # neighborhood longitude value

neighborhood_name = geocode_geo.loc[101, 'Neighborhood'] # neighborhood name
neighborhood_name = geocode_geo.loc[101, 'Borough']

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Etobicoke are 43.706748299999994, -79.3822802.


In [27]:
LIMIT = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '61016b544a220248a86c7d0e'},
 'response': {'headerLocation': 'Davisville',
  'headerFullLocation': 'Davisville, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 5,
  'suggestedBounds': {'ne': {'lat': 43.711248304499996,
    'lng': -79.37606676674264},
   'sw': {'lat': 43.70224829549999, 'lng': -79.38849363325735}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4dfb4c20d22d56d1ebae930c',
       'name': 'Starbucks',
       'location': {'crossStreet': 'Yorkmills',
        'lat': 43.707485,
        'lng': -79.381479,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.707485,
          'lng': -79.381479}],
        'distance': 104,
        'cc': 'CA',
        'city': 'Toronto',
        'state': 'O

# We will need the get_category_type function to extract category of venue.



In [28]:

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [29]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  nearby_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,name,categories,lat,lng
0,Starbucks,Coffee Shop,43.707485,-79.381479
1,McMurphy's,Bar,43.709324,-79.385076
2,Gabby's,American Restaurant,43.709399,-79.384987
3,Happy Convenience,Convenience Store,43.710155,-79.381671
4,Don Valley Offroad Bike Trails,Trail,43.703826,-79.377673


In [30]:
def getNearbyVenues(names, latitudes, longitudes, radius=500): # radius is 500m so as not to leave the center of neighborhood 
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        # print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [31]:
Scarborough_data = geocode_geo[df.Borough=='Scarborough'].reset_index(drop=True)

Scarborough_venues =  getNearbyVenues(names = Scarborough_data['Neighborhood'],
                                     latitudes=Scarborough_data['Latitude'],
                                   longitudes=Scarborough_data['Longitude'])
Scarborough_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Malvern, Rouge",43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant
1,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
2,"Guildwood, Morningside, West Hill",43.763573,-79.188711,RBC Royal Bank,43.76679,-79.191151,Bank
3,"Guildwood, Morningside, West Hill",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Sail Sushi,43.765951,-79.191275,Restaurant
5,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Big Bite Burrito,43.766299,-79.19072,Mexican Restaurant
6,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Enterprise Rent-A-Car,43.764076,-79.193406,Rental Car Location
7,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Krispy Kreme Doughnuts,43.767169,-79.18966,Donut Shop
8,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Woburn Medical Centre,43.766631,-79.192286,Medical Center
9,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Lawrence Ave E & Kingston Rd,43.767704,-79.18949,Intersection


In [32]:
 Scarborough_venues.groupby('Neighborhood').count()


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,5,5,5,5,5,5
"Birch Cliff, Cliffside West",4,4,4,4,4,4
Cedarbrae,8,8,8,8,8,8
"Clarks Corners, Tam O'Shanter, Sullivan",13,13,13,13,13,13
"Cliffside, Cliffcrest, Scarborough Village West",2,2,2,2,2,2
"Dorset Park, Wexford Heights, Scarborough Town Centre",7,7,7,7,7,7
"Golden Mile, Clairlea, Oakridge",9,9,9,9,9,9
"Guildwood, Morningside, West Hill",9,9,9,9,9,9
"Kennedy Park, Ionview, East Birchmount Park",6,6,6,6,6,6
"Malvern, Rouge",1,1,1,1,1,1


# Let's find out how many unique categories can be curated from all the returned venues



In [33]:
print('There are {} uniques categories.'.format(len(Scarborough_venues['Venue Category'].unique())))


There are 53 uniques categories.


# Analysic each neighborhood¶


In [34]:

Scarborough_onehot = pd.get_dummies(Scarborough_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Scarborough_onehot['Neighborhood'] = Scarborough_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Scarborough_onehot.columns[-1]] + list(Scarborough_onehot.columns[:-1])
Scarborough_onehot = Scarborough_onehot[fixed_columns]

Scarborough_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Athletics & Sports,Auto Garage,Bakery,Bank,Bar,Breakfast Spot,Bus Line,Bus Station,Café,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,College Stadium,Department Store,Discount Store,Donut Shop,Electronics Store,Fast Food Restaurant,Fried Chicken Joint,Gas Station,General Entertainment,Gym,Hakka Restaurant,Hobby Shop,Ice Cream Shop,Indian Restaurant,Intersection,Italian Restaurant,Korean BBQ Restaurant,Latin American Restaurant,Light Rail Station,Lounge,Medical Center,Mexican Restaurant,Middle Eastern Restaurant,Motel,Noodle House,Park,Pet Store,Pharmacy,Pizza Place,Playground,Rental Car Location,Restaurant,Sandwich Place,Skating Rink,Smoke Shop,Soccer Field,Thai Restaurant,Thrift / Vintage Store,Vietnamese Restaurant
0,"Malvern, Rouge",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Rouge Hill, Port Union, Highland Creek",0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Guildwood, Morningside, West Hill",0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0


# Now we will search for 'Fried Chicken Joint'¶


In [38]:
df_Chicken = Scarborough_venues[Scarborough_venues['Venue Category'] == 'Fried Chicken Joint'].reset_index(drop=True)
print(df_Chicken.shape)
df_Chicken.head()

(2, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Cedarbrae,43.773136,-79.239476,Popeyes Louisiana Kitchen,43.776059,-79.235265,Fried Chicken Joint
1,"Clarks Corners, Tam O'Shanter, Sullivan",43.781638,-79.304302,Popeyes Louisiana Kitchen,43.780335,-79.298683,Fried Chicken Joint


We will try to find the possible locations that have normal restaurant and other types of venues density in addition to that they should don't have Fried Chicken Joint.

In the first step, we have collected the required data: The Neighborhoods and their locations and also the venues in each of these neighborhoods to see density.

In the second step in our analysis, we will look at 'venues and restaurant density' across different areas of Toronto - we will use maps to identify a few promising areas close to center with moderate density of restaurants, neither too much nor too little.

In the third and final step, we will focus on the most promising areas and within those create clusters of locations (using k-means clustering) that meet some basic requirements established in discussion with entrepreneur

In [39]:
Scarborough_venues.Neighborhood.value_counts()[0:10]


Clarks Corners, Tam O'Shanter, Sullivan                  13
Steeles West, L'Amoreaux West                            12
Guildwood, Morningside, West Hill                         9
Golden Mile, Clairlea, Oakridge                           9
Cedarbrae                                                 8
Dorset Park, Wexford Heights, Scarborough Town Centre     7
Kennedy Park, Ionview, East Birchmount Park               6
Wexford, Maryvale                                         5
Agincourt                                                 5
Birch Cliff, Cliffside West                               4
Name: Neighborhood, dtype: int64

In [44]:
most_venues=Scarborough_venues.Neighborhood.value_counts().to_frame()
optimal_venues = most_venues[(most_venues.Neighborhood < 50) & (most_venues.Neighborhood >= 10) ]
optimal_neigs = optimal_venues.index.tolist()

df_10_50 = pd.DataFrame()
for neig in optimal_neigs:
    df_10_50 = df_10_50.append(Scarborough_venues[Scarborough_venues['Neighborhood'] == neig], ignore_index=True)

In [45]:
most_venues=Scarborough_venues.Neighborhood.value_counts().to_frame()
optimal_venues = most_venues[(most_venues.Neighborhood < 50) & (most_venues.Neighborhood >= 10) ]
optimal_neigs = optimal_venues.index.tolist()
optimal_neigs

["Clarks Corners, Tam O'Shanter, Sullivan", "Steeles West, L'Amoreaux West"]

In [46]:
df_10_50 = pd.DataFrame()
for neig in optimal_neigs:
    df_10_50 = df_10_50.append(Scarborough_venues[Scarborough_venues['Neighborhood'] == neig], ignore_index=True)
print(df_10_50.shape)
df_10_50.head()

(25, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Clarks Corners, Tam O'Shanter, Sullivan",43.781638,-79.304302,Remezzo Italian Bistro,43.778649,-79.308264,Italian Restaurant
1,"Clarks Corners, Tam O'Shanter, Sullivan",43.781638,-79.304302,Eight Noodles,43.778234,-79.308299,Noodle House
2,"Clarks Corners, Tam O'Shanter, Sullivan",43.781638,-79.304302,The Royal Chinese Restaurant 避風塘小炒,43.780505,-79.298844,Chinese Restaurant
3,"Clarks Corners, Tam O'Shanter, Sullivan",43.781638,-79.304302,Kub Khao,43.780438,-79.299837,Thai Restaurant
4,"Clarks Corners, Tam O'Shanter, Sullivan",43.781638,-79.304302,TD Canada Trust,43.779169,-79.303617,Bank


# Visualize the Fried Chicken Restaurant and other venues¶


In [48]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the central of Lille

# add populer spots as blue circle markers   
for lat, lng, label in zip(df_10_50['Venue Latitude'], df_10_50['Venue Longitude'], df_10_50['Venue Category']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=1,
        popup=label,
        fill=True,
        color='blue',
        fill_color='red',
        fill_opacity=0.1,
        parse_html=False).add_to(venues_map) 


# add the Fried Chicken Joint as blue circle markers
for lat, lng, label in zip(df_Chicken['Venue Latitude'], df_Chicken['Venue Longitude'], df_Chicken['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        color='red',
        popup=label,
        fill = True,
        fill_color='red',
        fill_opacity=0.4,
        parse_html=False).add_to(venues_map)
    

# display map
venues_map

# Analysis each Neighborhood

In [50]:
# one hot encoding
Scarborough_onehot = pd.get_dummies(df_10_50[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Scarborough_onehot['Neighborhood_1'] = df_10_50['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Scarborough_onehot.columns[-1]] + list(Scarborough_onehot.columns[:-1])
ScarboroughScarborough_onehot = Scarborough_onehot[fixed_columns]

Scarborough_onehot.head()

Unnamed: 0,Bank,Breakfast Spot,Chinese Restaurant,Coffee Shop,Fast Food Restaurant,Fried Chicken Joint,Gas Station,Gym,Intersection,Italian Restaurant,Noodle House,Pharmacy,Pizza Place,Sandwich Place,Thai Restaurant,Neighborhood_1
0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,"Clarks Corners, Tam O'Shanter, Sullivan"
1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,"Clarks Corners, Tam O'Shanter, Sullivan"
2,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,"Clarks Corners, Tam O'Shanter, Sullivan"
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,"Clarks Corners, Tam O'Shanter, Sullivan"
4,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Clarks Corners, Tam O'Shanter, Sullivan"


# Let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [52]:
Scarborough_grouped = Scarborough_onehot.groupby('Neighborhood_1').mean().reset_index()
Scarborough_grouped.head()

Unnamed: 0,Neighborhood_1,Bank,Breakfast Spot,Chinese Restaurant,Coffee Shop,Fast Food Restaurant,Fried Chicken Joint,Gas Station,Gym,Intersection,Italian Restaurant,Noodle House,Pharmacy,Pizza Place,Sandwich Place,Thai Restaurant
0,"Clarks Corners, Tam O'Shanter, Sullivan",0.076923,0.0,0.076923,0.0,0.153846,0.076923,0.076923,0.0,0.076923,0.076923,0.076923,0.076923,0.153846,0.0,0.076923
1,"Steeles West, L'Amoreaux West",0.083333,0.083333,0.083333,0.083333,0.166667,0.0,0.0,0.083333,0.083333,0.0,0.083333,0.083333,0.083333,0.083333,0.0


# Let's print each neighborhood along with the top 5 most common venues¶


In [54]:
num_top_venues = 5

for hood in Scarborough_grouped['Neighborhood_1']:
    print("----"+hood+"----")
    temp = Scarborough_grouped[Scarborough_grouped['Neighborhood_1'] == hood].T.reset_index()
    temp.columns = ['venue_category ','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Clarks Corners, Tam O'Shanter, Sullivan----
        venue_category   freq
0  Fast Food Restaurant  0.15
1           Pizza Place  0.15
2                  Bank  0.08
3    Chinese Restaurant  0.08
4   Fried Chicken Joint  0.08


----Steeles West, L'Amoreaux West----
        venue_category   freq
0  Fast Food Restaurant  0.17
1                  Bank  0.08
2        Breakfast Spot  0.08
3    Chinese Restaurant  0.08
4           Coffee Shop  0.08




In [55]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]


Now, create the new dataframe and display the top 10 venues for each neighborhood.



In [56]:
num_top_venus = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood_1']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood_1'] = Scarborough_grouped['Neighborhood_1']

for ind in np.arange(Scarborough_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Scarborough_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.rename(columns={'Neighborhood_1': 'Neighborhood'}, inplace=True)
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,"Clarks Corners, Tam O'Shanter, Sullivan",Pizza Place,Fast Food Restaurant,Thai Restaurant,Pharmacy,Noodle House
1,"Steeles West, L'Amoreaux West",Fast Food Restaurant,Sandwich Place,Pizza Place,Pharmacy,Noodle House


 # Cluster Neighborhoods

In [None]:
import matplotlib.pyplot as plt  

cost =[] 
Scarborough_grouped_clustering = Scarborough_grouped.drop('Neighborhood_1', 1)

for i in range(1, 9): 
    KM = KMeans(n_clusters = i, max_iter = 5) 
    KM.fit(Scarborough_grouped_clustering)  
      
    # calculates squared error 
    # for the clustered points 
    cost.append(KM.inertia_)      

# plot the cost against K values   
plt.plot(range(1, 5), cost, color ='g', linewidth ='3') 
plt.xlabel("Value of K") 
plt.ylabel("Sqaured Error (Cost)") 
plt.show() # clear the plot 
  
# the point of the elbow is the  
# most optimal value for choosing k 

## Conclusion
The purpose of this project was to identify Toronto areas close to center with normal number of restaurants and venues in order to narrowing down the search for optimal location for a new Fried Chicken Restaurant. By seeing the density of restaurants and venues from Foursquare data we have identified the borouhgs that don't have a Fried Chicken Restaurant and also have a normal density of venues and restaurants.