# Where to open a new restaurant in Toronto?


## Back ground 
Toronto is a big city judging from the city size. By looking through the food and industry busines in the city, the density does not reach its limit. Therefore, there are rooms for a new restaurant. Then we have to answer the most important question, where?


## Business Problem
Opening a restaurant is a very challenging task. As a start, it is very important to decide the location wisely. The current situation is that an entrepreneur usually carelessly relies on common sense and domain knowledge to choose a restaurant type and a good spot. Needless to say that too often an inconsiderate decision leads to a poor income and inevitable bankruptcy.
Where to open it?  
  
I do not know the business rules in Toronto. In the Netherlands, it is not allowed that you open a restaurant where there are no restaurants nearby. In other words, the government tries to group all restaurants together in a city center or a shopping area. You can not buy a house near a school and start a fast food restaurant there.  
  
As a starter, an easy way is to open a restaurant where it is surrounded by other restaurants. Because you do not need to worry about customers. You can simply share them with other business. Then the restaurant can build up its reputation and grow in to a mature business. Once it is getting famous, we may move it because customers will follow.  
  
To sum up as a starter, opening a new restaurant should be where most of restaurants are.  
  
Let us do the restaurants data analysis in Toronto and find the concentrated restaurants area for a new restaurant’s starter. 


## Target audience  
Investors, entrepreneurs, and chefs interested in opening a restaurant in Toronto, who may need a piece of objective advice on where exactly it should be opened.

## Data and methods
1. Using a table on https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M collect information about Toronto boroughs and locations and Postal Code.  
2. Use the Geopy and Folium library to get the coordinates of every locations and map geospatial data on a Toronto map.
3. Use Foursquare API, collect the all restaurants in Toronto and their location via an exploring query.
4. Group collected restaurants by their locations using K-mean algorithm and find out the cluster which has the most nodes. This cluster represents the busy restaurants area.
6. Calculate the distance (Euclidean distance) from each borough to the biggest cluster center and select a few boroughs which have smaller distances than others. These boroughs will be my recommendations.
7. Visualize on the map and see if my recommendations are convincing.


### 1. Using a table on https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M collect information about Toronto boroughs and locations and Postal Code. 

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup
import lxml
#load data from website
res = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0] 
df = pd.read_html(str(table))[0]

#clean data
df.rename(columns={'Postal Code':'PostalCode'},inplace = True)
df.drop(df[df['Borough']=='Not assigned'].index,inplace = True)
df.reset_index(drop=True, inplace=True)
df.loc[df['Neighborhood'] =='Not assigned' , 'Neighborhood'] = df['Borough']
result = df.groupby(['PostalCode','Borough'], sort=False).agg( ', '.join)
df.head(5)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


### 2. Use the Geopy and Folium library to get the coordinates of every locations and map geospatial data on a Toronto map. I limite my explorations to boroughs which name has word 'Toronto' in. They are: 'Downtown Toronto', 'East Toronto', 'West Toronto' and 'Central Toronto'. The same method can apply to other boroughs.

In [2]:
location_data = pd.read_csv('http://cocl.us/Geospatial_data')
df_location= pd.DataFrame(location_data)
df_location.set_index('Postal Code',inplace=True)

import numpy as np
postlist = df['PostalCode']
Latitude_list = []
Longitude_list = []
for index,item in enumerate(postlist):
    Latitude_list.append(df_location.loc[item]['Latitude'])
    Longitude_list.append(df_location.loc[item]['Longitude'])
df['Latitude']=Latitude_list
df['Longitude']=Longitude_list

import folium
import requests 
import json 
import matplotlib.cm as cm
import matplotlib.colors as colors
import pandas as pd
import sklearn

from pandas.io.json import json_normalize 
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim 

address = 'Toronto, Ontario Canada'
geolocator = Nominatim()
location = geolocator.geocode(address)
Toronto_latitude = location.latitude
Toronto_longitude = location.longitude
print('The geograpical coordinate of Toronto Canada are {}, {}.'.format(Toronto_latitude, Toronto_longitude))

df.loc[df['Borough'].str.contains('Toronto'),'Borough'].unique()

toronto_borough = df[df['Borough'].str.contains("Toronto")].reset_index(drop=True)
print(toronto_borough.shape)
toronto_borough.head(5)

map_toronto = folium.Map(location=[Toronto_latitude, Toronto_longitude], zoom_start=10)

for lat, lng, borough, neighborhood in zip(toronto_borough['Latitude'], toronto_borough['Longitude'], toronto_borough['Borough'], toronto_borough['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#87cefa',
        fill_opacity=0.5,
        parse_html=False).add_to(map_toronto)

map_toronto




The geograpical coordinate of Toronto Canada are 43.6534817, -79.3839347.
(39, 5)


### 3. Use Foursquare API, collect the all restaurants in Toronto and their location via an exploring query.

In [3]:
# Foursquare API
CLIENT_ID = '1IE0DSD4UR1KELVDJHGV5DXOUC3KHJ0RLRAHPOVGQ0RBMMEI' # Put Your Client Id
CLIENT_SECRET = '0YIQHCGFCJR31BO4NZVAXRI5FW0FJPZ4PH32PQUVVBVC4G2H' # Put You Client Secret 
VERSION = '20180615'
LIMIT = 30
radius=500
print('Your credentails:')
print('CLIENT_ID: Hidden')
print('CLIENT_SECRET: Hidden')

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'\
            .format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng ,radius, LIMIT)
        
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([( name, lat, lng, v['venue']['name'], v['venue']['location']['lat'], 
                             v['venue']['location']['lng'], v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                             'Neighborhood Latitude', 
                             'Neighborhood Longitude', 
                             'Venue', 'Venue Latitude', 
                             'Venue Longitude', 
                             'Venue Category']
    
    return(nearby_venues)

toronto_borough
toronto_venues = getNearbyVenues(names=toronto_borough['Neighborhood'], 
                                 latitudes=toronto_borough['Latitude'],
                                 longitudes=toronto_borough['Longitude'])

Your credentails:
CLIENT_ID: Hidden
CLIENT_SECRET: Hidden


In [5]:
toronto_venues.head()

toronto_restaurant = toronto_venues[toronto_venues['Venue Category'].str.contains("Restaurant")]
toronto_restaurant.groupby('Neighborhood').count().sort_values(['Venue Category'], ascending=False)

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Little Portugal, Trinity",13,13,13,13,13,13
"The Danforth West, Riverdale",12,12,12,12,12,12
"University of Toronto, Harbord",10,10,10,10,10,10
Central Bay Street,9,9,9,9,9,9
"Kensington Market, Chinatown, Grange Park",9,9,9,9,9,9
Stn A PO Boxes,9,9,9,9,9,9
Davisville,9,9,9,9,9,9
"St. James Town, Cabbagetown",9,9,9,9,9,9
"Richmond, Adelaide, King",9,9,9,9,9,9
"First Canadian Place, Underground city",8,8,8,8,8,8


In [6]:
toronto_restaurant.shape

(196, 7)

### 4. Group collected restaurants by their locations using K-mean algorithm and find out the cluster which has the most nodes. This cluster represents the busy restaurants area.

In [7]:
import sklearn.cluster.k_means_
num_clusters = 5
X = toronto_restaurant.loc[:,['Venue Latitude','Venue Longitude']]

# scaling data
from sklearn.preprocessing import StandardScaler
cluster_dataset = StandardScaler().fit_transform(X)
k_means = KMeans(init="k-means++", n_clusters=num_clusters, n_init=12)
k_means.fit(cluster_dataset)
labels = k_means.labels_

map_toronto_restaurant = folium.Map(location=[Toronto_latitude, Toronto_longitude], zoom_start=12)

x = np.arange(num_clusters)
ys = [i+x+(i*x)**2 for i in range(num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

for lat, lon, cluster in zip(X['Venue Latitude'],X['Venue Longitude'],labels):
    label = folium.Popup(' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker([lat, lon], radius=5, popup=label, color=rainbow[cluster], fill=True, fill_color=rainbow[cluster], fill_opacity=0.7).add_to(map_toronto_restaurant)

### 5-6. Calculate the distance (Euclidean distance) from each borough to the biggest cluster center and select a few boroughs which have smaller distances than others. These boroughs will be my recommendations.

In [8]:
from collections import Counter, defaultdict
print(Counter(labels))
key = list(Counter(labels).keys())[0]
print(Counter(labels)[key])
print ('Cluster{} has the {} restaurants, which is the most'.format(list(Counter(labels).keys())[0],Counter(labels)[key]))
biggest_cluster_center = X[labels==list(Counter(labels).keys())[0]].mean()
folium.CircleMarker(location=[43.649928, -79.379402], radius=5, popup='t', color='black', fill=True, fill_color='black', fill_opacity=0.7).add_to(map_toronto_restaurant)

map_toronto_restaurant

Counter({1: 71, 4: 51, 3: 35, 0: 20, 2: 19})
71
Cluster1 has the 71 restaurants, which is the most


In [10]:
#find the right neibourhood
biggest_cluster_center = np.array(biggest_cluster_center)

distance = np.linalg.norm(toronto_borough[['Latitude','Longitude']]-biggest_cluster_center,axis=1)
df_distance = pd.DataFrame(distance)

df_distance.index.name='index'
df_distance.columns=['distance']
df_distance.head()
recom_borough_ind = df_distance.nsmallest(5, 'distance').index
recom_borough = toronto_borough.iloc[recom_borough_ind][['Latitude','Longitude','Neighborhood']]
print('After data analysis these are great locations to start a restaurant', recom_borough['Neighborhood'])

for lat, lng, label in zip(recom_borough['Latitude'], recom_borough['Longitude'], recom_borough['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng], 
                        radius=5, 
                        popup=label, 
                        color='#FA5B3D', 
                        fill=True, 
                        fill_color='#FA5B3D', 
                        fill_opacity=0.7,
                        parse_html=False).add_to(map_toronto_restaurant)

map_toronto_restaurant

After data analysis these are great locations to start a restaurant 16              Commerce Court, Victoria Hotel
36      First Canadian Place, Underground city
13    Toronto Dominion Centre, Design Exchange
3                               St. James Town
8                     Richmond, Adelaide, King
Name: Neighborhood, dtype: object


In [11]:
recom_borough.head()


Unnamed: 0,Latitude,Longitude,Neighborhood
16,43.648198,-79.379817,"Commerce Court, Victoria Hotel"
36,43.648429,-79.38228,"First Canadian Place, Underground city"
13,43.647177,-79.381576,"Toronto Dominion Centre, Design Exchange"
3,43.651494,-79.375418,St. James Town
8,43.650571,-79.384568,"Richmond, Adelaide, King"
