# Capstone Project - The Battle of the Neighborhoods 

## Introduction: Business Problem <a name="introduction"></a>

In this project we are going to find a location in Toronto to open a restaurant. This report will be targeted to stakeholders interested in opening an **Chinese restaurant** in **Toronto**, Canada.

Since Toronto is a well-diversified city, you can find cuisines from all over the world. Toronto has handreds of thousands Chinese people. There are lots of authentic Chinese food in Greater Toronto Area. We are looking for locations where there are few Chinese food and locations near city centre

We are going to use data science to observe neighborhoods that meet our criterias.We will deliver the advantages of each locations so the stakeholders can make best possible decisions

## Data

We are going to investigate the following points to make decisions:
* the number of restaurants in each neighborhood
* number of and distance to Chinese restaurants
* the distance from Chinese restaurants to the city centre

We are going to leverage the Foursquare (Foursquare API) location data to explore the number of restaurants and the cuisine in every neighborhood

#### Use geopy library to get the latitude and longitude values of Toronto.

In [1]:
#first import all modules needed
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
 # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


#### Define Foursquare Credentials and Version

In [2]:
CLIENT_ID = 'BWAMIB5QU5TOZW1YPF5GSVWVCZ5HUDSNU5COQSKSJYO2QKT2' # your Foursquare ID
CLIENT_SECRET = '3VI5IPMCC4A2UN4EDOQSW0C4MNLOWX4PIKMRNK4FL2KLTFP2' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: BWAMIB5QU5TOZW1YPF5GSVWVCZ5HUDSNU5COQSKSJYO2QKT2
CLIENT_SECRET:3VI5IPMCC4A2UN4EDOQSW0C4MNLOWX4PIKMRNK4FL2KLTFP2


In [3]:
address = 'Toronto, CA'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


#### latitude/longitude <=> X/Y co-ordinates

In [4]:
import shapely.geometry

import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

In [6]:
print('Coordinate transformation check')
print('-------------------------------')
print('Toronto center longitude={}, latitude={}'.format(longitude, latitude))
x, y = lonlat_to_xy(longitude, latitude)
print('Toronto center X/Y co-ordinates X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Toronto center longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Toronto center longitude=-79.3839347, latitude=43.6534817
Toronto center X/Y co-ordinates X=-5310527.241020994, Y=10507538.454385541
Toronto center longitude=-79.3839347000005, latitude=43.653481699999766


  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)


#### get neighborhoods in Toronto

In [5]:
import urllib.request
url="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
page=urllib.request.urlopen(url)
from bs4 import BeautifulSoup
soup = BeautifulSoup(page, "html.parser")
all_tables=soup.find_all("table")
right_table=soup.find('table', class_='wikitable sortable')

A=[]
B=[]
C=[]
for row in right_table.findAll("tr"):
    cells=row.findAll("td")
    if len(cells)==3:
        A.append(cells[0].find(text=True))
        B.append(cells[1].find(text=True))
        C.append(cells[2].find(text=True))

def remove_useless(lst):
    lst_r=[]
    for i in lst:
        i=i[:-1]
        lst_r.append(i)
    return lst_r

A=remove_useless(A)
B=remove_useless(B)
C=remove_useless(C)

import pandas as pd
df=pd.DataFrame(A,columns=["PostalCode"])
df["Borough"]=B
df["Neighborhood"]=C
df

df_valid=df[df["Borough"]!="Not assigned"]
df_valid=df_valid.reset_index().drop(["index"],axis=1)
df_valid

for i in range(len(df_valid)):
    if df_valid.loc[i,"Neighborhood"]=="Not assigned":
        df_valid.loc[i,"Neighborhood"]=df_valid.loc[i,"Borough"]
        
geo_data=pd.read_csv("E:/Coursea/capstone/geo.csv")
geo=pd.DataFrame(columns=geo_data.columns)
for i in A:
    part=geo_data[geo_data["Postal Code"]==i]
    geo=pd.concat([geo,part])

lat=geo["Latitude"].values.tolist()
long=geo["Longitude"].values.tolist()

df_valid["Longitude"]=long
df_valid["Latitude"]=lat

df_valid

Unnamed: 0,PostalCode,Borough,Neighborhood,Longitude,Latitude
0,M3A,North York,Parkwoods,-79.329656,43.753259
1,M4A,North York,Victoria Village,-79.315572,43.725882
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",-79.360636,43.65426
3,M6A,North York,"Lawrence Manor, Lawrence Heights",-79.464763,43.718518
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",-79.389494,43.662301
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",-79.532242,43.667856
6,M1B,Scarborough,"Malvern, Rouge",-79.194353,43.806686
7,M3B,North York,Don Mills,-79.352188,43.745906
8,M4B,East York,"Parkview Hill, Woodbine Gardens",-79.309937,43.706397
9,M5B,Downtown Toronto,"Garden District, Ryerson",-79.378937,43.657162


In [6]:
import folium

map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_valid['Latitude'], df_valid['Longitude'], df_valid['Borough'], df_valid['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)
    
map_toronto

In [7]:
long=df_valid["Longitude"].tolist()
lat=df_valid["Latitude"].tolist()
geo_to_xy=[]
for i in range(len(long)):
    x,y=lonlat_to_xy(long[i],lat[i])
    geo_to_xy.append((x,y))

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lo

In [8]:
X_coord=[]
Y_coord=[]
for i in range(len(geo_to_xy)):
    x=geo_to_xy[i][0]
    y=geo_to_xy[i][1]
    X_coord.append(x)
    Y_coord.append(y)

In [9]:
x_toronto,y_toronto=lonlat_to_xy(longitude, latitude)
distance_to_centre=[]
for i in range(len(geo_to_xy)):
    distance=calc_xy_distance(x_toronto, y_toronto, geo_to_xy[i][0], geo_to_xy[i][1])
    distance_to_centre.append(distance)

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)


In [10]:
df_valid["X"]=X_coord
df_valid["Y"]=Y_coord
df_valid["distance to centre"]=distance_to_centre
df_valid.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Longitude,Latitude,X,Y,distance to centre
0,M3A,North York,Parkwoods,-79.329656,43.753259,-5295352.0,10499540.0,17156.471845
1,M4A,North York,Victoria Village,-79.315572,43.725882,-5299879.0,10498390.0,14040.948724
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",-79.360636,43.65426,-5310700.0,10504830.0,2710.884442
3,M6A,North York,"Lawrence Manor, Lawrence Heights",-79.464763,43.718518,-5299146.0,10515710.0,14010.776289
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",-79.389494,43.662301,-5309053.0,10508030.0,1552.537639


In [11]:
food_cat="4bf58dd8d48988d145941735"
chinese_restaurant_cate=["52af3a5e3cf9994f4e043bea","52af3a723cf9994f4e043bec","52af3a7c3cf9994f4e043bed","58daa1558bbb0b01f18ec1d3",
                         "52af3a673cf9994f4e043beb","52af3a903cf9994f4e043bee","4bf58dd8d48988d1f5931735","52af3a9f3cf9994f4e043bef",
                         "52af3aaa3cf9994f4e043bf0","52af3ab53cf9994f4e043bf1","52af3abe3cf9994f4e043bf2","52af3ac83cf9994f4e043bf3",
                         "52af3ad23cf9994f4e043bf4","52af3add3cf9994f4e043bf5","52af3af23cf9994f4e043bf7","52af3ae63cf9994f4e043bf6",
                         "52af3afc3cf9994f4e043bf8","52af3b053cf9994f4e043bf9","52af3b213cf9994f4e043bfa","52af3b213cf9994f4e043bfa",
                         "52af3b293cf9994f4e043bfb","52af3b343cf9994f4e043bfc","52af3b3b3cf9994f4e043bfd","52af3b463cf9994f4e043bfe",
                         "52af3b633cf9994f4e043c01","52af3b513cf9994f4e043bff","52af3b593cf9994f4e043c00","52af3b6e3cf9994f4e043c02",
                         "52af3b773cf9994f4e043c03","52af3b813cf9994f4e043c04","52af3b893cf9994f4e043c05","52af3b913cf9994f4e043c06",
                         "52af3b9a3cf9994f4e043c07","52af3ba23cf9994f4e043c08"]

In [14]:
radius=1000
LIMIT=50
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&categoryId={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION,food_cat,radius, LIMIT)
results = requests.get(url).json()

In [15]:
venues=results['response']['venues']
from pandas.io.json import json_normalize
dataframe = json_normalize(venues)
dataframe

  dataframe = json_normalize(venues)


Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,location.crossStreet,venuePage.id,location.neighborhood
0,5cb7b58a9d74680039974a4a,Artisan Plus,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1595306647,False,122 Dundas Street W,43.655685,-79.38433,"[{'label': 'display', 'lat': 43.655685, 'lng':...",247,M5G 1C3,CA,Toronto,ON,Canada,"[122 Dundas Street W, Toronto ON M5G 1C3, Canada]",,,
1,5bf765b2c5b11c002c1c8fc6,ZenQ,"[{'id': '52e81612bcbc57f1066b7a0c', 'name': 'B...",v-1595306647,False,171 Dundas Street W,43.654911,-79.387266,"[{'label': 'display', 'lat': 43.654911, 'lng':...",311,M5G 1C8,CA,Toronto,ON,Canada,"[171 Dundas Street W (Dundas & Centre), Toront...",Dundas & Centre,,
2,4b2027b5f964a520f82d24e3,Hong Shing Chinese Restaurant,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1595306647,False,195 Dundas St W,43.654925,-79.387089,"[{'label': 'display', 'lat': 43.65492521335936...",300,M5G 1C7,CA,Toronto,ON,Canada,"[195 Dundas St W (at University Ave), Toronto ...",at University Ave,60327598.0,
3,5bd0acba911fc4002cb6ac94,Yang Teashop,"[{'id': '4bf58dd8d48988d1dc931735', 'name': 'T...",v-1595306647,False,183 Dundas St W,43.655061,-79.386637,"[{'label': 'display', 'lat': 43.655061, 'lng':...",279,M5G 1C7,CA,Toronto,ON,Canada,"[183 Dundas St W, Toronto ON M5G 1C7, Canada]",,,
4,59ebff82d0a1496688eb92b0,DAGU RICE NOODLE Toronto 大鼓米线,"[{'id': '4bf58dd8d48988d1d1941735', 'name': 'N...",v-1595306647,False,111 Dundas St W,43.655632,-79.38428,"[{'label': 'display', 'lat': 43.65563207103592...",240,M5G 1C4,CA,Toronto,ON,Canada,"[111 Dundas St W (btwn Bay & Elizabeth St), To...",btwn Bay & Elizabeth St,,
5,4c1a5be68b3aa5932f7a955f,Asian Gourmet,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1595306647,False,,43.649288,-79.378183,"[{'label': 'display', 'lat': 43.649288, 'lng':...",657,,CA,,,Canada,[Canada],,,
6,4c69740b8d22c9284d42b745,Wah Too Seafood Restaurant,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1595306647,False,56 Centre Ave.,43.654833,-79.387206,"[{'label': 'display', 'lat': 43.65483285234745...",303,M5G 1R5,CA,Toronto,ON,Canada,"[56 Centre Ave., Toronto ON M5G 1R5, Canada]",,,
7,4e95cb3930f82cbde9ed2407,Shanghai 360,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1595306647,False,220 Yonge St.,43.654506,-79.380894,"[{'label': 'display', 'lat': 43.65450604545589...",270,,CA,Toronto,ON,Canada,"[220 Yonge St. (in Urban Eatery, Toronto Eaton...","in Urban Eatery, Toronto Eaton Centre",,
8,4ba573a5f964a5207c0839e3,Rol Jui Seafood Restaurant,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1595306647,False,472 Dundas St,43.653153,-79.396877,"[{'label': 'display', 'lat': 43.65315284886056...",1043,,CA,Toronto,ON,Canada,"[472 Dundas St (Spadina), Toronto ON, Canada]",Spadina,,
9,5cadd69f3092be0039a9ba17,Chi Chop,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1595306647,False,372 Yonge St,43.65873,-79.382057,"[{'label': 'display', 'lat': 43.65873, 'lng': ...",603,M5B 1S6,CA,Toronto,ON,Canada,"[372 Yonge St, Toronto ON M5B 1S6, Canada]",,,


In [16]:
df_valid.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Longitude,Latitude,X,Y,distance to centre
0,M3A,North York,Parkwoods,-79.329656,43.753259,-5295352.0,10499540.0,17156.471845
1,M4A,North York,Victoria Village,-79.315572,43.725882,-5299879.0,10498390.0,14040.948724
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",-79.360636,43.65426,-5310700.0,10504830.0,2710.884442
3,M6A,North York,"Lawrence Manor, Lawrence Heights",-79.464763,43.718518,-5299146.0,10515710.0,14010.776289
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",-79.389494,43.662301,-5309053.0,10508030.0,1552.537639


In [16]:
toronto_lat=df_valid["Latitude"].tolist()
toronto_long=df_valid["Longitude"].tolist()
toronto_borough=df_valid["Borough"].tolist()
toronto_distance=df_valid["distance to centre"].tolist()
toronto_Neighborhood=df_valid["Neighborhood"].tolist()
restaurants_in_neighborhood=pd.DataFrame(columns=dataframe.columns)

In [17]:
radius=1000
LIMIT=50
for i in range(len(toronto_lat)):
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&categoryId={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, toronto_lat[i], toronto_long[i], VERSION,food_cat,radius, LIMIT)
    results = requests.get(url).json()
    venues=results['response']['venues']
    dataframe = json_normalize(venues)
    dataframe["Neighborhood"]=[toronto_Neighborhood[i]]*len(dataframe)
    dataframe["Borough"]=[toronto_borough[i]]*len(dataframe)
    dataframe["Latitude"]=[toronto_lat[i]]*len(dataframe)
    dataframe["Longitude"]=[toronto_long[i]]*len(dataframe)
    restaurants_in_neighborhood=pd.concat([restaurants_in_neighborhood,dataframe])

  dataframe = json_normalize(venues)


In [18]:
restaurants_in_neighborhood.reset_index(inplace=True)
restaurants_in_neighborhood=restaurants_in_neighborhood.drop(["index"],axis=1)

In [20]:
restaurants_in_neighborhood.head(20)

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.crossStreet,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,venuePage.id,location.neighborhood,Neighborhood,Borough,Latitude,Longitude
0,4c0150f4716bc9b65b9dbb55,Spicy Chicken House,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1595289144,False,1277 York Mills Rd.,,43.760639,-79.325671,"[{'label': 'display', 'lat': 43.76063939666398...",881,M3A 1Z5,CA,North York,ON,Canada,"[1277 York Mills Rd., North York ON M3A 1Z5, C...",,,Parkwoods,North York,43.753259,-79.329656
1,4be21e1921d5a59390ec1511,Peking Express,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1595289144,False,,,43.656692,-79.365126,"[{'label': 'display', 'lat': 43.6566919000341,...",451,,CA,,,Canada,[Canada],,,"Regent Park, Harbourfront",Downtown Toronto,43.65426,-79.360636
2,4bace084f964a520ca143be3,Oriental Taste,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1595289144,False,329 Queen st,Parliment,43.655304,-79.365312,"[{'label': 'display', 'lat': 43.655304, 'lng':...",394,,CA,Toronto,ON,Canada,"[329 Queen st (Parliment), Toronto ON, Canada]",,,"Regent Park, Harbourfront",Downtown Toronto,43.65426,-79.360636
3,4bca992068f976b017d35f83,China Gourmet,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1595289144,False,235 Carlton St,at Parliament St,43.66418,-79.368359,"[{'label': 'display', 'lat': 43.6641802410051,...",1267,M5A 2L2,CA,Toronto,ON,Canada,"[235 Carlton St (at Parliament St), Toronto ON...",,,"Regent Park, Harbourfront",Downtown Toronto,43.65426,-79.360636
4,4c17c23a6a21c9b6b901c897,Ying Ying Soy Food,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1595289144,False,93 Front St. E.,in St. Lawrence Market,43.648994,-79.371494,"[{'label': 'display', 'lat': 43.64899366575782...",1052,M5E 1C3,CA,Toronto,ON,Canada,"[93 Front St. E. (in St. Lawrence Market), Tor...",,,"Regent Park, Harbourfront",Downtown Toronto,43.65426,-79.360636
5,5b3aa3ec029a55002c3f0338,Bamboo Kitchen,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1595289144,False,292 Parliament st,,43.65862,-79.365881,"[{'label': 'display', 'lat': 43.65862, 'lng': ...",643,M5A 3A4,CA,Toronto,ON,Canada,"[292 Parliament st, Toronto ON M5A 3A4, Canada]",,,"Regent Park, Harbourfront",Downtown Toronto,43.65426,-79.360636
6,4f73a473e4b0c1f445d21c78,Huayu Kitchen,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1595289144,False,,,43.654148,-79.357826,"[{'label': 'display', 'lat': 43.65414810180664...",226,,CA,,,Canada,[Canada],,,"Regent Park, Harbourfront",Downtown Toronto,43.65426,-79.360636
7,4ef0e4ece5e89bf2782272f4,Ho Mei Kitchen,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1595289144,False,236 Sherbourne St,,43.65803,-79.371028,"[{'label': 'display', 'lat': 43.65803017091301...",936,M5A,CA,Toronto,ON,Canada,"[236 Sherbourne St, Toronto ON M5A, Canada]",,,"Regent Park, Harbourfront",Downtown Toronto,43.65426,-79.360636
8,553c2ae7498e53f7c3086919,Kanpai Snack Bar,"[{'id': '52af3b813cf9994f4e043c04', 'name': 'T...",v-1595289144,False,252 Carlton St,at Parliament St.,43.664331,-79.368065,"[{'label': 'display', 'lat': 43.66433093594863...",1270,,CA,Toronto,ON,Canada,"[252 Carlton St (at Parliament St.), Toronto O...",,Cabbagetown,"Regent Park, Harbourfront",Downtown Toronto,43.65426,-79.360636
9,4b64d531f964a52090d32ae3,On The Rocks,"[{'id': '4bf58dd8d48988d120941735', 'name': 'K...",v-1595289144,False,169 Front Street East,at Sherbourne St,43.650408,-79.368354,"[{'label': 'display', 'lat': 43.65040844699981...",755,,CA,Toronto,ON,Canada,"[169 Front Street East (at Sherbourne St), Tor...",,,"Regent Park, Harbourfront",Downtown Toronto,43.65426,-79.360636


## Methodology

In this project we are going to implement two selection process and ensemble them together to deliver a result

The first process is to find out how many Chinese restaurants in each neighborhood and the distance of each neighborhood to Toronto city centre (Yonge-Dundas Square). we find the neighborhood(s) that will few Chinese restaurants but with moderate distance 

The second process is to find out the distance of each restaurant to Toronto centre(Yonge-Dundas Square), grouped by neighborhood ,find the mean value of the distance to Toronto centre within each neighborhood. 

Finally find the intersection of the two results

## Analysis

let us perform some basic data analysis techniques to explore the data, first we are going to observe the number of Chinese restaurants in each Neighborhood.

In [19]:
grouped_rest_by_neigh=restaurants_in_neighborhood.groupby(by="Neighborhood").count().sort_values(by="id",ascending=False)
grouped_rest_by_neigh.reset_index(inplace=True)
grouped_rest_by_neigh[["Neighborhood","id"]]

Unnamed: 0,Neighborhood,id
0,Central Bay Street,50
1,"University of Toronto, Harbord",50
2,"Queen's Park, Ontario Provincial Government",50
3,"Richmond, Adelaide, King",49
4,"Kensington Market, Chinatown, Grange Park",49
5,"First Canadian Place, Underground city",49
6,"Toronto Dominion Centre, Design Exchange",49
7,"Garden District, Ryerson",47
8,St. James Town,46
9,"Commerce Court, Victoria Hotel",46


We can observe that the top 5 neighborhoods are located in Toronto downtown area and near Chinatown and University of Toronto 

In [20]:
distance_to_neigh=[]
for i in grouped_rest_by_neigh["Neighborhood"].tolist():
    dist=df_valid[df_valid["Neighborhood"]==i].iloc[0,7]
    distance_to_neigh.append(dist)
distance_to_neigh=distance_to_neigh[:88]
grouped_rest_by_neigh["distance to neighborhood"]=distance_to_neigh

In [21]:
grouped_rest_by_neigh["distance to neighborhood/KM"]=grouped_rest_by_neigh["distance to neighborhood"]/1000
grouped_rest_by_neigh.drop(columns=["distance to neighborhood"],inplace=True)

In [22]:
grouped_rest_by_neigh[["Neighborhood","id","distance to neighborhood/KM"]]

Unnamed: 0,Neighborhood,id,distance to neighborhood/KM
0,Central Bay Street,50,0.820184
1,"University of Toronto, Harbord",50,2.383689
2,"Queen's Park, Ontario Provincial Government",50,1.552538
3,"Richmond, Adelaide, King",49,0.47169
4,"Kensington Market, Chinatown, Grange Park",49,1.873503
5,"First Canadian Place, Underground city",49,0.831382
6,"Toronto Dominion Centre, Design Exchange",49,1.0459
7,"Garden District, Ryerson",47,0.82728
8,St. James Town,46,1.039831
9,"Commerce Court, Victoria Hotel",46,0.97182


We have already finished the first method and demostrated the dataframe indicates the number of Chinese resatuant and the distance to city centre, we find that most neighborhoods with high number of Chinese restaurant are either located in Downtown Toronto or pretty far away from Toronto centre and that makes sense. These neighborhoods are out of considerations because it could be too competitive. but there are a few communities near Toronto Centre but with fewer Chinese restaurants. they are **Berczy Park, The Annex, North Midtown, Yorkville and East Toronto, Harbourfront East, Union Station, Toronto Islands, Broadview North (Old East York)**. these neighborhoods have around 20 Chinese restaurants means that Chinese food are still popular but not too competitive.

In [23]:
restaurants_in_neighborhood.head()

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,location.crossStreet,venuePage.id,location.neighborhood,Neighborhood,Borough,Latitude,Longitude
0,4c0150f4716bc9b65b9dbb55,Spicy Chicken House,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1595306675,False,1277 York Mills Rd.,43.760639,-79.325671,"[{'label': 'display', 'lat': 43.76063939666398...",881,M3A 1Z5,CA,North York,ON,Canada,"[1277 York Mills Rd., North York ON M3A 1Z5, C...",,,,Parkwoods,North York,43.753259,-79.329656
1,4be21e1921d5a59390ec1511,Peking Express,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1595306676,False,,43.656692,-79.365126,"[{'label': 'display', 'lat': 43.6566919000341,...",451,,CA,,,Canada,[Canada],,,,"Regent Park, Harbourfront",Downtown Toronto,43.65426,-79.360636
2,4bace084f964a520ca143be3,Oriental Taste,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1595306676,False,329 Queen st,43.655304,-79.365312,"[{'label': 'display', 'lat': 43.655304, 'lng':...",394,,CA,Toronto,ON,Canada,"[329 Queen st (Parliment), Toronto ON, Canada]",Parliment,,,"Regent Park, Harbourfront",Downtown Toronto,43.65426,-79.360636
3,4bca992068f976b017d35f83,China Gourmet,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1595306676,False,235 Carlton St,43.66418,-79.368359,"[{'label': 'display', 'lat': 43.6641802410051,...",1267,M5A 2L2,CA,Toronto,ON,Canada,"[235 Carlton St (at Parliament St), Toronto ON...",at Parliament St,,,"Regent Park, Harbourfront",Downtown Toronto,43.65426,-79.360636
4,5b3aa3ec029a55002c3f0338,Bamboo Kitchen,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1595306676,False,292 Parliament st,43.65862,-79.365881,"[{'label': 'display', 'lat': 43.65862, 'lng': ...",643,M5A 3A4,CA,Toronto,ON,Canada,"[292 Parliament st, Toronto ON M5A 3A4, Canada]",,,,"Regent Park, Harbourfront",Downtown Toronto,43.65426,-79.360636


We are going to calculate the distance of each restaurants to the city centre

In [36]:
distance_of_restaurants=[]
rest_lat=restaurants_in_neighborhood["location.lat"].tolist()
rest_long=restaurants_in_neighborhood["location.lng"].tolist()
rest_geo_to_xy=[]
for i in range(len(rest_long)):
    x,y=lonlat_to_xy(rest_long[i],rest_lat[i])
    rest_geo_to_xy.append((x,y))

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lo

In [37]:
len(rest_geo_to_xy)

1012

In [38]:
rest_X_coord=[]
rest_Y_coord=[]
for i in range(len(geo_to_xy)):
    x=rest_geo_to_xy[i][0]
    y=rest_geo_to_xy[i][1]
    rest_X_coord.append(x)
    rest_Y_coord.append(y)

In [39]:
x_toronto,y_toronto=lonlat_to_xy(longitude, latitude)
rest_distance_to_centre=[]
for i in range(len(rest_geo_to_xy)):
    distance=calc_xy_distance(x_toronto, y_toronto, rest_geo_to_xy[i][0], rest_geo_to_xy[i][1])
    rest_distance_to_centre.append(distance)

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)


In [40]:
len(rest_distance_to_centre)

1012

In [41]:
rest_distance_to_centre_km=[i/1000 for i in rest_distance_to_centre]

In [42]:
restaurants_in_neighborhood["distance to Toronto centre/KM"]=rest_distance_to_centre_km
restaurants_in_neighborhood.head(5)

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,location.crossStreet,venuePage.id,location.neighborhood,Neighborhood,Borough,Latitude,Longitude,distance to Toronto centre/KM
0,4c0150f4716bc9b65b9dbb55,Spicy Chicken House,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1595306675,False,1277 York Mills Rd.,43.760639,-79.325671,"[{'label': 'display', 'lat': 43.76063939666398...",881,M3A 1Z5,CA,North York,ON,Canada,"[1277 York Mills Rd., North York ON M3A 1Z5, C...",,,,Parkwoods,North York,43.753259,-79.329656,18.423039
1,4be21e1921d5a59390ec1511,Peking Express,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1595306676,False,,43.656692,-79.365126,"[{'label': 'display', 'lat': 43.6566919000341,...",451,,CA,,,Canada,[Canada],,,,"Regent Park, Harbourfront",Downtown Toronto,43.65426,-79.360636,2.245587
2,4bace084f964a520ca143be3,Oriental Taste,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1595306676,False,329 Queen st,43.655304,-79.365312,"[{'label': 'display', 'lat': 43.655304, 'lng':...",394,,CA,Toronto,ON,Canada,"[329 Queen st (Parliment), Toronto ON, Canada]",Parliment,,,"Regent Park, Harbourfront",Downtown Toronto,43.65426,-79.360636,2.18404
3,4bca992068f976b017d35f83,China Gourmet,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1595306676,False,235 Carlton St,43.66418,-79.368359,"[{'label': 'display', 'lat': 43.6641802410051,...",1267,M5A 2L2,CA,Toronto,ON,Canada,"[235 Carlton St (at Parliament St), Toronto ON...",at Parliament St,,,"Regent Park, Harbourfront",Downtown Toronto,43.65426,-79.360636,2.491721
4,5b3aa3ec029a55002c3f0338,Bamboo Kitchen,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1595306676,False,292 Parliament st,43.65862,-79.365881,"[{'label': 'display', 'lat': 43.65862, 'lng': ...",643,M5A 3A4,CA,Toronto,ON,Canada,"[292 Parliament st, Toronto ON M5A 3A4, Canada]",,,,"Regent Park, Harbourfront",Downtown Toronto,43.65426,-79.360636,2.253669


In [43]:
grouped_2_neighborhood=restaurants_in_neighborhood.groupby(by="Neighborhood").mean().sort_values(by="distance to Toronto centre/KM")
grouped_2_neighborhood.reset_index(inplace=True)
grouped_2_neighborhood

Unnamed: 0,Neighborhood,location.lat,location.lng,Latitude,Longitude,distance to Toronto centre/KM
0,"Garden District, Ryerson",43.655793,-79.382815,43.657162,-79.378937,0.891494
1,"Commerce Court, Victoria Hotel",43.65077,-79.382495,43.648199,-79.379817,0.915078
2,St. James Town,43.653109,-79.380153,43.651494,-79.375418,0.931716
3,"Richmond, Adelaide, King",43.653658,-79.387705,43.650571,-79.384568,0.968876
4,"Toronto Dominion Centre, Design Exchange",43.651893,-79.386897,43.647177,-79.381576,1.0362
5,"First Canadian Place, Underground city",43.651893,-79.386897,43.648429,-79.38228,1.0362
6,Stn A PO Boxes,43.649058,-79.379251,43.646435,-79.374846,1.184906
7,Berczy Park,43.647321,-79.379142,43.644771,-79.373306,1.21202
8,Central Bay Street,43.656292,-79.392754,43.657952,-79.387383,1.384786
9,"Harbourfront East, Union Station, Toronto Islands",43.645527,-79.383315,43.640816,-79.381752,1.408815


In [44]:
grouped_count=restaurants_in_neighborhood.groupby(by="Neighborhood").count()
grouped_count.reset_index(inplace=True)
grouped_count=grouped_count[["Neighborhood","id"]]

In [45]:
num_of_chinese_rest=[]
for i in grouped_2_neighborhood["Neighborhood"].tolist():
    if i in grouped_count["Neighborhood"].tolist() == False:
        num=0
    else:
        num=grouped_count[grouped_count["Neighborhood"]==i].iloc[0,1]
    num_of_chinese_rest.append(num)

In [46]:
grouped_2_neighborhood["num of Chinese restaurant"]=num_of_chinese_rest
grouped_2_neighborhood

Unnamed: 0,Neighborhood,location.lat,location.lng,Latitude,Longitude,distance to Toronto centre/KM,num of Chinese restaurant
0,"Garden District, Ryerson",43.655793,-79.382815,43.657162,-79.378937,0.891494,47
1,"Commerce Court, Victoria Hotel",43.65077,-79.382495,43.648199,-79.379817,0.915078,46
2,St. James Town,43.653109,-79.380153,43.651494,-79.375418,0.931716,46
3,"Richmond, Adelaide, King",43.653658,-79.387705,43.650571,-79.384568,0.968876,49
4,"Toronto Dominion Centre, Design Exchange",43.651893,-79.386897,43.647177,-79.381576,1.0362,49
5,"First Canadian Place, Underground city",43.651893,-79.386897,43.648429,-79.38228,1.0362,49
6,Stn A PO Boxes,43.649058,-79.379251,43.646435,-79.374846,1.184906,31
7,Berczy Park,43.647321,-79.379142,43.644771,-79.373306,1.21202,21
8,Central Bay Street,43.656292,-79.392754,43.657952,-79.387383,1.384786,50
9,"Harbourfront East, Union Station, Toronto Islands",43.645527,-79.383315,43.640816,-79.381752,1.408815,26


We have already completed the second selection process and demostrated the above dataframe. We can observe that most neighborhood near Toronto centre have many Chinese restaurant (around 50) we observe **Berczy Park, Harbourfront East, Union Station, Toronto Islands, Harbourfront,St.,Regent Park, Harbourfront, James Town, Cabbagetown,The Annex, North Midtown, Yorkville** are good place to open Chinese restauarants

In [48]:
from folium import plugins
from folium.plugins import HeatMap

map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)
folium.TileLayer('cartodbpositron').add_to(map_toronto) #cartodbpositron cartodbdark_matter
HeatMap(restaurants_in_neighborhood[["location.lat","location.lng"]]).add_to(map_toronto)
folium.Marker([latitude, longitude]).add_to(map_toronto)
folium.Circle([latitude, longitude], radius=1000, fill=False, color='white').add_to(map_toronto)
folium.Circle([latitude, longitude], radius=2000, fill=False, color='white').add_to(map_toronto)
folium.Circle([latitude, longitude], radius=3000, fill=False, color='white').add_to(map_toronto)
map_toronto

The heat map below shows the density of Chinese restaurants in the Toronto. It is clear that area in red represents high density, yellow and green represents median density and blue represents low density. We can see there are lots of in **downtown Toronto** and **North York Centre Station** (the centre of North York)

We recall the results we generated from the first selection process: **Berczy Park, The Annex, North Midtown, Yorkville and East Toronto,Harbourfront East, Union Station, Toronto Islands, Broadview North (Old East York)** as well as the results we generated by the second process: **Berczy Park, Harbourfront East, Union Station, Toronto Islands, Harbourfront,St.,Regent Park, Harbourfront, James Town, Cabbagetown,The Annex, North Midtown, Yorkville**

We take the intersection of those two results: **Berczy Park,(The Annex,North Midtown,Yorkville),(Harbourfront East,Union Station, Toronto Islands).** 

The final result contains seven neighborhood which is still too many, we are going to deep compare these seven neighborhood and narrow it down

In [58]:
my_neighborhood=["Berczy Park","The Annex, North Midtown, Yorkville","Harbourfront East, Union Station, Toronto Islands"]
my_restaurant_df=pd.DataFrame(columns=grouped_2_neighborhood.columns)
for i in my_neighborhood:
    df=grouped_2_neighborhood[grouped_2_neighborhood["Neighborhood"]==i]
    my_restaurant_df=pd.concat([my_restaurant_df,df])
my_restaurant_df

Unnamed: 0,Neighborhood,location.lat,location.lng,Latitude,Longitude,distance to Toronto centre/KM,num of Chinese restaurant
7,Berczy Park,43.647321,-79.379142,43.644771,-79.373306,1.21202,21
17,"The Annex, North Midtown, Yorkville",43.666104,-79.40167,43.67271,-79.405678,2.969228,15
9,"Harbourfront East, Union Station, Toronto Islands",43.645527,-79.383315,43.640816,-79.381752,1.408815,26


In [61]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)
folium.Marker([latitude, longitude], popup='Toronto Centre').add_to(map_toronto)
folium.Marker([43.644771,-79.373306], popup='Berczy Park').add_to(map_toronto)
folium.Marker([43.672710,-79.405678], popup='The Annex, North Midtown, Yorkville').add_to(map_toronto)
folium.Marker([43.640816,-79.381752], popup='Harbourfront East, Union Station, Toronto Islands').add_to(map_toronto)
map_toronto

From the map as well as the data frame we observe that Harbourfront East, Union Station, Toronto Islands and Berczy Park neighborhoods have similar number of Chinese restuarants and similar distance to city centre but Berczy Park is much better because it is much near and the number is fewer than Harbourfront East, Union Station, Toronto Islands neighborhood. The Annex, North Midtown, Yorkville is much further (more than 2 times) but the it is less competitive.

## Results and Discussion

Our analysis shows there are lots of Chinese restaurants in the city of Toronto (around one thousand not included cities in york region and peel region). However, most of them are located in downtown. To meet our criterias, we have find a neighborhood in Toronto downtown or somewhere near downtown. After ensembled two selection method we have three neighborhoods: **Berczy Park,(The Annex,North Midtown,Yorkville),(Harbourfront East,Union Station, Toronto Islands)**. But we want to narrow down the result. We compared the three neighborhoods and labelled them on the map and we finally selected The Annex,North Midtown,Yorkville neighborhood because it is pretty near Toronto centre (3km) and market is not as competitive as the neighborhoods it near by.

The neighborhood we found is the most optimal based on our selection criteria but it doesn't imply it is the best place. Selecting a community to start business doesn't simply consider these criterias. Rent, population densities,target customers are also factors that should be considered. Take an example, Yorkville is one of the region with highest rent and expense. But these factors are not included in the project. We can only say The Annex,North Midtown,Yorkville neighborhood is the best choice based on the our requirements initially set above.

## Conclusion

The purpose of this project is to find the best neighborhood to start business based on the number of Chinese restaurants nearby and the distance to city centre. By calculating the distance with two methods and the number of Chinese restaurants in the neighborhood and narrow down analysis, we choose The Annex,North Midtown,Yorkville neighborhood. But we have to take considerations of other factors such as rent, population densities and target customers whether it nears bus stations or subway stations. 