## The Battle of Neighborhoods

## 1. Introduction
In this notebook, we will discuss the problem that how to select a location if someone is looking to open a restaurant in Toronto, CA.  
The target group who would be interested in this project is the restaurateur or the invertor.  
The problem they are interested in could be summarized into that where the restaurant would locate to attract as many as customers.

## 2. Data Source
1) Neighbourhood information grouped by postal code is collected from WiKi page and then is processed.  
2) Geographical coordinates of each neighbourhood can be assessed through .csv file offered in assignment of last week.  
3) Trending Venues around each neighbourhood are acquired from FourSquare.

## 3. Methodology
1) To address the problem that attracting as many as customers, we define a parameter "Score" to measure the geographical convenience of the restaurant to its target customer group.
<center>**Score = sum (D × QD)**</center>
<center>**D--distance**</center>  
<center>**QD--quantity demanded**</center>  
2) It represents the sum of product of the distance between the neighbourhood where the restaurant locates and other neibourhoods and quantity demanded of this kind of restaurant. The less the value is, the easier to be approached the restaurant is for its customers.  
3) The distance can be calculated by geographical coordinates of each neighbourhood.  
4) And the quantity demanded is represented by trending venues. The more frequently a type of restaurant appears, the greater its demand is in this neighbourhood.  
#### Now, data analysis starts!

### 3.1 geographical data of each neighbourhood

In [2]:
# import libraries
import requests
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup

#### 3.1.1 Scrape page

In [3]:
# connect to page and get html content
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
doc=requests.get(url)

#### 3.1.2 Convert to BeautifulSoup Object and get table
Use the BeautifulSoup package to transform the data in the table on the Wikipedia page into the pandas dataframe

In [4]:
# convert to BeautifulSoup Object
html_content=BeautifulSoup(doc.content,'lxml')
#print(html_content.prettify())

In [5]:
# get table and transform it into pandas dataframe
table=html_content.find_all('table')[0]
df=pd.read_html(str(table))[0]
#df

#### 3.1.3 Data wrangling

In [6]:
# The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
df.columns=['PostalCode','Borough','Neighborhood']
df.drop(0,inplace=True)
#df

In [7]:
# Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned
# Replace "Not assigned" with "NaN" in column "Borough"
df_clean=df
df_clean['Borough'].replace("Not assigned", np.nan, inplace = True)
#df_clean.head()

In [8]:
# Drop rows with value "NaN"
df_clean.dropna(inplace=True)
#df_clean.head()

In [9]:
# If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
df_clean['Neighborhood'].replace("Not assigned",df_clean['Borough'], inplace = True)
#df_clean

In [10]:
# merge the Neighbourhood with the same Postcode
df_group=df_clean.groupby(['PostalCode','Borough']).aggregate(lambda x:', '.join(x))
df_group=df_group.reset_index()
#df_group

In [11]:
#  the number of rows of df_group
df_group.shape[0]

103

#### 3.1.4 get the latitude and the longitude coordinates of each neighborhood

In [12]:
lat_lng=pd.read_csv('http://cocl.us/Geospatial_data')
#lat_lng

In [13]:
# merge neighbourhood with geospatial data by the key Postal Code
df_lat_lng=df_group.copy()

# inner join two tables
df_lat_lng=pd.merge(df_lat_lng, lat_lng, how='inner', on=None, left_on='PostalCode', right_on='Postal Code')
# delete duplicate column 
df_lat_lng.drop(['Postal Code'],axis=1,inplace=True)
#df_lat_lng

#### 3.1.5 refine the scope to Toronto

In [14]:
# segment and cluster only the neighborhoods in Toronto
# drop rows that are irrelevant to Toronto
Toronto_data=df_lat_lng.copy()
Toronto_data = Toronto_data[Toronto_data['Borough'].str.contains('Toronto')].reset_index(drop=True)
Toronto_data

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
5,M4P,Central Toronto,Davisville North,43.712751,-79.390197
6,M4R,Central Toronto,North Toronto West,43.715383,-79.405678
7,M4S,Central Toronto,Davisville,43.704324,-79.38879
8,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316
9,M4V,Central Toronto,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",43.686412,-79.400049


### 3.2 get trending venues of each neighbourhood

get nearby venues within 500 meters and with a limit 15

In [19]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        # print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            'DODA2AZKZJG1UG4AW0XE3Z43EMRLSXDA50CLRQL4FDEH4ZKA', 
            'XVJA2ASYYJJOSHNFPCA1FPSXYWWUQTWCYIRK2JDFUDOTWALT', 
            '20180605', 
            lat, 
            lng, 
            radius, 
            15)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [21]:
Toronto_venues = getNearbyVenues(names=Toronto_data['Neighborhood'],
                                   latitudes=Toronto_data['Latitude'],
                                   longitudes=Toronto_data['Longitude']
                                  )

In [22]:
print(Toronto_venues.shape)
Toronto_venues.head()
Toronto_venues.to_csv('Toronto_venues1.csv')

(492, 7)


In [23]:
Toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",15,15,15,15,15,15
Berczy Park,15,15,15,15,15,15
"Brockton, Exhibition Place, Parkdale Village",15,15,15,15,15,15
Business reply mail Processing Centre969 Eastern,15,15,15,15,15,15
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",14,14,14,14,14,14
"Cabbagetown, St. James Town",15,15,15,15,15,15
Central Bay Street,15,15,15,15,15,15
"Chinatown, Grange Park, Kensington Market",15,15,15,15,15,15
Christie,15,15,15,15,15,15
Church and Wellesley,15,15,15,15,15,15


In [24]:
print('There are {} uniques categories.'.format(len(Toronto_venues['Venue Category'].unique())))

There are 148 uniques categories.


In [25]:
# one hot encoding
Toronto_onehot = pd.get_dummies(Toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Toronto_onehot['Neighborhood'] = Toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Toronto_onehot.columns[-1]] + list(Toronto_onehot.columns[:-1])
Toronto_onehot = Toronto_onehot[fixed_columns]

Toronto_onehot.head()

Unnamed: 0,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Art Gallery,...,Swim School,Taco Place,Tea Room,Tennis Court,Thai Restaurant,Theater,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [26]:
Toronto_onehot.shape

(492, 148)

In [27]:
# get appearance frequency of each type of venues in each neighbourhood, and use it to represent the QD
Toronto_grouped = Toronto_onehot.groupby('Neighborhood').mean().reset_index()
Toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Swim School,Taco Place,Tea Room,Tennis Court,Thai Restaurant,Theater,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.0
2,"Brockton, Exhibition Place, Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business reply mail Processing Centre969 Eastern,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0,0.071429,0.071429,0.071429,0.142857,0.142857,0.142857,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Chinatown, Grange Park, Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667
8,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Church and Wellesley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [158]:
# find the type that appears most frequently
Toronto_grouped_count=Toronto_grouped.copy()
Toronto_grouped_count=np.sum(Toronto_grouped_count.drop('Neighborhood',1),axis=0)
Toronto_grouped_count.sort_values(ascending=False)

Park                             2.308333
Coffee Shop                      2.204762
Café                             2.066667
Restaurant                       1.188095
Pub                              1.066667
Italian Restaurant               0.933333
Ice Cream Shop                   0.833333
Bakery                           0.800000
Pizza Place                      0.791667
Gastropub                        0.733333
Sushi Restaurant                 0.716667
Breakfast Spot                   0.667857
Garden                           0.566667
Bar                              0.538095
Sandwich Place                   0.525000
Playground                       0.500000
Trail                            0.500000
Steakhouse                       0.466667
Japanese Restaurant              0.466667
French Restaurant                0.466667
Burger Joint                     0.463095
Greek Restaurant                 0.400000
Farmers Market                   0.400000
Brewery                          0

In [36]:
Toronto_grouped.shape

(38, 148)

According data above, we can find that Park, Coffee Shop, Café are top 3 trending venues.  
**So, opening a Coffee Shop may be a good choice.**

In [41]:
#num_top_venues = 15

#for hood in Toronto_grouped['Neighborhood']:
#    print("----"+hood+"----")
#    temp = Toronto_grouped[Toronto_grouped['Neighborhood'] == hood].T.reset_index()
#    temp.columns = ['venue','freq']
#    temp = temp.iloc[1:]
#    temp['freq'] = temp['freq'].astype(float)
#    temp = temp.round({'freq': 2})
#    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
#    print('\n')

In [39]:
#def return_most_common_venues(row, num_top_venues):
#    row_categories = row.iloc[1:]
#    row_categories_sorted = row_categories.sort_values(ascending=False)
    
#    return row_categories_sorted.index.values[0:num_top_venues]

### 3.3 calculate Score of each neighbourhood

#### 3.3.1 calculate distance between each two neighbourhood and normalize it

In [66]:
# get coordinates
coordinates=Toronto_data[['Latitude','Longitude']]
coordinates=coordinates.values
coordinates

array([[ 43.6763574, -79.2930312],
       [ 43.6795571, -79.352188 ],
       [ 43.6689985, -79.3155716],
       [ 43.6595255, -79.340923 ],
       [ 43.7280205, -79.3887901],
       [ 43.7127511, -79.3901975],
       [ 43.7153834, -79.4056784],
       [ 43.7043244, -79.3887901],
       [ 43.6895743, -79.3831599],
       [ 43.6864123, -79.4000493],
       [ 43.6795626, -79.3775294],
       [ 43.667967 , -79.3676753],
       [ 43.6658599, -79.3831599],
       [ 43.6542599, -79.3606359],
       [ 43.6571618, -79.3789371],
       [ 43.6514939, -79.3754179],
       [ 43.6447708, -79.3733064],
       [ 43.6579524, -79.3873826],
       [ 43.6505712, -79.3845675],
       [ 43.6408157, -79.3817523],
       [ 43.6471768, -79.3815764],
       [ 43.6481985, -79.3798169],
       [ 43.7116948, -79.4169356],
       [ 43.6969476, -79.4113072],
       [ 43.6727097, -79.4056784],
       [ 43.6626956, -79.4000493],
       [ 43.6532057, -79.4000493],
       [ 43.6289467, -79.3944199],
       [ 43.6464352,

In [73]:
# calculate distance between each two neighbourhoods
dist = np.reshape(np.sum(coordinates**2,axis=1),(coordinates.shape[0],1))+ np.sum(coordinates**2,axis=1)-2*coordinates.dot(coordinates.T)
dist

array([[ 0.00000000e+00,  3.50976507e-03,  5.62223046e-04, ...,
         2.74154991e-02,  3.72555042e-02,  9.99105698e-04],
       [ 3.50976507e-03,  0.00000000e+00,  1.45224479e-03, ...,
         1.17807157e-02,  1.82764544e-02,  1.22088059e-03],
       [ 5.62223046e-04,  1.45224479e-03,  0.00000000e+00, ...,
         2.02130731e-02,  2.88236119e-02,  7.49570063e-05],
       ...,
       [ 2.74154991e-02,  1.17807157e-02,  2.02130731e-02, ...,
        -3.63797881e-12,  7.97826797e-04,  1.83521485e-02],
       [ 3.72555042e-02,  1.82764544e-02,  2.88236119e-02, ...,
         7.97826797e-04,  0.00000000e+00,  2.66586137e-02],
       [ 9.99105698e-04,  1.22088059e-03,  7.49570063e-05, ...,
         1.83521485e-02,  2.66586137e-02,  0.00000000e+00]])

In [94]:
distance=pd.DataFrame(dist)
distance

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,28,29,30,31,32,33,34,35,36,37
0,0.0,0.00351,0.000562,0.002577,0.01183884,0.010766,0.014212,0.009952,0.008298,0.011554,...,0.007589,0.008745,0.016825,0.022323,0.016866,0.019829,0.029709,0.0274155,0.037256,0.000999
1,0.00351,0.0,0.001452,0.000528,0.003688415,0.002547,0.004145,0.001953,0.00106,0.002338,...,0.001610445,0.001874,0.005053,0.008224,0.005565,0.007601,0.012995,0.01178072,0.018276,0.001221
2,0.000562,0.001452,0.0,0.000732,0.008844545,0.007483,0.010271,0.006609,0.004992,0.00744,...,0.004022557,0.004873,0.011448,0.01605,0.011297,0.013717,0.022313,0.02021307,0.028824,7.5e-05
3,0.002577,0.000528,0.000732,0.0,0.006982824,0.005261,0.007313,0.004298,0.002687,0.004219,...,0.001322126,0.001834,0.006766,0.010359,0.006348,0.00813,0.015341,0.01342926,0.020663,0.000385
4,0.011839,0.003688,0.008845,0.006983,3.637979e-12,0.000235,0.000445,0.000562,0.00151,0.001858,...,0.006850599,0.006377,0.00456,0.006342,0.007374,0.009865,0.010183,0.01081157,0.014995,0.008781
5,0.010766,0.002547,0.007483,0.005261,0.0002351354,0.0,0.000247,7.3e-05,0.000587,0.000791,...,0.004633467,0.0042,0.002915,0.004624,0.005076,0.007205,0.008176,0.008442189,0.012627,0.007212
6,0.014212,0.004145,0.010271,0.007313,0.000444911,0.000247,0.0,0.000408,0.001173,0.000871,...,0.005704491,0.00503,0.002387,0.003489,0.004748,0.006675,0.006383,0.006977186,0.010277,0.009847
7,0.009952,0.001953,0.006609,0.004298,0.0005615052,7.3e-05,0.000408,0.0,0.000249,0.000448,...,0.003545597,0.003167,0.00235,0.004106,0.004139,0.006106,0.007597,0.007626213,0.011934,0.006249
8,0.008298,0.00106,0.004992,0.002687,0.001509809,0.000587,0.001173,0.000249,0.0,0.000295,...,0.001930103,0.001694,0.001954,0.003916,0.003073,0.004808,0.007441,0.007002678,0.011704,0.004515
9,0.011554,0.002338,0.00744,0.004219,0.001858012,0.000791,0.000871,0.000448,0.000295,0.0,...,0.002233375,0.001758,0.000792,0.002085,0.001869,0.003249,0.004803,0.004569652,0.008337,0.006721


In [95]:
distance.index=Toronto_data['Neighborhood']
distance.columns=Toronto_data['Neighborhood']
#distance.reset_index(drop=True,inplace=True)
#distance.index=Toronto_data['Neighborhood']
distance

Neighborhood,The Beaches,"The Danforth West, Riverdale","The Beaches West, India Bazaar",Studio District,Lawrence Park,Davisville North,North Toronto West,Davisville,"Moore Park, Summerhill East","Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West",...,Stn A PO Boxes 25 The Esplanade,"First Canadian Place, Underground city",Christie,"Dovercourt Village, Dufferin","Little Portugal, Trinity","Brockton, Exhibition Place, Parkdale Village","High Park, The Junction South","Parkdale, Roncesvalles","Runnymede, Swansea",Business reply mail Processing Centre969 Eastern
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
The Beaches,0.0,0.00351,0.000562,0.002577,0.01183884,0.010766,0.014212,0.009952,0.008298,0.011554,...,0.007589,0.008745,0.016825,0.022323,0.016866,0.019829,0.029709,0.0274155,0.037256,0.000999
"The Danforth West, Riverdale",0.00351,0.0,0.001452,0.000528,0.003688415,0.002547,0.004145,0.001953,0.00106,0.002338,...,0.001610445,0.001874,0.005053,0.008224,0.005565,0.007601,0.012995,0.01178072,0.018276,0.001221
"The Beaches West, India Bazaar",0.000562,0.001452,0.0,0.000732,0.008844545,0.007483,0.010271,0.006609,0.004992,0.00744,...,0.004022557,0.004873,0.011448,0.01605,0.011297,0.013717,0.022313,0.02021307,0.028824,7.5e-05
Studio District,0.002577,0.000528,0.000732,0.0,0.006982824,0.005261,0.007313,0.004298,0.002687,0.004219,...,0.001322126,0.001834,0.006766,0.010359,0.006348,0.00813,0.015341,0.01342926,0.020663,0.000385
Lawrence Park,0.011839,0.003688,0.008845,0.006983,3.637979e-12,0.000235,0.000445,0.000562,0.00151,0.001858,...,0.006850599,0.006377,0.00456,0.006342,0.007374,0.009865,0.010183,0.01081157,0.014995,0.008781
Davisville North,0.010766,0.002547,0.007483,0.005261,0.0002351354,0.0,0.000247,7.3e-05,0.000587,0.000791,...,0.004633467,0.0042,0.002915,0.004624,0.005076,0.007205,0.008176,0.008442189,0.012627,0.007212
North Toronto West,0.014212,0.004145,0.010271,0.007313,0.000444911,0.000247,0.0,0.000408,0.001173,0.000871,...,0.005704491,0.00503,0.002387,0.003489,0.004748,0.006675,0.006383,0.006977186,0.010277,0.009847
Davisville,0.009952,0.001953,0.006609,0.004298,0.0005615052,7.3e-05,0.000408,0.0,0.000249,0.000448,...,0.003545597,0.003167,0.00235,0.004106,0.004139,0.006106,0.007597,0.007626213,0.011934,0.006249
"Moore Park, Summerhill East",0.008298,0.00106,0.004992,0.002687,0.001509809,0.000587,0.001173,0.000249,0.0,0.000295,...,0.001930103,0.001694,0.001954,0.003916,0.003073,0.004808,0.007441,0.007002678,0.011704,0.004515
"Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West",0.011554,0.002338,0.00744,0.004219,0.001858012,0.000791,0.000871,0.000448,0.000295,0.0,...,0.002233375,0.001758,0.000792,0.002085,0.001869,0.003249,0.004803,0.004569652,0.008337,0.006721


In [126]:
# get maximal distance
dmax=distance.max()
dmax=dmax.max()
dmax

0.037255504161294084

In [127]:
# use maximum to normalize the distance
distance=distance/dmax
distance

Neighborhood,The Beaches,"The Danforth West, Riverdale","The Beaches West, India Bazaar",Studio District,Lawrence Park,Davisville North,North Toronto West,Davisville,"Moore Park, Summerhill East","Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West",...,Stn A PO Boxes 25 The Esplanade,"First Canadian Place, Underground city",Christie,"Dovercourt Village, Dufferin","Little Portugal, Trinity","Brockton, Exhibition Place, Parkdale Village","High Park, The Junction South","Parkdale, Roncesvalles","Runnymede, Swansea",Business reply mail Processing Centre969 Eastern
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
The Beaches,0.0,0.094208,0.015091,0.069169,0.3177743,0.288972,0.381485,0.267126,0.222729,0.310128,...,0.2037014,0.23474,0.451614,0.599189,0.452708,0.532253,0.797451,0.7358778,1.0,0.026818
"The Danforth West, Riverdale",0.094208,0.0,0.038981,0.014177,0.09900322,0.068354,0.111252,0.052425,0.028442,0.062748,...,0.04322704,0.050314,0.135632,0.220751,0.149376,0.204014,0.348817,0.3162141,0.490571,0.03277
"The Beaches West, India Bazaar",0.015091,0.038981,0.0,0.01966,0.2374024,0.200865,0.275685,0.177393,0.133981,0.199695,...,0.1079722,0.130803,0.307273,0.430803,0.303233,0.368185,0.598912,0.5425527,0.773674,0.002012
Studio District,0.069169,0.014177,0.01966,0.0,0.1874307,0.141212,0.196303,0.115371,0.072121,0.11324,...,0.03548807,0.049215,0.181598,0.27805,0.170396,0.218225,0.411772,0.3604637,0.554636,0.010344
Lawrence Park,0.317774,0.099003,0.237402,0.187431,9.764943e-11,0.006311,0.011942,0.015072,0.040526,0.049872,...,0.1838815,0.171173,0.122409,0.170224,0.197917,0.264794,0.273316,0.2902007,0.402501,0.235702
Davisville North,0.288972,0.068354,0.200865,0.141212,0.006311426,0.0,0.006619,0.001959,0.015748,0.021226,...,0.12437,0.112735,0.078233,0.12412,0.136236,0.193393,0.219448,0.2266025,0.338918,0.193585
North Toronto West,0.381485,0.111252,0.275685,0.196303,0.01194215,0.006619,0.0,0.010938,0.03149,0.023379,...,0.1531181,0.135023,0.064059,0.093654,0.127455,0.179162,0.171325,0.1872793,0.275852,0.264314
Davisville,0.267126,0.052425,0.177393,0.115371,0.01507174,0.001959,0.010938,0.0,0.006691,0.012015,...,0.09516976,0.084998,0.063091,0.110223,0.111103,0.163885,0.203905,0.2047003,0.320322,0.167736
"Moore Park, Summerhill East",0.222729,0.028442,0.133981,0.072121,0.04052581,0.015748,0.03149,0.006691,0.0,0.007925,...,0.05180719,0.045462,0.052447,0.105107,0.082493,0.129054,0.199735,0.1879636,0.314153,0.121181
"Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West",0.310128,0.062748,0.199695,0.11324,0.04987214,0.021226,0.023379,0.012015,0.007925,0.0,...,0.05994751,0.0472,0.021245,0.055957,0.050174,0.0872,0.128924,0.1226571,0.22379,0.180405


In [148]:
#distance.iloc[0,1]
#distance.loc['The Beaches','The Danforth West, Riverdale']

In [132]:
# calculate Score
Toronto_data['Score']=None
for i in range(0,distance.shape[0]):
    score=0
    for j in range(0,distance.shape[1]):
        # Score = sum ( D * QD )
        score+=distance.iloc[i,j]*Toronto_grouped.loc[j,'Coffee Shop']
    Toronto_data.loc[i,'Score']=score
Toronto_data

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Score
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,0.755768
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,0.259955
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572,0.510239
3,M4M,East Toronto,Studio District,43.659526,-79.340923,0.3144
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,0.375779
5,M4P,Central Toronto,Davisville North,43.712751,-79.390197,0.276324
6,M4R,Central Toronto,North Toronto West,43.715383,-79.405678,0.298436
7,M4S,Central Toronto,Davisville,43.704324,-79.38879,0.234442
8,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316,0.185425
9,M4V,Central Toronto,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",43.686412,-79.400049,0.172161


In [147]:
# sort the neighbourhood by Score
Score=Toronto_data.copy()
Score.sort_values(by =['Score'],axis = 0,ascending = True)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Score
25,M5S,Downtown Toronto,"Harbord, University of Toronto",43.662696,-79.400049,0.147527
17,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,0.151137
12,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316,0.151916
24,M5R,Central Toronto,"The Annex, North Midtown, Yorkville",43.67271,-79.405678,0.155659
26,M5T,Downtown Toronto,"Chinatown, Grange Park, Kensington Market",43.653206,-79.400049,0.15632
14,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937,0.162824
18,M5H,Downtown Toronto,"Adelaide, King, Richmond",43.650571,-79.384568,0.1639
29,M5X,Downtown Toronto,"First Canadian Place, Underground city",43.648429,-79.38228,0.170912
9,M4V,Central Toronto,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",43.686412,-79.400049,0.172161
10,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529,0.172314


The neighbourhood **'Harbord, University of Toronto'** gets smallest score!

### 3.4 display the location in map

In [153]:
# import libraries
# from geopy.geocoders import Nominatim
from sklearn.cluster import KMeans
import folium

In [154]:
from geopy.geocoders import Nominatim
# get the geographical coordinates of Toronto
address = 'Toronto, CA'

geolocator = Nominatim()
location = geolocator.geocode(address)
Toronto_lat = location.latitude
Toronto_lng= location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(Toronto_lat, Toronto_lng))



The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [156]:
# visualizat Toronto the neighborhoods in it

# create map of Toronto using latitude and longitude values
map_Toronto = folium.Map(location=[Toronto_lat, Toronto_lng], zoom_start=12)

# add markers to map
for lat, lng, label in zip(Toronto_data['Latitude'], Toronto_data['Longitude'], Toronto_data['Neighborhood']):
    if label=='Harbord, University of Toronto':
        cc='red'
    else:
        cc='blue'
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=cc,
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Toronto)  
    
map_Toronto

In [157]:
map_Toronto.save('map_Toronto.html')

## 4. Results

1) Coffee shop appears frequently.  
2) Neighborhood 'Harbord, University of Toronto' has lowest value (0.147527) in parameter 'Score' we define.   

## 5. Disscusion

1) Coffee shop is the most popular type of restaurant. It also means larger quantity demanded.   
2) To make all customers who like coffee shop assess it conveniently, the value of the parameter 'Score' should be as low as it could be. So we choose the Neighborhood 'Harbord, University of Toronto' to open a coffee shop as it has lowest value 0.147527.  
3) It can be also shown that the Neighborhood 'Harbord, University of Toronto' locates in the center of Toronto, which makes customers from all other neighborhood easy to approach it.

## 6. Conclusion

According to analysis, we would recommend that you could open a coffee shop in the Neighborhood 'Harbord, University of Toronto'.