# Capstone Project - The Battle of Neighborhoods

## Table of Contents

* Introduction & Business Problem
* Data
* Analysis
* Results and Discussion
* Conclusion

### Introduction & Business Problem

#### Introduction
Hong Kong is one of the most densely populated places in the world, with an overall density of some 6,300 people per square kilometre.. The territory has become one of the world's most significant financial centres and commercial ports.Ethnically, Hong Kong mainly consists of Han Chinese who constitute approximately 92% of the population. Of these, many originate from various regions in Canton. There are also a number of descendants of immigrants from elsewhere in Southern China and around the world after the end of World War II.
#### Business Problem
In this project we will try to find an optimal location for a restaurant.Specifically, this report will be targeted to stakeholders interested in opening an **Cantonese restaurant in Hong Kong**.  
Although there are so many restaurants in Hong Kong. We found there are few restaurant crowed particularly the Cantonese restaurant. We are supposed to find a **location where is close to the city center as possible**, Also there are **no similar restaurant nearby or the restaurant with bad reputation**. 

### Data

Based on definition of our problem, factors that will influence our decission are:  
* number of existing restaurants in the neighborhood (any type of restaurant)
* number of and distance to Cantonese restaurants in the neighborhood, if any
* distance of neighborhood from city center  

Following data sources will be needed to extract/generate the required information:  
* number of restaurants and their type and location in every neighborhood will be obtained using Foursquare API
* coordinate of Hong Kong center will be obtained using *geopy.geocoders* of well known Hong Kong location (Central)

#### Neighborhood Candidates

As the beginning,we have to find the location of the city center. We will use FourSquare API to extract the data what we need surrounded by the Central.  
First we get the latitude & longitude of Central.

#### Import necessary libraries

In [3]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
print('Folium installed')
print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    geopy-1.21.0               |             py_0          58 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1.21.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

#### Let's again assume that you are staying at HongKong.

In [4]:
address = 'Central,Hong Kong'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

22.2850394 114.1583819


#### Search for a specific venue category
1. Make sure that you have created a Foursquare developer account and have your credentials handy

In [5]:
CLIENT_ID = '5ESZXISFGD2ZO5NQC0DUKHXTYIT4PQT0U01WQYLD1PLEITLW' # your Foursquare ID
CLIENT_SECRET = 'EJFLXNIT4T52K22ZWDJBV0CSMMZCFC2VHKC2XWXAWZXUJAKN' # your Foursquare Secret
VERSION = '20200210'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 5ESZXISFGD2ZO5NQC0DUKHXTYIT4PQT0U01WQYLD1PLEITLW
CLIENT_SECRET:EJFLXNIT4T52K22ZWDJBV0CSMMZCFC2VHKC2XWXAWZXUJAKN


2.Get relevant part of JSON and transform it into a *pandas* dataframe

In [6]:
search_query = 'Food'
LIMIT = 100
radius = 1000
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
results = requests.get(url).json()

In [7]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,referralId,venuePage.id
0,"[{'id': '4bf58dd8d48988d1f5941735', 'name': 'G...",False,537dedb3498e9239c5267673,"Kinwick Centre, Shelley St",HK,中区,香港,,623,"[Kinwick Centre, Shelley St, 香港]","[{'label': 'display', 'lat': 22.28200948348554...",22.282009,114.153296,,,,Marks & Spencer Food,v-1581429216,
1,"[{'id': '4bf58dd8d48988d1c4941735', 'name': 'R...",False,4ccfa2326449a09306f9c5cf,Man Cheung St.,HK,中環,香港,"Hong Kong Station, Airport Express",0,"[Man Cheung St. (Hong Kong Station, Airport Ex...","[{'label': 'display', 'lat': 22.28504661513501...",22.285047,114.15838,,,中西區,Food Expression (現代煮意),v-1581429216,
2,"[{'id': '52f2ab2ebcbc57f1066b8b46', 'name': 'S...",False,4bc541bf5935c9b6bcc9a5d2,"B/F, Pacific Place, 88 Queensway",HK,中環,香港,,1026,"[B/F, Pacific Place, 88 Queensway, 中西區, 香港]","[{'label': 'display', 'lat': 22.27792296623498...",22.277923,114.164712,,,中西區,Great Food Hall,v-1581429216,
3,"[{'id': '52f2ab2ebcbc57f1066b8b46', 'name': 'S...",False,533fff61498e424c5874d29e,"Shop 206, Shun Tak Centre",HK,Sheung Wan,香港,,633,"[Shop 206, Shun Tak Centre, Hong Kong, 香港]","[{'label': 'display', 'lat': 22.28779568111136...",22.287796,114.152997,,,Hong Kong,Marks & Spencer Food,v-1581429216,
4,"[{'id': '4bf58dd8d48988d1d3941735', 'name': 'V...",False,4f56e921e4b01d8a92a28b70,92 Wellington St,HK,中環,香港,btwn Cochrain & Pottinger St,450,[92 Wellington St (btwn Cochrain & Pottinger S...,"[{'label': 'display', 'lat': 22.28292072899061...",22.282921,114.154651,,,中西區,Mana! Fast Slow Food,v-1581429216,67771409.0


#### Define information of interest and filter dataframe

In [8]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,Marks & Spencer Food,Gourmet Shop,"Kinwick Centre, Shelley St",HK,中区,香港,,623,"[Kinwick Centre, Shelley St, 香港]","[{'label': 'display', 'lat': 22.28200948348554...",22.282009,114.153296,,,,537dedb3498e9239c5267673
1,Food Expression (現代煮意),Restaurant,Man Cheung St.,HK,中環,香港,"Hong Kong Station, Airport Express",0,"[Man Cheung St. (Hong Kong Station, Airport Ex...","[{'label': 'display', 'lat': 22.28504661513501...",22.285047,114.15838,,,中西區,4ccfa2326449a09306f9c5cf
2,Great Food Hall,Supermarket,"B/F, Pacific Place, 88 Queensway",HK,中環,香港,,1026,"[B/F, Pacific Place, 88 Queensway, 中西區, 香港]","[{'label': 'display', 'lat': 22.27792296623498...",22.277923,114.164712,,,中西區,4bc541bf5935c9b6bcc9a5d2
3,Marks & Spencer Food,Supermarket,"Shop 206, Shun Tak Centre",HK,Sheung Wan,香港,,633,"[Shop 206, Shun Tak Centre, Hong Kong, 香港]","[{'label': 'display', 'lat': 22.28779568111136...",22.287796,114.152997,,,Hong Kong,533fff61498e424c5874d29e
4,Mana! Fast Slow Food,Vegetarian / Vegan Restaurant,92 Wellington St,HK,中環,香港,btwn Cochrain & Pottinger St,450,[92 Wellington St (btwn Cochrain & Pottinger S...,"[{'label': 'display', 'lat': 22.28292072899061...",22.282921,114.154651,,,中西區,4f56e921e4b01d8a92a28b70
5,Chiu Lung Fast Food (昭隆美食),Chinese Restaurant,21-23 Chiu Lung St,HK,中環,香港,,313,"[21-23 Chiu Lung St, 中西區, 香港]","[{'label': 'display', 'lat': 22.28265943293749...",22.282659,114.156753,Central,,中西區,4ca6a13cb0b8236a4b96a1e6
6,nood food,Juice Bar,"2/F Asia Standard Tower, 59-65 Queen's Road C...",HK,中環,香港,59-65 Queen's Road Central,363,"[2/F Asia Standard Tower, 59-65 Queen's Road ...","[{'label': 'display', 'lat': 22.28308755592456...",22.283088,114.155551,,,中西區,522b194711d2d5f6266289c5
7,Soul Food,Thai Restaurant,28 Elgin Street,HK,香港,香港,,713,"[28 Elgin Street, 香港]","[{'label': 'display', 'lat': 22.28166764962636...",22.281668,114.152495,,,,5791f50f38fab55334b8cb05
8,Sun Hing Fast Food (新興美食),Asian Restaurant,中環昭隆街9號B舖地下,HK,,香港,,328,"[中環昭隆街9號B舖地下, 香港]","[{'label': 'display', 'lat': 22.28252120232423...",22.282521,114.156717,,,,4d58ca847e2237040d59a973
9,Observation Wheel Food & Drink,Creperie,,HK,,香港,,359,[香港],"[{'label': 'display', 'lat': 22.285526, 'lng':...",22.285526,114.161833,,,,5683d3a2498ec2f0d370de68


#### visualize the results 

In [11]:
dataframe_filtered.name

0                          Marks & Spencer Food
1                        Food Expression (現代煮意)
2                               Great Food Hall
3                          Marks & Spencer Food
4                    Chiu Lung Fast Food (昭隆美食)
5                          Mana! Fast Slow Food
6                                     nood food
7                     Sun Hing Fast Food (新興美食)
8                        Vietnam Fast Food (越僑)
9                Observation Wheel Food & Drink
10       Sheung Wan Cooked Food Centre (上環熟食中心)
11                                    Soul Food
12                     Master Food Company 匯業餐廳
13    Queen Street Cooked Food Market (皇后街熟食市場)
14                 Good Luck Thai Food (鴻運泰國美食)
15              Greenixs Korea Food Court (綠味亭)
16                         Marks & Spencer Food
17                     Yuk Kin Fast Food (郁健快餐)
18                    Delicious Food Shop 東方小食店
19                                    Nood Food
20                                    no

In [34]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the Central
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Conrad Hotel',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

convert to *pd.DataFrame* and filter the columns specifically

In [10]:
dataframe.shape

(50, 19)

In [11]:
hk_df = pd.DataFrame(dataframe_filtered)
hk_df.head()

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,Marks & Spencer Food,Gourmet Shop,"Kinwick Centre, Shelley St",HK,中区,香港,,623,"[Kinwick Centre, Shelley St, 香港]","[{'label': 'display', 'lat': 22.28200948348554...",22.282009,114.153296,,,,537dedb3498e9239c5267673
1,Food Expression (現代煮意),Restaurant,Man Cheung St.,HK,中環,香港,"Hong Kong Station, Airport Express",0,"[Man Cheung St. (Hong Kong Station, Airport Ex...","[{'label': 'display', 'lat': 22.28504661513501...",22.285047,114.15838,,,中西區,4ccfa2326449a09306f9c5cf
2,Great Food Hall,Supermarket,"B/F, Pacific Place, 88 Queensway",HK,中環,香港,,1026,"[B/F, Pacific Place, 88 Queensway, 中西區, 香港]","[{'label': 'display', 'lat': 22.27792296623498...",22.277923,114.164712,,,中西區,4bc541bf5935c9b6bcc9a5d2
3,Marks & Spencer Food,Supermarket,"Shop 206, Shun Tak Centre",HK,Sheung Wan,香港,,633,"[Shop 206, Shun Tak Centre, Hong Kong, 香港]","[{'label': 'display', 'lat': 22.28779568111136...",22.287796,114.152997,,,Hong Kong,533fff61498e424c5874d29e
4,Mana! Fast Slow Food,Vegetarian / Vegan Restaurant,92 Wellington St,HK,中環,香港,btwn Cochrain & Pottinger St,450,[92 Wellington St (btwn Cochrain & Pottinger S...,"[{'label': 'display', 'lat': 22.28292072899061...",22.282921,114.154651,,,中西區,4f56e921e4b01d8a92a28b70


Finishing the dataframe what we need next step.

In [24]:
hk_df_new = hk_df.drop(['formattedAddress','labeledLatLngs','crossStreet','city','country','neighborhood','postalCode'],axis=1)
hk_df_new.head()

Unnamed: 0,name,categories,address,cc,distance,lat,lng,state,id
0,Marks & Spencer Food,Gourmet Shop,"Kinwick Centre, Shelley St",HK,623,22.282009,114.153296,,537dedb3498e9239c5267673
1,Food Expression (現代煮意),Restaurant,Man Cheung St.,HK,0,22.285047,114.15838,中西區,4ccfa2326449a09306f9c5cf
2,Great Food Hall,Supermarket,"B/F, Pacific Place, 88 Queensway",HK,1026,22.277923,114.164712,中西區,4bc541bf5935c9b6bcc9a5d2
3,Marks & Spencer Food,Supermarket,"Shop 206, Shun Tak Centre",HK,633,22.287796,114.152997,Hong Kong,533fff61498e424c5874d29e
4,Mana! Fast Slow Food,Vegetarian / Vegan Restaurant,92 Wellington St,HK,450,22.282921,114.154651,中西區,4f56e921e4b01d8a92a28b70


#### Methodology
In this project we will direct our efforts on **detecting areas that have low restaurant density**, particularly those with low number of Chinese restaurants,which is smilar restaurants. We will limit our analysis to area ~500m around center.

In first step we have collected the required data: **location and type (category) of every restaurant within 600 from Central.** We have also identified Chinese restaurants (according to Foursquare categorization).

Second step in our analysis will be **visualize the map** and exploration of 'restaurant density' to identify a few promising areas close to center with low number of restaurants in general and focus our attention on those areas.

In third and final step we will focus on most promising areas and within those create clusters of locations that meet some basic requirements established in discussion with stakeholders: we will take into consideration locations with no more than two restaurants in radius of 30 meters, and we want locations without Italian restaurants in radius of 100 meters. We will present map of all such locations but also create clusters (using **k-means clustering**) of those locations to identify general zones / neighborhoods / addresses which should be a starting point for final 'street level' exploration and search for optimal venue location by stakeholders.

### Analysis
Let's perform some basic explanatory data analysis and derive some additional info from our raw data. First let's **count the number of restaurants** in area **within 600m**:

In [25]:
nearby_hk_df = hk_df_new[hk_df_new.distance<= 600  ]

In [21]:
nearby_hk_df.groupby("categories").distance.agg(['count','mean','max','min'])

Unnamed: 0_level_0,count,mean,max,min
categories,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Asian Restaurant,1,328.0,328,328
Bar,1,470.0,470,470
Café,1,598.0,598,598
Chinese Restaurant,2,390.5,468,313
Creperie,1,359.0,359,359
Department Store,1,426.0,426,426
Diner,1,206.0,206,206
Grocery Store,2,530.0,596,464
Juice Bar,3,339.333333,575,80
Korean Restaurant,1,586.0,586,586


From the data,we can see there are **12 restaurants** including the Chinese restaurant which we regard as the similar restaurant to Cantonese restaurant.The similar restaurant such as Chinese Restaurant or undefined Restaurants we should pick them up and avoid them as possible.So we can pick the range between **~100m and ~200m** or **~500 and ~600m**.

#### visualize the nearby_hk_df

In [35]:
map_nearby_hk_df = folium.Map(location=[latitude, longitude], zoom_start=15)
hk_center = [latitude, longitude]

for lat, lng, label in zip(hk_df_new.lat, hk_df_new.lng, nearby_hk_df.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(map_nearby_hk_df)
folium.Marker(hk_center).add_to(map_nearby_hk_df)
folium.Circle(hk_center, radius=200, fill=False, color='white').add_to(map_nearby_hk_df)
folium.Circle(hk_center, radius=300, fill=False, color='white').add_to(map_nearby_hk_df)
folium.Circle(hk_center, radius=600, fill=False, color='white').add_to(map_nearby_hk_df)
folium.Circle(hk_center, radius=700, fill=False, color='white').add_to(map_nearby_hk_df)
# display map
map_nearby_hk_df

#### Cluster

In [27]:
from sklearn.cluster import KMeans

kclusters = 3

hk_df_new_clustering = nearby_hk_df.drop(["name", "categories", "address","state","id","cc"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(hk_df_new_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:5]

array([2, 1, 0, 0, 0], dtype=int32)

In [28]:
hk_merged = nearby_hk_df.copy()

# add clustering labels
hk_merged["Cluster Labels"] = kmeans.labels_


print(hk_merged.shape)
hk_merged.head() # check the last columns!

(22, 10)


Unnamed: 0,name,categories,address,cc,distance,lat,lng,state,id,Cluster Labels
1,Food Expression (現代煮意),Restaurant,Man Cheung St.,HK,0,22.285047,114.15838,中西區,4ccfa2326449a09306f9c5cf,2
4,Mana! Fast Slow Food,Vegetarian / Vegan Restaurant,92 Wellington St,HK,450,22.282921,114.154651,中西區,4f56e921e4b01d8a92a28b70,1
5,Chiu Lung Fast Food (昭隆美食),Chinese Restaurant,21-23 Chiu Lung St,HK,313,22.282659,114.156753,中西區,4ca6a13cb0b8236a4b96a1e6,0
6,nood food,Juice Bar,"2/F Asia Standard Tower, 59-65 Queen's Road C...",HK,363,22.283088,114.155551,中西區,522b194711d2d5f6266289c5,0
8,Sun Hing Fast Food (新興美食),Asian Restaurant,中環昭隆街9號B舖地下,HK,328,22.282521,114.156717,,4d58ca847e2237040d59a973,0


In [None]:
print(hk_merged.shape)
hk_merged.sort_values(["Cluster Labels"], inplace=True)

#### Finally, let's visualize the resulting clusters

In [36]:
# create map
import matplotlib.cm as cm
import matplotlib.colors as colors

map_clusters = folium.Map(location= hk_center, zoom_start=16)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, name, categories, cluster in zip(hk_merged['lat'], hk_merged['lng'], hk_merged['name'], hk_merged['categories'],hk_merged['Cluster Labels']):
    label = folium.Popup('{} - Cluster {}'.format(name, cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
folium.Circle(hk_center, radius=200, fill=False, color='white').add_to(map_clusters)
folium.Circle(hk_center, radius=300, fill=False, color='white').add_to(map_clusters)
folium.Circle(hk_center, radius=600, fill=False, color='white').add_to(map_clusters)
folium.Circle(hk_center, radius=700, fill=False, color='white').add_to(map_clusters)      
map_clusters

#### Examine Clusters

#### Cluster 1

In [105]:
hk_merged.loc[hk_merged['Cluster Labels'] == 0, hk_merged.columns[[1] + list(range(5, hk_merged.shape[1]))]]

Unnamed: 0,categories,lat,lng,state,id,Cluster Labels
1,Restaurant,22.285047,114.15838,中西區,4ccfa2326449a09306f9c5cf,0
40,Restaurant,22.284725,114.157844,中西區,5976d56d002f4c4336fed1ba,0
20,Juice Bar,22.284978,114.159157,,5ab4c8891987ec33c08ba3c2,0


#### Cluster 2

In [33]:
hk_merged.loc[hk_merged['Cluster Labels'] == 1, hk_merged.columns[[1] + list(range(5, hk_merged.shape[1]))]]

Unnamed: 0,categories,lat,lng,state,id,Cluster Labels
45,Department Store,22.28144,114.156964,中西區,4b0588ddf964a520a4dd22e3,1
28,Market,22.283528,114.153953,,5129df90e4b0fd966410da3b,1
27,Bar,22.283289,114.154229,,4b0588cdf964a520b4da22e3,1
25,Wine Shop,22.280999,114.159417,,4c09e137009a0f47fe9ee8bf,1
22,Juice Bar,22.280751,114.155257,,560cb17f498ebf43db58949a,1
16,Korean Restaurant,22.285231,114.152688,,533a41d0498ef2793daea5ed,1
34,Café,22.285315,114.152576,Hong Kong,5a72b03f8ad62e6caa9f206f,1
29,Grocery Store,22.281053,114.159719,,55669c49498e2f10228b956a,1
13,Thai Restaurant,22.281165,114.155296,,4b0588d6f964a5207adc22e3,1
12,Chinese Restaurant,22.283723,114.154065,中西區,4e7c1ea3b8f75b2042a7c7e3,1


#### Cluster 3

In [32]:
hk_merged.loc[hk_merged['Cluster Labels'] == 2, hk_merged.columns[[1] + list(range(5, hk_merged.shape[1]))]]

Unnamed: 0,categories,lat,lng,state,id,Cluster Labels
1,Restaurant,22.285047,114.15838,中西區,4ccfa2326449a09306f9c5cf,2
42,Restaurant,22.284725,114.157844,中西區,5976d56d002f4c4336fed1ba,2
20,Juice Bar,22.284978,114.159157,,5ab4c8891987ec33c08ba3c2,2


### Results and Discussion 

Our analysis shows that although there is a great number of restaurants in this area, there are pockets of low restaurant density fairly close to city center. Highest concentration of restaurants was detected south and south-west from Central, so we focused our attention to areas south-east and east. Another area was identified as potentially interesting (the north and north -east ), but our attention was focused on **south-east and east of the Cluster2 area** which offer a combination of popularity among tourists, closeness to Central, strong socio-economic dynamics and a number of pockets of low restaurant density.


Those location candidates were then clustered to create zones of interest which contain greatest number of location candidates. Addresses of centers of those zones were also generated using reverse geocoding to be used as markers/starting points for more detailed local analysis based on other factors.

Result of all this is Cluster2 area containing largest number of potential new restaurant locations based on number of and distance to existing venues - both restaurants in general and Chinese restaurants particularly.This, of course, does not imply that those zones are actually optimal locations for a new restaurant! Purpose of this analysis was to only provide info on areas close to Central but not crowded with existing restaurants(particularly Cantonese) - it is entirely possible that there is a very good reason for small number of restaurants in any of those areas, reasons which would make them unsuitable for a new restaurant regardless of lack of competition in the area.Recommended zones should therefore be considered only as a starting point for more detailed analysis which could eventually result in location which has not only no nearby competition but also other factors taken into account and all other relevant conditions met.

### Conclusion

Purpose of this project was to identify Hong Kong areas close to center with low number of restaurants (particularly Cantonese restaurants) in order to aid stakeholders in narrowing down the search for optimal location for a new Cantonese restaurant.By calculating restaurant density distribution from Foursquare data we have first identified general boroughs that justify further analysis (what we mentioned above), and then generated extensive collection of locations which satisfy some basic requirements regarding existing nearby restaurants.Clustering of those locations was then performed in order to create major zones of interest (containing greatest number of potential locations) and addresses of those zone centers were created to be used as starting points for final exploration by stakeholders.  

Final decission on optimal restaurant location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location (proximity to park or water), levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.