## Capstone Project - Finding the best area for openning Thai restaurant in Sydney

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

The audience is the stakeholders in the Thai restaurant business looking to open a new Thai restaurant in Sydney so they would like us to help them find the best locations that match their criteria.

The audience cares about this problem because Sydney is a big city and there are thousands of restaurants spread across the continent. It wouldn't be practical for the stakeholder to drive around each area and collect information on the restaurants. Using data science and mapping data from Foursquare, the world's leader in location service, combined with the Google API power to identify the coordinates of the location, would help to achieve this task in a sensible and most efficient way.

Therefore, we are conducting this business research in order to discover the best location for opening a Thai restaurant in Sydney.

The criteria for making evaluation are from the findings in our previous market survey which became the basis for our criteria for an evaluation in this analysis. The three findings are:

1. There's a strong relationship between the population density and the restaurant density in any particular area. The higher population density the higher density of the restaurant accordingly. This means that we can know the population density by knowing the restaurant density in the area.

2. In the survey about food preferences, it was concluded that Thai food is the most popular food in Sydney and people will choose Thai food over other food 95% of the time.

3. From the survey responses, people think that Malaysian food is very similar to Thai food that it can be a substitute for each other. 90% of the respondents say that they are indifferent when it comes to choosing between Thai and Malaysian food.

Given the above findings, the criteria for finding the best area for opening a Thai restaurant in Sydney will be:

1. The area must not already have either Thai or Malaysian restaurant nearby.

2. Given the first condition is met, the area with more number of restaurants will be more promising than the area with a lower number of restaurants.

We will use python code to extract the information necessary for our analysis. We will then normalise and visualise the data to show all the areas candidate that is optimal for opening a Thai restaurant.

## Data <a name="data"></a>

To solve the business problem, we would be interested to know:
* number of existing restaurants in the neighborhood (any type of restaurant)
* number of existing Thai and Malaysian restaurants in the neighborhood, if any
* The distance of a certain location to Thai or Malaysian restaurants in the neighborhood, if any


we use Google API to obtain the latitute, longitute and addresses of the area we are evaluating. The data regarding the restaurant such as location, name and type will be from Foursquare. 

### Coming up with the center point of the whole area for evaluation

To draw the whole area we have to identify the center by pinpointing its latitude & longitude coordinates. In this case we take Sydney's Town hall as a center as this place has long been commonly regarded as the center of the Sydney. 


Starting from the center point, we will create a grid of cells covering our area of interest. Sydney is one of the biggest city in the world and we would like to make sure that the Northshore and area near airport in the south as well as the west coast area are covered, we take the parameter of 24x24 killometers of all area around the Sydney's center, Townhall. 


In [152]:
google_api_key = 'AIzaSyBVLyjMijTLe0uwx4Ge-nMvJsymxmNYAfg'

In [153]:
import requests

def get_coordinates(api_key, address, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, address)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        geographical_data = results[0]['geometry']['location'] # get geographical coordinates
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        return [None, None]
    
address = '483 George St, Sydney NSW 2000, Australia'
sydney_center = get_coordinates(google_api_key, address)
print('Coordinate of {}: {}'.format(address, sydney_center))

Coordinate of 483 George St, Sydney NSW 2000, Australia: [-33.873183, 151.2061462]


Having pinpointed the center, which is the adress of the Sydney town hall, we can now build a grid of of equal size in seperating the whole Sydney into different areas.

In [154]:
!pip install shapely
import shapely.geometry

!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Sydney center longitude={}, latitude={}'.format(sydney_center[1], sydney_center[0]))
x, y = lonlat_to_xy(sydney_center[1], sydney_center[0])
print('Sydney center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Sydney center longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Sydney center longitude=151.2061462, latitude=-33.873183
Sydney center UTM X=4677047.158513354, Y=-15238473.809824437
Sydney center longitude=151.20614619999998, latitude=-33.87318300000005


In [155]:
sydney_center_x, sydney_center_y = lonlat_to_xy(sydney_center[1], sydney_center[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = sydney_center_x - 12000
x_step = 1200
y_min = sydney_center_y - 12000 - (int(21/k)*k*1200 - 24000)/2
y_step = 1200 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 600 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(sydney_center_x, sydney_center_y, x, y)
        if (distance_from_center <= 12001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

364 candidate neighborhood centers generated.


Here's the grid.

In [156]:
!pip install folium

import folium



In [157]:
map_sydney = folium.Map(location=sydney_center, zoom_start=13)
folium.Marker(sydney_center, popup='Townhall').add_to(map_sydney)
for lat, lon in zip(latitudes, longitudes):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_sydney) 
    folium.Circle([lat, lon], radius=600, color='blue', fill=False).add_to(map_sydney)
    #folium.Marker([lat, lon]).add_to(map_sydney)
map_sydney

We then use Google Maps API to get approximate addresses of those locations and append them into our dataframe of the areas' info.

In [158]:
def get_address(api_key, latitude, longitude, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude, longitude)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        address = results[0]['formatted_address']
        return address
    except:
        return None

addr = get_address(google_api_key, sydney_center[0], sydney_center[1])
print('Reverse geocoding check')
print('-----------------------')
print('Address of [{}, {}] is: {}'.format(sydney_center[0], sydney_center[1], addr))

Reverse geocoding check
-----------------------
Address of [-33.873183, 151.2061462] is: 483 George St, Sydney NSW 2000, Australia


In [159]:
print('Obtaining location addresses: ', end='')
addresses = []
for lat, lon in zip(latitudes, longitudes):
    address = get_address(google_api_key, lat, lon)
    if address is None:
        address = 'NO ADDRESS'
    address = address.replace(', Australia', '') # We don't need country part of address
    addresses.append(address)
    print(' .', end='')
print(' done.')

Obtaining location addresses:  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [160]:
import pandas as pd

df_locations = pd.DataFrame({'Address': addresses,
                             'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center
0,"1 Addison Rd, Manly NSW 2095",-33.811318,151.281864,4673447.0,-15249910.0,11984.990613
1,"Reef Beach Track, Balgowlah Heights NSW 2093",-33.80715,151.272504,4674647.0,-15249910.0,11680.7534
2,"24 Condamine St, Balgowlah Heights NSW 2093",-33.802983,151.263147,4675847.0,-15249910.0,11494.346436
3,"5 Plant St, Balgowlah NSW 2093",-33.798815,151.253791,4677047.0,-15249910.0,11431.53533
4,"65 Peacock St, Seaforth NSW 2092",-33.794647,151.244438,4678247.0,-15249910.0,11494.346436
5,"79A Gurney Cres, Seaforth NSW 2092",-33.790479,151.235086,4679447.0,-15249910.0,11680.7534
6,"7 Bampi Pl, Castle Cove NSW 2069",-33.78631,151.225736,4680647.0,-15249910.0,11984.990613
7,"Hole in Wall Track, Manly NSW 2095",-33.824336,151.291582,4671647.0,-15248870.0,11711.532778
8,"Hospital Rd, Manly NSW 2095",-33.820168,151.28222,4672847.0,-15248870.0,11208.925015
9,Balgowlah NSW 2093,-33.816,151.27286,4674047.0,-15248870.0,10816.653826


In [161]:
df_locations.to_pickle('./locations.pkl')

### Foursquare

So now we have our location of each neighborhood candidate. let's use Foursquare API to get info on restaurants in each neighborhood.

We're interested in venues in 'food' category, however, the food category in foursquare will also include cafe, fast food, pizza and bakeries, which we are not our direcct competitor. When retrieving the restaurant list, we would like to exclude these non-direct competitor so we will use the code to filter them out by their key word.

On the other hand, we will include in out list only venues that have 'restaurant' in category name. Also we will scan for the restaurent in Thai restaurant category as well as Malaysian restaurant. We will treat Malaysian restaurant exactly the same as Thai restaurant (as if it is Thai restaurent) because our prior research shown that Thai and Malay have a very high similarty that it can be a complete substitute of each other. 


Let's state the Foursquare credential first. Then we move on to getting the restaurant venue.

In [162]:
client_id = '13NNS3MY3JQC3SPP4Z132YMMXWB2CQDTAWZOEZVTVGWQIX0T'
client_secret = 'MGLGSHOCACIBFOF5PTOW4GGFRCM1EWVQJUVR55QPLHZ1C2W3'

In [163]:
# Category IDs corresponding to Italian restaurants were taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):

food_category = '4d4b7105d754a06374d81259' # 'Root' category for all food-related venues

thai_restaurant_categories = ['4bf58dd8d48988d149941735','56aa371be4b08b9a8d573502','4bf58dd8d48988d156941735','5ae9595eb77c77002c2f9f26']

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse','cuisine']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific


def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', NSW', '')
    address = address.replace(', Australia', '')
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=600, limit=100):
    version = '20200303'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [164]:
# Let's now go over our neighborhood locations and get nearby restaurants; we'll also maintain a dictionary of all found restaurants and all found thai restaurants

import pickle

def get_restaurants(lats, lons):
    restaurants = {}
    thai_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        venues = get_venues_near_location(lat, lon, food_category, client_id, client_secret, radius=600, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_thai = is_restaurant(venue_categories, specific_filter=thai_restaurant_categories)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_thai, x, y)
                if venue_distance<=600:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_thai:
                    thai_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, thai_restaurants, location_restaurants

# Try to load from local file system in case we did this before
restaurants = {}
thai_restaurants = {}
location_restaurants = []
loaded = False
try:
    with open('restaurants_350.pkl', 'rb') as f:
        restaurants = pickle.load(f)
    with open('thai_restaurants_350.pkl', 'rb') as f:
        thai_restaurants = pickle.load(f)
    with open('location_restaurants_350.pkl', 'rb') as f:
        location_restaurants = pickle.load(f)
    print('Restaurant data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    restaurants, thai_restaurants, location_restaurants = get_restaurants(latitudes, longitudes)
    
    # Let's persists this in local file system
    with open('restaurants_350.pkl', 'wb') as f:
        pickle.dump(restaurants, f)
    with open('thai_restaurants_350.pkl', 'wb') as f:
        pickle.dump(thai_restaurants, f)
    with open('location_restaurants_350.pkl', 'wb') as f:
        pickle.dump(location_restaurants, f)
        

Restaurant data loaded.


In [165]:
import numpy as np

print('Total number of restaurants:', len(restaurants))
print('Total number of Thai plus Malay restaurants:', len(thai_restaurants))
print('Percentage of Thai plus Malay restaurants: {:.2f}%'.format(len(thai_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())


Total number of restaurants: 1838
Total number of Thai plus Malay restaurants: 256
Percentage of Thai plus Malay restaurants: 13.93%
Average number of restaurants in neighborhood: 1.3763736263736264


In [166]:
print('List of all restaurants')
print('-----------------------')
for r in list(restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(restaurants))


List of all restaurants
-----------------------
('4fe1abbae4b0fccda1268008', 'The Perky Pickle', -33.805127249031216, 151.26486287808711, 'Australia', 286, False, 4675539.023732124, -15249740.329566466)
('4c08e28effb8c9b60de26861', 'Bijolias', -33.79666549467954, 151.25245681605196, '5/538-540 Manly Rd., Balgowlah NSW 2093', 269, False, 4677317.609162321, -15250091.332525447)
('4b1e0d0ef964a520171724e3', 'Kohinoor Indian', -33.797266, 151.251201, '555 Sydney Rd., Seaforth NSW 2092', 295, False, 4677404.672134857, -15249952.594936864)
('4e424444b61ca5ba3b23d584', 'Balance Thai Restaurant', -33.79749411753701, 151.25073565755122, '7/50 Ethel St., Seaforth NSW 2092', 318, True, 4677436.574912272, -15249900.515259352)
('4c2c2899d1a10f471b26f964', "ck's bites", -33.80060532468794, 151.25716318925308, '73 New Street West (Wanganella Street), Balgowlah NSW 2093', 370, False, 4676596.259829709, -15249870.921735363)
('4c2fcc0ded37a59316f36703', 'Harrys Fish Cafe', -33.79299617587857, 151.246587

In [167]:
print('List of Thai plus Malay restaurants')
print('---------------------------')
for r in list(thai_restaurants.values())[:100]:
    print(r)
print('...')
print('Total:', len(thai_restaurants))


List of Thai plus Malay restaurants
---------------------------
('4e424444b61ca5ba3b23d584', 'Balance Thai Restaurant', -33.79749411753701, 151.25073565755122, '7/50 Ethel St., Seaforth NSW 2092', 318, True, 4677436.574912272, -15249900.515259352)
('4b936772f964a520d44234e3', 'Bai Yok Thai', -33.80238, 151.21315, 'Shop 2A, 122 Edinburgh Road, Castlecrag NSW 2068', 524, True, 4680877.18733583, -15247311.872232182)
('4b28b2a2f964a520b49424e3', 'Express Thai Noodle Hut', -33.79270189675285, 151.19563042281055, '339 Penshurst St (at Victoria Ave), City of Sydney NSW 2067', 289, True, 4683245.357876835, -15247534.20447998)
('51d3e556498ea03f125c72af', 'Khao Pla', -33.796495679067434, 151.1833691534041, 'Shop 7, 370-374 Victoria Ave. (at Anderson St.), Chatswood NSW 2067', 555, True, 4684227.101863276, -15246425.586656997)
('51f8cf99498ed18e08d929da', 'Tanoonmai', -33.79660704459571, 151.18419240724356, 'Australia', 586, True, 4684137.7680593515, -15246456.244357489)
('5171fc17e4b0bb055dc1ce

Let's now see all the collected restaurants in the map presented in blue dots and the Thai and Malaysian restaurants in red dots.

In [168]:
map_sydney = folium.Map(location=sydney_center, zoom_start=13)
folium.Marker(sydney_center, popup='Townhall').add_to(map_sydney)
for res in restaurants.values():
    lat = res[2]; lon = res[3]
    is_thai = res[6]
    color = 'red' if is_thai else 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_sydney)
map_sydney

So now we have all the restaurants in area under evaluation, and we know which ones are Thai and Malaysian restaurants. 

So the data collection process is done. we're now ready to use this data for analysis to find the best location to open Thai restaurant given our criteria. 


### Methodology <a name="methodology"></a>

As mentioned earlier that our critieria of selecting the optimal location listed by priority are as follows:

1. Area with less no Thai and Malaysian restaurant at all.
2. Area with higher population density which are represented by the restaurant density (from the strong correlation beween and population and number of restaurant in the other study).



### Analysis <a name="analysis"></a>
Let's perform some basic explanatory data analysis and derive some additional info from our raw data. First let's count the number of restaurants in every area candidate:

In [169]:
location_restaurants_count = [len(res) for res in location_restaurants]

df_locations['Restaurants in area'] = location_restaurants_count

print('Average number of restaurants in every area with radius=600m:', np.array(location_restaurants_count).mean())

df_locations.head(10)


Average number of restaurants in every area with radius=600m: 1.3763736263736264


Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,Restaurants in area
0,"1 Addison Rd, Manly NSW 2095",-33.811318,151.281864,4673447.0,-15249910.0,11984.990613,0
1,"Reef Beach Track, Balgowlah Heights NSW 2093",-33.80715,151.272504,4674647.0,-15249910.0,11680.7534,0
2,"24 Condamine St, Balgowlah Heights NSW 2093",-33.802983,151.263147,4675847.0,-15249910.0,11494.346436,1
3,"5 Plant St, Balgowlah NSW 2093",-33.798815,151.253791,4677047.0,-15249910.0,11431.53533,2
4,"65 Peacock St, Seaforth NSW 2092",-33.794647,151.244438,4678247.0,-15249910.0,11494.346436,1
5,"79A Gurney Cres, Seaforth NSW 2092",-33.790479,151.235086,4679447.0,-15249910.0,11680.7534,0
6,"7 Bampi Pl, Castle Cove NSW 2069",-33.78631,151.225736,4680647.0,-15249910.0,11984.990613,0
7,"Hole in Wall Track, Manly NSW 2095",-33.824336,151.291582,4671647.0,-15248870.0,11711.532778,0
8,"Hospital Rd, Manly NSW 2095",-33.820168,151.28222,4672847.0,-15248870.0,11208.925015,0
9,Balgowlah NSW 2093,-33.816,151.27286,4674047.0,-15248870.0,10816.653826,0


Now let's generate the heat map depicting the density of the restaurant in each area.

In [170]:
restaurant_latlons = [[res[2], res[3]] for res in restaurants.values()]

thai_latlons = [[res[2], res[3]] for res in thai_restaurants.values()]

In [171]:
from folium import plugins
from folium.plugins import HeatMap

map_sydney = folium.Map(location=sydney_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_sydney) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_sydney)
folium.Marker(sydney_center).add_to(map_sydney)
map_sydney

and let's calculate two most important things for each location candidate: number of restaurants in vicinity (we'll use radius of 600 meters) and distance to closest Thai or Malaysian restaurant.

In [172]:
def count_restaurants_nearby(x, y, restaurants, radius=600):    
    count = 0
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=radius:
            count += 1
    return count

def find_nearest_restaurant(x, y, restaurants):
    d_min = 100000
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=d_min:
            d_min = d
    return d_min

restaurant_counts = []
thai_distances = []

print('Generating data on location candidates... ', end='')
for x, y in zip(xs, ys):
    count = count_restaurants_nearby(x, y, restaurants, radius=600)
    restaurant_counts.append(count)
    distance = find_nearest_restaurant(x, y, thai_restaurants)
    thai_distances.append(distance)
print('done.')

Generating data on location candidates... done.


So we can see that average distance to closest Thai or Malaysian restaurant from each area center is 1463.9976627014726

From the heat map generated above, it looks like the further from the city center, the lower density of the restaurant in the area.




In [173]:
df_locations = pd.DataFrame({'Latitude':latitudes,
                                 'Longitude':longitudes,
                                 'X':xs,
                                 'Y':ys,
                                 'Restaurants nearby':restaurant_counts,
                                 'Distance to Thai or Malaysian restaurant':thai_distances})

df_locations.head(10)

Unnamed: 0,Latitude,Longitude,X,Y,Restaurants nearby,Distance to Thai or Malaysian restaurant
0,-33.811318,151.281864,4673447.0,-15249910.0,0,3989.419323
1,-33.80715,151.272504,4674647.0,-15249910.0,0,2789.42058
2,-33.802983,151.263147,4675847.0,-15249910.0,1,1589.423737
3,-33.798815,151.253791,4677047.0,-15249910.0,4,389.44635
4,-33.794647,151.244438,4678247.0,-15249910.0,1,810.597991
5,-33.790479,151.235086,4679447.0,-15249910.0,0,2010.589402
6,-33.78631,151.225736,4680647.0,-15249910.0,0,2603.654174
7,-33.824336,151.291582,4671647.0,-15248870.0,0,2896.030453
8,-33.820168,151.28222,4672847.0,-15248870.0,0,3336.711019
9,-33.816,151.27286,4674047.0,-15248870.0,0,3543.744927


Let us now filter those locations: we're interested only in locations with no Thai or Malaysian restaurants in radius of 1 km that also have more than 5 restaurants in the area.

In [174]:
good_res_count = np.array((df_locations['Restaurants nearby']>=5))
print('Locations with more than five restaurants nearby:', good_res_count.sum())

good_th_distance = np.array(df_locations['Distance to Thai or Malaysian restaurant']>=1000)
print('Locations with no Thai and Malaysian restaurants within 1km:', good_th_distance.sum())

good_locations = np.logical_and(good_res_count, good_th_distance)
print('Locations with both conditions met:', good_locations.sum())

df_good_locations = df_locations[good_locations]

Locations with more than five restaurants nearby: 93
Locations with no Thai and Malaysian restaurants within 1km: 163
Locations with both conditions met: 8


Let's see how this looks on a map. The blue dot represents the center of each area that meet all the criteria mentioned above while the heat represent the total number of restaurant in the area. 

In [175]:
good_latitudes = df_good_locations['Latitude'].values
good_longitudes = df_good_locations['Longitude'].values

good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

map_sydney = folium.Map(location=sydney_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_sydney)
HeatMap(restaurant_latlons).add_to(map_sydney)
folium.Marker(sydney_center).add_to(map_sydney)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=4, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_sydney) 
map_sydney

Now we have a few locations having more than 5 restaurants in its radius of 600 and no Thai/Malysian restaurant in its radius of 1 km. These locations are a potential candidate for a new Thai restaurant to be opened. 

### Results and Discussion <a name="results"></a>

We have identified 8 potential areas that meet with our criteria for selecting location. However, which one of them will be the best one will depend on the other factor and the preferences of stakeholders. For example, if the stakeholder want to focus the high-income customer, then the wealthy area like Mosman would be the best option. On the other hand, if the focus is instead on the working class customers aged around 25-40, the area in the suburb like Wolli Creek in the southwest be more preferable. 

The blue dots of the 8 candidates were presented along with the heat map representing restaurant density in the area. Given the criteria and everyting else equal, the blue dot situated in the more 'heated' area would be considered more preferable. 


### Conclusion <a name="conclusion"></a>

In this project, we aim to identify the best location to open Thai restaurant given our criteria, which is based on the survey and study conducted prior to this project. That is to say, the good location to open Thai rastaurant should be one in the area that have high density of restaurant (representing high population density) but no Thai and Malaysian restaurant in the area yet. 

By using the google API to pinpoint the coordinates of different area in Sydney and by using data from Foursquare to identify the restaurants and their type, we are able to build the dataframe and generate the map that depicts the features we need for our analysis, namely restaurant density by heatmap and existence of direct competitors which are Thai and Malaysian restaurant within a certain proximity by the blue dots. 

After all, the findings in this analysis will have to be used along with other factors depending on final preferences of the stakeholders, i.e. whether the restaurant would be high-end or mid-end.