# Capstone Project - The Battle of the Neighborhoods

Applied Data Science Capstone by IBM/Coursera

Table of contents
- Introduction: Business Problem
- Data
- Methodology
- Analysis
- Results and Discussion
- Conclusion

# Introduction/Business Problem

My client is a global restaurant chain of Asian foods. They would like to expand their business and invest a new chain in Dublin, Ireland. Before the investment, they investigate the overview of restaurants in Dublin, especially the Asian restaurants, and the Asian markets that can supply specific materials for their Asian menu. Also, they want to gain insight about competitors such as which Asian restaurants are highly recommended by the customers and which areas have a high density of restaurants.

In this project, we will try to find out an ideal location for their restaurant. Since there are many restaurants in Dublin, we will prioritize the locations that are not packed with restaurants. Besides, we are also particularly interested in areas with a few Asian restaurants. Also, we prefer the locations that are close to any Asian markets, assuming the first two conditions are met.

We will apply the data science to provide them insight about a few most promising neighborhoods that are suitable to their criteria. The areas with more advantages will be suggested for them so that the best location can be chosen.

# Data Section

With the criteria above, the task jobs can be allocated to reach the final decision:
- Investigating the existing restaurants in the neighborhood
- Investigating the existing Asian restaurants in the neighborhood and the distances between them.
- Investigating the Asian markets in the neighborhood

The data sources will be needed to extract and obtain the required information:
- The map and approximate addresses of Dublin will be built up using Google Maps API reverse geocoding.
- The number of restaurants, their type, rating and location in every neighborhood will be provided by Foursquare API.
- The number of Asian markets will be also obtained by Foursquare API.


# Dublin City Center with neighborhoods

Let's create latitude & longitude coordinates for centroids of our candidate neighborhoods. We will create a grid of cells covering our area of interest which is aprox. 15x15 killometers centered around Dublin city center.

Let's first find the latitude & longitude of Dublin city center, using specific, well known address and Google Maps geocoding API.

In [1]:
# This file contains all my ids for foursquare and google

CLIENT_ID = "1TFMMZVZJPX4FETB4I1MTDYBNF3FXYL0QS4O2J3DTUXDWZN1"
CLIENT_SECRET = "EGRFW5AYBZA3EM04VSUGR4Y5TH5XZCNLJK4AZWUGQ4GLUN3Z"
GOOGLE_API_KEY = "AIzaSyAFzT9HsLrjRi8UR18_fID6jEtzDc7xDHc"


# Others imports:
from IPython.display import Image
import pickle
import json
import requests
import folium
import pandas as pd

!pip3 install shapely
!pip3 install geos
import shapely.geometry


!pip3 install pyproj
import pyproj

import math
import warnings
warnings.simplefilter("ignore")



In [2]:
import requests
def get_coordinates(api_key, address, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, address)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        geographical_data = results[0]['geometry']['location'] # get geographical coordinates
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        return [None, None]
    
address = 'Dublin, Ireland'
Dublin_center = get_coordinates(GOOGLE_API_KEY, address)
print('Coordinate of {}: {}'.format(address, Dublin_center))

Coordinate of Dublin, Ireland: [53.3498053, -6.2603097]


Creating a grid of area candidates, equaly spaced, centered around city center and within ~7km from Dublin. 
Our neighborhoods will be defined as circular areas with a radius of 300 meters, so our neighborhood centers will be 600 meters apart.
To accurately calculate distances we need to create our grid of locations in Cartesian 2D coordinate system which allows us to calculate distances in meters (not in latitude/longitude degrees). Then we'll project those coordinates back to latitude/longitude degrees to be shown on Folium map. So let's create functions to convert between WGS84 spherical coordinate system (latitude/longitude degrees) and UTM Cartesian coordinate system (X/Y coordinates in meters).

In [3]:
import shapely.geometry

#!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Dublin center longitude={}, latitude={}'.format(Dublin_center[1], Dublin_center[0]))
x, y = lonlat_to_xy(Dublin_center[1], Dublin_center[0])
print('Dublin center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Dublin center longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Dublin center longitude=-6.2603097, latitude=53.3498053
Dublin center UTM X=-905222.6802309377, Y=6124550.318313507
Dublin center longitude=-6.260309699999997, latitude=53.3498053


Creating a hexagonal grid of cells: we offset every other row, and adjust vertical row spacing so that every cell center is equally distant from all it's neighbors.

In [4]:
Dublin_center_x, Dublin_center_y = lonlat_to_xy(Dublin_center[1], Dublin_center[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = Dublin_center_x - 7000
x_step = 600
y_min = Dublin_center_y - 7000 - (int(21/k)*k*600 - 14000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(Dublin_center_x, Dublin_center_y, x, y)
        if (distance_from_center <= 10001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

504 candidate neighborhood centers generated.


Visualizing the data we have so far: city center location and candidate neighborhood centers.

In [5]:
!pip install folium

import folium



In [6]:
map_Dublin = folium.Map(location=Dublin_center, zoom_start=13)
folium.Marker(Dublin_center, popup='Dublin').add_to(map_Dublin)
for lat, lon in zip(latitudes, longitudes):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_Dublin) 
    folium.Circle([lat, lon], radius=300, color='blue', fill=False).add_to(map_Dublin)
    #folium.Marker([lat, lon]).add_to(map_Dublin)
map_Dublin

Now we have the coordinates of centers of neighborhoods/areas to be evaluated, equally spaced (distance from every point to it's neighbors is exactly the same) and within ~7km from Dublin.

Let's now use Google Maps API to get approximate addresses of those locations.

In [7]:
def get_address(api_key, latitude, longitude, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude, longitude)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        address = results[0]['formatted_address']
        return address
    except:
        return None

addr = get_address(GOOGLE_API_KEY, Dublin_center[0], Dublin_center[1])
print('Reverse geocoding check')
print('-----------------------')
print('Address of [{}, {}] is: {}'.format(Dublin_center[0], Dublin_center[1], addr))

Reverse geocoding check
-----------------------
Address of [53.3498053, -6.2603097] is: 11 Henry St, North City, Dublin, Ireland


In [8]:
print('Obtaining location addresses: ', end='')
addresses = []
for lat, lon in zip(latitudes, longitudes):
    address = get_address(GOOGLE_API_KEY, lat, lon)
    if address is None:
        address = 'NO ADDRESS'
    address = address.replace(', Ireland', '') # We don't need country part of address
    addresses.append(address)
    print(' .', end='')
print(' done.')

Obtaining location addresses:  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

In [9]:
addresses[150:170]

["Saint Damian's National School, Cherryfield Dr, Walkinstown, Dublin 12",
 '95 Kimmage Rd W, Royston, Crumlin, Dublin 12, D12 F603',
 '163 Stannaway Rd, Crumlin, Dublin 12, D12 X0K6',
 "Our Lady's Hope, Bangor Rd, Crumlin, Dublin 12",
 "9 Mount Argus View, Harold's Cross, Dublin 6W, Co. Dublin, D6W V259",
 '14 Shamrock Villas, Terenure, Dublin, Co. Dublin',
 '38 Grosvenor Ln, Rathmines, Dublin',
 '29 Observatory Ln, Rathmines, Dublin',
 '18B Northbrook Ave, Park View, Ranelagh, Dublin 6, D06 YP99',
 'Leeson Village, stop 2795, Dublin, Co. Dublin',
 '8 Raglan Rd, Ballsbridge, Dublin, D04 EA36',
 'Ballsbridge Gardens, 2 Shelbourne Rd, Ballsbridge, Dublin',
 '1 Tritonville Rd, Dublin 4, D04 Y0A8',
 '25A Beach Rd, Dublin 4, D04 NF88',
 'Unnamed Road, Pembroke - Rathmines, Dublin',
 'Waste Water Treatment Plant, 4, Pigeon House Rd, Dublin',
 'Poolbeg Power Station, Pigeon House Rd, Dublin 4',
 'S Wall, Co. Dublin',
 '12 Ballymount Dr, Wilkinstown, Dublin',
 '7 Ballymount Rd Lower, Wilkinst

Now present all into a Pandas dataframe.

In [10]:
import pandas as pd

df_locations = pd.DataFrame({'Address': addresses,
                             'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center
0,"6 Castlefield Grove, Castlefield Manor, Dublin...",53.280045,-6.326749,-911922.680231,6118315.0,9152.59526
1,"2 Beverly Dr, Bóthar Bhaile Scallart, Schollar...",53.281617,-6.318367,-911322.680231,6118315.0,8722.958214
2,"20 Templeroan Cres, Ballyroan, Dublin 16, D16 ...",53.283189,-6.309983,-910722.680231,6118315.0,8314.445261
3,"St. Marys Convent, Ballyroan, Dublin, Co. Dublin",53.28476,-6.301599,-910122.680231,6118315.0,7930.321557
4,"121 R115, Willbrook, Dublin",53.28633,-6.293215,-909522.680231,6118315.0,7574.298647
5,"17 St Endas Park, Rathfarnham, Dublin, D14 TP97",53.2879,-6.284829,-908922.680231,6118315.0,7250.517223
6,"38 Aranleigh Ct, Rathfarnham, Dublin 14, D14 N6C6",53.289469,-6.276442,-908322.680231,6118315.0,6963.476143
7,"Unnamed Road, Whitehall, Dublin",53.291038,-6.268055,-907722.680231,6118315.0,6717.886572
8,"376 Nutgrove Ave, Rathfarnham, Dublin, D14 YE39",53.292606,-6.259667,-907122.680231,6118315.0,6518.435395
9,"3 Woodlawn Terrace, Churchtown Lower, Dublin",53.294174,-6.251278,-906522.680231,6118315.0,6369.458376


In [105]:
df_locations.shape

(504, 8)

Now save this data into local file.

In [11]:
df_locations.to_pickle('./locations.pkl')

### Foursquare
Now that we have our location candidates, let's use Foursquare API to get info on restaurants in each neighborhood.

We're interested in venues in 'food' category, but only those that are proper restaurants - coffe shops, pizza places, bakeries etc. are not direct competitors so we don't care about those. So we will include in out list only venues that have 'restaurant' in category name, and we'll make sure to detect and include all the subcategories of specific 'Asian restaurant' category, as we need info on Asian restaurants (Chinese, Vietnames, Thai, Indian, Japanese, Korean,...) in the neighborhood.

Foursquare credentials are defined in hidden cell bellow.

In [12]:
#hidden information
foursquare_client_id = '1TFMMZVZJPX4FETB4I1MTDYBNF3FXYL0QS4O2J3DTUXDWZN1'
foursquare_client_secret = 'EGRFW5AYBZA3EM04VSUGR4Y5TH5XZCNLJK4AZWUGQ4GLUN3Z'

In [13]:
# Category IDs corresponding to Asian restaurants were taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):

food_category = '4d4b7105d754a06374d81259' # 'Root' category for all food-related venues

asian_restaurant_categories = ['4bf58dd8d48988d142941735']

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'noodle', 'satay', 'hotpot', 'Chinese', 'Vietnamese','Japanese', 'Korean', 'Thai']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Dublin', '')
    address = address.replace(', Ireland', '')
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [14]:
# Let's now go over our neighborhood locations and get nearby restaurants; we'll also maintain a dictionary of all found restaurants and all found Asian restaurants

import pickle

def get_restaurants(lats, lons):
    restaurants = {}
    asian_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=350 to meke sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, food_category, foursquare_client_id, foursquare_client_secret, radius=350, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_asian = is_restaurant(venue_categories, specific_filter=asian_restaurant_categories)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_asian, x, y)
                if venue_distance<=300:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_asian:
                    asian_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, asian_restaurants, location_restaurants

# Try to load from local file system in case we did this before
restaurants = {}
asian_restaurants = {}
location_restaurants = []
loaded = False
try:
    with open('restaurants_350.pkl', 'rb') as f:
        restaurants = pickle.load(f)
    with open('asian_restaurants_350.pkl', 'rb') as f:
        asian_restaurants = pickle.load(f)
    with open('location_restaurants_350.pkl', 'rb') as f:
        location_restaurants = pickle.load(f)
    print('Restaurant data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    restaurants, asian_restaurants, location_restaurants = get_restaurants(latitudes, longitudes)
    
    # Let's persists this in local file system
    with open('restaurants_350.pkl', 'wb') as f:
        pickle.dump(restaurants, f)
    with open('asian_restaurants_350.pkl', 'wb') as f:
        pickle.dump(asian_restaurants, f)
    with open('location_restaurants_350.pkl', 'wb') as f:
        pickle.dump(location_restaurants, f)

Restaurant data loaded.


In [15]:
import numpy as np

print('Total number of restaurants:', len(restaurants))
print('Total number of Asian restaurants:', len(asian_restaurants))
print('Percentage of Asian restaurants: {:.2f}%'.format(len(asian_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurants: 535
Total number of Asian restaurants: 34
Percentage of Asian restaurants: 6.36%
Average number of restaurants in neighborhood: 0.9801587301587301


In [16]:
print('List of all restaurants')
print('-----------------------')
for r in list(restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(restaurants))

List of all restaurants
-----------------------
('4c0009c837850f470d9e973f', 'Excelsior Take Away', 53.292611, -6.272054, 'Nutgrove Ave City', 276, False, -907929.9297510192, 6118567.389696925)
('4fa8097ee4b0fc2afce3e2ba', 'China Dynasty', 53.297354996115054, -6.233573858472655, 'Goatstown Rd City', 61, False, -905260.8041661368, 6118301.310391948)
('5a6f0a506a595012d8344f9a', 'Hx46', 53.297257, -6.233606, '4 Willowfield Park, Friarland 14, Clonskeagh', 59, True, -905266.2208366359, 6118291.301474564)
('54133227498eae39dc82bb96', 'Farmhill', 53.298543, -6.236507, '9 Farmhill Roaf', 191, False, -905411.7109359733, 6118490.12792218)
('4b2d271ef964a5209bcf24e3', 'Indian Gate', 53.303206036839065, -6.207979541688265, 'Nutgrove Shopping Centre, Nutgrove Way', 160, False, -903394.4330420226, 6118418.497126048)
('4c59dba16407d13ad23eb328', 'China House', 53.28390360063163, -6.337994669141318, 'Firhouse Shopping Centre City', 319, False, -912524.3771279724, 6118964.143572723)
('51f3088d498e0e1

In [17]:
print('List of Asian restaurants')
print('---------------------------')
for r in list(asian_restaurants.values()):
    print(r)
print('...')
print('Total:', len(asian_restaurants))

List of Asian restaurants
---------------------------
('5a6f0a506a595012d8344f9a', 'Hx46', 53.297257, -6.233606, '4 Willowfield Park, Friarland 14, Clonskeagh', 59, True, -905266.2208366359, 6118291.301474564)
('556b0eba498ef35cb1672122', 'Mao Asian Kitchen', 53.29107666015625, -6.297719955444336, 'Marion Rd, Dún Loaghaire', 184, True, -909655.0068834506, 6118923.036612354)
('5159e573e4b04e4afa989025', 'Diep', 53.29836705942237, -6.303728434471745, 'Templeogue Village', 41, True, -909798.6759581459, 6119838.507494776)
('4c8fa366daa93704767d56b1', 'Sunkist', 53.31402774354646, -6.2371204818998285, 'Clonskeagh Rd City', 283, True, -904926.681586121, 6120187.225275636)
('4ade0f33f964a520757121e3', 'Kites', 53.3292498537681, -6.231632591824455, '15-17 Ballsbridge Terrace 4', 311, True, -904053.0784568074, 6121731.832827633)
('4b277606f964a520a88624e3', 'Orchid Szechuan Restaurant', 53.33246293496805, -6.236250720015116, '120 Pembroke Rd. City', 278, True, -904244.8766007707, 6122175.121974

In [18]:
print('Restaurants around location')
print('---------------------------')
for i in range(50, 110):
    rs = location_restaurants[i][:8]
    names = ', '.join([r[1] for r in rs])
    print('Restaurants around location {}: {}'.format(i+1, names))

Restaurants around location
---------------------------
Restaurants around location 51: The Glenside, Taiping Chinese, The Elephant's Ear
Restaurants around location 52: 
Restaurants around location 53: 
Restaurants around location 54: Yumi
Restaurants around location 55: 
Restaurants around location 56: 
Restaurants around location 57: 
Restaurants around location 58: 
Restaurants around location 59: 
Restaurants around location 60: 
Restaurants around location 61: 
Restaurants around location 62: 
Restaurants around location 63: 
Restaurants around location 64: 
Restaurants around location 65: 
Restaurants around location 66: 
Restaurants around location 67: 
Restaurants around location 68: Diep, Reeves Restaurant
Restaurants around location 69: 
Restaurants around location 70: 
Restaurants around location 71: Kongs Chinese Takeaway
Restaurants around location 72: 
Restaurants around location 73: 
Restaurants around location 74: 
Restaurants around location 75: 
Restaurants around lo

Let's now see all the collected restaurants in blue colour, and Asian restaurants in read color.

In [19]:
map_dublin = folium.Map(location=Dublin_center, zoom_start=13)
folium.Marker(Dublin_center, popup='Dublin').add_to(map_dublin)
for res in restaurants.values():
    lat = res[2]; lon = res[3]
    is_asian = res[6]
    color = 'red' if is_asian else 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_dublin)
map_dublin

From the map, we have the distribution of all restaurants in Dublin city center, and we can also highlight the Asian restaurants.
Then, we are now ready to analyze the data gathered to produce the report on ideal locations for a new Asian restaurant.


# Methodology 

In this project, we attempt to discover the areas of Dublin that have low density of restaurants, particularly those with low number of Asian ones. 
We will limit our analysis to the area of 7 km around city center.

In the first step, we have already collected the required data: 

    - The location and type of all restaurants within 7 km from Dublin center
    
    - Identification of Asian restaurants
    
In the second step, we will calculate and explore the restaurant density across different areas in Dublin. The Heatmaps will be used to identify the promising areas close to center with low number of restaurants, and then we will focus on these areas for next step.

Finally, we will creat clusters of locations that meet the intial requirements of the client: 

    - The locations with **no more than three restaurants in radius of 400 meters**, and we want locations **without Asian restaurants in radius of 500 meters**.
    
    - We will present map of all such locations but also create clusters (using **k-means clustering**) of those locations to identify general zones / neighborhoods / addresses which should be a starting point for the client investment.


# Analysis 

Let's perform some basic explanatory data analysis and derive some additional info from our raw data. First let's count the number of restaurants in every area candidate:

In [20]:
location_restaurants_count = [len(res) for res in location_restaurants]

df_locations['Restaurants in area'] = location_restaurants_count

print('Average number of restaurants in every area with radius=200m:', np.array(location_restaurants_count).mean())

df_locations

Average number of restaurants in every area with radius=200m: 0.9801587301587301


Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,Restaurants in area
0,"6 Castlefield Grove, Castlefield Manor, Dublin...",53.280045,-6.326749,-911922.680231,6.118315e+06,9152.595260,0
1,"2 Beverly Dr, Bóthar Bhaile Scallart, Schollar...",53.281617,-6.318367,-911322.680231,6.118315e+06,8722.958214,0
2,"20 Templeroan Cres, Ballyroan, Dublin 16, D16 ...",53.283189,-6.309983,-910722.680231,6.118315e+06,8314.445261,0
3,"St. Marys Convent, Ballyroan, Dublin, Co. Dublin",53.284760,-6.301599,-910122.680231,6.118315e+06,7930.321557,0
4,"121 R115, Willbrook, Dublin",53.286330,-6.293215,-909522.680231,6.118315e+06,7574.298647,0
...,...,...,...,...,...,...,...
499,"Northwood Ave, Santry Demesne, Dublin",53.404486,-6.248886,-902622.680231,6.130266e+06,6279.331175,1
500,"The Airport Hub, Unit 01 Furry Park Ind. Est, ...",53.406055,-6.240475,-902022.680231,6.130266e+06,6550.572494,3
501,"36A Turnapin Cottages, Turnapin Little, Dublin...",53.407624,-6.232063,-901422.680231,6.130266e+06,6863.672486,0
502,"8139 R139, Co. Dublin",53.409192,-6.223650,-900822.680231,6.130266e+06,7213.182377,0


let's calculate the distance to nearest Asian restaurant from every area candidate center (not only those within 200m - we want distance to closest one, regardless of how distant it is)

In [21]:
distances_to_asian_restaurant = []

for area_x, area_y in zip(xs, ys):
    min_distance = 10000
    for res in asian_restaurants.values():
        res_x = res[7]
        res_y = res[8]
        d = calc_xy_distance(area_x, area_y, res_x, res_y)
        if d<min_distance:
            min_distance = d
    distances_to_asian_restaurant.append(min_distance)

df_locations['Distance to Asian restaurant'] = distances_to_asian_restaurant

In [22]:
df_locations

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,Restaurants in area,Distance to Asian restaurant
0,"6 Castlefield Grove, Castlefield Manor, Dublin...",53.280045,-6.326749,-911922.680231,6.118315e+06,9152.595260,0,2347.792471
1,"2 Beverly Dr, Bóthar Bhaile Scallart, Schollar...",53.281617,-6.318367,-911322.680231,6.118315e+06,8722.958214,0,1775.083511
2,"20 Templeroan Cres, Ballyroan, Dublin 16, D16 ...",53.283189,-6.309983,-910722.680231,6.118315e+06,8314.445261,0,1228.703973
3,"St. Marys Convent, Ballyroan, Dublin, Co. Dublin",53.284760,-6.301599,-910122.680231,6.118315e+06,7930.321557,0,767.141080
4,"121 R115, Willbrook, Dublin",53.286330,-6.293215,-909522.680231,6.118315e+06,7574.298647,0,622.332242
...,...,...,...,...,...,...,...,...
499,"Northwood Ave, Santry Demesne, Dublin",53.404486,-6.248886,-902622.680231,6.130266e+06,6279.331175,1,232.198592
500,"The Airport Hub, Unit 01 Furry Park Ind. Est, ...",53.406055,-6.240475,-902022.680231,6.130266e+06,6550.572494,3,389.887964
501,"36A Turnapin Cottages, Turnapin Little, Dublin...",53.407624,-6.232063,-901422.680231,6.130266e+06,6863.672486,0,430.287364
502,"8139 R139, Co. Dublin",53.409192,-6.223650,-900822.680231,6.130266e+06,7213.182377,0,920.063010


In [23]:
df_locations.sort_values(by='Restaurants in area', ascending=False).head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,Restaurants in area,Distance to Asian restaurant
242,"49, 2 Dame St, Temple Bar, Dublin, Co. Dublin,...",53.344407,-6.263638,-905622.680231,6124031.0,655.743852,44,111.423264
201,"125 R816, Dublin",53.33727,-6.2507,-905022.680231,6122991.0,1571.623365,22,132.027139
199,"6 Grantham St, Saint Kevin's, Dublin, D08 FF80",53.334132,-6.267494,-906222.680231,6122991.0,1852.025918,18,350.09374
220,"Sherborne, Aungier St, Dublin 2",53.339269,-6.265566,-905922.680231,6123511.0,1252.996409,14,258.636029
180,"Baggot St Upper, stop 752, Dublin, Co. Dublin",53.333701,-6.244233,-904722.680231,6122472.0,2137.755833,14,107.545367
285,"36 Hill St, Dublin Northside, Dublin 1, D01 RW14",53.354681,-6.259781,-905022.680231,6125070.0,556.776436,13,118.559511
262,"32 Arran St E, Smithfield, Dublin 7, D07 EF9P",53.347974,-6.270109,-905922.680231,6124550.0,700.0,11,543.262797
264,"Colvill House 24-, 26 Talbot St, Mountjoy, Dub...",53.351113,-6.25331,-904722.680231,6124550.0,500.0,10,476.051206
198,"124 an Cuarbhóthar Theas, Dublin 8, D08 VYN5",53.332563,-6.27589,-906822.680231,6122991.0,2233.83079,9,577.649269
306,"11 Sherrard Street Lower, Dublin, D01 KX08",53.359818,-6.257852,-904722.680231,6125590.0,1153.256259,9,191.436915


In [24]:
print('Average distance to closest Asian restaurant from each area center:', df_locations['Distance to Asian restaurant'].mean())

Average distance to closest Asian restaurant from each area center: 1450.4025389253254


Let's crete a map showing heatmap / density of restaurants. Also, let's show borders of Dublin boroughs on our map and a few circles indicating distance of 1km, 2km and 3km from city center.

In [25]:
restaurant_latlons = [[res[2], res[3]] for res in restaurants.values()]

asian_latlons = [[res[2], res[3]] for res in asian_restaurants.values()]

In [26]:
from folium import plugins
from folium.plugins import HeatMap

map_dublin = folium.Map(location=Dublin_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_dublin) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_dublin)
folium.Marker(Dublin_center).add_to(map_dublin)
folium.Circle(Dublin_center, radius=1000, fill=False, color='white').add_to(map_dublin)
folium.Circle(Dublin_center, radius=2000, fill=False, color='white').add_to(map_dublin)
folium.Circle(Dublin_center, radius=3000, fill=False, color='white').add_to(map_dublin)
map_dublin

Looks like a few pockets of low restaurant density closest to city center can be found **north, north-east and east from center (Henry street)**. 

Let's create another heatmap map showing **heatmap/density of Asian restaurants** only.

In [27]:
map_dublin = folium.Map(location=Dublin_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_dublin) #cartodbpositron cartodbdark_matter
HeatMap(asian_latlons).add_to(map_dublin)
folium.Marker(Dublin_center).add_to(map_dublin)
folium.Circle(Dublin_center, radius=1000, fill=False, color='white').add_to(map_dublin)
folium.Circle(Dublin_center, radius=2000, fill=False, color='white').add_to(map_dublin)
folium.Circle(Dublin_center, radius=3000, fill=False, color='white').add_to(map_dublin)
map_dublin

The heatmap of Asian restaurants is not so red, which can be easily understood because Asian restaurants only contribute 6% of total restaurants in Dublin.
It aslo indicates that the higher density of Asian restaurants is distributed in the north-east from Hentry street (center of Dublin).
Based on this we will focus the analysis on the areas north and east from Dublin center
Particularly, we will move the center of areas of interest and reduce its' size to 3 km of radius. So our candidates mostly in Phisborough and North Wall.

# Phisborough and North Wall.

Let's define new, more narrow region of interest, which will include low-restaurant-count parts of Phisborough and North Wall closest to center

In [276]:
roi_x_min = Dublin_center_x + 2200
roi_y_max = Dublin_center_y - 1400
roi_width = 2500
roi_height = 2500
roi_center_x = roi_x_min - 2000
roi_center_y = roi_y_max + 2000
roi_center_lon, roi_center_lat = xy_to_lonlat(roi_center_x, roi_center_y)
roi_center = [roi_center_lat, roi_center_lon]

map_dublin = folium.Map(location=roi_center, zoom_start=14)
HeatMap(restaurant_latlons).add_to(map_dublin)
folium.Marker(Dublin_center).add_to(map_dublin)
folium.Circle(roi_center, radius=1200, color='white', fill=True, fill_opacity=0.4).add_to(map_dublin)

map_dublin

The cycle covers all the regions of low density restaurant density in Phisborough and North Wall closest to Dublin city center

Let's also create new, more dense grid of location candidates restricted to our new region of interest (let's make our location candidates 100m appart).

In [284]:
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_step = 100
roi_x_min = roi_center_x - 2000
y_step = 100 * k 
roi_y_min = roi_center_y - 2000

roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int(51/k)):
    y = roi_y_min + i * y_step
    x_offset = (50-1) if i%2==0 else 0
    for j in range(0, 51):
        x = roi_x_min + j * x_step + x_offset
        d = calc_xy_distance(roi_center_x, roi_center_y, x, y)
        if (d <= 1201):
            lon, lat = xy_to_lonlat(x, y)
            roi_latitudes.append(lat)
            roi_longitudes.append(lon)
            roi_xs.append(x)
            roi_ys.append(y)

print(len(roi_latitudes), 'candidate neighborhood centers generated.')


515 candidate neighborhood centers generated.


Calculating two most important things for each location candidate: number of restaurants in vicinity (we'll use radius of 400 meters) and distance to closest Asian restaurant.

In [285]:
def count_restaurants_nearby(x, y, restaurants, radius=400):    
    count = 0
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=radius:
            count += 1
    return count

def find_nearest_restaurant(x, y, restaurants):
    d_min = 10000000
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=d_min:
            d_min = d
    return d_min

roi_restaurant_counts = []
roi_asian_distances = []

print('Generating data on location candidates... ', end='')
for x, y in zip(roi_xs, roi_ys):
    count = count_restaurants_nearby(x, y, restaurants, radius=400)
    roi_restaurant_counts.append(count)
    distance = find_nearest_restaurant(x, y, asian_restaurants)
    roi_asian_distances.append(distance)
print('done.')


Generating data on location candidates... done.


In [286]:
# Let's put this into dataframe
df_roi_locations = pd.DataFrame({'Latitude':roi_latitudes,
                                 'Longitude':roi_longitudes,
                                 'X':roi_xs,
                                 'Y':roi_ys,
                                 'Restaurants nearby':roi_restaurant_counts,
                                 'Distance to Asian restaurant':roi_asian_distances})

df_roi_locations

Unnamed: 0,Latitude,Longitude,X,Y,Restaurants nearby,Distance to Asian restaurant
0,53.344938,-6.260090,-905373.680231,6.124016e+06,51,360.789653
1,53.345199,-6.258690,-905273.680231,6.124016e+06,39,460.606384
2,53.345460,-6.257290,-905173.680231,6.124016e+06,30,560.488454
3,53.345722,-6.255891,-905073.680231,6.124016e+06,15,660.406219
4,53.345983,-6.254491,-904973.680231,6.124016e+06,7,760.345606
...,...,...,...,...,...,...
510,53.364583,-6.265734,-905073.680231,6.126268e+06,0,602.840312
511,53.364845,-6.264333,-904973.680231,6.126268e+06,0,578.746434
512,53.365106,-6.262933,-904873.680231,6.126268e+06,2,571.382907
513,53.365368,-6.261533,-904773.680231,6.126268e+06,3,581.385774


In [287]:

map_dublin = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_dublin)
HeatMap(restaurant_latlons).add_to(map_dublin)
folium.Circle(roi_center, radius=1200, color='white', fill=True, fill_opacity=0.6).add_to(map_dublin)
folium.Marker(Dublin_center).add_to(map_dublin)
for lat, lon in zip(df_roi_locations['Latitude'], df_roi_locations['Longitude']):
    folium.CircleMarker([lat, lon], radius=5, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_dublin) 

map_dublin

In [288]:
map_Dublin = folium.Map(location=roi_center, zoom_start=13)
folium.Marker(Dublin_center, popup='Dublin').add_to(map_Dublin)
for lat, lon in zip(df_roi_locations['Latitude'], df_roi_locations['Longitude']):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_Dublin) 
    folium.Circle([lat, lon], radius=2, color='blue', fill=False).add_to(map_Dublin)
    #folium.Marker([lat, lon]).add_to(map_Dublin)
map_Dublin

#### We are interested only in the locations with no more than three restaurants in radius of 400 meters, and no Asian restaurants in the radius of 500 meters.

In [293]:
good_res_count = np.array((df_roi_locations['Restaurants nearby']<=3))
print('Locations with no more than three restaurants nearby:', good_res_count.sum())

good_asian_distance = np.array(df_roi_locations['Distance to Asian restaurant']>=500)
print('Locations with no Asian restaurants within 1000:', good_asian_distance.sum())

good_locations = np.logical_and(good_res_count, good_asian_distance)
print('Locations with both conditions met:', good_locations.sum())

df_good_locations = df_roi_locations[good_locations]

Locations with no more than three restaurants nearby: 152
Locations with no Asian restaurants within 1000: 200
Locations with both conditions met: 95


In [294]:
good_latitudes = df_good_locations['Latitude'].values
good_longitudes = df_good_locations['Longitude'].values

good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

map_dublin = folium.Map(location=Dublin_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_dublin)
HeatMap(restaurant_latlons).add_to(map_dublin)
folium.Circle(roi_center, radius=1200, color='white', fill=True, fill_opacity=0.6).add_to(map_dublin)
folium.Marker(Dublin_center).add_to(map_dublin)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_dublin) 

map_dublin

We now have a bunch of locations fairly close to Dublin center (mostly in Phisborough and  EallyBough), and we know that each of those locations has no more than three restaurants in radius of 400m, and no Asian restaurant closer than 500m. Any of those locations is a potential candidate for a new Asian restaurant, at least based on nearby competition.

Let's now show those good locations in a form of heatmap:

In [295]:
map_dublin = folium.Map(location=roi_center, zoom_start=14)
HeatMap(good_locations, radius=25).add_to(map_dublin)
folium.Marker(Dublin_center).add_to(map_dublin)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_dublin)
map_dublin

What we have now is a clear indication of zones with low number of restaurants in vicinity, and no Asian restaurants at all nearby.

Let us now cluster those locations to create centers of zones containing good locations. Those zones, their centers and addresses will be the final result of our analysis.

In [298]:
from sklearn.cluster import KMeans

number_of_clusters = 3

good_xys = df_good_locations[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)

cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

map_dublin = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_dublin)
HeatMap(restaurant_latlons).add_to(map_dublin)
folium.Circle(roi_center, radius=1200, color='white', fill=True, fill_opacity=0.4).add_to(map_dublin)
folium.Marker(Dublin_center).add_to(map_dublin)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=300, color='green', fill=True, fill_opacity=0.25).add_to(map_dublin) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_dublin)
map_dublin

In [303]:
cluster_centers

[(-6.274027606640501, 53.35564870799142),
 (-6.25181360748393, 53.36274725709786),
 (-6.269441097018172, 53.36188531988884)]

Our clusters represent groupings of most of the candidate locations and cluster centers are placed nicely in the middle of the zones 'rich' with location candidates.

Addresses of those cluster centers will be a good starting point for exploring the neighborhoods to find the best possible location based on neighborhood specifics.

Let's see those zones on a city map without heatmap, using shaded areas to indicate our clusters:

In [302]:
map_dublin = folium.Map(location=roi_center, zoom_start=14)
folium.Marker(Dublin_center).add_to(map_dublin)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#00000000', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(map_dublin)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_dublin)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=350, color='green', fill=False).add_to(map_dublin) 

map_dublin

Let's zoom in on candidate areas in Grangegorman:

In [307]:
map_dublin = folium.Map(location=[ 53.35564870799142,-6.274027606640501], zoom_start=15)
folium.Marker(Dublin_center).add_to(map_dublin)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=350, color='green', fill=False).add_to(map_dublin) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#0000ff00', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(map_dublin)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_dublin)
map_dublin

and candidate areas in Phibsborough

In [309]:
map_dublin = folium.Map(location=[53.36188531988884,-6.269441097018172], zoom_start=15)
folium.Marker(Dublin_center).add_to(map_dublin)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=350, color='green', fill=False).add_to(map_dublin) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#0000ff00', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(map_dublin)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_dublin)
map_dublin

Finaly, let's reverse geocode those candidate area centers to get the addresses which can be presented to the client.

In [311]:
candidate_area_addresses = []
print('==============================================================')
print('Addresses of centers of areas recommended for further analysis')
print('==============================================================\n')
for lon, lat in cluster_centers:
    addr = get_address(GOOGLE_API_KEY, lat, lon).replace(', Ireland', '')
    candidate_area_addresses.append(addr)    
    x, y = lonlat_to_xy(lon, lat)
    d = calc_xy_distance(x, y, Dublin_center_x, Dublin_center_y)
    print('{}{} => {:.1f}km from Dublin city center'.format(addr, ' '*(50-len(addr)), d/1000))
    

Addresses of centers of areas recommended for further analysis

6 Phibsborough Rd, Dublin 7, D07 C2W7              => 1.1km from Dublin city center
164 Clonliffe Rd, Ballybough, Dublin               => 1.6km from Dublin city center
13 Arranmore Ave, Phibsborough, Dublin, D07 PP26   => 1.5km from Dublin city center


This concludes our analysis. We have created 3 addresses representing centers of zones containing locations with low number of restaurants and no Asian restaurants nearby, all zones being fairly close to city center (all less than 2km from Dublin city center). Although zones are shown on map with a radius of ~300 meters (green circles), their shape is actually very irregular and their centers/addresses should be considered only as a starting point for exploring area neighborhoods in search for potential restaurant locations. Most of the zones are located in Phibsborough and Grangegorman, which we have identified as interesting due to being popular with tourists, fairly close to city center and well connected by public transport

In [312]:
map_dublin = folium.Map(location=roi_center, zoom_start=14)
folium.Circle(Dublin_center, radius=50, color='red', fill=True, fill_color='red', fill_opacity=1).add_to(map_dublin)
for lonlat, addr in zip(cluster_centers, candidate_area_addresses):
    folium.Marker([lonlat[1], lonlat[0]], popup=addr).add_to(map_dublin) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#0000ff00', fill=True, fill_color='#0066ff', fill_opacity=0.05).add_to(map_dublin)
map_dublin

# Results and Discussion 

In our analysis, there are many places in Dublin with the low density of restaurants. The highest distribution of restaurants was detected south and south-west from Henry street in Dublin center, so we focused our attention to the northern, northeastern areas, corresponding to Phibsborough, North Wall, Ballybough, Grangegorman. Particularly, our attention was focused on Phibsborough and North Wall because these places offer a high population, tourist attraction and especially have a number of pocket of low restaurant density. 

After directing our attention to this more narrow area of interest (covering approx. 2x2km north-east from Henry street) we first created a dense grid of location candidates (spaced 100m appart); those locations were then filtered so that those with more than three restaurants in radius of 400m and those with an Asian restaurant closer than 500m were removed.

Those location candidates were then clustered to create zones of interest which contain greatest number of location candidates. Addresses of centers of those zones were also generated using reverse geocoding to be used as markers/starting points for more detailed local analysis based on other factors.

Result of all this is 3 zones (Phibsborough, Grangegorman and Ballyborough) containing largest number of potential new restaurant locations based on number of and distance to existing venues - both restaurants in general and Asian restaurants particularly. The final decision for an optimal location for a new Asian restaurant will depend on many differenr factors such as the population of Asian people, competitors, market size, the policy of government etc. So this work only provides the information on the areas close to Dublin center but not crowded with restaurants, especially Asian ones.

# Conclusion

With the aim to optimize the location for the an Asian restaurant, this project identify the areas in Dublin city center that have low density of restaurants (particularly Asian restaurants) so that the client can narrow down the targets before investment. Using the data from Foursquare and Google map to measure the restaurant distribution, we have obtained some general boroughs meeting the intitial requirements, and then clustering thoses locations into three promising areas: Phibsborough, Grangegorman and Ballyborough. The visual map of these areas were performed and the addresses of them were created to be used as starting points for the investment of the client. 
