In [1]:
# @hidden_cell
# The project token is an authorization token that is used to access project resources like data sources, connections, and used by platform APIs.
from project_lib import Project
project = Project(project_id='1a3d1608-cef6-4e55-892f-e694350999b9', project_access_token='p-b1e906d1e9ca88d56cfd31d28d427663e50beb64')
pc = project.project_context


# The Battle of the Jakarta Neighborhoods 
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Business Problem](#introduction)
* [Data Definition](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## 1. Business Problem <a name="introduction"></a>

Jakarta is the capital of Indonesia, the largest city in the country, and one of the most populous urban agglomerations on earth. The population of Jakarta metropolitan area, with an area of 6,343 km2 (2,449 sq mi), was 31.6 million according to the Indonesia 2015 Inter-Census, making it the most populous region in Indonesia, as well as the second most populous urban area in the world after Tokyo. The population share of Jakarta metropolitan area to national population increased from 6.1% in 1961 to 11.26% in 2010. Today about 20% of Indonesia's urban population is concentrated in the Jakarta metropolitan area.

Jakarta is now considered a global city and one of the fastest growing economies in the world. Interestingly, Jakarta reported the highest return on investment for luxury real estate in 2014 compared to any other city on earth. Considering the facts, many investors are eager to open their business in Jakarta nowadays, which makes it so important that they choose the right location according to their line of business. However, it is not easy to get the information and it also requires an analytical thinking.

To address the above problem, this project will be focusing on determining the location to open new hotels in Jakarta metropolitan area, with considering the distance to city center, number of existing hotels nearby, and total of population in the area. Those factors contribute in the likelihood of optimal decision.

The result will be recommendation of location candidates according to data analysis method.


## 2. Data Definition<a name="data"></a>

There are some data needed to solve the business problem as follow.
1.	Center of Jakarta is defined as ‘Setiabudi’ area and its coordinate along with neighborhoods will be obtained using **Google Maps API**.
2. Number of existing hotels in the neighborhood. The data will be generated using **Foursquare API**.
3. Number of populations in the neighborhood. For this case, the data will be gathered from **Jakarta Government public data repositories** and converted to csv.

All the required data will be merged into a dataset and will be used for further analysis. Data will be presented in map visualization which allows us to see the location in a quick glance.


## 3. Methodology <a name="methodology"></a>

Below is the methodology that will be used for analysis.

**1. Data Collection**

We define Jakarta city center coordinates using Google Maps API, collected the location of hotels within 6 km using Foursquare and gathered population within the neighborhoods district from Jakarta public datasets.

**2. Exploratory Data Analysis**

Calculate the density across neighborhood using heatmaps to identify recommended area with criteria as follow:
- Close to city center
- Low number of hotels
- High number of population

**3. Clustering**

Data analysis will be presented using k-means clustering in Folium map. The recommendation will be defined by focusing on locations close to city center (less than 2.5 km) with no hotels within radius 1 km and more than 150,000 total population. 

## Data Collection

### Get Neighborhood Candidates using Google Maps API

Find the latitude & longitude of Jakarta city center using Google Maps API.

In [2]:
google_api_key = 'AIzaSyD_6MiroMbMYQvM39r1P9VEFIALuQi6Jeg'

In [3]:
#Function to generate coordinate using Google Maps API
import requests

def get_coordinates(api_key, address, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, address)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        geographical_data = results[0]['geometry']['location'] # get geographical coordinates
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        return [None, None]
    
address = 'Setiabudi St, Jakarta, Indonesia'
jakarta_center = get_coordinates(google_api_key, address)
print('Coordinate of {}: {}'.format(address, jakarta_center))

Coordinate of Setiabudi St, Jakarta, Indonesia: [-6.2160764, 106.8300133]


Now we have the latitude and longitude of Jakarta center in WGS84 format, let's convert them into UTM X,Y format so we can find surrounding neighborhood area within 3 km. Once we get the neighborhood X/Y data, we need to convert them back to lat/long and show in Folium map.

Our neighborhoods will be defined as circular areas with a radius of 300 meters, so our neighborhood centers will be 600 meters apart.

In [4]:
# Function to convert WGS84 to UTM (Cartesian)

!pip install shapely
import shapely.geometry

!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=48, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=48, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)





In [5]:
print('Jakarta center longitude={}, latitude={}'.format(jakarta_center[1], jakarta_center[0]))
x, y = lonlat_to_xy(jakarta_center[1], jakarta_center[0])
print('Jakarta center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)

Jakarta center longitude=106.8300133, latitude=-6.2160764
Jakarta center UTM X=702479.2103012843, Y=-687440.4328693151


Let's create a **hexagonal grid of cells**: we offset every other row, and adjust vertical row spacing so that **every cell center is equally distant from all it's neighbors**.

In [6]:
jakarta_center_x, jakarta_center_y = lonlat_to_xy(jakarta_center[1], jakarta_center[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = jakarta_center_x - 3000 # offset 3 km
x_step = 600
y_min = jakarta_center_y - 3000 - (int(21/k)*k*600 - 12000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(jakarta_center_x, jakarta_center_y, x, y)
        if (distance_from_center <= 6001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

244 candidate neighborhood centers generated.


### Visualize Neighborhoods using Folium Map

In [7]:
!pip install folium

import folium



In [8]:
map_jakarta = folium.Map(location=jakarta_center, zoom_start=12)
folium.Marker(jakarta_center, popup='Setiabudi (City Center)').add_to(map_jakarta)
for lat, lon in zip(latitudes, longitudes):
    folium.Marker([lat, lon]).add_to(map_jakarta)

map_jakarta

Use Google Maps API to get approximate addresses of those neighborhoods.

In [9]:
#Function to get address
def get_address(api_key, latitude, longitude, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude, longitude)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        address = results[0]['formatted_address']
        return address
    except:
        return None

addr = get_address(google_api_key, jakarta_center[0], jakarta_center[1])
print('Reverse geocoding check')
print('-----------------------')
print('Address of [{}, {}] is: {}'.format(jakarta_center[0], jakarta_center[1], addr))

Reverse geocoding check
-----------------------
Address of [-6.2160764, 106.8300133] is: Setiabudi 2, Jl. H. R. Rasuna Said No.62, RT.6/RW.7, Kuningan, Karet Kuningan, Kecamatan Setiabudi, Kota Jakarta Selatan, Daerah Khusus Ibukota Jakarta 12920, Indonesia


In [10]:
print('Obtaining location addresses: ', end='')
addresses = []
for lat, lon in zip(latitudes, longitudes):
    address = get_address(google_api_key, lat, lon)
    if address is None:
        address = 'NO ADDRESS'
    address = address.replace(', Indonesia', '') # We don't need country part of address
    addresses.append(address)
    print(' .', end='')
print(' done.')

Obtaining location addresses:  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


Put the data into a Pandas dataframe.

In [11]:
import pandas as pd

df_locations = pd.DataFrame({'Address': addresses,
                             'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

df_locations.head()

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center
0,"5, Jl. Iskandarsyah I No.18, RT.5/RW.4, Melawa...",-6.245414,106.805719,699779.210301,-690675.815777,4213.988913
1,"Jl. Pulo Raya III No.27, RT.1/RW.3, Petogogan,...",-6.245395,106.81114,700379.210301,-690675.815777,3857.162501
2,"Jl. Bangka II Gg. V No.21, RT.13/RW.2, Pela Ma...",-6.245376,106.816562,700979.210301,-690675.815777,3566.1888
3,"Jl. Pondok Karya VIII Blok I No.72, RT.11/RW.4...",-6.245358,106.821983,701579.210301,-690675.815777,3358.22908
4,"Jl. Mampang Prapatan IV No.19, RT.2/RW.5, Mamp...",-6.245339,106.827404,702179.210301,-690675.815777,3249.261848


### Get Venue in Neighborhood using Foursquare API

Let's get hotels data in each neighborhood.

In [12]:
# @hidden_cell
#Connect to Foursquare API
CLIENT_ID = 'PFHHUBSQYQWBTTRZAADIDOA0N3BX2IEBOZFL33OIYJO2RNSI' # your Foursquare ID
CLIENT_SECRET = '3ZC25DGLLW5I11LZZQHKPHAF0AILLNPOJUOFQVNQTPPQ11JE' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [13]:
#Function to explore top 100 venue within radius 500 meters from provided location
LIMIT = 100

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        url   
        # make the GET request
        results = requests.get(url).json()["response"]["groups"][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood_Latitude', 
                  'Neighborhood_Longitude', 
                  'Venue', 
                  'Venue_Latitude', 
                  'Venue_Longitude', 
                  'Venue_Category']
    
    return(nearby_venues)

In [14]:
jakarta_venues = getNearbyVenues(names=df_locations['Address'],
                                   latitudes=df_locations['Latitude'],
                                   longitudes=df_locations['Longitude']
                                  )

5, Jl. Iskandarsyah I No.18, RT.5/RW.4, Melawai, Kec. Kby. Baru, Kota Jakarta Selatan, Daerah Khusus Ibukota Jakarta 12160
Jl. Pulo Raya III No.27, RT.1/RW.3, Petogogan, Kec. Kby. Baru, Kota Jakarta Selatan, Daerah Khusus Ibukota Jakarta 12170
Jl. Bangka II Gg. V No.21, RT.13/RW.2, Pela Mampang, Kec. Mampang Prpt., Kota Jakarta Selatan, Daerah Khusus Ibukota Jakarta 12720
Jl. Pondok Karya VIII Blok I No.72, RT.11/RW.4, Pela Mampang, Kec. Mampang Prpt., Kota Jakarta Selatan, Daerah Khusus Ibukota Jakarta 12720
Jl. Mampang Prapatan IV No.19, RT.2/RW.5, Mampang Prpt., Kec. Mampang Prpt., Kota Jakarta Selatan, Daerah Khusus Ibukota Jakarta 12790
2, Jl. Tegal Parang Utara I No.94, RT.9/RW.5, Tegal Parang, Kec. Mampang Prpt., Kota Jakarta Selatan, Daerah Khusus Ibukota Jakarta 12790
Jl. Triloka VI Blok K No.5, RT.3/RW.4, Pancoran, Kec. Pancoran, Kota Jakarta Selatan, Daerah Khusus Ibukota Jakarta 12780
Jl. Raya Pasar Minggu No.2 B, RT.2/RW.2, Pancoran, Kec. Pancoran, Kota Jakarta Selatan, Da

In [15]:
print('{} venues were returned by Foursquare.'.format(jakarta_venues.shape[0]))
#jakarta_venues.head()

6025 venues were returned by Foursquare.


In [16]:
#Filter datasets to only include hotels
jakarta_hotels=jakarta_venues.Venue_Category.str.contains('hotel', case=False)
jakarta_venues=jakarta_venues[jakarta_hotels]

#Rename column values to be standard term
jakarta_venues['Neighborhood'] = jakarta_venues['Neighborhood'].str.replace('Kec.','Kecamatan')
jakarta_venues['Kecamatan'] = jakarta_venues['Neighborhood'].str.extract(r'Kecamatan\s+(.*?),')

#Convert venue lat long to x y
xx, yy = lonlat_to_xy(jakarta_venues['Venue_Longitude'].values, jakarta_venues['Venue_Latitude'].values)

jakarta_venues['Venue_x'] = xx
jakarta_venues['Venue_y'] = yy

jakarta_venues.head(144)
#print('Total number of hotels nearby :', len(jakarta_venues))

Unnamed: 0,Neighborhood,Neighborhood_Latitude,Neighborhood_Longitude,Venue,Venue_Latitude,Venue_Longitude,Venue_Category,Kecamatan,Venue_x,Venue_y
0,"5, Jl. Iskandarsyah I No.18, RT.5/RW.4, Melawa...",-6.245414,106.805719,Hotel GranDhika Iskandarsyah Jakarta,-6.244813,106.803989,Hotel,Kby. Baru,699587.931649,-690608.756764
35,"5, Jl. Iskandarsyah I No.18, RT.5/RW.4, Melawa...",-6.245414,106.805719,Hotel Ambhara,-6.243266,106.803678,Hotel,Kby. Baru,699554.124192,-690437.517698
46,"5, Jl. Iskandarsyah I No.18, RT.5/RW.4, Melawa...",-6.245414,106.805719,favehotel Melawai,-6.244321,106.801337,Hotel,Kby. Baru,699294.688423,-690553.296449
111,"Jl. Raya Pasar Minggu No.2 B, RT.2/RW.2, Panco...",-6.245282,106.843668,Cipta Hotel Pancoran,-6.247249,106.843335,Hotel,Pancoran,703941.588910,-690893.251751
123,"Jl. Raya Pasar Minggu No.2 B, RT.2/RW.2, Panco...",-6.245282,106.843668,Amaris Hotel Pancoran,-6.247507,106.842834,Hotel,Pancoran,703886.021918,-690921.551012
131,"Jl. Raya Pasar Minggu No.2 B, RT.2/RW.2, Panco...",-6.245282,106.843668,V Hotel Jakarta,-6.241865,106.844744,Hotel,Pancoran,704099.618926,-690298.267303
184,"Jl. Madrasah X No.6, RT.1/RW.10, Cipinang Cemp...",-6.245186,106.870774,Harper MT Haryono,-6.245173,106.872283,Hotel,Kramat jati,707146.307796,-690674.878332
188,"Jl. Madrasah X No.6, RT.1/RW.10, Cipinang Cemp...",-6.245186,106.870774,Hotel Ibis Cawang,-6.246892,106.872668,Hotel,Kramat jati,707188.216735,-690865.198457
202,"Jl. Trunojoyo No.135, RT.6/RW.2, Melawai, Keca...",-6.240725,106.802992,Hotel GranDhika Iskandarsyah Jakarta,-6.244813,106.803989,Hotel,Kby. Baru,699587.931649,-690608.756764
303,"Jl. Kapt. Tendean No.11, RT.1/RW.1, Pela Mampa...",-6.240669,106.819256,Diradja Hotel,-6.240012,106.820329,Hotel,Mampang Prpt.,701398.277136,-690083.978935


#### Get population data

In [17]:
#Import population data
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_7e0a6b8ed76149af9b3f842747a5c013 = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='JU5bdlJKhBqRKKyuZehXw5HxrUMEgZxmFn1R9Flxvbks',
    ibm_auth_endpoint="https://iam.ng.bluemix.net/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')

body = client_7e0a6b8ed76149af9b3f842747a5c013.get_object(Bucket='datasciencecapstoneproject-donotdelete-pr-xkcgnmdxqvslim',Key='Population_Data.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

# If you are reading an Excel file into a pandas DataFrame, replace `read_csv` by `read_excel` in the next statement.
df_data_0 = pd.read_csv(body)
df_data_0.head()


Unnamed: 0,Nama_Kecamatan,Pria,Wanita,Total_Population
0,Menteng,44475,43628,88103
1,Cemp. Putih,47964,47310,95274
2,Senen,63212,60421,123633
3,Taman Sari,63567,62323,125890
4,Johar Baru,69901,66604,136505


In [18]:
#Merge population data into datasets
jakarta_venues = pd.merge(jakarta_venues,
                 df_data_0,
                 left_on='Kecamatan',
                 right_on='Nama_Kecamatan',
                 how='left'
                 )
jakarta_venues.head()

Unnamed: 0,Neighborhood,Neighborhood_Latitude,Neighborhood_Longitude,Venue,Venue_Latitude,Venue_Longitude,Venue_Category,Kecamatan,Venue_x,Venue_y,Nama_Kecamatan,Pria,Wanita,Total_Population
0,"5, Jl. Iskandarsyah I No.18, RT.5/RW.4, Melawa...",-6.245414,106.805719,Hotel GranDhika Iskandarsyah Jakarta,-6.244813,106.803989,Hotel,Kby. Baru,699587.931649,-690608.756764,Kby. Baru,74982.0,73362.0,148344.0
1,"5, Jl. Iskandarsyah I No.18, RT.5/RW.4, Melawa...",-6.245414,106.805719,Hotel Ambhara,-6.243266,106.803678,Hotel,Kby. Baru,699554.124192,-690437.517698,Kby. Baru,74982.0,73362.0,148344.0
2,"5, Jl. Iskandarsyah I No.18, RT.5/RW.4, Melawa...",-6.245414,106.805719,favehotel Melawai,-6.244321,106.801337,Hotel,Kby. Baru,699294.688423,-690553.296449,Kby. Baru,74982.0,73362.0,148344.0
3,"Jl. Raya Pasar Minggu No.2 B, RT.2/RW.2, Panco...",-6.245282,106.843668,Cipta Hotel Pancoran,-6.247249,106.843335,Hotel,Pancoran,703941.58891,-690893.251751,Pancoran,78803.0,76393.0,155196.0
4,"Jl. Raya Pasar Minggu No.2 B, RT.2/RW.2, Panco...",-6.245282,106.843668,Amaris Hotel Pancoran,-6.247507,106.842834,Hotel,Pancoran,703886.021918,-690921.551012,Pancoran,78803.0,76393.0,155196.0


In [19]:
#Data preprocessing
import numpy as np

#Create column to indicate population status
jakarta_venues['Is_PopulationHigh'] = np.where(jakarta_venues['Total_Population']>150000, True, False)

jakarta_venues.head()

Unnamed: 0,Neighborhood,Neighborhood_Latitude,Neighborhood_Longitude,Venue,Venue_Latitude,Venue_Longitude,Venue_Category,Kecamatan,Venue_x,Venue_y,Nama_Kecamatan,Pria,Wanita,Total_Population,Is_PopulationHigh
0,"5, Jl. Iskandarsyah I No.18, RT.5/RW.4, Melawa...",-6.245414,106.805719,Hotel GranDhika Iskandarsyah Jakarta,-6.244813,106.803989,Hotel,Kby. Baru,699587.931649,-690608.756764,Kby. Baru,74982.0,73362.0,148344.0,False
1,"5, Jl. Iskandarsyah I No.18, RT.5/RW.4, Melawa...",-6.245414,106.805719,Hotel Ambhara,-6.243266,106.803678,Hotel,Kby. Baru,699554.124192,-690437.517698,Kby. Baru,74982.0,73362.0,148344.0,False
2,"5, Jl. Iskandarsyah I No.18, RT.5/RW.4, Melawa...",-6.245414,106.805719,favehotel Melawai,-6.244321,106.801337,Hotel,Kby. Baru,699294.688423,-690553.296449,Kby. Baru,74982.0,73362.0,148344.0,False
3,"Jl. Raya Pasar Minggu No.2 B, RT.2/RW.2, Panco...",-6.245282,106.843668,Cipta Hotel Pancoran,-6.247249,106.843335,Hotel,Pancoran,703941.58891,-690893.251751,Pancoran,78803.0,76393.0,155196.0,True
4,"Jl. Raya Pasar Minggu No.2 B, RT.2/RW.2, Panco...",-6.245282,106.843668,Amaris Hotel Pancoran,-6.247507,106.842834,Hotel,Pancoran,703886.021918,-690921.551012,Pancoran,78803.0,76393.0,155196.0,True


In [20]:
#Let's visualize nearby hotels in neighborhood using Folium map
map_jakarta = folium.Map(location=jakarta_center, zoom_start=13)
folium.Marker(jakarta_center, popup='Setiabudi (City Center)').add_to(map_jakarta)

# add markers to map
for lat, lng, venue, neighborhood in zip(jakarta_venues['Venue_Latitude'], jakarta_venues['Venue_Longitude'], jakarta_venues['Venue'], jakarta_venues['Neighborhood']):
    label = '{}'.format(venue)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        #fill_color='#green',
        fill_opacity=0.7,
        parse_html=False).add_to(map_jakarta)

map_jakarta

By now, we already know the location of hotel in surrounding neighborhoods from Jakarta city center. We have gathered the required data and will be moving on to the analysis phase.

## Analysis <a name="analysis"></a>

#### Exploratory Data Analysis

In [21]:
#Get neighborhood lat long
hotel_latlon = [[row[4], row[5]] for index, row in jakarta_venues.iterrows()]
#hotel_latlon

In [22]:
from folium.plugins import HeatMap

map_jakarta = folium.Map(location=jakarta_center, zoom_start=13)
#folium.TileLayer('cartodbpositron').add_to(map_jakarta) #cartodbpositron cartodbdark_matter
folium.Marker(jakarta_center).add_to(map_jakarta)
HeatMap(hotel_latlon).add_to(map_jakarta)
 
for lat, lng, venue, neighborhood, is_high in zip(jakarta_venues['Venue_Latitude'], jakarta_venues['Venue_Longitude'], jakarta_venues['Venue'], jakarta_venues['Neighborhood'], jakarta_venues['Is_PopulationHigh']):
    label = '{}'.format(venue)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='black' if is_high else 'grey',
        fill=True,
        #fill_color='#green',
        fill_opacity=0.7,
        parse_html=False).add_to(map_jakarta)

folium.Circle(jakarta_center, radius=3000, fill=False, color='white').add_to(map_jakarta)
map_jakarta

We can see the low hotel density closest to city center are around **south from Setiabudi area**. In addition to that, high population area are also being in the south. Looks promising. That being said, we can focus our area of interest to the south part. Let's move our center of interest and visualize with heatmap of hotel density and black color for higher population.

In [23]:
roi_x_min = jakarta_center_x - 2000
roi_y_max = jakarta_center_y + 1000

roi_center_x = roi_x_min + 2500
roi_center_y = roi_y_max - 2500
roi_center_lon, roi_center_lat = xy_to_lonlat(roi_center_x, roi_center_y)
roi_center = [roi_center_lat, roi_center_lon]

map_jakarta = folium.Map(location=roi_center, zoom_start=13)
HeatMap(hotel_latlon).add_to(map_jakarta)
folium.Marker(jakarta_center).add_to(map_jakarta)

for lat, lng, venue, neighborhood, is_high in zip(jakarta_venues['Venue_Latitude'], jakarta_venues['Venue_Longitude'], jakarta_venues['Venue'], jakarta_venues['Neighborhood'], jakarta_venues['Is_PopulationHigh']):
    label = '{}'.format(venue)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        #is_population_high = 
        color='black' if is_high else 'grey',
        fill=True,
        #fill_color='#green',
        fill_opacity=0.7,
        parse_html=False).add_to(map_jakarta)
    
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(map_jakarta)
map_jakarta

The new center of interest looks good. It covers the low density of hotels, high populated area and still close enough to the city center. Let's put this location into dataframe.

In [24]:
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_step = 100
y_step = 100 * k 
roi_y_min = roi_center_y - 2500

roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int(51/k)):
    y = roi_y_min + i * y_step
    x_offset = 50 if i%2==0 else 0
    for j in range(0, 51):
        x = roi_x_min + j * x_step + x_offset
        d = calc_xy_distance(roi_center_x, roi_center_y, x, y)
        if (d <= 2501):
            lon, lat = xy_to_lonlat(x, y)
            roi_latitudes.append(lat)
            roi_longitudes.append(lon)
            roi_xs.append(x)
            roi_ys.append(y)

print(len(roi_latitudes), 'candidate neighborhood centers generated.')

2261 candidate neighborhood centers generated.


In [25]:
roi_hotels_counts = []

def count_hotels_nearby(x, y, hotel, radius):
    count = 0
    for h in hotel.values:
        h_x = h[8]; h_y = h[9]
        e = calc_xy_distance(x, y, h_x, h_y)
        if (e <= radius) :
            count+=1
    return count

for x, y in zip(roi_xs, roi_ys):
    count = count_hotels_nearby(x, y, jakarta_venues, radius=1000)
    roi_hotels_counts.append(count)

print('done')

done


#### Put candidate locations into dataframe

In [26]:
df_good_location = pd.DataFrame({'Latitude':roi_latitudes,
                                 'Longitude':roi_longitudes,
                                 'X':roi_xs,
                                 'Y':roi_ys,
                                 'Hotels_nearby':roi_hotels_counts,
                                 })

df_good_location.head()

Unnamed: 0,Latitude,Longitude,X,Y,Hotels_nearby
0,-6.252229,106.834205,702929.210301,-691440.432869,0
1,-6.252225,106.835108,703029.210301,-691440.432869,0
2,-6.251463,106.829232,702379.210301,-691353.830329,0
3,-6.25146,106.830136,702479.210301,-691353.830329,0
4,-6.251457,106.83104,702579.210301,-691353.830329,0


#### We want to focus on area with no hotels 

In [27]:
df_good_candidates = df_good_location[df_good_location['Hotels_nearby']==0]
len(df_good_candidates)

199

#### Let's visualize the good area candidates in map

In [28]:
good_latitudes = df_good_candidates['Latitude'].values
good_longitudes = df_good_candidates['Longitude'].values

good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

map_jakarta = folium.Map(location=roi_center, zoom_start=13)
HeatMap(hotel_latlon).add_to(map_jakarta)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.6).add_to(map_jakarta)
folium.Marker(jakarta_center).add_to(map_jakarta)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_jakarta) 
map_jakarta

Now we have area with no hotels nearby and should be a good candidates. Let's cluster these areas and get the center of it. We will use KMeans clustering with 5 number of clusters to be presented to the investors.

## Clustering

In [29]:
from sklearn.cluster import KMeans

number_of_clusters = 5

good_xys = df_good_candidates[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)

cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

map_jakarta = folium.Map(location=roi_center, zoom_start=13)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.6).add_to(map_jakarta)
folium.Marker(jakarta_center).add_to(map_jakarta)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_jakarta) 
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=True, fill_opacity=0.25).add_to(map_jakarta) 

map_jakarta

The clusters looks good. It represents the center of 5 candidates location. Finaly, let's **reverse geocode those candidate area centers to get the addresses**.

In [30]:
candidate_area = []

for lon, lat in cluster_centers:
    addr = get_address(google_api_key, lat, lon).replace(', Indonesia', '').replace(', Daerah Khusus Ibukota Jakarta ', '').replace('Kota Jakarta Selatan', '')
    candidate_area.append(addr)    
    x, y = lonlat_to_xy(lon, lat)
    d = calc_xy_distance(x, y, jakarta_center_x, jakarta_center_y)
    print('{}{} => {:.1f}km from Setiabudi'.format(addr, ' '*(50-len(addr)), d/1000))
    

6, Jl. Tebet Utara III B No.15, RT.7/RW.2, Tebet Tim., Kec. Tebet, 12820 => 2.9km from Setiabudi
Jl. Tegal Parang Selatan V No.47, RT.4/RW.7, Tegal Parang, Kec. Mampang Prpt., 12790 => 3.8km from Setiabudi
Jl. Tebet Timur II G No.2, RT.9/RW.5, Tebet Tim., Kec. Tebet, 12820 => 3.5km from Setiabudi
Jl. Mampang Prpt. Raya No.3a, RT.7/RW.1, Tegal Parang, Kec. Mampang Prpt., 12790 => 3.7km from Setiabudi
14, Jl. Barkah III No.29, RT.14/RW.2, Manggarai Sel., Kec. Tebet, 12860 => 2.4km from Setiabudi


#### The final visualization 

In [31]:
map_jakarta = folium.Map(location=roi_center, zoom_start=13)
folium.Circle(jakarta_center, radius=50, color='red', fill=True, fill_color='red', fill_opacity=1).add_to(map_jakarta)
for lonlat, addr in zip(cluster_centers, candidate_area):
    folium.Marker([lonlat[1], lonlat[0]], popup=addr).add_to(map_jakarta) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#0000ff00', fill=True, fill_color='#0066ff', fill_opacity=0.05).add_to(map_jakarta)
map_jakarta

## Results and Discussion <a name="results"></a>

From analysis that we have done, we can see there's about 144 hotels within 3 km from Jakarta City Center (Setiabudi area) with the low density area detected in south. Additionally, high population density also mainly located in the south so we can focus our interest to the area. 

After shifting our focus, we filtered the location candidates to include only area with no hotels within 1 km. Those location were then clustered to create 5 zones of interest which contain greatest number of location candidates. Those are the recommended location for investors to open a new hotel. Finally, we generated the address of the zone centers using reverse geocoding to be used as markers for more detailed analysis based on other factors. 

Although we have with the recommendation, there are some other factors need to be considered further, such as tourist attraction, socioeconomics and government regulation.

## Conclusion <a name="conclusion"></a>

Problem addressed in this project is how to get the required information and do analysis for determining the best location candidates in opening new hotel by stakeholder/investors.

By generating datasets and clustering the locations, we have identified 5 areas to be considered for further analysis. The recommendation can be used by investors in decision making process.
