# Bangkok Neighborhood Analysis along BTS Sukhumvit Line

## Background
Bangkok is one of the mega cities where 10 million people reside. The city occupies 1,568.7 square kilometres and consists of 50 districts with different lifestyles and neighborhoods. The main transportation in Bangkok is by roads which can accommodate only 1.5 million vehicles whereas the city has 9.7 million automobiles and motorbikes combined. This overcapacity has led to multiple issues such as traffic jam, high fuel consumption and air pollution. The average person would spend 207 extra hours per year driving in the rush hours in Bangkok.

The Bangkok Mass Transit System, commonly known as the BTS is an elevated rapid transit system in Bangkok has opened the first line called “Green Line” which consists of 1.Silom Line and 2.Sukhumvit Line in late 1999 to ease the mentioned traffic jam issues on the roads. Currently, this Green Line support around 700 thousand people daily and became the major mass transit system in Bangkok.

In this study, the scope of study will be the neighborhood along “BTS Sukhumvit Line” which consists of 32 stations as depicted below:

<img src=https://upload.wikimedia.org/wikipedia/commons/thumb/0/0d/BTS_Sukhumvit.jpg/300px-BTS_Sukhumvit.jpg>

The main reason to pick this Sukhumvit line is due to its importance in term of people served per day compared to Silom Line. Also, this line also passed “Asok Intersection” which marked as one of the central business districts in Bangkok.

## Business Problem
This study will try to identify the suitable neighborhoods along BTS Sukhumvit Line which lower living expenses and shorten commute time to work. The results of this study would benefit many stakeholders such as expatriates and multinational companies who need to identify the suitable location for their employees to reside. Also, it will be good for real estate developers to explore new potential areas to develop the new projects based on the model of this analysis.

For this study, I will assume that the target work location is at “Asok Station” in the central business district and residential neighborhood would be around stations along BTS Sukhumvit Line. The preferred commute time from/to work location would be within 30 mins which can be indirectly transform to the distance of 12 stations away.


## Data used to solve problem
1. Location of BTS Sukhumvit Line: Due to unavailability of preformatted data source and the limit number of stations on Sukhumvit Line, I have extracted the location of each station from Wikipedia and put into CSV format as the below:

In [10]:
import pandas as pd
Sukhumvit_Stations = pd.read_csv('Sukhumvit_Line.csv')
Sukhumvit_Stations.head()

Unnamed: 0,Sequence,Station Name,Latitude,Longitude
0,1,Ha Yaek Lat Phrao,13.816552,100.562012
1,2,Mo Chit,13.802583,100.553833
2,3,Saphan Khwai,13.7938,100.549731
3,4,Ari,13.779703,100.544642
4,5,Sanam Pao,13.772622,100.542092


- Sequence: Sequence identification of each station along Sukhumvit Line
- Station Name: Name of each station
- Latitude and Longitude: Coordinates of each station

2. Foursquare Venues: Using Foursquare’s Place API to retrieve details of venues, categories and price.

# Methodology
## Explore the location of stations along BTS Sukhumvit Line

In [2]:
!conda install -c conda-forge geopy --yes

# Import used libraries
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim
import folium
import numpy as np
import requests
import matplotlib.cm as cm
import matplotlib.colors as colors

Solving environment: done


  current version: 4.5.11
  latest version: 4.8.2

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    scikit-learn-0.20.1        |   py36h22eb022_0         5.7 MB
    liblapack-3.8.0            |      11_openblas          10 KB  conda-forge
    numpy-1.18.1               |   py36h95a1406_0         5.2 MB  conda-forge
    liblapacke-3.8.0           |      11_openblas          10 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    libopenblas-0.3.6          |       h5a2b251_2         7.7 MB
    scipy-1.4.1                |   py36h921218d_0        

In [11]:
# Getting location of Bangkok
address = 'Bangkok, Thailand'

geolocator = Nominatim(user_agent="safari")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Bangkok are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Bangkok are 13.7542529, 100.493087.


In [12]:
# Create map
map_stations = folium.Map(location=[latitude, longitude], zoom_start=11)

# Add markers to the map
for lat, lon, poi in zip(Sukhumvit_Stations['Latitude'], Sukhumvit_Stations['Longitude'], Sukhumvit_Stations['Station Name']):
    label = folium.Popup(str(poi), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.7).add_to(map_stations)
       
map_stations

## Filter out stations beyond 12 stations away from “Asok Station”

In [13]:
# Find sequence of Asok Station
Seq_Asok = Sukhumvit_Stations.loc[Sukhumvit_Stations['Station Name']=='Asok']['Sequence'].values[0]
# Create distance from Asok Station (Unit is number of station away from Asok Station)
Sukhumvit_Stations['Distance'] = abs(Sukhumvit_Stations['Sequence'] - Seq_Asok)
# Filter only stations within 12 stations away from Asok Station
Selected_Stations = Sukhumvit_Stations.loc[Sukhumvit_Stations['Distance']<=12]

Selected_Stations

Unnamed: 0,Sequence,Station Name,Latitude,Longitude,Distance
0,1,Ha Yaek Lat Phrao,13.816552,100.562012,12
1,2,Mo Chit,13.802583,100.553833,11
2,3,Saphan Khwai,13.7938,100.549731,10
3,4,Ari,13.779703,100.544642,9
4,5,Sanam Pao,13.772622,100.542092,8
5,6,Victory Monument,13.762744,100.537086,7
6,7,Phaya Thai,13.756942,100.533844,6
7,8,Ratchathewi,13.751875,100.531575,5
8,9,Siam,13.745619,100.534228,4
9,10,Chit Lom,13.744108,100.543097,3


In [14]:
# Show these stations on the map
# Create map
map_stations = folium.Map(location=[latitude, longitude], zoom_start=11)

# Add markers to the map
markers_colors = []
for lat, lon, poi in zip(Selected_Stations['Latitude'], Selected_Stations['Longitude'], Selected_Stations['Station Name']):
    label = folium.Popup(str(poi), parse_html=False)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='green',
        fill_opacity=0.7).add_to(map_stations)
       
map_stations

# Explore venues within 1,000 meters around stations via FourSquare API.

In [15]:
# Define FourSquare Credentials and Version
CLIENT_ID = 'TLKNSN5KHGP4O2HU4K1LJZQAFD14TD0A2OFN0T5PVLGDQ1PY' # your Foursquare ID
CLIENT_SECRET = 'KKCA5MSHIOPJEDZGH35K1JCI2P0WP2AIEPTOZQDQLZMPWZ2D' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [52]:
# Create GET Request
LIMIT = 100
radius = 1000

# Get venues around each station
def getNearbyVenues(names, latitudes, longitudes, radius=radius):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng,
            v['venue']['id'],
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Station Name', 
                  'Station Latitude', 
                  'Station Longitude',
                  'Venue ID',
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

# Get vanues for all select stations
station_venues = getNearbyVenues(names=Selected_Stations['Station Name'],
                                   latitudes=Selected_Stations['Latitude'],
                                   longitudes=Selected_Stations['Longitude']
                                  )

Ha Yaek Lat Phrao
Mo Chit
Saphan Khwai
Ari
Sanam Pao
Victory Monument
Phaya Thai
Ratchathewi
Siam
Chit Lom
Phloen Chit
Nana
Asok
Phrom Phong
Thong Lo
Ekkamai
Phra Khanong
On Nut
Bang Chak
Punnawithi
Udom Suk
Bang Na
Bearing
Samrong
Pu Chao


### Explore venues results return from FourSquare

In [40]:
# Check venues size and venue count by Station Name
print('Total records:',station_venues.shape[0])
print('Uniques categories:', len(station_venues['Venue Category'].unique()))
station_venues.groupby('Station Name').count()


Total records: 2155
Uniques categories: 200


Unnamed: 0_level_0,Station Latitude,Station Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Station Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ari,100,100,100,100,100,100
Asok,100,100,100,100,100,100
Bang Chak,44,44,44,44,44,44
Bang Na,52,52,52,52,52,52
Bearing,38,38,38,38,38,38
Chit Lom,100,100,100,100,100,100
Ekkamai,100,100,100,100,100,100
Ha Yaek Lat Phrao,100,100,100,100,100,100
Mo Chit,100,100,100,100,100,100
Nana,100,100,100,100,100,100


## Analyze each neighborhoods around stations

In [41]:
# one hot encoding
station_onehot = pd.get_dummies(station_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
station_onehot['Station Name'] = station_venues['Station Name'] 

# Move Station Name column to the first column
fixed_columns = [station_onehot.columns[-1]] + list(station_onehot.columns[:-1])
station_onehot = station_onehot[fixed_columns]
station_grouped = station_onehot.groupby('Station Name').mean().reset_index()
station_grouped

Unnamed: 0,Station Name,Accessories Store,American Restaurant,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,...,Train Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Whisky Bar,Wine Bar,Women's Store,Yakitori Restaurant,Yoga Studio,Yoshoku Restaurant,Zoo Exhibit
0,Ari,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,...,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Asok,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0
2,Bang Chak,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,...,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bang Na,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.019231,...,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bearing,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,...,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Chit Lom,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0
6,Ekkamai,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,...,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0
7,Ha Yaek Lat Phrao,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,...,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0
8,Mo Chit,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,...,0.01,0.02,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.01
9,Nana,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0


## Explore top 10 categories of each station

In [42]:
num_top_venues = 10

for station in station_grouped['Station Name']:
    print("----"+station+"----")
    temp = station_grouped[station_grouped['Station Name'] == station].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Ari----
                 venue  freq
0      Thai Restaurant  0.11
1                 Café  0.08
2          Coffee Shop  0.08
3         Noodle House  0.06
4                  Bar  0.05
5     Sushi Restaurant  0.05
6   Som Tum Restaurant  0.04
7  Japanese Restaurant  0.04
8           Restaurant  0.04
9            BBQ Joint  0.03


----Asok----
               venue  freq
0              Hotel  0.15
1  Korean Restaurant  0.09
2        Coffee Shop  0.07
3                Spa  0.06
4    Thai Restaurant  0.06
5       Burger Joint  0.03
6         Restaurant  0.03
7          Hotel Bar  0.03
8               Café  0.02
9        Pizza Place  0.02


----Bang Chak----
                 venue  freq
0    Convenience Store  0.16
1         Noodle House  0.14
2          Coffee Shop  0.09
3                Hotel  0.07
4      Thai Restaurant  0.07
5   Chinese Restaurant  0.07
6   Italian Restaurant  0.05
7        Train Station  0.05
8  Japanese Restaurant  0.02
9             Boutique  0.02


----Bang Na----


In [43]:
# Arrange top 10 categories into Dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)

    return row_categories_sorted.index.values[0:num_top_venues]

# Create columns according to number of top venues
indicators = ['st', 'nd', 'rd']
columns = ['Station Name']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# Create a new dataframe
station_sorted = pd.DataFrame(columns=columns)
station_sorted['Station Name'] = station_grouped['Station Name']

for ind in np.arange(station_grouped.shape[0]):
    station_sorted.iloc[ind,1:] = return_most_common_venues(station_grouped.iloc[ind,:], num_top_venues)

station_sorted.head()

Unnamed: 0,Station Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ari,Thai Restaurant,Coffee Shop,Café,Noodle House,Bar,Sushi Restaurant,Som Tum Restaurant,Restaurant,Japanese Restaurant,BBQ Joint
1,Asok,Hotel,Korean Restaurant,Coffee Shop,Thai Restaurant,Spa,Restaurant,Hotel Bar,Burger Joint,Café,BBQ Joint
2,Bang Chak,Convenience Store,Noodle House,Coffee Shop,Chinese Restaurant,Hotel,Thai Restaurant,Italian Restaurant,Train Station,Spa,Market
3,Bang Na,Coffee Shop,Convenience Store,Fast Food Restaurant,Noodle House,Train Station,Thai Restaurant,Hotpot Restaurant,Music Venue,Stadium,Golf Driving Range
4,Bearing,Convenience Store,Noodle House,Coffee Shop,Thai Restaurant,Café,Steakhouse,Market,Bus Stop,Asian Restaurant,Bakery


## Clustering Neighborhood around each station

In [44]:
# Set number of clusters
kclusters = 5

station_grouped_clustering = station_grouped.drop('Station Name', 1)

# Run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(station_grouped_clustering)

# Add clustering labels
station_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# Merge Dataframes
station_merged = Selected_Stations.join(station_sorted.set_index('Station Name'), on='Station Name')

station_merged

Unnamed: 0,Sequence,Station Name,Latitude,Longitude,Distance,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,Ha Yaek Lat Phrao,13.816552,100.562012,12,1,Coffee Shop,Bar,Thai Restaurant,Dessert Shop,Ice Cream Shop,Japanese Restaurant,Korean Restaurant,Clothing Store,Bakery,Bookstore
1,2,Mo Chit,13.802583,100.553833,11,1,Coffee Shop,Thai Restaurant,Convenience Store,Café,Noodle House,Shopping Mall,Som Tum Restaurant,Seafood Restaurant,Park,Fast Food Restaurant
2,3,Saphan Khwai,13.7938,100.549731,10,1,Thai Restaurant,Coffee Shop,Noodle House,Som Tum Restaurant,Café,Convenience Store,Hotel,Seafood Restaurant,Bar,BBQ Joint
3,4,Ari,13.779703,100.544642,9,4,Thai Restaurant,Coffee Shop,Café,Noodle House,Bar,Sushi Restaurant,Som Tum Restaurant,Restaurant,Japanese Restaurant,BBQ Joint
4,5,Sanam Pao,13.772622,100.542092,8,1,Noodle House,Coffee Shop,Thai Restaurant,Sushi Restaurant,Bar,Restaurant,Bakery,Hotel,BBQ Joint,Café
5,6,Victory Monument,13.762744,100.537086,7,0,Hotel,Noodle House,Coffee Shop,Café,Steakhouse,Thai Restaurant,Hostel,Convenience Store,Bakery,Food Court
6,7,Phaya Thai,13.756942,100.533844,6,0,Hotel,Noodle House,Coffee Shop,Hostel,Steakhouse,Café,Restaurant,Som Tum Restaurant,Convenience Store,Bar
7,8,Ratchathewi,13.751875,100.531575,5,0,Hotel,Coffee Shop,Hostel,Dessert Shop,Bakery,Café,Restaurant,Shopping Mall,Korean Restaurant,Bar
8,9,Siam,13.745619,100.534228,4,1,Coffee Shop,Dessert Shop,Bakery,Café,Thai Restaurant,Shopping Mall,Korean Restaurant,Movie Theater,Cosmetics Shop,Boutique
9,10,Chit Lom,13.744108,100.543097,3,0,Hotel,Coffee Shop,Thai Restaurant,Spa,Chinese Restaurant,Ice Cream Shop,Italian Restaurant,Department Store,Restaurant,Cosmetics Shop


## Display Cluster on the Map

In [45]:
# Create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# Set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(station_merged['Latitude'], station_merged['Longitude'], station_merged['Station Name'], station_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examime each cluster

### Cluster Labels = 0

In [46]:
station_merged.loc[station_merged['Cluster Labels'] == 0, station_merged.columns[[1] + list(range(5, station_merged.shape[1]))]]

Unnamed: 0,Station Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Victory Monument,0,Hotel,Noodle House,Coffee Shop,Café,Steakhouse,Thai Restaurant,Hostel,Convenience Store,Bakery,Food Court
6,Phaya Thai,0,Hotel,Noodle House,Coffee Shop,Hostel,Steakhouse,Café,Restaurant,Som Tum Restaurant,Convenience Store,Bar
7,Ratchathewi,0,Hotel,Coffee Shop,Hostel,Dessert Shop,Bakery,Café,Restaurant,Shopping Mall,Korean Restaurant,Bar
9,Chit Lom,0,Hotel,Coffee Shop,Thai Restaurant,Spa,Chinese Restaurant,Ice Cream Shop,Italian Restaurant,Department Store,Restaurant,Cosmetics Shop
10,Phloen Chit,0,Hotel,Coffee Shop,Café,Thai Restaurant,Restaurant,Hotel Bar,French Restaurant,Japanese Restaurant,Italian Restaurant,Buffet
11,Nana,0,Hotel,Korean Restaurant,Hotel Bar,Indian Restaurant,Japanese Restaurant,Coffee Shop,Pizza Place,Spa,Café,Bar
12,Asok,0,Hotel,Korean Restaurant,Coffee Shop,Thai Restaurant,Spa,Restaurant,Hotel Bar,Burger Joint,Café,BBQ Joint


### Cluster Labels = 1

In [47]:
station_merged.loc[station_merged['Cluster Labels'] == 1, station_merged.columns[[1] + list(range(5, station_merged.shape[1]))]]

Unnamed: 0,Station Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ha Yaek Lat Phrao,1,Coffee Shop,Bar,Thai Restaurant,Dessert Shop,Ice Cream Shop,Japanese Restaurant,Korean Restaurant,Clothing Store,Bakery,Bookstore
1,Mo Chit,1,Coffee Shop,Thai Restaurant,Convenience Store,Café,Noodle House,Shopping Mall,Som Tum Restaurant,Seafood Restaurant,Park,Fast Food Restaurant
2,Saphan Khwai,1,Thai Restaurant,Coffee Shop,Noodle House,Som Tum Restaurant,Café,Convenience Store,Hotel,Seafood Restaurant,Bar,BBQ Joint
4,Sanam Pao,1,Noodle House,Coffee Shop,Thai Restaurant,Sushi Restaurant,Bar,Restaurant,Bakery,Hotel,BBQ Joint,Café
8,Siam,1,Coffee Shop,Dessert Shop,Bakery,Café,Thai Restaurant,Shopping Mall,Korean Restaurant,Movie Theater,Cosmetics Shop,Boutique
21,Bang Na,1,Coffee Shop,Convenience Store,Fast Food Restaurant,Noodle House,Train Station,Thai Restaurant,Hotpot Restaurant,Music Venue,Stadium,Golf Driving Range


### Cluster Labels = 2

In [48]:
station_merged.loc[station_merged['Cluster Labels'] == 2, station_merged.columns[[1] + list(range(5, station_merged.shape[1]))]]

Unnamed: 0,Station Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,On Nut,2,Convenience Store,Hotel,Coffee Shop,Italian Restaurant,Fast Food Restaurant,Residential Building (Apartment / Condo),Noodle House,Ice Cream Shop,Chinese Restaurant,Café
18,Bang Chak,2,Convenience Store,Noodle House,Coffee Shop,Chinese Restaurant,Hotel,Thai Restaurant,Italian Restaurant,Train Station,Spa,Market
19,Punnawithi,2,Convenience Store,Thai Restaurant,Fast Food Restaurant,Coffee Shop,Noodle House,Ice Cream Shop,Bakery,Café,Hotel,Gas Station
20,Udom Suk,2,Thai Restaurant,Convenience Store,Noodle House,Fast Food Restaurant,Café,Coffee Shop,Bar,Bakery,Dim Sum Restaurant,BBQ Joint
22,Bearing,2,Convenience Store,Noodle House,Coffee Shop,Thai Restaurant,Café,Steakhouse,Market,Bus Stop,Asian Restaurant,Bakery
23,Samrong,2,Noodle House,Convenience Store,Fast Food Restaurant,Market,Hotpot Restaurant,Ramen Restaurant,Bakery,Shabu-Shabu Restaurant,Buffet,Medical Center


### Cluster Labels = 3

In [49]:
station_merged.loc[station_merged['Cluster Labels'] == 3, station_merged.columns[[1] + list(range(5, station_merged.shape[1]))]]

Unnamed: 0,Station Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
24,Pu Chao,3,Convenience Store,Noodle House,Food Truck,Intersection,BBQ Joint,Bus Station,Bus Stop,Soccer Field,Café,Museum


### Cluster Labels = 4

In [50]:
station_merged.loc[station_merged['Cluster Labels'] == 4, station_merged.columns[[1] + list(range(5, station_merged.shape[1]))]]

Unnamed: 0,Station Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Ari,4,Thai Restaurant,Coffee Shop,Café,Noodle House,Bar,Sushi Restaurant,Som Tum Restaurant,Restaurant,Japanese Restaurant,BBQ Joint
13,Phrom Phong,4,Japanese Restaurant,Café,Hotel,Coffee Shop,Thai Restaurant,Supermarket,Shopping Mall,Spa,Massage Studio,Restaurant
14,Thong Lo,4,Thai Restaurant,Coffee Shop,Japanese Restaurant,Café,Hotel,BBQ Joint,Bar,Noodle House,Spa,Cocktail Bar
15,Ekkamai,4,Japanese Restaurant,Coffee Shop,Thai Restaurant,Café,Dessert Shop,Noodle House,Gym / Fitness Center,Hotel,Ramen Restaurant,Theme Park
16,Phra Khanong,4,Coffee Shop,Japanese Restaurant,Convenience Store,Gym / Fitness Center,Noodle House,Bakery,Café,Ramen Restaurant,Art Gallery,Thai Restaurant


# Examine Living Expenses based on price tier of sample venues
Venues' details return from FourSquare contain info of price tier but not all of venues have this price tier info

In [58]:
# Merge Dataframes
venues_merged = station_venues.join(station_sorted[['Station Name','Cluster Labels']].set_index('Station Name'), on='Station Name')
venues_merged

Unnamed: 0,Station Name,Station Latitude,Station Longitude,Venue ID,Venue,Venue Latitude,Venue Longitude,Venue Category,Cluster Labels
0,Ha Yaek Lat Phrao,13.816552,100.562012,4c26d6e05c5ca593ebfa46fe,Starbucks (สตาร์บัคส์),13.817346,100.561466,Coffee Shop,1
1,Ha Yaek Lat Phrao,13.816552,100.562012,4b9a45d6f964a520d2a835e3,Starbucks (สตาร์บัคส์),13.816479,100.560947,Coffee Shop,1
2,Ha Yaek Lat Phrao,13.816552,100.562012,4f338c37e4b0662777593023,After You (อาฟเตอร์ ยู),13.816114,100.560331,Dessert Shop,1
3,Ha Yaek Lat Phrao,13.816552,100.562012,4e59ed36d164da2b2769fcc2,Cold Stone Creamery (โคล สโตน ครีมเมอรี่),13.816303,100.560715,Ice Cream Shop,1
4,Ha Yaek Lat Phrao,13.816552,100.562012,4b0bc026f964a520663323e3,CentralPlaza Lardprao (เซ็นทรัลพลาซา ลาดพร้าว),13.816376,100.561057,Shopping Mall,1
...,...,...,...,...,...,...,...,...,...
2150,Pu Chao,13.637280,100.592000,50416ff7e4b0b5223da49e85,ร้านข้าวมันไก่ตอน (ปอ)เจ้าเก่า,13.643967,100.588624,Asian Restaurant,3
2151,Pu Chao,13.637280,100.592000,4d99ca67647d8cfa148a0f3e,Club 148 สนามฟุตบอลหญ้าเทียม,13.643647,100.597727,Soccer Field,3
2152,Pu Chao,13.637280,100.592000,4ce25a2c7e2e236ab820951b,The Pizza Company (เดอะ พิซซ่า คอมปะนี),13.644400,100.587137,Pizza Place,3
2153,Pu Chao,13.637280,100.592000,4eccb7030aafc14745e69c13,Pak Nam 2 Toll Plaza (ด่านฯ ปากน้ำ 2),13.628549,100.592792,Toll Plaza,3


In [102]:
# Function to get price tier from sample venues for each cluster
def get_price_tier(df,cluster,n):
    sample = df.loc[df['Cluster Labels'] == cluster].sample(n)
    price_tier_array = []
    for id in sample['Venue ID']:
        url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(id, CLIENT_ID, CLIENT_SECRET, VERSION)
        result = requests.get(url).json()
        try:
            price_tier_array.append(result['response']['venue']['price']['tier'])
        except:
            pass
    return np.mean(price_tier_array)

In [117]:
# Analysis price tier from sample for each cluster
data = []
n = 30
for cluster in range(kclusters):
    price_tier = get_price_tier(venues_merged,cluster,n)
    data.append([cluster, price_tier])

In [118]:
# Create Dataframe Price Tier for each Cluster
dfPriceTier = pd.DataFrame(data, columns = ['Cluster Labels','Price Tier'])
dfPriceTier

Unnamed: 0,Cluster Labels,Price Tier
0,0,2.733333
1,1,2.105263
2,2,2.0
3,3,1.8
4,4,2.05


## Results
Based on above results, the lowest living expenses cluster is Cluster Labels 3 and the 2nd lowest living expenses cluster is Cluster Labels 2 (Depicted below on the maps)

In [141]:
# Create map
map_results = folium.Map(location=[latitude, longitude], zoom_start=11)
c2 = station_merged.loc[station_merged['Cluster Labels'] == 2]
c3 = station_merged.loc[station_merged['Cluster Labels'] == 3]
results = pd.concat([c3,c2])

# Set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(results['Latitude'], results['Longitude'], results['Station Name'], results['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_results)
       
map_results

In [142]:
results

Unnamed: 0,Sequence,Station Name,Latitude,Longitude,Distance,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
24,25,Pu Chao,13.63728,100.592,12,3,Convenience Store,Noodle House,Food Truck,Intersection,BBQ Joint,Bus Station,Bus Stop,Soccer Field,Café,Museum
17,18,On Nut,13.705611,100.601083,5,2,Convenience Store,Hotel,Coffee Shop,Italian Restaurant,Fast Food Restaurant,Residential Building (Apartment / Condo),Noodle House,Ice Cream Shop,Chinese Restaurant,Café
18,19,Bang Chak,13.696751,100.605349,6,2,Convenience Store,Noodle House,Coffee Shop,Chinese Restaurant,Hotel,Thai Restaurant,Italian Restaurant,Train Station,Spa,Market
19,20,Punnawithi,13.688919,100.609211,7,2,Convenience Store,Thai Restaurant,Fast Food Restaurant,Coffee Shop,Noodle House,Ice Cream Shop,Bakery,Café,Hotel,Gas Station
20,21,Udom Suk,13.680317,100.609658,8,2,Thai Restaurant,Convenience Store,Noodle House,Fast Food Restaurant,Café,Coffee Shop,Bar,Bakery,Dim Sum Restaurant,BBQ Joint
22,23,Bearing,13.659336,100.60105,10,2,Convenience Store,Noodle House,Coffee Shop,Thai Restaurant,Café,Steakhouse,Market,Bus Stop,Asian Restaurant,Bakery
23,24,Samrong,13.647363,100.596153,11,2,Noodle House,Convenience Store,Fast Food Restaurant,Market,Hotpot Restaurant,Ramen Restaurant,Bakery,Shabu-Shabu Restaurant,Buffet,Medical Center
