# Capstone Project - The Battle of the Neighborhoods (Week 5
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

**AT20** is a fitness studio group in mainland China specialized in Electrical Muscle Stimulation (EMS) training.  AT20 claims that the new training method leads to seriously impressive results in short periods of time. People could reap the benefits of a comparable hour-long workout in less than 20 minutes or less. 
Focused customers of AT20 are busy office workers who work long hours and don’t have the luxury for long hour physical exercise. According to experience, AT20 studio is ideally located **near subway station**.   
Management of AT20 plans to set up shops in Hong Kong that could meet the following criteria:
1. **within 1.5km from major MTR stations** (subway of Hong Kong)
2. in neighborhood that the **number of fitness centers are not saturated** (less competition) 

We would like to employ data science analysis techniques to help AT20 to decide the best locations to set up training studios.

People who would be interested in the findings are:
1. Management of AT20 who have to decide the locations of their new studios in Hong Kong
2. People who are interested in setting up fitness centers/Gym  business in Hong Kong. 


## Data <a name="data"></a>

As AT20's new studio needs to be located near major MTR stations where the number of fitness centers/Gym is not saturated,the following data is required:
1. List of MTR stations in Hong Kong - this could be obtained from wikipedia (https://en.wikipedia.org/wiki/List_of_MTR_stations)
2. Numbers of fitness centers/Gym within 1.5km radius in each MTR stations - via API query of foursquare.com
3. Population density of districts where each MTR stations are located - obtain from wikipedia (https://en.wikipedia.org/wiki/Districts_of_Hong_Kong) 

**Obtain list of MTR stations from wikipedia**

First, we have to obtain list of Hong Kong's MTR station from Wikipedia (https://en.wikipedia.org/wiki/List_of_MTR_stations).
Names of MTR stations were captured and saved in a cvs file (MTR_stations.csv)

In [87]:
import numpy as np
import pandas as pd
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium # map rendering library
from folium.plugins import HeatMap
import json
import math

In [5]:
MTR_hk = pd.read_csv("MTR_stations.csv")
MTR_hk.head()

Unnamed: 0,MTRStation,longitude,latitude
0,MTR Admiralty station,0,0
1,MTR Causeway Bay station,0,0
2,MTR Central station,0,0
3,MTR Chai Wan station,0,0
4,MTR Che Kung Temple station,0,0


#### Obtain longtitude and latitude of each MTR stations

In [8]:
# api key for google and foursquare
google_api_key= ' '
foursquare_client_id=' '
foursquare_client_secret=' '
VERSION = '20180605'

In [9]:
def get_coordinates(api_key, address, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, address)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        geographical_data = results[0]['geometry']['location'] # get geographical coordinates
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        return [None, None]

In [10]:
for i in range(0, MTR_hk.shape[0]):
    address = MTR_hk.iloc[i,0]
    coordinates = get_coordinates(google_api_key, address)
    MTR_hk.iloc[i,1] = coordinates[1]
    MTR_hk.iloc[i,2] = coordinates[0]

In [11]:
MTR_hk.head()

Unnamed: 0,MTRStation,longitude,latitude
0,MTR Admiralty station,114.164536,22.278285
1,MTR Causeway Bay station,114.185042,22.280375
2,MTR Central station,114.15746,22.281895
3,MTR Chai Wan station,114.23635,22.264498
4,MTR Che Kung Temple station,114.185953,22.37476


**Let's take a look at the locations of MTR stations in Hong Kong**

In [12]:
import folium
hk_center = [22.3193039, 114.1693611]


# create map of Hong Kong using latitude and longitude values
map_hongkong = folium.Map(location=hk_center, zoom_start=11)

# add markers to map
for lat, lng, label in zip(MTR_hk['latitude'], MTR_hk['longitude'], MTR_hk['MTRStation']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_hongkong)    
    
map_hongkong

#### Obtain population density (# of people/ sq. Km) of districts where each MTR station is located

The infomation is obtained from wikipedia (https://en.wikipedia.org/wiki/Districts_of_Hong_Kong) and saved in a cvs file (MTR_Pdensity.csv)

In [13]:
MTR_Pdensity = pd.read_csv('MTR_Pdensity.csv')
MTR_Pdensity.head()

Unnamed: 0,MTRStation,District,Pop_density
0,MTR Admiralty station,Central Western,19983.92
1,MTR Central station,Central Western,19983.92
2,MTR HKU station,Central Western,19983.92
3,MTR Kennedy Town station,Central Western,19983.92
4,MTR Sai Ying Pun station,Central Western,19983.92


#### Using Foursquare API queries to obtain numbers of fitness center/GYM within 1.5Km radius of each MTR Stations

In [14]:
# This is the Foursquare's venue category id for Gym / Fitness Center
cat='4bf58dd8d48988d175941735' 

In [17]:
# function to get near by gym/fitness centers venues based on location's longtitude and latitude
LIMIT = 100 # limit of number of venues returned by Foursquare API
def getNearbyVenues(names, latitudes, longitudes, radius=1500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
            foursquare_client_id, 
            foursquare_client_secret, 
            VERSION, 
            lat, 
            lng,
            cat,
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['MTRStation', 
                  'latitude', 
                  'longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [18]:
# get all the near by gym/fitness centers venues for all MTR Stations
gym_hk = MTR_hk
gym_venues = getNearbyVenues(names=gym_hk['MTRStation'],
                                   latitudes=gym_hk['latitude'],
                                   longitudes=gym_hk['longitude']
                                  )

MTR Admiralty station
MTR Causeway Bay station
MTR Central station
MTR Chai Wan station
MTR Che Kung Temple station
MTR Cheung Sha Wan station
MTR Choi Hung station
MTR City One station
MTR Diamond Hill station
MTR Fanling station
MTR Fo Tan station
MTR Fortress Hill station
MTR Hang Hau station
MTR Heng Fa Chuen station
MTR Heng On station
MTR HKU station
MTR Ho Man Tin station
MTR Hung Hom station
MTR Jordan station
MTR Kam Sheung Road station
MTR Kennedy Town station
MTR Kowloon Bay station
MTR Kowloon station
MTR Kowloon Tong station
MTR Kwai Fong station
MTR Kwai Hing station
MTR Kwun Tong station
MTR Lai Chi Kok station
MTR Lai King station
MTR Lam Tin station
MTR Lei Tung station
MTR LOHAS Park station
MTR Lok Fu station
MTR Long Ping station
MTR Ma On Shan station
MTR Mei Foo station
MTR Mong Kok station
MTR Nam Cheong station
MTR Ngau Tau Kok station
MTR North Point station
MTR Olympic station
MTR Po Lam station
MTR Prince Edward station
MTR Quarry Bay station
MTR Sai Wan Ho s

In [19]:
print(gym_venues.shape)
gym_venues.head()

(969, 7)


Unnamed: 0,MTRStation,latitude,longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,MTR Admiralty station,22.278285,114.164536,Pure Fitness,22.278475,114.161363,Gym / Fitness Center
1,MTR Admiralty station,22.278285,114.164536,Pure Yoga,22.278106,114.164754,Yoga Studio
2,MTR Admiralty station,22.278285,114.164536,Pure Fitness,22.279925,114.163022,Gym / Fitness Center
3,MTR Admiralty station,22.278285,114.164536,Pure Yoga,22.276904,114.168365,Yoga Studio
4,MTR Admiralty station,22.278285,114.164536,Pure Fitness,22.285137,114.159455,Gym / Fitness Center


In [20]:
# get a table of MTR stations and the number of gym/fitness centers within 1.5Km
df1 = gym_venues['MTRStation'].value_counts()
df1 = df1.to_frame().reset_index()
df1.columns = ["MTRStation", "count"]
gym_count = df1
gym_count.head()

Unnamed: 0,MTRStation,count
0,MTR Admiralty station,84
1,MTR Central station,82
2,MTR Sai Ying Pun station,77
3,MTR Causeway Bay station,58
4,MTR Jordan station,46


#### Let's draw a heatmap to depict the density of gym/fitness center in each MTR stations

In [21]:
df2 = df1.merge(MTR_hk, on='MTRStation', how='outer').fillna('')
df2 = df2[['latitude', 'longitude','count']]
for i in range(0,df2.shape[0]):
    if df2.loc[i,'count']=='':df2.loc[i,'count']=0

In [26]:
df3 = df2.values.tolist()
address = 'Hong Kong'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map of Toronto using latitude and longitude values
map_hk = folium.Map(location=[latitude, longitude], zoom_start=10)

HeatMap(df3).add_to(map_hk)

map_hk

#### Now we have all the required data

## Methodology <a name="methodology"></a>

In order to recommend locations for new AT20 studio, we need to :
1. determine whether there are already "too many" gym/fitness centers within 1.5Km radius of each MTR stations.
2. determine whether we should treat all MTR stations as one group or we should differentiate them into different groups

Here is our methodology:

A. Grouping of MTR stations:

step 1: Find out different type of venues in the vicinity of each MTR stations via foursquare API queries

step 2: using  the finding in step 1 to categorise (using "k-means clustering") all the MTR stations into 5 groups.

step 3: examine each of the 5 MTR station groups from step 2 and assign to each of them a meaning label 

B. Determine how "crowded" is the gym/fitness center market in the neighbor of all the MTR stations
- the area of a circle of 1.5 Km radius = 3.1415 x 1.5 x 1.5 = 7.068 sq,Km. The number of population in a 7.068 sq. Km circle with MTR station as its center is estimated by 7.068 x population density (number of people/sq. Km)
- number of people per gym/fitness center is calculated by (7.068 x population density)/ number of gym/fitness centers

step 1: calculate number of people per gym/fitness center by using above formular

step 2: list the top 3 MTR station with the highest number of people per gym/fitness center in each group

This would be our recommendation of new AT20 studio

## Analysis <a name="analysis"></a>

### A. Grouping of MTR stations

In [46]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

def getNearbyVenues(names, latitudes, longitudes, radius=1500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            foursquare_client_id, 
            foursquare_client_secret, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['MTRStation', 
                  'MTRStation Latitude', 
                  'MTRStation Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [47]:
HK_venues = getNearbyVenues(names=MTR_hk['MTRStation'],
                                   latitudes=MTR_hk['latitude'],
                                   longitudes=MTR_hk['longitude']
                                  )

MTR Admiralty station
MTR Causeway Bay station
MTR Central station
MTR Chai Wan station
MTR Che Kung Temple station
MTR Cheung Sha Wan station
MTR Choi Hung station
MTR City One station
MTR Diamond Hill station
MTR Fanling station
MTR Fo Tan station
MTR Fortress Hill station
MTR Hang Hau station
MTR Heng Fa Chuen station
MTR Heng On station
MTR HKU station
MTR Ho Man Tin station
MTR Hung Hom station
MTR Jordan station
MTR Kam Sheung Road station
MTR Kennedy Town station
MTR Kowloon Bay station
MTR Kowloon station
MTR Kowloon Tong station
MTR Kwai Fong station
MTR Kwai Hing station
MTR Kwun Tong station
MTR Lai Chi Kok station
MTR Lai King station
MTR Lam Tin station
MTR Lei Tung station
MTR LOHAS Park station
MTR Lok Fu station
MTR Long Ping station
MTR Ma On Shan station
MTR Mei Foo station
MTR Mong Kok station
MTR Nam Cheong station
MTR Ngau Tau Kok station
MTR North Point station
MTR Olympic station
MTR Po Lam station
MTR Prince Edward station
MTR Quarry Bay station
MTR Sai Wan Ho s

In [48]:
# one hot encoding
HK_onehot = pd.get_dummies(HK_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
HK_onehot['MTRStation'] = HK_venues['MTRStation'] 

# move neighborhood column to the first column
fixed_columns = [HK_onehot.columns[-1]] + list(HK_onehot.columns[:-1])
HK_onehot = HK_onehot[fixed_columns]

HK_grouped = HK_onehot.groupby('MTRStation').mean().reset_index()

In [49]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [50]:
num_top_venues = 30

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['MTRStation']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
HK_venues_sorted = pd.DataFrame(columns=columns)
HK_venues_sorted['MTRStation'] = HK_grouped['MTRStation']

for ind in np.arange(HK_grouped.shape[0]):
    HK_venues_sorted.iloc[ind, 1:] = return_most_common_venues(HK_grouped.iloc[ind, :], num_top_venues)

HK_venues_sorted.head()

Unnamed: 0,MTRStation,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,...,21th Most Common Venue,22th Most Common Venue,23th Most Common Venue,24th Most Common Venue,25th Most Common Venue,26th Most Common Venue,27th Most Common Venue,28th Most Common Venue,29th Most Common Venue,30th Most Common Venue
0,MTR Admiralty station,Hotel,Italian Restaurant,Café,Yoga Studio,Hotel Bar,Lounge,Bar,Japanese Restaurant,Steakhouse,...,Hong Kong Restaurant,Park,Chinese Restaurant,Middle Eastern Restaurant,Snack Place,Social Club,Cocktail Bar,Multiplex,Electronics Store,Non-Profit
1,MTR Causeway Bay station,Japanese Restaurant,Chinese Restaurant,Coffee Shop,Dessert Shop,Sushi Restaurant,Hotel,Café,Szechuan Restaurant,Sporting Goods Shop,...,Park,Pizza Place,Buffet,French Restaurant,Snack Place,Shopping Mall,Shanghai Restaurant,Electronics Store,Dim Sum Restaurant,Chocolate Shop
2,MTR Central station,Cocktail Bar,Hotel,Gym / Fitness Center,French Restaurant,Japanese Restaurant,Italian Restaurant,Art Gallery,Bar,Coffee Shop,...,Middle Eastern Restaurant,BBQ Joint,Shanghai Restaurant,Sandwich Place,Cantonese Restaurant,Cupcake Shop,Shopping Mall,Café,Cultural Center,Electronics Store
3,MTR Chai Wan station,Fast Food Restaurant,Bakery,Chinese Restaurant,Coffee Shop,Convenience Store,Café,Cantonese Restaurant,Noodle House,Supermarket,...,Bus Stop,Hostel,Korean Restaurant,Paper / Office Supplies Store,Park,Athletics & Sports,Hong Kong Restaurant,Asian Restaurant,Dumpling Restaurant,Vietnamese Restaurant
4,MTR Che Kung Temple station,Fast Food Restaurant,Café,Chinese Restaurant,Cantonese Restaurant,Dessert Shop,Hong Kong Restaurant,Shopping Mall,Train Station,Restaurant,...,Sandwich Place,Cha Chaan Teng,Campground,Seafood Restaurant,Shanghai Restaurant,Bus Stop,Italian Restaurant,Park,Spanish Restaurant,Department Store


In [51]:
# set number of clusters
kclusters = 5

HK_grouped_clustering = HK_grouped.drop('MTRStation', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(HK_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 3, 3, 0, 0, 2, 2, 0, 2, 2])

In [52]:
# add clustering labels
HK_venues_sorted.insert(0, 'ClusterLabels', kmeans.labels_)

HK_merged = MTR_hk

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
HK_merged = HK_merged.join(HK_venues_sorted.set_index('MTRStation'), on='MTRStation')

In [53]:
# create map
hk_center = [22.3193039, 114.1693611]
map_clusters = folium.Map(location=hk_center, zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(HK_merged['latitude'], HK_merged['longitude'], HK_merged['MTRStation'], HK_merged['ClusterLabels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [54]:
HK_Cat = HK_merged[['MTRStation', 'ClusterLabels']]

In [60]:
HK_Cat.to_csv(r'HK_Cat1.csv')

#### Let's look at elements of group with lable = 0

In [55]:
group_0=HK_merged.loc[HK_merged['ClusterLabels'] == 0]
group_0['MTRStation']

3            MTR Chai Wan station
4     MTR Che Kung Temple station
7            MTR City One station
10             MTR Fo Tan station
12           MTR Hang Hau station
14            MTR Heng On station
19    MTR Kam Sheung Road station
25          MTR Kwai Hing station
28           MTR Lai King station
29            MTR Lam Tin station
30           MTR Lei Tung station
34         MTR Ma On Shan station
41             MTR Po Lam station
46            MTR Sha Tin station
47        MTR Sha Tin Wai station
51           MTR Shek Mun station
52        MTR Sheung Shui station
53           MTR Siu Hong station
57      MTR Tai Shui Hang station
58            MTR Tai Wai station
62       MTR Tin Shui Wai station
63      MTR Tiu Keng Leng station
64      MTR Tseung Kwan O station
68           MTR Tuen Mun station
71     MTR Wong Chuk Hang station
73         MTR Wu Kai Sha station
75           MTR Yau Tong station
Name: MTRStation, dtype: object

#### Let's look at elements of group with lable = 1

In [56]:
group_1=HK_merged.loc[HK_merged['ClusterLabels'] == 1]
group_1['MTRStation']

54    MTR Sunny Bay station
Name: MTRStation, dtype: object

#### Let's look at elements of group with lable = 2

In [57]:
group_2=HK_merged.loc[HK_merged['ClusterLabels'] == 2]
group_2['MTRStation']

5     MTR Cheung Sha Wan station
6          MTR Choi Hung station
8       MTR Diamond Hill station
9            MTR Fanling station
13     MTR Heng Fa Chuen station
21       MTR Kowloon Bay station
23      MTR Kowloon Tong station
24         MTR Kwai Fong station
26         MTR Kwun Tong station
27       MTR Lai Chi Kok station
32            MTR Lok Fu station
33         MTR Long Ping station
35           MTR Mei Foo station
37        MTR Nam Cheong station
38      MTR Ngau Tau Kok station
48      MTR Sham Shui Po station
49      MTR Shau Kei Wan station
50      MTR Shek Kip Mei station
56     MTR Tai Po Market station
59        MTR Tai Wo Hau station
60            MTR Tai Wo station
66          MTR Tsing Yi station
67         MTR Tsuen Wan station
72      MTR Wong Tai Sin station
76         MTR Yuen Long station
Name: MTRStation, dtype: object

#### Let's look at elements of group with lable = 3

In [58]:
group_3=HK_merged.loc[HK_merged['ClusterLabels'] == 3]
group_3['MTRStation']

0         MTR Admiralty station
1      MTR Causeway Bay station
2           MTR Central station
11    MTR Fortress Hill station
15              MTR HKU station
16       MTR Ho Man Tin station
17         MTR Hung Hom station
18           MTR Jordan station
20     MTR Kennedy Town station
22          MTR Kowloon station
36         MTR Mong Kok station
39      MTR North Point station
40          MTR Olympic station
42    MTR Prince Edward station
43       MTR Quarry Bay station
44       MTR Sai Wan Ho station
45     MTR Sai Ying Pun station
55          MTR Tai Koo station
61          MTR Tin Hau station
65    MTR Tsim Sha Tsui station
69       MTR Tung Chung station
70          MTR Whampoa station
74       MTR Yau Ma Tei station
Name: MTRStation, dtype: object

#### Let's look at elements of group with lable = 4

In [59]:
group_4=HK_merged.loc[HK_merged['ClusterLabels'] == 4]
group_4['MTRStation']

31    MTR LOHAS Park station
Name: MTRStation, dtype: object

As Group 1 and Group 4 have only 1 element each, we could ignore these two groups and discard them. 
After examing elements of group 0, 2 and 3, we could re-classify each group as followings:

Group 0: New Development Area 

Group 2: Traditional Residential Area

Group 3: High Value Central Area

#### Split the MTR grouping into 3 seperate table

In [70]:
# Creating table for High Value Central Area
HighValue = group_3[['MTRStation']]
HighValue = HighValue.reset_index(drop=True)
HighValue.head()

Unnamed: 0,MTRStation
0,MTR Admiralty station
1,MTR Causeway Bay station
2,MTR Central station
3,MTR Fortress Hill station
4,MTR HKU station


In [72]:
# Creating table for Traditional Residential Area
TradRes = group_2[['MTRStation']]
TradRes = TradRes.reset_index(drop=True)
TradRes.head()

Unnamed: 0,MTRStation
0,MTR Cheung Sha Wan station
1,MTR Choi Hung station
2,MTR Diamond Hill station
3,MTR Fanling station
4,MTR Heng Fa Chuen station


In [78]:
# Creating table for New Development Area
NewDev = group_0[['MTRStation']]
NewDev = NewDev.reset_index(drop=True)
NewDev.shape

(27, 1)

In [82]:
# Import Population density (people/sq.Km)
MTR_Pdensity = pd.read_csv('MTR_Pdensity.csv') # data obtained from (https://en.wikipedia.org/wiki/Districts_of_Hong_Kong)

In [97]:
HK_cluster2 = NewDev.merge(gym_count, on='MTRStation', how='left')
HK_cluster2 = HK_cluster2.merge(MTR_Pdensity, on='MTRStation', how='left')
HK_cluster2['PeoplePerGym'] = HK_cluster2['Pop_density']*math.pi*1.5*1.5/HK_cluster2['count']
HK_cluster2 = HK_cluster2.sort_values('PeoplePerGym', ascending=False)
HK_cluster2.to_csv(r'HK_cluster2.csv')
HK_cluster2.head(3)

Unnamed: 0,MTRStation,count,District,Pop_density,PeoplePerGym
26,MTR Yau Tong station,1,Kwun Tong,56779.05,401347.454305
0,MTR Chai Wan station,3,Eastern,31217.67,73554.902051
9,MTR Lam Tin station,6,Kwun Tong,56779.05,66891.242384


#### HK_cluster1 is the dataframe that contains name of MTR stations and its corresponding labels

### B. Determine how "crowded" is the gym/fitness center market in the neighbor of all the MTR stations

Determine how "crowded" is the gym/fitness center market in the neighbor of all the MTR stations:

We define the term "PeoplePerGym" as the number of people serves by 1 gym in a particular MTR station.
It is calculated by "Pop_density" (population density) x (1.5km x 1.5km xpi) / "count" (number of gym/fitness center)  


In [116]:
# Import Population density (people/sq.Km)
MTR_Pdensity = pd.read_csv('MTR_Pdensity.csv') # data obtained from (https://en.wikipedia.org/wiki/Districts_of_Hong_Kong)

#### Add "PeoplePerGym"  to group "New Development Area" and sort it

In [117]:
NewDev1 = NewDev.merge(gym_count, on='MTRStation', how='left')
NewDev1 = NewDev1.merge(MTR_Pdensity, on='MTRStation', how='left')
NewDev1['PeoplePerGym'] = NewDev1['Pop_density']*math.pi*1.5*1.5/NewDev1['count']
NewDev1 = NewDev1.sort_values('PeoplePerGym', ascending=False) # sort the dataframe according to PeoplePerGym value in descending order
NewDev1 = NewDev1.reset_index(drop=True)

#### Add "PeoplePerGym"  to group "Traditional Residential Area" and sort it

In [118]:
TradRes1 = TradRes.merge(gym_count, on='MTRStation', how='left')
TradRes1 = TradRes1.merge(MTR_Pdensity, on='MTRStation', how='left')
TradRes1['PeoplePerGym'] = TradRes1['Pop_density']*math.pi*1.5*1.5/TradRes1['count']
TradRes1 = TradRes1.sort_values('PeoplePerGym', ascending=False) # sort the dataframe according to PeoplePerGym value in descending order
TradRes1 = TradRes1.reset_index(drop=True)

#### Add "PeoplePerGym"  to group "High Value Central Area" and sort it

In [119]:
HighValue1 = HighValue.merge(gym_count, on='MTRStation', how='left')
HighValue1 = HighValue1.merge(MTR_Pdensity, on='MTRStation', how='left')
HighValue1['PeoplePerGym'] = HighValue1['Pop_density']*math.pi*1.5*1.5/HighValue1['count']
HighValue1 = HighValue1.sort_values('PeoplePerGym', ascending=False) # sort the dataframe according to PeoplePerGym value in descending order
HighValue1 = HighValue1.reset_index(drop=True)

## Conclusion: AT20 new studios location recommendation

#### MTR Stations in Group 'New Development Area'

In [128]:
NewDev1.loc[0:2,'MTRStation'] #top 3 MTR stations with the highest 'PeoplePerGym' value

0    MTR Yau Tong station
1    MTR Chai Wan station
2     MTR Lam Tin station
Name: MTRStation, dtype: object

#### MTR Stations in Group 'Traditional Residential Area'

In [129]:
TradRes1.loc[0:2,'MTRStation'] #top 3 MTR stations with the highest 'PeoplePerGym' value

0    MTR Cheung Sha Wan station
1      MTR Shau Kei Wan station
2      MTR Diamond Hill station
Name: MTRStation, dtype: object

#### MTR Stations in Group 'High Value Central Area

In [130]:
HighValue1.loc[0:2,'MTRStation'] #top 3 MTR stations with the highest 'PeoplePerGym' value

0          MTR Whampoa station
1         MTR Mong Kok station
2    MTR Prince Edward station
Name: MTRStation, dtype: object

In [134]:
HighValue1.head(3)

Unnamed: 0,MTRStation,count,District,Pop_density,PeoplePerGym
0,MTR Whampoa station,11,Kowloon City,40194.7,25829.05382
1,MTR Mong Kok station,18,Yau Tsim Mong,44864.09,17618.086944
2,MTR Prince Edward station,18,Yau Tsim Mong,44864.09,17618.086944
