<b> Battle of the Neighbourhoods - Week 1 </b>

<b>Introduction</b>

<b>Clearly define a problem or an idea of your choice, where you would need to leverage the Foursquare location data to solve or execute. Remember that data science problems always target an audience and are meant to help a group of stakeholders solve a problem, so make sure that you explicitly describe your audience and why they would care about your problem.</b>

Problem: Choosing schools often is a huge decision for parents, especially for weary-eyed parents hoping to have their kids stay in school and study as far away from 'attractions' as possible. There will be a couple of assumptions for the below problems
1) These parents are in Hong Kong and are placing kids into International Secondary Schools (kids have ability to roam free either during lunch or immediately after school). Hence not considering primary schools or kindergartens.
2) Some stereotypical attraction-joints that parents are keen to avoid include arcades, gaming-areas, shopping malls, bars, clubs etc. Let this list not be exhaustive, but also be known as "the list".
3) The consideration would be "walking distance" from schools, definted as less than 200 meters.

By no means does this study aim to define what is the best school just by highlighting what attractions there are, but it aims to highlight the different or similarity on shops are around schools. If there are any venues that are mentioned in "the list" it was also be flagged.

<b> Data Section </b>


<b>Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.</b>

The data that I would be using includes the list of International Schools listed by the HK Government as of 2017. The file is 'schools.csv'. Data from fourquare of some exploratory attractions around the school. 

Methodology:

The longitude and latitude data are provided by a CSV by the government. However it uses a Degree, Minutes and Seconds for Latitude and Longitude, so there needs to be a conversion to Decimal Degree Latitude and Longitude. So this would rquire some data cleaning. 

Then after isolating and getting the correct lat long data form the office source, this would then be used to query foursquare API for the nearby venues within a limit of 200 meters.

After the result is stored and queries, we will run a kmeans to seperate the different clusters and see if there are different / similarities of these cluster-venues around the schools. 



In [508]:
import requests
import pandas as pd
from pandas.io.json import json_normalize
import numpy as np
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
import folium

In [509]:
df_schools = pd.read_csv('schools.csv', header=0, index_col=False)

In [510]:
df_schools.head()

Unnamed: 0,ENGLISH CATEGORY,ENGLISH NAME,ENGLISH ADDRESS,LONGITUDE,LATITUDE,EASTING,NORTHING,SCHOOL NO.,SCHOOL NO..1,STUDENTS GENDER,SESSION,DISTRICT,FINANCE TYPE,SCHOOL LEVEL,OPENING HOURS,TELEPHONE,FAX NUMBER,EMAIL ADDRESS
0,International Schools (Secondary),DELIA SCHOOL OF CANADA,TAIKOO SHING QUARRY BAY HONG KONG,114-13-10,22-17-7,840649.6,816269.8,216000000000.0,216000000000.0,CO-ED,WHOLE DAY,EASTERN,PRIVATE,SECONDARY,,36580388,28860813,
1,International Schools (Secondary),HARROW INTERNATIONAL SCHOOL HONG KONG,38 TSING YING ROAD TUEN MUN NEW TERRITORIES,113-59-29,22-22-35,817163.3,826354.0,591000000000.0,591000000000.0,CO-ED,WHOLE DAY,TUEN MUN,PRIVATE,SECONDARY,,28249099,28249928,
2,International Schools (Secondary),KOREAN INTERNATIONAL SCHOOL,55 LEI KING ROAD SAI WAN HO (IL 8802) HONG KONG,114-13-22,22-17-7,841000.4,816260.2,216000000000.0,216000000000.0,CO-ED,WHOLE DAY,EASTERN,PRIVATE,SECONDARY,,25695500,25605699,
3,International Schools (Secondary),KELLETT SCHOOL,7 LAM HING STREET KOWLOON BAY KOWLOON (EXCLUDI...,114-12-23,22-19-28,839317.4,820615.1,215000000000.0,215000000000.0,CO-ED,WHOLE DAY,KWUN TONG,PRIVATE,SECONDARY,,31200700,23052292,
4,International Schools (Secondary),AMERICAN SCHOOL HONG KONG,"6 MA CHUNG ROAD, TAI PO, NEW TERRITORIES (EXCE...",114-9-53,22-26-33,835003.6,833676.8,604000000000.0,604000000000.0,CO-ED,WHOLE DAY,TAI PO,PRIVATE,SECONDARY,,39194100,39194112,


In [511]:
df_schools = df_schools[['ENGLISH NAME', 'ENGLISH ADDRESS', 'DISTRICT', 'LATITUDE', 'LONGITUDE','SCHOOL LEVEL']]

In [512]:
# Regular expression to remove all parenthesis + contents in parenthesis in column English Address
df_schools['ENGLISH ADDRESS'] = df_schools['ENGLISH ADDRESS'].str.replace(r'\(.*\)','')

In [513]:
df_schools

Unnamed: 0,ENGLISH NAME,ENGLISH ADDRESS,DISTRICT,LATITUDE,LONGITUDE,SCHOOL LEVEL
0,DELIA SCHOOL OF CANADA,TAIKOO SHING QUARRY BAY HONG KONG,EASTERN,22-17-7,114-13-10,SECONDARY
1,HARROW INTERNATIONAL SCHOOL HONG KONG,38 TSING YING ROAD TUEN MUN NEW TERRITORIES,TUEN MUN,22-22-35,113-59-29,SECONDARY
2,KOREAN INTERNATIONAL SCHOOL,55 LEI KING ROAD SAI WAN HO HONG KONG,EASTERN,22-17-7,114-13-22,SECONDARY
3,KELLETT SCHOOL,7 LAM HING STREET KOWLOON BAY KOWLOON,KWUN TONG,22-19-28,114-12-23,SECONDARY
4,AMERICAN SCHOOL HONG KONG,"6 MA CHUNG ROAD, TAI PO, NEW TERRITORIES",TAI PO,22-26-33,114-9-53,SECONDARY
5,CHRISTIAN ALLIANCE INTERNATIONAL SCHOOL,"33 KING LAM STREET, CHEUNG SHA WAN, KOWLOON",SHAM SHUI PO,22-20-20,114-8-47,SECONDARY
6,AMERICAN INTERNATIONAL SCHOOL,129 WATERLOO ROAD KOWLOON TONG KOWLOON,KOWLOON CITY,22-19-53,114-10-42,SECONDARY
7,"NORD ANGLIA INTERNATIONAL SCHOOL, HONG KONG",11 ON TIN STREET LAM TIN KOWLOON,KWUN TONG,22-18-33,114-14-16,SECONDARY
8,CARMEL SCHOOL,460 SHAU KEI WAN ROAD HONG KONG,EASTERN,22-16-35,114-13-47,SECONDARY
9,CONCORDIA INTERNATIONAL SCHOOL,YAU YAT CHUEN 68 BEGONIA ROAD SHAMSHUIPO KOWLOON,SHAM SHUI PO,22-19-53,114-10-20,SECONDARY


In [514]:
df_schools[['long_degree','long_minute','long_second']] = df_schools['LONGITUDE'].str.split("-", expand=True) 

In [515]:
df_schools[['lat_degree','lat_minute','lat_second']] = df_schools['LATITUDE'].str.split("-", expand=True) 

In [516]:
df_schools.drop(['LATITUDE','LONGITUDE'], axis=1,inplace=True)

In [517]:
df_schools.head()

Unnamed: 0,ENGLISH NAME,ENGLISH ADDRESS,DISTRICT,SCHOOL LEVEL,long_degree,long_minute,long_second,lat_degree,lat_minute,lat_second
0,DELIA SCHOOL OF CANADA,TAIKOO SHING QUARRY BAY HONG KONG,EASTERN,SECONDARY,114,13,10,22,17,7
1,HARROW INTERNATIONAL SCHOOL HONG KONG,38 TSING YING ROAD TUEN MUN NEW TERRITORIES,TUEN MUN,SECONDARY,113,59,29,22,22,35
2,KOREAN INTERNATIONAL SCHOOL,55 LEI KING ROAD SAI WAN HO HONG KONG,EASTERN,SECONDARY,114,13,22,22,17,7
3,KELLETT SCHOOL,7 LAM HING STREET KOWLOON BAY KOWLOON,KWUN TONG,SECONDARY,114,12,23,22,19,28
4,AMERICAN SCHOOL HONG KONG,"6 MA CHUNG ROAD, TAI PO, NEW TERRITORIES",TAI PO,SECONDARY,114,9,53,22,26,33


In [518]:
# change dtype as float
df_schools[['long_degree','long_minute','long_second','lat_degree','lat_minute','lat_second']] = df_schools[['long_degree','long_minute','long_second','lat_degree','lat_minute','lat_second']].astype('float')

In [519]:
df_schools[['long_minute','lat_minute']] = df_schools[['long_minute','lat_minute']]/60

In [520]:
df_schools[['long_second', 'lat_second']] = df_schools[['long_minute','lat_minute']]/3600

In [521]:
df_schools['latitude_dd'] = df_schools['lat_degree'] + df_schools['lat_minute'] + df_schools['lat_second'] 

In [522]:
df_schools['longitude_dd'] = df_schools['long_degree'] + df_schools['long_minute'] + df_schools['long_second'] 

In [523]:
df_schools = df_schools.drop(['long_degree','long_minute','long_second','lat_degree','lat_minute','lat_second'], axis=1)

In [524]:
df_schools.rename(columns={'ENGLISH NAME': 'Schools'}, inplace=True)

In [525]:
df_schools.head()

Unnamed: 0,Schools,ENGLISH ADDRESS,DISTRICT,SCHOOL LEVEL,latitude_dd,longitude_dd
0,DELIA SCHOOL OF CANADA,TAIKOO SHING QUARRY BAY HONG KONG,EASTERN,SECONDARY,22.283412,114.216727
1,HARROW INTERNATIONAL SCHOOL HONG KONG,38 TSING YING ROAD TUEN MUN NEW TERRITORIES,TUEN MUN,SECONDARY,22.366769,113.983606
2,KOREAN INTERNATIONAL SCHOOL,55 LEI KING ROAD SAI WAN HO HONG KONG,EASTERN,SECONDARY,22.283412,114.216727
3,KELLETT SCHOOL,7 LAM HING STREET KOWLOON BAY KOWLOON,KWUN TONG,SECONDARY,22.316755,114.200056
4,AMERICAN SCHOOL HONG KONG,"6 MA CHUNG ROAD, TAI PO, NEW TERRITORIES",TAI PO,SECONDARY,22.433454,114.150042


In [526]:
address = 'Central, HK'

geolocator = Nominatim(user_agent="coursera_assignment")
location = geolocator.geocode(address)
hk_latitude = location.latitude
hk_longitude = location.longitude
print('The geograpical coordinate of Hong Kong are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Hong Kong are 22.350627, 114.1849161.


In [527]:
# create map of Hong Kong using latitude and longitude values
map_hongkong = folium.Map(location=[hk_latitude, hk_longitude], zoom_start=11)

# add markers to map. Zipping the columns of the DF, lat long and labels
for lat, lng, label in zip(df_schools['latitude_dd'], df_schools['longitude_dd'], df_schools['Schools']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_hongkong)  
    
map_hongkong

In [568]:
# Search Query for Exploration
CLIENT_ID = '----'
CLIENT_SECRET = '----' 
VERSION = '20180604'
radius = 200 # in meters
LIMIT=100

In [569]:
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    hk_latitude, 
    hk_longitude, 
    radius, 
    LIMIT)

In [530]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ba88d3e4434b92747e390b0'},
  'headerLocation': 'Sha Tin District',
  'headerFullLocation': 'Sha Tin District, Hong Kong',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 2,
  'suggestedBounds': {'ne': {'lat': 22.352427001800002,
    'lng': 114.18685867905498},
   'sw': {'lat': 22.348826998199996, 'lng': 114.18297352094501}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4f33b6d6e4b0cd90f6c955f5',
       'name': 'Masamura Sushi Restaurant 正村壽司',
       'location': {'address': 'Shop U108, Lok Fu Plaza, 198 Junction Rd',
        'lat': 22.35,
        'lng': 114.18333000000001,
        'labeledLatLngs': [{'label': 'display',
          'lat': 22.35,
          'lng': 114.18333000000001}],
        'distance': 177,
       

In [531]:
# function from Coursera module that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [532]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON
print(nearby_venues.head())
# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']

# sub-selecting or slicing the dataframe
nearby_venues =nearby_venues.loc[:, filtered_columns]


   reasons.count                                      reasons.items  \
0              0  [{'summary': 'This spot is popular', 'type': '...   
1              0  [{'summary': 'This spot is popular', 'type': '...   

                       referralId  \
0  e-0-4f33b6d6e4b0cd90f6c955f5-0   
1  e-0-5ba5d4be029a5500398f51c3-1   

                                    venue.categories  \
0  [{'id': '4bf58dd8d48988d1d2941735', 'name': 'S...   
1  [{'id': '4eb1d4d54b900d56c88a45fc', 'name': 'M...   

                   venue.id                    venue.location.address  \
0  4f33b6d6e4b0cd90f6c955f5  Shop U108, Lok Fu Plaza, 198 Junction Rd   
1  5ba5d4be029a5500398f51c3                                       NaN   

  venue.location.cc venue.location.city venue.location.country  \
0                HK        Wang Tau Hom                     香港   
1                HK                 NaN                     香港   

   venue.location.distance                 venue.location.formattedAddress  \
0       

In [533]:

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Masamura Sushi Restaurant 正村壽司,Sushi Restaurant,22.35,114.18333
1,ライオンの頭,Mountain,22.35218,114.184864


In [536]:
def getNearbyVenuesToSchool(schools, latitudes, longitudes, radius=200):
    
    venues_list=[]
    for name, lat, lng in zip(schools, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Schools', 
                  'School Latitude', 
                  'School Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [537]:
venues_around_schools = getNearbyVenuesToSchool(df_schools['Schools'], df_schools['latitude_dd'], df_schools['longitude_dd'])


DELIA SCHOOL OF CANADA
HARROW INTERNATIONAL SCHOOL HONG KONG
KOREAN INTERNATIONAL SCHOOL
KELLETT SCHOOL
AMERICAN SCHOOL HONG KONG
CHRISTIAN ALLIANCE INTERNATIONAL SCHOOL
AMERICAN INTERNATIONAL SCHOOL
NORD ANGLIA INTERNATIONAL SCHOOL, HONG KONG
CARMEL SCHOOL
CONCORDIA INTERNATIONAL SCHOOL
STAMFORD AMERICAN SCHOOL HONG KONG
CANADIAN INTERNATIONAL SCHOOL
DISCOVERY BAY INTERNATIONAL SCHOOL
KIANGSU-CHEKIANG COLLEGE
SAINT TOO SEAR ROGERS INTERNATIONAL SCHOOL
HONGKONG JAPANESE SCHOOL
AUSTRALIAN INTERNATIONAL SCH HK
INTERNATIONAL COLLEGE HONG KONG (NEW TERRITORIES)
CHINESE INTERNATIONAL SCHOOL
SINGAPORE INTERNATIONAL SCH (HONG KONG)
LYC'EE FRANCAIS INTL (FRENCH INTL SCH)
GERMAN SWISS INTERNATIONAL SCHOOL
HONG KONG ACADEMY
HONG KONG INTERNATIONAL SCHOOL


In [538]:
venues_around_schools.shape

(64, 7)

In [539]:
venues_around_schools.head()

Unnamed: 0,Schools,School Latitude,School Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,DELIA SCHOOL OF CANADA,22.283412,114.216727,Ootoya (大戶屋),22.284785,114.216599,Japanese Restaurant
1,DELIA SCHOOL OF CANADA,22.283412,114.216727,Genki Sushi 元気寿司,22.284595,114.215765,Sushi Restaurant
2,DELIA SCHOOL OF CANADA,22.283412,114.216727,Aeon Style (永旺),22.284673,114.216222,Department Store
3,DELIA SCHOOL OF CANADA,22.283412,114.216727,Waterfall Sports & Wellness,22.284252,114.216249,Gym / Fitness Center
4,DELIA SCHOOL OF CANADA,22.283412,114.216727,Starbucks Coffee (星巴克),22.284953,114.216131,Coffee Shop


In [540]:
venues_around_schools.groupby('Schools').count()

Unnamed: 0_level_0,School Latitude,School Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Schools,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AMERICAN INTERNATIONAL SCHOOL,3,3,3,3,3,3
AUSTRALIAN INTERNATIONAL SCH HK,7,7,7,7,7,7
CANADIAN INTERNATIONAL SCHOOL,2,2,2,2,2,2
CHINESE INTERNATIONAL SCHOOL,7,7,7,7,7,7
CONCORDIA INTERNATIONAL SCHOOL,3,3,3,3,3,3
DELIA SCHOOL OF CANADA,12,12,12,12,12,12
GERMAN SWISS INTERNATIONAL SCHOOL,2,2,2,2,2,2
HONG KONG ACADEMY,1,1,1,1,1,1
KIANGSU-CHEKIANG COLLEGE,6,6,6,6,6,6
KOREAN INTERNATIONAL SCHOOL,12,12,12,12,12,12


In [541]:
schools_onehot = pd.get_dummies(venues_around_schools[['Venue Category']], prefix="", prefix_sep="")


In [542]:
schools_onehot.head()

Unnamed: 0,Bus Station,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House,Coffee Shop,Convenience Store,...,Pizza Place,Public Art,Shabu-Shabu Restaurant,Shanghai Restaurant,Shopping Mall,Supermarket,Sushi Restaurant,Tea Room,Theme Park Ride / Attraction,Udon Restaurant
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0


In [543]:
schools_onehot['Schools'] = df_schools['Schools']

In [544]:
column_order = [schools_onehot.columns[-1]] + list(schools_onehot.columns[:-1])
print(column_order)

['Schools', 'Bus Station', 'Bus Stop', 'Café', 'Cantonese Restaurant', 'Caribbean Restaurant', 'Cha Chaan Teng', 'Chinese Restaurant', 'Club House', 'Coffee Shop', 'Convenience Store', 'Deli / Bodega', 'Department Store', 'Dim Sum Restaurant', 'Fast Food Restaurant', 'Grocery Store', 'Gym / Fitness Center', 'Hotel', 'Japanese Restaurant', 'Lingerie Store', 'Market', 'Pier', 'Pizza Place', 'Public Art', 'Shabu-Shabu Restaurant', 'Shanghai Restaurant', 'Shopping Mall', 'Supermarket', 'Sushi Restaurant', 'Tea Room', 'Theme Park Ride / Attraction', 'Udon Restaurant']


In [545]:
schools_onehot = schools_onehot[column_order]

In [546]:
schools_onehot.head()

Unnamed: 0,Schools,Bus Station,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House,Coffee Shop,...,Pizza Place,Public Art,Shabu-Shabu Restaurant,Shanghai Restaurant,Shopping Mall,Supermarket,Sushi Restaurant,Tea Room,Theme Park Ride / Attraction,Udon Restaurant
0,DELIA SCHOOL OF CANADA,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,HARROW INTERNATIONAL SCHOOL HONG KONG,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
2,KOREAN INTERNATIONAL SCHOOL,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,KELLETT SCHOOL,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,AMERICAN SCHOOL HONG KONG,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0


In [547]:
schools_onehot.shape

(64, 32)

In [548]:
schools_venue_grouped = schools_onehot.groupby('Schools').mean().reset_index()

In [549]:
schools_venue_grouped

Unnamed: 0,Schools,Bus Station,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House,Coffee Shop,...,Pizza Place,Public Art,Shabu-Shabu Restaurant,Shanghai Restaurant,Shopping Mall,Supermarket,Sushi Restaurant,Tea Room,Theme Park Ride / Attraction,Udon Restaurant
0,AMERICAN INTERNATIONAL SCHOOL,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,AMERICAN SCHOOL HONG KONG,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
2,AUSTRALIAN INTERNATIONAL SCH HK,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
3,CANADIAN INTERNATIONAL SCHOOL,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
4,CARMEL SCHOOL,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,CHINESE INTERNATIONAL SCHOOL,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,CHRISTIAN ALLIANCE INTERNATIONAL SCHOOL,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,CONCORDIA INTERNATIONAL SCHOOL,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
8,DELIA SCHOOL OF CANADA,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,DISCOVERY BAY INTERNATIONAL SCHOOL,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [550]:
schools_venue_grouped.shape

(24, 32)

In [551]:
top_10_venues = 10

for school in schools_venue_grouped['Schools']:
    print(school + "--------")
    temp_df = schools_venue_grouped[schools_venue_grouped['Schools'] == school].T.reset_index()
    temp_df.columns = ['Venue','Rank'] #renames columns, reset index pushed the column in.
    temp_df = temp_df.iloc[1:] #slices rows to ignore the
    temp['Rank'] = temp_df['Rank'].astype(float)
    temp_df = temp_df.round(decimals=2)
    print(temp_df.sort_values('Rank', ascending=False).reset_index(drop=True).head(top_10_venues))
    print('\n')


AMERICAN INTERNATIONAL SCHOOL--------
                          Venue Rank
0                          Café    1
1                   Bus Station    0
2                         Hotel    0
3  Theme Park Ride / Attraction    0
4                      Tea Room    0
5              Sushi Restaurant    0
6                   Supermarket    0
7                 Shopping Mall    0
8           Shanghai Restaurant    0
9        Shabu-Shabu Restaurant    0


AMERICAN SCHOOL HONG KONG--------
                          Venue Rank
0                   Coffee Shop    1
1                   Bus Station    0
2                         Hotel    0
3  Theme Park Ride / Attraction    0
4                      Tea Room    0
5              Sushi Restaurant    0
6                   Supermarket    0
7                 Shopping Mall    0
8           Shanghai Restaurant    0
9        Shabu-Shabu Restaurant    0


AUSTRALIAN INTERNATIONAL SCH HK--------
                          Venue Rank
0                   Coffee Shop  

In [552]:
def return_most_common_venues(row, top_10_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:top_10_venues]

In [553]:
top_10_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Schools']
for index in np.arange(top_10_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(index+1, indicators[index]))
    except:
        columns.append('{}th Most Common Venue'.format(index+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Schools'] = schools_venue_grouped['Schools']

for index in np.arange(schools_venue_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[index, 1:] = return_most_common_venues(schools_venue_grouped.iloc[index, :], top_10_venues)

neighborhoods_venues_sorted

Unnamed: 0,Schools,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,AMERICAN INTERNATIONAL SCHOOL,Café,Udon Restaurant,Grocery Store,Bus Stop,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House,Coffee Shop
1,AMERICAN SCHOOL HONG KONG,Coffee Shop,Udon Restaurant,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House
2,AUSTRALIAN INTERNATIONAL SCH HK,Coffee Shop,Udon Restaurant,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House
3,CANADIAN INTERNATIONAL SCHOOL,Supermarket,Udon Restaurant,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House
4,CARMEL SCHOOL,Deli / Bodega,Udon Restaurant,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House
5,CHINESE INTERNATIONAL SCHOOL,Café,Udon Restaurant,Grocery Store,Bus Stop,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House,Coffee Shop
6,CHRISTIAN ALLIANCE INTERNATIONAL SCHOOL,Cantonese Restaurant,Udon Restaurant,Grocery Store,Bus Stop,Café,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House,Coffee Shop
7,CONCORDIA INTERNATIONAL SCHOOL,Udon Restaurant,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House,Coffee Shop
8,DELIA SCHOOL OF CANADA,Japanese Restaurant,Udon Restaurant,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House
9,DISCOVERY BAY INTERNATIONAL SCHOOL,Japanese Restaurant,Udon Restaurant,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House


In [554]:
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

In [555]:
# Cluster Schools
kclusters = 5

schools_venue_grouped_cluster  = schools_venue_grouped.drop(['Schools'],1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(schools_venue_grouped_cluster)

kmeans.labels_[0:10]


array([0, 0, 0, 0, 2, 0, 3, 0, 0, 0], dtype=int32)

In [556]:
schools_merged = df_schools

In [557]:
schools_merged['Cluster_Labels'] = kmeans.labels_

schools_merged = schools_merged.join(neighborhoods_venues_sorted.set_index('Schools'), on='Schools')

schools_merged.head()

Unnamed: 0,Schools,ENGLISH ADDRESS,DISTRICT,SCHOOL LEVEL,latitude_dd,longitude_dd,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,DELIA SCHOOL OF CANADA,TAIKOO SHING QUARRY BAY HONG KONG,EASTERN,SECONDARY,22.283412,114.216727,0,Japanese Restaurant,Udon Restaurant,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House
1,HARROW INTERNATIONAL SCHOOL HONG KONG,38 TSING YING ROAD TUEN MUN NEW TERRITORIES,TUEN MUN,SECONDARY,22.366769,113.983606,0,Sushi Restaurant,Udon Restaurant,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House
2,KOREAN INTERNATIONAL SCHOOL,55 LEI KING ROAD SAI WAN HO HONG KONG,EASTERN,SECONDARY,22.283412,114.216727,0,Department Store,Udon Restaurant,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House
3,KELLETT SCHOOL,7 LAM HING STREET KOWLOON BAY KOWLOON,KWUN TONG,SECONDARY,22.316755,114.200056,0,Gym / Fitness Center,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House,Coffee Shop
4,AMERICAN SCHOOL HONG KONG,"6 MA CHUNG ROAD, TAI PO, NEW TERRITORIES",TAI PO,SECONDARY,22.433454,114.150042,2,Coffee Shop,Udon Restaurant,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House


In [558]:
# create map with Folium
map_clusters = folium.Map(location=[hk_latitude, hk_longitude], zoom_start=10)

x = np.arange(kclusters)# add markers to the map
ys = [i+x+(i*x)**2 for i in range(kclusters)]
markers_colors = []
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

for lat, lon, poi, cluster in zip(schools_merged['latitude_dd'], schools_merged['longitude_dd'], schools_merged['Schools'], schools_merged['Cluster_Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
# add markers to the map
map_clusters


In [559]:
# Cluster 0
schools_merged.loc[schools_merged['Cluster_Labels'] == 0, schools_merged.columns[[1] + list(range(5, schools_merged.shape[1]))]]

Unnamed: 0,ENGLISH ADDRESS,longitude_dd,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,TAIKOO SHING QUARRY BAY HONG KONG,114.216727,0,Japanese Restaurant,Udon Restaurant,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House
1,38 TSING YING ROAD TUEN MUN NEW TERRITORIES,113.983606,0,Sushi Restaurant,Udon Restaurant,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House
2,55 LEI KING ROAD SAI WAN HO HONG KONG,114.216727,0,Department Store,Udon Restaurant,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House
3,7 LAM HING STREET KOWLOON BAY KOWLOON,114.200056,0,Gym / Fitness Center,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House,Coffee Shop
5,"33 KING LAM STREET, CHEUNG SHA WAN, KOWLOON",114.13337,0,Cantonese Restaurant,Udon Restaurant,Grocery Store,Bus Stop,Café,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House,Coffee Shop
7,11 ON TIN STREET LAM TIN KOWLOON,114.233398,0,Convenience Store,Udon Restaurant,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House
8,460 SHAU KEI WAN ROAD HONG KONG,114.216727,0,Deli / Bodega,Udon Restaurant,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House
9,YAU YAT CHUEN 68 BEGONIA ROAD SHAMSHUIPO KOWLOON,114.166713,0,Udon Restaurant,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House,Coffee Shop
10,"25 MAN FUK ROAD, HO MAN TIN, KOWLOON",114.166713,0,Lingerie Store,Udon Restaurant,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House
11,36 NAM LONG SHAN ROAD ABERDEEN HONG KONG,114.166713,0,Supermarket,Udon Restaurant,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House


In [560]:
# Cluster 1
schools_merged.loc[schools_merged['Cluster_Labels'] == 1, schools_merged.columns[[1] + list(range(5, schools_merged.shape[1]))]]

Unnamed: 0,ENGLISH ADDRESS,longitude_dd,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,15 TONG YAM STREET TAI HANG TUNG KOWLOON,114.166713,1,Department Store,Udon Restaurant,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House
16,"3A, NORFOLK ROAD KOWLOON TONG KOWLOON",114.166713,1,Coffee Shop,Udon Restaurant,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House


In [561]:
# Cluster 2
schools_merged.loc[schools_merged['Cluster_Labels'] == 2, schools_merged.columns[[1] + list(range(5, schools_merged.shape[1]))]]

Unnamed: 0,ENGLISH ADDRESS,longitude_dd,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,"6 MA CHUNG ROAD, TAI PO, NEW TERRITORIES",114.150042,2,Coffee Shop,Udon Restaurant,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House
19,2 POLICE SCHOOL ROAD WONG CHUK HANG HONG KONG,114.166713,2,Convenience Store,Udon Restaurant,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House


In [562]:
# Cluster 3
schools_merged.loc[schools_merged['Cluster_Labels'] == 3, schools_merged.columns[[1] + list(range(5, schools_merged.shape[1]))]]

Unnamed: 0,ENGLISH ADDRESS,longitude_dd,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,129 WATERLOO ROAD KOWLOON TONG KOWLOON,114.166713,3,Café,Udon Restaurant,Grocery Store,Bus Stop,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House,Coffee Shop
15,"157 BLUE POOL ROAD, HONG KONG",114.183384,3,Gym / Fitness Center,Grocery Store,Bus Stop,Café,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Chinese Restaurant,Club House,Coffee Shop


In [563]:
# import 
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics
from sklearn.metrics import jaccard_similarity_score
from sklearn.metrics import f1_score


In [566]:
cols =['Algorithm', 'Jaccard', 'F1-Score']
accu_matrix=pd.DataFrame([['kNN', 0, 'NA'], 
                          ['Decision Tree',0,'NA'],
                          ['SVM',0,'NA']],
                          columns=cols)

<b> Conclusion </b>

