Purpose: This is my detailed final peer reviewed assignment for the IBM Data Science Professional Certificate program – Coursera Capstone.


 INTRODUCTION: Singapore is a city country and one of the most visited places in Asia. There are number of travellers who seek information about Singapore
while planning a visit to the country. They look for information like places to visit, travel mode, shoping avenues and stay during their visit. This project
is built to provide a data centric recommendation that can enhance the correctness of the recommendation based on available data. 


## First we will import libraries required for the task 
Note: We can import additional libraries wherever required 

In [1]:
!conda install -c conda-forge folium=0.5.0 --yes # comment/uncomment if not yet installed.
!conda install -c conda-forge geopy --yes        # comment/uncomment if not yet installed

import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis

# Numpy and Pandas libraries were already imported at the beginning of this notebook.
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
import folium # map rendering library

import requests # library to handle requests
import lxml.html as lh
import bs4 as bs
import urllib.request
print('Libraries imported.')

Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
folium                    0.5.0                      py_0    conda-forge
Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
geopy                     1.19.0                     py_0    conda-forge
Libraries imported.


In [2]:
from IPython.display import HTML
import base64

# Extra Helper scripts to generate download links for saved dataframes in csv format.
def create_download_link( df, title = "Download CSV file", filename = "data.csv"):  
    csv = df.to_csv()
    b64 = base64.b64encode(csv.encode())
    payload = b64.decode()
    html = '<a download="{filename}" href="data:text/csv;base64,{payload}" target="_blank">{title}</a>'
    html = html.format(payload=payload,title=title,filename=filename)
    return HTML(html)

1. Downloading Singapore towns list with and median residential rental prices

In [3]:
import zipfile
import os
!wget -q -O 'median-rent-by-town-and-flat-type.zip' "https://data.gov.sg/dataset/b35046dc-7428-4cff-968d-ef4c3e9e6c99/download"
zf = zipfile.ZipFile('./median-rent-by-town-and-flat-type.zip')
sgp_median_rent_by_town_data = pd.read_csv(zf.open("median-rent-by-town-and-flat-type.csv"))
sgp_median_rent_by_town_data.rename(columns = {'town':'Town'}, inplace = True)
sgp_median_rent_by_town_data.head()

Unnamed: 0,quarter,Town,flat_type,median_rent
0,2005-Q2,ANG MO KIO,1-RM,na
1,2005-Q2,ANG MO KIO,2-RM,na
2,2005-Q2,ANG MO KIO,3-RM,800
3,2005-Q2,ANG MO KIO,4-RM,950
4,2005-Q2,ANG MO KIO,5-RM,-


Data Cleanup and re-grouping.

The retrieved table contains some un-wanted entries and needs some cleanup. The following tasks will be performed:

    Drop/ignore cells with missing data.
    Use most recent data record.
   

In [4]:
# Drop rows with rental price == 'na'.
sgp_median_rent_by_town_data_filter=sgp_median_rent_by_town_data[~sgp_median_rent_by_town_data['median_rent'].isin(['-','na'])]

# Take the most recent report which is "2018-Q2"
sgp_median_rent_by_town_data_filter=sgp_median_rent_by_town_data_filter[sgp_median_rent_by_town_data_filter['quarter'] == "2018-Q2"]

# Now that all rows reports are "2018-Q2", we dont need this column anymore.
sgp_median_rent_by_town_data_filter=sgp_median_rent_by_town_data_filter.drop(['quarter'], axis=1)

# Ensure that median_rent column is float64.
sgp_median_rent_by_town_data_filter['median_rent']=sgp_median_rent_by_town_data_filter['median_rent'].astype(np.float64)


In [5]:
singapore_average_rental_prices_by_town = sgp_median_rent_by_town_data_filter.groupby(['Town'])['median_rent'].mean().reset_index()
singapore_average_rental_prices_by_town

Unnamed: 0,Town,median_rent
0,ANG MO KIO,2033.333333
1,BEDOK,2087.5
2,BISHAN,2233.333333
3,BUKIT BATOK,1962.5
4,BUKIT MERAH,2162.5
5,BUKIT PANJANG,1737.5
6,CENTRAL,2450.0
7,CHOA CHU KANG,1933.333333
8,CLEMENTI,2263.333333
9,GEYLANG,2166.666667


Adding geographical coordinates of each town location.

In [65]:
    geo = Nominatim(user_agent='Mypythonapi')
    for idx,town in singapore_average_rental_prices_by_town['Town'].iteritems():
        coord = geo.geocode(town + ' ' + "Singapore", timeout = 10)
        if coord:
            singapore_average_rental_prices_by_town.loc[idx,'Latitude'] = coord.latitude
            singapore_average_rental_prices_by_town.loc[idx,'Longitude'] = coord.longitude
        else:
            singapore_average_rental_prices_by_town.loc[idx,'Latitude'] = NULL
            singapore_average_rental_prices_by_town.loc[idx,'Longitude'] = NULL


In [66]:
singapore_average_rental_prices_by_town.set_index("Town")

Unnamed: 0_level_0,median_rent,Latitude,Longitude
Town,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ANG MO KIO,2033.333333,1.369842,103.846609
BEDOK,2087.5,1.323976,103.930216
BISHAN,2233.333333,1.351455,103.848263
BUKIT BATOK,1962.5,1.349057,103.749591
BUKIT MERAH,2162.5,1.280628,103.830591
BUKIT PANJANG,1737.5,1.377921,103.771866
CENTRAL,2450.0,1.290475,103.852036
CHOA CHU KANG,1933.333333,1.38926,103.743728
CLEMENTI,2263.333333,1.314026,103.76241
GEYLANG,2166.666667,1.318186,103.887056


Now that we have latitude and longitude of Singapore we generate a basic map of Singapore 

In [8]:
geo = Nominatim(user_agent='My-IBMNotebook')
address = 'Singapore'
location = geo.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Singapore {}, {}.'.format(latitude, longitude))

# create map of Singapore using latitude and longitude values
map_singapore = folium.Map(location=[latitude, longitude],tiles="OpenStreetMap", zoom_start=10)

# add markers to map
for lat, lng, town in zip(
    singapore_average_rental_prices_by_town['Latitude'],
    singapore_average_rental_prices_by_town['Longitude'],
    singapore_average_rental_prices_by_town['Town']):
    label = town
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#87cefa',
        fill_opacity=0.5,
        parse_html=False).add_to(map_singapore)
map_singapore


The geograpical coordinate of Singapore 1.2904753, 103.8520359.


In [9]:

fileName = "singapore_average_rpbt.csv"
linkName = "Singapore Average Rental Prices"
create_download_link(singapore_average_rental_prices_by_town,linkName,fileName)


 ### Segmenting and Clustering Towns in Singapore
Retrieving FourSquare Places of interest.

Storing my Foursquare credentials in variables 

In [10]:
CLIENT_ID = '23NES53FFNN3HHE3UFXX2IJYTH3JGQU3YMHPEVKXVEBAEWB3' # your Foursquare ID
CLIENT_SECRET = '4A1Z0ENHPF4ZF5MG2KXZ4J3HELYTZNJ0SP42CNEROGZPJLHF' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)



Your credentails:
CLIENT_ID: 23NES53FFNN3HHE3UFXX2IJYTH3JGQU3YMHPEVKXVEBAEWB3
CLIENT_SECRET:4A1Z0ENHPF4ZF5MG2KXZ4J3HELYTZNJ0SP42CNEROGZPJLHF


Getting coordinates of Singapore City 

In [11]:
address = 'Singapore'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Singapore are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Singapore are 1.2904753, 103.8520359.


Creating foursquare url to get venues in the radius of 500 meters 

In [118]:
LIMIT = 100

radius = 500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    location.latitude, 
   location.longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=23NES53FFNN3HHE3UFXX2IJYTH3JGQU3YMHPEVKXVEBAEWB3&client_secret=4A1Z0ENHPF4ZF5MG2KXZ4J3HELYTZNJ0SP42CNEROGZPJLHF&v=20180605&ll=1.2904753,103.8520359&radius=500&limit=100'

### The following function retrieves the venues given the names and coordinates and stores it into dataframe.

In [119]:

import time

FOURSQUARE_EXPLORE_URL = 'https://api.foursquare.com/v2/venues/explore?'
FOURSQUARE_SEARCH_URL = 'https://api.foursquare.com/v2/venues/search?'

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    global CLIENT_ID
    global CLIENT_SECRET
    global FOURSQUARE_EXPLORE_URL
    global FOURSQUARE_SEARCH_URL
    global VERSION
    global LIMIT
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print('getNearbyVenues',names)
        cyclefsk2()
       
        url = '{}&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            FOURSQUARE_EXPLORE_URL,CLIENT_ID,CLIENT_SECRET,VERSION,
            lat,lng,radius,LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name,lat,lng, 
            v['venue']['id'],v['venue']['name'], 
            v['venue']['location']['lat'],v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        time.sleep(2)

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Town','Town Latitude','Town Longitude','Venue','Venue Latitude','Venue Longitude','Venue Category']
    
    return(nearby_venues)

In [14]:
FOURSQUARE_SEARCH_URL = 'https://api.foursquare.com/v2/venues/search?'
# SEARCH VENUES BY CATEGORY

# Dataframe : venue_id_recover 
# - store venue id to recover failed venues id score retrieval later if foursquare limit is exceeded when getting score.
venue_id_rcols = ['VenueID']
venue_id_recover = pd.DataFrame(columns=venue_id_rcols)
def getVenuesByCategory(names, latitudes, longitudes, categoryID, radius=500):
    global CLIENT_ID
    global CLIENT_SECRET
    global FOURSQUARE_EXPLORE_URL
    global FOURSQUARE_SEARCH_URL
    global VERSION
    global LIMIT
    venue_columns = ['Town','Town Latitude','Town Longitude','VenueID','VenueName','score','category','catID','latitude','longitude']
    venue_DF = pd.DataFrame(columns=venue_columns)
    print("[#Start getVenuesByCategory]")
    for name, lat, lng in zip(names, latitudes, longitudes):
        cyclefsk2()
        print(name,",",end='')
        #print('getVenuesByCategory',categoryID,name) ; # DEBUG: be quiet
        # create the API request URL
        url = '{}client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            FOURSQUARE_SEARCH_URL,CLIENT_ID,CLIENT_SECRET,VERSION,lat,lng,radius,LIMIT,categoryID)
        # make the GET request
        results = requests.get(url).json()
        # Populate dataframe with the category venue results
        # Extracting JSON  data values
        
        for jsonSub in results['response']['venues']:
            #print(jsonSub)
            # JSON Results may not be in expected format or incomplete data, in that case, skip!
            ven_id = 0
            try:
                # If there are any issue with a restaurant, retry or ignore and continue
                # Get location details
                ven_id   = jsonSub['id']
                ven_cat  = jsonSub['categories'][0]['pluralName']
                ven_CID  = jsonSub['categories'][0]['id']
                ven_name = jsonSub['name']
                ven_lat  = jsonSub['location']['lat']
                ven_lng  = jsonSub['location']['lng']
                venue_DF = venue_DF.append({
                    'Town'      : name,
                    'Town Latitude' : lat,
                    'Town Longitude': lng,
                    'VenueID'   : ven_id,
                    'VenueName' : ven_name,
                    'score'     : 'nan',
                    'category'  : ven_cat,
                    'catID'     : ven_CID,
                    'latitude'  : ven_lat,
                    'longitude' : ven_lng}, ignore_index=True)
            except:
                continue
    # END OF LOOP, return.
    print("\n[#Done getVenuesByCategory]")
    return(venue_DF)


Store venue id to recover failed venues id score retrieval later if foursquare limit is exceeded when getting score.

In [120]:
FOURSQUARE_SEARCH_URL = 'https://api.foursquare.com/v2/venues/search?'

venue_id_rcols = ['VenueID','Score']
venue_id_recover = pd.DataFrame(columns=venue_id_rcols)

def getVenuesIDScore(venueID):
    global CLIENT_ID
    global CLIENT_SECRET
    global FOURSQUARE_EXPLORE_URL
    global FOURSQUARE_SEARCH_URL
    global VERSION
    global LIMIT
    global venue_id_recover
    print("[#getVenuesIDScore]")
    venID_URL = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venueID,CLIENT_ID,CLIENT_SECRET,VERSION)
    print(venID_URL)
    venID_score = 0.00
    # Process results
    try:
        venID_result = requests.get(venID_URL).json()
        venID_score  = venID_result['response']['venue']['rating']
    except: 
        venue_id_recover = venue_id_recover.append({'VenueID' : venueID, 'Score' : 0.0})
        cyclefsk2()
        return ["error",0.0]
    return ["success",venID_score]



In [121]:
singapore_average_rental_prices_by_town.dtypes

Town            object
median_rent    float64
Latitude       float64
Longitude      float64
dtype: object

In [122]:
venue_columns = ['Town','Town Latitude','Town Longitude','VenueID','VenueName','score','category','catID','latitude','longitude']
singapore_town_venues = pd.DataFrame(columns=venue_columns)

Search Venues with recommendations on : Food Venues (Restaurants,Fastfoods, etc.)

To demonstrate user selection of places of interest, We will use this Food Venues category in our further analysis.

    This Foursquare search is expected to collect venues in the following category:
        category
        Food Courts
        Coffee Shops
        Restaurants
        Cafés
        Other food venues

In [123]:
# Food Venues : Restaurants, Fastfoods, Etc
# For testing
if (0):
    categoryID = "4d4b7105d754a06377d81259"
    town_names = ['ANG MO KIO']
    lat_list   = [1.3699718]
    lng_list   = [103.8495876]
    tmp = getVenuesByCategory(names=town_names,latitudes=lat_list,longitudes=lng_list,categoryID=categoryID)
    singapore_town_venues = pd.concat([singapore_town_venues,tmp], ignore_index=True)
    

In [124]:
results = requests.get(url).json()
results


{'meta': {'code': 200, 'requestId': '5ca107de4434b961752ca80d'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4d438c6514aa8cfa743d5c3d-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/arts_entertainment/artgallery_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d1e2931735',
         'name': 'Art Gallery',
         'pluralName': 'Art Galleries',
         'primary': True,
         'shortName': 'Art Gallery'}],
       'id': '4d438c6514aa8cfa743d5c3d',
       'location': {'address': "1 St. Andrew's Road",
        'cc': 'SG',
        'city': 'Singapore',
        'country': 'Singapore',
        'distance': 61,
        'formattedAddress': ["1 St. Andrew's Road", '178957', 'Singapore'],
        'labeledLatLngs': [{'label': 'display',
          'lat': 1.2907395913

In [125]:


# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']



In [126]:
results = requests.get(url).json()
results



{'meta': {'code': 200, 'requestId': '5ca108199fb6b73b73218a7e'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4d438c6514aa8cfa743d5c3d-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/arts_entertainment/artgallery_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d1e2931735',
         'name': 'Art Gallery',
         'pluralName': 'Art Galleries',
         'primary': True,
         'shortName': 'Art Gallery'}],
       'id': '4d438c6514aa8cfa743d5c3d',
       'location': {'address': "1 St. Andrew's Road",
        'cc': 'SG',
        'city': 'Singapore',
        'country': 'Singapore',
        'distance': 61,
        'formattedAddress': ["1 St. Andrew's Road", '178957', 'Singapore'],
        'labeledLatLngs': [{'label': 'display',
          'lat': 1.2907395913

Fetch the Venue details into dataframe

In [127]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,National Gal­lery Singa­pore,Art Gallery,1.29074,103.851548
1,Esplanade Park,Park,1.288968,103.85358
2,The Oval @ Singapore Cricket Club Pavilion,Restaurant,1.289006,103.852438
3,Odette Restaurant,French Restaurant,1.289679,103.851691
4,Singapore F1 Padang Grandstand,Event Space,1.290656,103.852773


In [23]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

79 venues were returned by Foursquare.


 For each retrieved venueID, retrive the venues category rating.

The generated data frame in the second function contains the following column:

In [24]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [25]:


sgp_venues = getNearbyVenues(names=nearby_venues['name'],
                                   latitudes=nearby_venues['lat'],
                                   longitudes=nearby_venues['lng']
                                  )



National Gal­lery Singa­pore
Esplanade Park
The Oval @ Singapore Cricket Club Pavilion
Odette Restaurant
Singapore F1 Padang Grandstand
Singapore F1 GP: Padang Stage
Aura
Esplanade Theatre
Esplanade Concourse
Victoria Theatre & Victoria Concert Hall
Smoke & Mirrors
Esplanade - Theatres On The Bay
Singapore F1 Circuit Gate 3
JAAN
Esplanade Concert Hall
Swissôtel The Stamford
The National Kitchen by Violet Oon Singapore
Victoria Concert Hall - Home of the SSO
Asian Civilisations Museum
Esplanade Riverside
Tokyo Milk Cheese Factory
Starbucks Reserve Store
Raffles City Shopping Centre
Sky Lounge @ Peninsula Excelsior
Duke Bakery
Royce
Hoshino Coffee
TAP Craft Beer Bar (One Raffles Link)
Din Tai Fung 鼎泰豐 (Din Tai Fung)
Esplanade Outdoor Theatre
Capitol Piazza
Cavenagh Bridge
Capitol Theatre
Southbridge
Barbershop By Timbre
Headquarters
The Fullerton Hotel
The Merlion
Esplanade Recital Studio
Singapore Cricket Club
Jumbo Seafood Gallery 珍宝海鮮樓
Sabaai Sabaai Traditional Thai Massage
Fairmont S

In [26]:
print(sgp_venues.shape)
sgp_venues.head()



(6975, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,National Gal­lery Singa­pore,1.29074,103.851548,National Gal­lery Singa­pore,1.29074,103.851548,Art Gallery
1,National Gal­lery Singa­pore,1.29074,103.851548,The Oval @ Singapore Cricket Club Pavilion,1.289006,103.852438,Restaurant
2,National Gal­lery Singa­pore,1.29074,103.851548,Odette Restaurant,1.289679,103.851691,French Restaurant
3,National Gal­lery Singa­pore,1.29074,103.851548,Singapore F1 Padang Grandstand,1.290656,103.852773,Event Space
4,National Gal­lery Singa­pore,1.29074,103.851548,Esplanade Park,1.288968,103.85358,Park


In [27]:
sgp_venues.groupby('Neighborhood').count()



#sgp_grouped = sgp_onehot.groupby('Neighborhood').mean().reset_index()
#sgp_grouped



Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
4Fingers Crispy Chicken,100,100,100,100,100,100
Ah Sam Cold Drink Stall,91,91,91,91,91,91
Anti:dote,100,100,100,100,100,100
Asian Civilisations Museum,81,81,81,81,81,81
Aura,77,77,77,77,77,77
Barbershop By Timbre,85,85,85,85,85,85
Braci,94,94,94,94,94,94
Capitol Piazza,71,71,71,71,71,71
Capitol Theatre,68,68,68,68,68,68
Cavenagh Bridge,81,81,81,81,81,81


## one hot encoding

In [28]:
# one hot encoding
sgp_onehot = pd.get_dummies(sgp_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
sgp_onehot['Neighborhood'] = sgp_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [sgp_onehot.columns[-1]] + list(sgp_onehot.columns[:-1])
sgp_onehot = sgp_onehot[fixed_columns]

sgp_onehot.head()



Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,Bar,Bay,Beer Bar,Beer Garden,Bike Rental / Bike Share,Bistro,Bookstore,Boutique,Breakfast Spot,Brewery,Bridge,Buffet,Building,Burger Joint,Burmese Restaurant,Burrito Place,Café,Camera Store,Canal,Cantonese Restaurant,Chinese Restaurant,Clothing Store,Club House,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Creperie,Cuban Restaurant,Cupcake Shop,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dumpling Restaurant,Electronics Store,Event Space,Exhibit,Festival,Filipino Restaurant,Food Court,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Gastropub,General Entertainment,German Restaurant,Gift Shop,Gym,Gym / Fitness Center,Hainan Restaurant,History Museum,Hotel,Hotel Bar,Hotel Pool,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Juice Bar,Kids Store,Korean Restaurant,Lighthouse,Lounge,Market,Martial Arts Dojo,Massage Studio,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Movie Theater,Museum,Music Store,Music Venue,Nightclub,Noodle House,Outdoor Sculpture,Paella Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Pharmacy,Pizza Place,Playground,Plaza,Poke Place,Pool,Pub,Racetrack,Ramen Restaurant,Restaurant,River,Salad Place,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shabu-Shabu Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Spa,Spanish Restaurant,Sporting Goods Shop,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Waterfront,Wine Bar,Wine Shop,Wings Joint,Yoga Studio
0,National Gal­lery Singa­pore,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,National Gal­lery Singa­pore,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,National Gal­lery Singa­pore,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,National Gal­lery Singa­pore,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,National Gal­lery Singa­pore,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [29]:
sgp_grouped = sgp_onehot.groupby('Neighborhood').mean().reset_index()
sgp_grouped




Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,Bar,Bay,Beer Bar,Beer Garden,Bike Rental / Bike Share,Bistro,Bookstore,Boutique,Breakfast Spot,Brewery,Bridge,Buffet,Building,Burger Joint,Burmese Restaurant,Burrito Place,Café,Camera Store,Canal,Cantonese Restaurant,Chinese Restaurant,Clothing Store,Club House,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Creperie,Cuban Restaurant,Cupcake Shop,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dumpling Restaurant,Electronics Store,Event Space,Exhibit,Festival,Filipino Restaurant,Food Court,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Gastropub,General Entertainment,German Restaurant,Gift Shop,Gym,Gym / Fitness Center,Hainan Restaurant,History Museum,Hotel,Hotel Bar,Hotel Pool,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Juice Bar,Kids Store,Korean Restaurant,Lighthouse,Lounge,Market,Martial Arts Dojo,Massage Studio,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Movie Theater,Museum,Music Store,Music Venue,Nightclub,Noodle House,Outdoor Sculpture,Paella Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Pharmacy,Pizza Place,Playground,Plaza,Poke Place,Pool,Pub,Racetrack,Ramen Restaurant,Restaurant,River,Salad Place,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shabu-Shabu Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Spa,Spanish Restaurant,Sporting Goods Shop,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Waterfront,Wine Bar,Wine Shop,Wings Joint,Yoga Studio
0,4Fingers Crispy Chicken,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.03,0.0,0.03,0.03,0.0,0.02,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.02,0.01,0.04,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.1,0.05,0.01,0.0,0.01,0.02,0.0,0.02,0.03,0.0,0.01,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.03,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.05,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0
1,Ah Sam Cold Drink Stall,0.0,0.0,0.010989,0.0,0.0,0.021978,0.010989,0.010989,0.043956,0.0,0.0,0.0,0.010989,0.021978,0.0,0.0,0.0,0.0,0.021978,0.010989,0.010989,0.0,0.0,0.0,0.054945,0.0,0.010989,0.0,0.021978,0.0,0.0,0.043956,0.010989,0.010989,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.043956,0.0,0.021978,0.043956,0.021978,0.0,0.021978,0.010989,0.010989,0.010989,0.043956,0.065934,0.0,0.0,0.010989,0.0,0.032967,0.0,0.0,0.010989,0.0,0.010989,0.0,0.0,0.010989,0.0,0.0,0.010989,0.010989,0.010989,0.021978,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.010989,0.010989,0.010989,0.032967,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.010989,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.010989,0.010989,0.010989,0.0,0.032967
2,Anti:dote,0.0,0.0,0.01,0.01,0.01,0.03,0.0,0.03,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.05,0.0,0.0,0.0,0.04,0.03,0.0,0.04,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.02,0.01,0.02,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.08,0.0,0.0,0.0,0.01,0.01,0.0,0.02,0.04,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.01,0.04,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0
3,Asian Civilisations Museum,0.0,0.0,0.012346,0.0,0.0,0.024691,0.012346,0.0,0.049383,0.0,0.0,0.0,0.012346,0.012346,0.0,0.0,0.0,0.0,0.012346,0.0,0.012346,0.0,0.0,0.0,0.012346,0.0,0.012346,0.0,0.037037,0.0,0.0,0.049383,0.024691,0.012346,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.024691,0.0,0.0,0.0,0.012346,0.0,0.0,0.012346,0.049383,0.0,0.012346,0.024691,0.012346,0.0,0.012346,0.0,0.012346,0.012346,0.049383,0.037037,0.0,0.0,0.0,0.0,0.024691,0.0,0.0,0.012346,0.0,0.0,0.0,0.012346,0.012346,0.0,0.0,0.012346,0.024691,0.012346,0.0,0.012346,0.0,0.0,0.012346,0.037037,0.0,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.037037,0.012346,0.012346,0.012346,0.024691,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.0,0.024691,0.0,0.012346,0.0,0.037037
4,Aura,0.0,0.0,0.038961,0.0,0.0,0.025974,0.0,0.025974,0.012987,0.0,0.0,0.012987,0.0,0.012987,0.0,0.0,0.012987,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.012987,0.025974,0.012987,0.012987,0.051948,0.038961,0.0,0.038961,0.0,0.0,0.0,0.0,0.025974,0.0,0.0,0.012987,0.0,0.025974,0.0,0.012987,0.0,0.0,0.038961,0.0,0.0,0.0,0.012987,0.012987,0.0,0.0,0.012987,0.0,0.025974,0.038961,0.012987,0.0,0.0,0.0,0.012987,0.0,0.025974,0.012987,0.0,0.0,0.0,0.012987,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.025974,0.012987,0.0,0.0,0.025974,0.012987,0.0,0.0,0.012987,0.0,0.012987,0.025974,0.0,0.012987,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.012987,0.012987,0.0,0.0,0.012987,0.0,0.012987,0.038961,0.0,0.0,0.0,0.0,0.012987,0.012987,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.012987,0.0,0.0,0.012987,0.0
5,Barbershop By Timbre,0.0,0.0,0.011765,0.0,0.0,0.023529,0.011765,0.011765,0.047059,0.0,0.0,0.0,0.011765,0.023529,0.0,0.0,0.0,0.0,0.023529,0.0,0.011765,0.0,0.011765,0.0,0.023529,0.011765,0.011765,0.0,0.023529,0.0,0.0,0.047059,0.023529,0.011765,0.035294,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.011765,0.0,0.0,0.011765,0.047059,0.0,0.023529,0.011765,0.011765,0.0,0.011765,0.0,0.023529,0.011765,0.047059,0.047059,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.023529,0.011765,0.011765,0.0,0.011765,0.0,0.0,0.011765,0.011765,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.0,0.011765,0.011765,0.035294,0.011765,0.0,0.011765,0.023529,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.011765,0.0,0.023529,0.0,0.011765,0.0,0.035294
6,Braci,0.0,0.0,0.010638,0.0,0.0,0.021277,0.010638,0.010638,0.042553,0.0,0.0,0.0,0.010638,0.021277,0.0,0.0,0.0,0.0,0.021277,0.010638,0.010638,0.0,0.0,0.0,0.053191,0.0,0.010638,0.0,0.031915,0.0,0.0,0.042553,0.010638,0.010638,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.042553,0.0,0.010638,0.042553,0.010638,0.0,0.021277,0.010638,0.021277,0.010638,0.031915,0.06383,0.0,0.0,0.021277,0.0,0.031915,0.0,0.010638,0.010638,0.0,0.010638,0.0,0.0,0.010638,0.0,0.0,0.0,0.010638,0.010638,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.010638,0.010638,0.010638,0.031915,0.010638,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.010638,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.010638,0.010638,0.010638,0.010638,0.0,0.031915
7,Capitol Piazza,0.0,0.0,0.042254,0.014085,0.014085,0.042254,0.0,0.028169,0.0,0.0,0.0,0.014085,0.0,0.0,0.028169,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028169,0.0,0.0,0.0,0.042254,0.014085,0.014085,0.056338,0.028169,0.0,0.014085,0.0,0.0,0.0,0.0,0.028169,0.0,0.0,0.014085,0.0,0.028169,0.0,0.0,0.0,0.0,0.056338,0.0,0.0,0.014085,0.0,0.014085,0.014085,0.0,0.0,0.0,0.014085,0.056338,0.014085,0.0,0.0,0.0,0.014085,0.0,0.028169,0.028169,0.0,0.0,0.0,0.014085,0.014085,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.014085,0.0,0.014085,0.0,0.0,0.014085,0.0,0.014085,0.014085,0.0,0.014085,0.014085,0.0,0.0,0.0,0.014085,0.014085,0.042254,0.014085,0.0,0.0,0.0,0.014085,0.014085,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Capitol Theatre,0.0,0.0,0.029412,0.014706,0.014706,0.044118,0.0,0.029412,0.0,0.0,0.0,0.014706,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.044118,0.014706,0.0,0.058824,0.029412,0.0,0.014706,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.029412,0.0,0.029412,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.014706,0.0,0.014706,0.014706,0.0,0.0,0.0,0.029412,0.058824,0.014706,0.0,0.0,0.0,0.014706,0.0,0.014706,0.029412,0.0,0.0,0.0,0.014706,0.014706,0.014706,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.014706,0.014706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.014706,0.0,0.014706,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.014706,0.014706,0.044118,0.014706,0.0,0.0,0.0,0.014706,0.014706,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Cavenagh Bridge,0.0,0.0,0.012346,0.0,0.0,0.024691,0.012346,0.0,0.049383,0.0,0.0,0.0,0.012346,0.012346,0.0,0.0,0.0,0.0,0.012346,0.0,0.012346,0.012346,0.0,0.0,0.012346,0.0,0.012346,0.0,0.037037,0.0,0.0,0.061728,0.024691,0.012346,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.024691,0.0,0.0,0.0,0.012346,0.0,0.0,0.012346,0.061728,0.0,0.012346,0.024691,0.012346,0.0,0.012346,0.0,0.012346,0.012346,0.049383,0.049383,0.0,0.0,0.0,0.0,0.024691,0.0,0.0,0.012346,0.012346,0.0,0.0,0.012346,0.012346,0.0,0.0,0.0,0.024691,0.012346,0.0,0.012346,0.0,0.0,0.012346,0.024691,0.0,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.037037,0.012346,0.0,0.012346,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.0,0.024691,0.0,0.0,0.0,0.037037


Let's print each neighborhood along with the top 5 most common venues

In [30]:
num_top_venues = 5

for hood in sgp_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = sgp_grouped[sgp_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----4Fingers Crispy Chicken----
           venue  freq
0          Hotel  0.10
1      Hotel Bar  0.05
2  Shopping Mall  0.05
3         Buffet  0.04
4    Event Space  0.04


----Ah Sam Cold Drink Stall----
                  venue  freq
0   Japanese Restaurant  0.07
1                  Café  0.05
2  Gym / Fitness Center  0.04
3    Italian Restaurant  0.04
4                 Hotel  0.04


----Anti:dote----
               venue  freq
0              Hotel  0.08
1               Café  0.05
2        Coffee Shop  0.04
3      Shopping Mall  0.04
4  French Restaurant  0.04


----Asian Civilisations Museum----
                  venue  freq
0  Gym / Fitness Center  0.05
1          Cocktail Bar  0.05
2                   Bar  0.05
3    Italian Restaurant  0.05
4           Yoga Studio  0.04


----Aura----
               venue  freq
0       Cocktail Bar  0.05
1      Shopping Mall  0.04
2  French Restaurant  0.04
3        Art Gallery  0.04
4       Concert Hall  0.04


----Barbershop By Timbre----
         


Let's put that into a pandas dataframe

First, let's write a function to sort the venues in descending order.


In [31]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [32]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = sgp_grouped['Neighborhood']

for ind in np.arange(sgp_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sgp_grouped.iloc[ind, :], num_top_venues)
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,4Fingers Crispy Chicken,Hotel,Shopping Mall,Hotel Bar,Buffet,Event Space,Performing Arts Venue,Steakhouse,Japanese Restaurant,Dessert Shop,Coffee Shop
1,Ah Sam Cold Drink Stall,Japanese Restaurant,Café,Gym / Fitness Center,Italian Restaurant,Cocktail Bar,Bar,Hotel,Yoga Studio,Lounge,Salad Place
2,Anti:dote,Hotel,Café,French Restaurant,Chinese Restaurant,Cocktail Bar,Coffee Shop,Japanese Restaurant,Shopping Mall,Bakery,Dessert Shop
3,Asian Civilisations Museum,Gym / Fitness Center,Italian Restaurant,Cocktail Bar,Bar,Yoga Studio,Japanese Restaurant,Concert Hall,Chinese Restaurant,Performing Arts Venue,Salad Place
4,Aura,Cocktail Bar,French Restaurant,Hotel,Art Gallery,Coffee Shop,Concert Hall,Shopping Mall,Music Venue,Chinese Restaurant,Monument / Landmark


Adding Latitude and Longitude to each Neighborhood in the Dataframe

In [72]:
    geo = Nominatim(user_agent='Mypythonapi')
    for idx,town in neighborhoods_venues_sorted['Neighborhood'].iteritems():
        coord = geo.geocode(town + ' ' + "Singapore", timeout = 10)
        if coord:
            neighborhoods_venues_sorted.loc[idx,'Latitude'] = coord.latitude
            neighborhoods_venues_sorted.loc[idx,'Longitude'] = coord.longitude
       # else:
         #   neighborhoods_venues_sorted.loc[idx,'Latitude'] = NULL
         #   neighborhoods_venues_sorted.loc[idx,'Longitude'] = NULL

In [73]:
neighborhoods_venues_sorted.set_index("Neighborhood")

Unnamed: 0_level_0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
4Fingers Crispy Chicken,Hotel,Shopping Mall,Hotel Bar,Buffet,Event Space,Performing Arts Venue,Steakhouse,Japanese Restaurant,Dessert Shop,Coffee Shop,1.350203,103.848276
Ah Sam Cold Drink Stall,Japanese Restaurant,Café,Gym / Fitness Center,Italian Restaurant,Cocktail Bar,Bar,Hotel,Yoga Studio,Lounge,Salad Place,0.0,0.0
Anti:dote,Hotel,Café,French Restaurant,Chinese Restaurant,Cocktail Bar,Coffee Shop,Japanese Restaurant,Shopping Mall,Bakery,Dessert Shop,0.0,0.0
Asian Civilisations Museum,Gym / Fitness Center,Italian Restaurant,Cocktail Bar,Bar,Yoga Studio,Japanese Restaurant,Concert Hall,Chinese Restaurant,Performing Arts Venue,Salad Place,1.287446,103.851486
Aura,Cocktail Bar,French Restaurant,Hotel,Art Gallery,Coffee Shop,Concert Hall,Shopping Mall,Music Venue,Chinese Restaurant,Monument / Landmark,1.290746,103.851993
Barbershop By Timbre,Bar,Cocktail Bar,Japanese Restaurant,Italian Restaurant,Gym / Fitness Center,Yoga Studio,Salad Place,Concert Hall,Waterfront,Bridge,0.0,0.0
Braci,Japanese Restaurant,Café,Gym / Fitness Center,Cocktail Bar,Bar,Hotel,Yoga Studio,Lounge,Italian Restaurant,Chinese Restaurant,0.0,0.0
Capitol Piazza,French Restaurant,Hotel,Cocktail Bar,Chinese Restaurant,Art Gallery,Asian Restaurant,Shopping Mall,Event Space,Bookstore,Italian Restaurant,1.29302,103.850981
Capitol Theatre,French Restaurant,Hotel,Cocktail Bar,Shopping Mall,Chinese Restaurant,Asian Restaurant,Dumpling Restaurant,Japanese Restaurant,Event Space,Coffee Shop,1.293506,103.851208
Cavenagh Bridge,Gym / Fitness Center,Cocktail Bar,Bar,Italian Restaurant,Japanese Restaurant,Yoga Studio,Concert Hall,Salad Place,Chinese Restaurant,Music Venue,1.28681,103.852284


## Cluster Neighborhoods

In [74]:
# set number of clusters
kclusters = 5

sgp_grouped_clustering = sgp_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sgp_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 1, 0, 3, 4, 3, 1, 0, 0, 3], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.


In [91]:
# add clustering labels
#neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

sgp_merged = singapore_average_rental_prices_by_town
sgp_merged = neighborhoods_venues_sorted
#neighborhoods_venues_sorted.head()
sgp_merged.shape # check the last columns!


(78, 13)

In [81]:
town_venues_sorted = pd.DataFrame(columns=columns)
town_venues_sorted['Neighborhood'] = sgp_grouped['Neighborhood']

for ind in np.arange(sgp_grouped.shape[0]):
    town_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sgp_grouped.iloc[ind, :], num_top_venues)

print(town_venues_sorted.shape)
town_venues_sorted.head()

(78, 11)


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,4Fingers Crispy Chicken,Hotel,Shopping Mall,Hotel Bar,Buffet,Event Space,Performing Arts Venue,Steakhouse,Japanese Restaurant,Dessert Shop,Coffee Shop
1,Ah Sam Cold Drink Stall,Japanese Restaurant,Café,Gym / Fitness Center,Italian Restaurant,Cocktail Bar,Bar,Hotel,Yoga Studio,Lounge,Salad Place
2,Anti:dote,Hotel,Café,French Restaurant,Chinese Restaurant,Cocktail Bar,Coffee Shop,Japanese Restaurant,Shopping Mall,Bakery,Dessert Shop
3,Asian Civilisations Museum,Gym / Fitness Center,Italian Restaurant,Cocktail Bar,Bar,Yoga Studio,Japanese Restaurant,Concert Hall,Chinese Restaurant,Performing Arts Venue,Salad Place
4,Aura,Cocktail Bar,French Restaurant,Hotel,Art Gallery,Coffee Shop,Concert Hall,Shopping Mall,Music Venue,Chinese Restaurant,Monument / Landmark


Run k-means to cluster the Towns into 5 clusters.

In [104]:
# set number of clusters
kclusters = 5
sgp_grouped_clustering = sgp_grouped.drop('Neighborhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=1).fit(sgp_grouped_clustering)

# check cluster labels generated for each row in the dataframe
print(kmeans.labels_[0:10])
print(len(kmeans.labels_))

[0 2 3 1 4 1 2 3 3 1]
78


In [99]:
town_venues_sorted.head()

Unnamed: 0_level_0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
4Fingers Crispy Chicken,Hotel,Shopping Mall,Hotel Bar,Buffet,Event Space,Performing Arts Venue,Steakhouse,Japanese Restaurant,Dessert Shop,Coffee Shop
Ah Sam Cold Drink Stall,Japanese Restaurant,Café,Gym / Fitness Center,Italian Restaurant,Cocktail Bar,Bar,Hotel,Yoga Studio,Lounge,Salad Place
Anti:dote,Hotel,Café,French Restaurant,Chinese Restaurant,Cocktail Bar,Coffee Shop,Japanese Restaurant,Shopping Mall,Bakery,Dessert Shop
Asian Civilisations Museum,Gym / Fitness Center,Italian Restaurant,Cocktail Bar,Bar,Yoga Studio,Japanese Restaurant,Concert Hall,Chinese Restaurant,Performing Arts Venue,Salad Place
Aura,Cocktail Bar,French Restaurant,Hotel,Art Gallery,Coffee Shop,Concert Hall,Shopping Mall,Music Venue,Chinese Restaurant,Monument / Landmark
Barbershop By Timbre,Bar,Cocktail Bar,Japanese Restaurant,Italian Restaurant,Gym / Fitness Center,Yoga Studio,Salad Place,Concert Hall,Waterfront,Bridge
Braci,Japanese Restaurant,Café,Gym / Fitness Center,Cocktail Bar,Bar,Hotel,Yoga Studio,Lounge,Italian Restaurant,Chinese Restaurant
Capitol Piazza,French Restaurant,Hotel,Cocktail Bar,Chinese Restaurant,Art Gallery,Asian Restaurant,Shopping Mall,Event Space,Bookstore,Italian Restaurant
Capitol Theatre,French Restaurant,Hotel,Cocktail Bar,Shopping Mall,Chinese Restaurant,Asian Restaurant,Dumpling Restaurant,Japanese Restaurant,Event Space,Coffee Shop
Cavenagh Bridge,Gym / Fitness Center,Cocktail Bar,Bar,Italian Restaurant,Japanese Restaurant,Yoga Studio,Concert Hall,Salad Place,Chinese Restaurant,Music Venue


In [113]:
#town_venues_sorted = town_venues_sorted.set_index('Neighborhood')
#sgp_merged = sgp_merged.set_index('Neighborhood')
# add clustering labels
sgp_merged['Cluster Labels'] = kmeans.labels_
# merge sg_grouped with singapore_average_rental_prices_by_town to add latitude/longitude for each neighborhood
#sgp_merged = sgp_merged.join(town_venues_sorted)
sgp_merged


Unnamed: 0_level_0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude,Cluster Labels
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
4Fingers Crispy Chicken,Hotel,Shopping Mall,Hotel Bar,Buffet,Event Space,Performing Arts Venue,Steakhouse,Japanese Restaurant,Dessert Shop,Coffee Shop,1.350203,103.848276,0
Ah Sam Cold Drink Stall,Japanese Restaurant,Café,Gym / Fitness Center,Italian Restaurant,Cocktail Bar,Bar,Hotel,Yoga Studio,Lounge,Salad Place,0.0,0.0,2
Anti:dote,Hotel,Café,French Restaurant,Chinese Restaurant,Cocktail Bar,Coffee Shop,Japanese Restaurant,Shopping Mall,Bakery,Dessert Shop,0.0,0.0,3
Asian Civilisations Museum,Gym / Fitness Center,Italian Restaurant,Cocktail Bar,Bar,Yoga Studio,Japanese Restaurant,Concert Hall,Chinese Restaurant,Performing Arts Venue,Salad Place,1.287446,103.851486,1
Aura,Cocktail Bar,French Restaurant,Hotel,Art Gallery,Coffee Shop,Concert Hall,Shopping Mall,Music Venue,Chinese Restaurant,Monument / Landmark,1.290746,103.851993,4
Barbershop By Timbre,Bar,Cocktail Bar,Japanese Restaurant,Italian Restaurant,Gym / Fitness Center,Yoga Studio,Salad Place,Concert Hall,Waterfront,Bridge,0.0,0.0,1
Braci,Japanese Restaurant,Café,Gym / Fitness Center,Cocktail Bar,Bar,Hotel,Yoga Studio,Lounge,Italian Restaurant,Chinese Restaurant,0.0,0.0,2
Capitol Piazza,French Restaurant,Hotel,Cocktail Bar,Chinese Restaurant,Art Gallery,Asian Restaurant,Shopping Mall,Event Space,Bookstore,Italian Restaurant,1.29302,103.850981,3
Capitol Theatre,French Restaurant,Hotel,Cocktail Bar,Shopping Mall,Chinese Restaurant,Asian Restaurant,Dumpling Restaurant,Japanese Restaurant,Event Space,Coffee Shop,1.293506,103.851208,3
Cavenagh Bridge,Gym / Fitness Center,Cocktail Bar,Bar,Italian Restaurant,Japanese Restaurant,Yoga Studio,Concert Hall,Salad Place,Chinese Restaurant,Music Venue,1.28681,103.852284,1


## Visualising the findings / clusters on the Map

In [114]:
map_clusters = folium.Map(location=[latitude, longitude], tiles="Openstreetmap", zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(sgp_merged['Latitude'], sgp_merged['Longitude'], sgp_merged.index.values,kmeans.labels_):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=1).add_to(map_clusters)
       
map_clusters