# CAPSTONE PROJECT: BATTLE OF THE NEIGHBORHOODS

Singapore Visitors Venue Recommendation

I. PURPOSE

This document provides the details of my final peer reviewed assignment for the IBM Data Science Professional Certificate program – Coursera Capstone.

II. INTRODUCTION

Singapore is a small country but now becomes one of the most popular countries in East Asia. There are a lot of websites where travelers can check and retrieve recommendations of places to stay or visit like Airbnb and Booking. However, most of these websites provides recommendation simply based on usual tourist attractions or key residential areas that are mostly expensive or already known for travelers based on certain keywords like "Hotel", or "Backpackers" etc. 

The intention of this project is to collect data from Singapore open data sources and FourSquare API venue recommendations and provide a data driven recommendation that can supplement the recommendation with statistical data.

The sample recommender system provides the following use case scenario: A person planning to visit Singapore as a Tourist and looking for a reasonable accommodation. The person wants to receive venue recommendation where he/she can stay or rent an HDB apartment with close proximity to places of interest or search category option. The recommendation should not only present the most viable option, but also present a comparison table of all possible town venues.

The data used will include: Singapore Median Rental Prices by town. Popular Food venues in the vicinity. Food Venue Category. Outdoors and Recreation Nightlife Nearby Schools.

III. DATA

This demonstration will make use of the following data sources: Singapore Towns and median residential rental prices.
Data will be retrieved from Singapore open dataset from median rent by town and flattype from https://data.gov.sg website.
The original data source contains median rental prices of Singapore HDB units from 2005 up to 2nd quarter of 2018. 
For this demonstration, I will simplify the analysis by using the average rental prices of all available flat type.
Singapore Towns location data retrieved using Google maps API.
Data coordinates of Town Venues will be retrieved using google API. 
MRT stations coordinate as a more important center of for all towns included in venue recommendations.
Singapore Top Venue Recommendations from FourSquare API (FourSquare website: www.foursquare.com). To explore neighborhoods in selected towns in Singapore. The Foursquare explore function will be used to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. The following information are retrieved on the first query: Venue ID. Venue Name. Coordinates : Latitude and Longitude. Category Name
Another venue query will be performed to retrieve venue ratings for each location.

IV. METHODOLOGY

Singapore Towns List with median residential rental prices.
The source data contains median rental prices of Singapore HDB units from 2005 up to
2nd quarter of 2018. I will retrieve the most recent recorded rental prices from this data
source (Q2 2018) being the most relevant price available at this time. For this
demonstration, I will simplify the analysis by using the average rental prices of all
available flat type.
Data Cleanup and re-grouping. The retrieved table contains some un-wanted entries
and needs some cleanup.
The following tasks will be performed:
1. Drop/ignore cells with missing data.
2. Use most current data record.
3. Fix data types. Post Processed Singapore towns list with and median residential rental prices
4. Adding geographical coordinates of each town location.

2. Retrieve town coordinates.

Google API was be used to retrieve the coordinates (latitude and longitude of each town
centers. For this exercise, I just used the MRT stations as the center points of each
evaluated towns. The town coordinates will be used in retrieval of Foursquare API
location data.

V. Segmenting and Clustering Towns in Singapore

Retrieving FourSquare Places of interest.

Using the Foursquare API, the explore API function was be used to get the most
common venue categories in each neighborhood, and then used this feature to group
the neighborhoods into clusters. The k-means clustering algorithm was used for the
analysis. Fnally, the Folium library is used to visualize the recommended neighborhoods
and their emerging clusters.
In the ipynb notebook, the function getNearbyVenues extracts the following
information for the dataframe it generates:

1. Venue ID
2. Venue Name
3. Coordinates : Latitude and Longitude
4. Category Name

The function getVenuesByCategory performs the following:
1. Category based venue search to simulate user venue searches based on certain places
of interest. This search extracts the following information:
    1. Venue ID
    2. Venue Name
    3. Coordinates : Latitude and Longitude
    4. Category Name
2. For each retrieved venueID, retrieve the venues category rating.

The generated data frame in the second function contains the following column:

Search Venues with recommendations on : Food Venues (Restaurants,Fastfoods, etc.)
To demonstrate user selection of places of interest, We will use this Food Venues
category in our further analysis.
This Foursquare search is expected to collect venues in the following category:
1. category
2. Food Courts
3. Coffee Shops
4. Restaurants
5. Cafés
6. Other food venues

I used the FourSquare API to retrieve venue scores of locations. Note that there is max
query limit of 50 in FourSquare API for free subscription. So use or query carefully.

Data cleanup un-needed entries
1. Eliminate possible venue duplicates.
2. Improve the quality of our venue selection by removing venues with no ratings or 0.0

Analyze Each Singapore Town nearby recommended venues

Technique : One Hot Encoding

Analysis of Singapore Town most visited venues

RESULTS: Categorized Result

RESULTS : k-means Cluster Results

RESULTS: Merged Cluster Table with rental prices.

IV. Discussion and Conclusion

On this notebook, Analysis of best town venue recommendations based on Food venue category has been presented. Recommendations based on other user searches like available outdoor and recreation areas are also available. As singapore is a small country with a whole host of interesting venues scattered around the town, the information
extracted in this notebook present on the town areas, will be a good supplement to web based recommendations for visitors to find out nearby venues of interest and be a useful aid in deciding a place to stay or where to go during their visits.

Using Foursquare API, we have collected a good amount of venue recommendations in
Singapore Towns. Sourcing from the venue recommendations from FourSquare has its
limitation; The list of venues is not exhaustive list of all the available venues is the area.
Furthermore, not all the venues found in the the area has a stored ratings. For this
reason, the number of analyzed venues is only about 50% of all the available venues
initially collected. The results therefore may significantly change, when more information
are collected on those with missing data.
The generated clusters from our results shows that there are very good and interesting
places located in areas where the median rents are cheaper. This kind of results may be
very interesting for travelers who are also on budget constraints. Our results also yielded
some interesting findings. For instance, The initial assumption among websites
providing recommendations is that the Central Area that have the highest median rent
also have better food venues. The results however show that while Marine Parade, a
cheaper location has better rated food courts. Result shows that most popular food
venue among Singaporeans, residents and visitors are Food Courts, Coffee Shops and
Fast Food Restaurants. The highest rated Food Courts are located in Marine Parade,
and in Central Area.
I will be providing a other supplementary Inferential Statics in the future about on these
data collected and also update in a new notebook using other categories. For now, this
completes the requirements for this task.