# MIPANSUSUSU 

## Contents
- [Packages](#section1)
- [Datasets Cleaning](#section2)
    - [HDB Resale 2020](#subsection2.1)
    - [MRT Coordinates](#subsection2.2)
    - [Import All Other Relevant Datasets](#subsection2.3)
- [General Functions](#section3)

## Packages<a id="section2"></a>

In [1]:
import json
import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import urllib  

## Datasets Cleaning<a id="section1"></a>

We first begin by cleaning raw data sets in order to consolidate related data, as well as process data for later data visualisation and data processing.

<div class="alert alert-block alert-success">
<b>List of Key Datasets:</b> 
The Key Datasets we will be using are summarised below (alongside links to online data files if applicable)
  <li><b><a href="https://data.gov.sg/dataset/resale-flat-prices">HDB Resale Data</a></b>: Resale price, town, blk, street_name, postal code, longitude and latitude of HDB blocks</li>
  <li><b>Malls</b>: years of education</li>
  <li><b>MRT</b>: years of potential experience</li>
  <li><b>Buses</b>: genders of these working individuals</li>
</div> 

<a href="https://docs.onemap.sg/#search">OneMap API</a> was utilised in mapping corresponding latitude and longitude data to respective locations.

Due to the length of time taken to query onemap's API to attain geolocation data, we have prerun the queries and exported the data into a csv file for any later data processing and visualisation. The below variable has been set to 'False' to prevent rerunning of those specific queries when restarting the kernel

In [2]:
rerun = True

### HDB Resale 2020<a id="subsection2.1"></a>

In [11]:
HDB_coordinates = pd.read_csv("raw_HDB_coordinates.csv")
HDV_resale_2020_raw = pd.read_csv("raw_resale_flat_prices_2017_2020.csv")
HDB_resale_2020 = HDV_resale_2020_raw.merge(HDB_coordinates, how='inner', left_on=['town','block'], right_on=['town','block'])

#export the full dataframe for hdb with longitude and latitudes into csv
HDB_resale_2020.to_csv('clean_HDBresale2020.csv')

HDB_resale_2020

Unnamed: 0,month,town,flat_type,block,STREETreet_name,STREETorey_range,floor_area_sqm,flat_model,lease_commence_date,remaining_lease,resale_price,street_name,lat,long
0,2017-01,ANG MO KIO,2 ROOM,406,ANG MO KIO AVENUENUE 10,10 TO 12,44.0,Improved,1979,61 years 04 months,232000.0,ANG MO KIO AVE 10,1.362005,103.853880
1,2017-05,ANG MO KIO,2 ROOM,406,ANG MO KIO AVENUE 10,10 TO 12,44.0,Improved,1979,61 years 01 month,235000.0,ANG MO KIO AVE 10,1.362005,103.853880
2,2018-03,ANG MO KIO,2 ROOM,406,ANG MO KIO AVENUE 10,01 TO 03,44.0,Improved,1979,60 years 02 months,202000.0,ANG MO KIO AVE 10,1.362005,103.853880
3,2018-03,ANG MO KIO,2 ROOM,406,ANG MO KIO AVENUE 10,01 TO 03,44.0,Improved,1979,60 years 02 months,210000.0,ANG MO KIO AVE 10,1.362005,103.853880
4,2018-05,ANG MO KIO,2 ROOM,406,ANG MO KIO AVENUE 10,07 TO 09,44.0,Improved,1979,60 years 01 month,220000.0,ANG MO KIO AVE 10,1.362005,103.853880
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
74059,2020-08,TOA PAYOH,EXECUTIVE,240,LORONG 1 TOA PAYOH,01 TO 03,146.0,Maisonette,1986,65 years 03 months,760000.0,LOR 1 TOA PAYOH,1.340876,103.850830
74060,2020-08,TOA PAYOH,EXECUTIVE,101B,LORONG 2 TOA PAYOH,04 TO 06,144.0,Apartment,1993,71 years 08 months,901000.0,LOR 2 TOA PAYOH,1.339599,103.847605
74061,2020-08,WOODLANDS,EXECUTIVE,176,WOODLANDS STREET 13,04 TO 06,184.0,Apartment,1994,72 years 05 months,670888.0,WOODLANDS ST 13,1.433579,103.778353
74062,2020-08,YISHUN,EXECUTIVE,361,YISHUN RING ROAD,01 TO 03,145.0,Maisonette,1988,66 years 10 months,610000.0,YISHUN RING RD,1.428325,103.845908


### MRT Coordinates<a id="subsection2.2"></a>

In [8]:
MRT_coords = pd.read_csv("clean_MRT.csv")

MRT_coords

Unnamed: 0,OBJECTID,STN_NAME,STN_NO,X,Y,Latitude,Longitude,COLOR
0,12,ADMIRALTY MRT STATION,NS10,24402.1063,46918.1131,1.440585,103.800998,RED
1,16,ALJUNIED MRT STATION,EW9,33518.6049,33190.0020,1.316433,103.882893,GREEN
2,33,ANG MO KIO MRT STATION,NS16,29807.2655,39105.7720,1.369933,103.849553,RED
3,81,BAKAU LRT STATION,SE3,36026.0821,41113.8766,1.388093,103.905418,OTHERS
4,80,BANGKIT LRT STATION,BP9,21248.2460,40220.9693,1.380018,103.772667,OTHERS
...,...,...,...,...,...,...,...,...
182,175,WOODLANDS SOUTH MRT STATION,TE3,23607.8309,45444.7113,1.427260,103.793863,OTHERS
183,146,WOODLEIGH MRT STATION,NE11,32173.3186,35706.3794,1.339190,103.870808,PURPLE
184,6,YEW TEE MRT STATION,NS5,18438.9791,42158.0124,1.397535,103.747431,RED
185,41,YIO CHU KANG MRT STATION,NS15,29294.1283,40413.0820,1.381756,103.844944,RED


### Import All Other Relevant Datasets<a id="subsection2.2"></a>

## General Functions<a id="section3"></a>

In [None]:
#calculate distance between two pairs of lat and long
def distance(x1,y1,x2,y2):
    R = 6373.0
    lat1 = radians(x1)
    lon1 = radians(y1)
    lat2 = radians(x2)
    lon2 = radians(y2)
    dlon = lon2 - lon1
    dlat = lat2 - lat1
    a = (sin(dlat/2))**2 + cos(lat1)*cos(lat2)*(sin(dlon/2))**2
    c = 2* atan2(sqrt(a), sqrt(1-a))
    dist = R * c
    return dist