# Singapore Public Housing (HDB) Resale Price Prediction Model (Part 4)
### Feature Engineering - Highway and Ramps Data

## 1. About this Notebook

Noise could be an issue for home buyer when choosing a property to call home. While we cannot control what kind of neighbours we will get prior to moving in, we can choose not to be living close to major highway. Vehicle noise and emission might be a turn off for some people, so it might be interesting to find out if it has significant effect on the resale price of HDB unit.

This notebook will be a relatively short one, it will focus only to extract information from geojson data and convert them into csv format that we can use for pandas transformation in the next part. The raw data of highway came from the Singapore national archive, where national map line of major expressway and trunk road are projected as line vector in geojson format. The archive can be found here in [this link](https://data.gov.sg/dataset/national-map-line?resource_id=de5f4fc2-e04f-4dcf-a02a-2ca6468b1b54).

Part of this notebook was performed on Google Colab due to known dependencies issues with Geopandas. The code involving geopandas Geojson driver have been commented out for the ease of running the notebook without error. You may uncomment the code if you would like to test the code.

## 2. Initialization

In [1]:
# Import JSON-related Libraries
import json
import geojson as gpd
from bs4 import BeautifulSoup

In [2]:
# Load file into dictionary for easier wrangling
with open("./Dataset/Raw/national_map_line.geojson", 'r') as f:
    data = json.load(f)

In [3]:
# Extract all possible categories of road in the geojson
road_types = set()

for polygon in data['features']:
    string = polygon['properties']['Description']
    road_type = BeautifulSoup(string, 'lxml').find_all("td")[1].text
    road_types.add(road_type)
    
road_types

{'Layers/Contour_250K',
 'Layers/Expressway',
 'Layers/Expressway_Sliproad',
 'Layers/International_bdy',
 'Layers/Major_Road'}

## 3. Highways

First up, we will extract all the polygon/line-vector object that has been marked as expressway and transform them into CSV format that we can use in pandas. However, due to the afore-mentioned dependencies issue, the object was required to be transformed into geojson file first so that it can be run on Google Colab. After that, it will be transformed to CSV file. The steps could actually be omitted if you face no dependencies issue with the Geopandas library or its Geojson driver.

In [4]:
# Extract all expressway polygon into a list
highways = []
for polygon in data['features']:
    string = polygon['properties']['Description']
    road_type = BeautifulSoup(string, 'lxml').find_all("td")[1].text
    if road_type == "Layers/Expressway":
        highways.append(polygon)

In [5]:
# Forming basic structure of geojson with the shortlisted expressway
highway_json = data
highway_json['features'] = highways

In [6]:
# Export to geojson format for futher wranling in Google Colab
with open("./Dataset/Spatial/highway_geojson.geojson", 'w') as f:
    json.dump(highway_json, f)

In [7]:
# ### ---      KNOWN DEPENDENCIES ISSUE      --- ###
# ### ---  CODES TO EXPORT FOR GOOGLE COLAB  --- ###
# ### ---     UNCOMMENT TO TEST THE CODE     --- ###

# highways = gpd.read_file('./Dataset/Spatial/highway_geojson.geojson')
# highways['Description'] = [BeautifulSoup(name, 'lxml').find('td').text for name in list(highways['Description'])]
# highways.to_csv('./Dataset/Spatial/Highways.csv', index=False)

## 4. Highway Exits/Ramps

The same steps is repeated for highway exit/ramps data to extract the data.

In [8]:
# Extract all expressway ramps polygon into a list
ramps = []
for polygon in data['features']:
    string = polygon['properties']['Description']
    road_type = BeautifulSoup(string, 'lxml').find_all("td")[1].text
    if road_type == "Layers/Expressway_Sliproad":
        ramps.append(polygon)

In [9]:
# Forming basic structure of geojson with the shortlisted ramps
ramp_json = data
ramp_json['features'] = ramps

In [10]:
# Export to geojson format for futher wranling in Google Colab
with open("./Dataset/Spatial/ramp_geojson.geojson", 'w') as f:
    json.dump(ramp_json, f)

In [11]:
# ### ---      KNOWN DEPENDENCIES ISSUE      --- ###
# ### ---  CODES TO EXPORT FOR GOOGLE COLAB  --- ###
# ### ---     UNCOMMENT TO TEST THE CODE     --- ###

# ramps = gpd.read_file('./Dataset/Spatial/ramp_geojson.geojson')
# ramps['Description'] = [BeautifulSoup(name, 'lxml').find('td').text for name in list(ramps['Description'])]
# ramps.to_csv('./Dataset/Spatial/Ramps.csv', index=False)

We have collected all the information that we need for the analysis. In the next notebook (Part 5), we will be performing feature engineering on the core HDB dataset to acquire all the geospatial information for each HDB transaction record.