# Data.gov.sg
https://data.gov.sg/

## infectous disease data
Download SG weekly infectious disease file from:  
https://data.gov.sg/dataset/weekly-infectious-disease-bulletin-cases

In [16]:
import os
import urllib.request as r
import pandas as pd
import requests

In [2]:
URL = "https://data.gov.sg/dataset/e51da589-b2d7-486b-adfc-4505d47e1206/download"

Send request, to retrieve file

In [3]:
dir_path = "./../../data/1_raw/"
FILE_NAME = dir_path+"weekly-infectious-disease-bulletin-cases.zip"

In [4]:
# add headers by building an opener
opener = r.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]

In [5]:
# Download the file from `URL` and save it locally under `FILE_NAME`:
with opener.open(URL) as response, open(FILE_NAME, 'wb') as out_file:
    data = response.read() # a `bytes` object
    out_file.write(data)

# Weather Data
http://www.weather.gov.sg/climate-historical-daily/

We want only those weather stations that has data from year 2012 onwards, and has mean teamperatures.

http://www.weather.gov.sg/wp-content/uploads/2016/12/Station_Records.pdf

Tengah - S23
Changi - S24
Seletar - S25
Tai Seng - S43
Jurong West - S44
Ang Mo Kio - S109
Clementi - S50
Admiralty - S104
Sentosa Island - S60
Sembawang - S80
Boon Lay (East) - S86
Semakau Island - S102
Pulau Ubin - S106
East Coast Parkway - S107
Marina Barrage - S108
Newton - S111
Tuas South - S115

## Download
Set a list of weather station ids:

In [11]:
weather_station_ids = [23, 24, 25, 43, 44, 50, 60, 80, 86, 102, 104, 106, 107, 108, 109, 111, 115]
months = list(range(1,13))
years = list(range(2012,2019))

Set base url:

In [12]:
base_url = "http://www.weather.gov.sg/files/dailydata/DAILYDATA_"
out_path = dir_path + "weather/"
if not os.path.exists(out_path):
    os.makedirs(out_path)

Set today's year and month:

In [13]:
from datetime import datetime
today_ym = int(datetime.today().strftime("%Y%m"))
today_d = int(datetime.today().strftime("%d"))

Loop through to download data

In [17]:
for year in years:
    y = str(year)
    for month in months:
        m = "%02d"%month
        file_ym = int(y+m)
        if ((file_ym == (today_ym-1)) and (today_d <= 10)):
            break
        elif (file_ym == today_ym):
            break
        for station_id in weather_station_ids:
            ws = 'S' + str(station_id)
            try:
                # set URL
                url = base_url + ws + "_" + y + m + ".csv"
                # set out file name
                filename = out_path + ws + "_" + y + m + ".csv"
                # retrieve file
                response = requests.get(url)
                if response.status_code == 200:
                    with open(filename, 'wb') as f:
                        f.write(response.content)
            except:
                # as not all data is available the same month for all the stations you will get a 404 error if the data is not here
                pass

Explore the data collected:

In [18]:
dfWeather = pd.read_csv(out_path+"S111_201803.csv", encoding='latin_1')

In [19]:
dfWeather.head()

Unnamed: 0,Station,Year,Month,Day,Daily Rainfall Total (mm),Highest 30 Min Rainfall (mm),Highest 60 Min Rainfall (mm),Highest 120 Min Rainfall (mm),Mean Temperature (°C),Maximum Temperature (°C),Minimum Temperature (°C),Mean Wind Speed (km/h),Max Wind Speed (km/h)
0,Newton,2018,3,1,0.0,0.0,0.0,0.0,27.6,32.6,25.1,5.8,28.4
1,Newton,2018,3,2,28.0,22.4,27.6,28.0,27.2,31.9,23.5,5.4,23.8
2,Newton,2018,3,3,0.0,0.0,0.0,0.0,27.6,32.7,25.6,6.1,28.4
3,Newton,2018,3,4,32.8,31.0,32.4,32.8,27.1,32.2,24.3,5.0,27.0
4,Newton,2018,3,5,0.0,0.0,0.0,0.0,28.3,33.1,25.8,6.5,31.0
