Preparing `taipower` dataset:  
+ electricity consumption in northern Taiwan
+ 273 days (2016/10/01–2017/06/30)
+ select 10 hours for each day (9:00–18:00)
+ p.s. round DATETIME towards its nearest hour (e.g. 2017/10/16 14:37:52 -> 2017/10/16 15:00:00)

```sql
SELECT
  DATE_FORMAT(
    DATE_ADD(Power.updateTime, INTERVAL 30 MINUTE),
    '%Y-%m-%d %H:00:00'
  ) AS time,
  Power.northUsage
FROM Power
WHERE (DATE(Power.updateTime) BETWEEN '2016-10-01' AND '2017-06-30')
  AND (TIME(Power.updateTime) BETWEEN '08:30:00' AND '18:29:59')
```

In [1]:
from datetime import date, timedelta
start = date(2016, 10, 1)
end = date(2017, 6, 30)
delta = end - start
hours = ['09', '10', '11', '12', '13', '14', '15', '16', '17', '18']
delta.days + 1

273

In [2]:
taipower_dict = {}
for i in range(delta.days + 1):
    for j in hours:
        key = str(start + timedelta(days=i)) + ' ' + j + ':00:00'
        taipower_dict[key] = -1

In [3]:
with open('taipower.csv') as file:
    for line in file:
        fields = line.split(',')
        key = fields[0]
        if key in taipower_dict:
            if taipower_dict[key] != -1:
                print('DUPLICATE', key)
            else:
                taipower_dict[key] = float(fields[1])
        else:
            print('ERROR')

DUPLICATE 2017-05-20 13:00:00


In [4]:
missing = [key for key in taipower_dict if taipower_dict[key] == -1]
len(missing)

867

In [5]:
taipower = []
for i in range(delta.days + 1):
    record = []
    for j in hours:
        key = str(start + timedelta(days=i)) + ' ' + j + ':00:00'
        record.append(taipower_dict[key])
    taipower.append(record)

Use [scikit-learn](http://scikit-learn.org/) to impute missing values using the mean of columns.

In [7]:
from sklearn.preprocessing import Imputer
taipower = Imputer(missing_values=-1).fit_transform(taipower)

In [9]:
len(taipower)

273

Preparing `temperature` dataset:  
+ temperature in Banqiao, Taiwan
+ 273 days (2016/10/01–2017/06/30)
+ select 10 hours for each day (9:00–18:00)
+ p.s. round DATETIME towards its nearest hour (e.g. 2017/10/16 14:37:52 -> 2017/10/16 15:00:00)

```sql
SELECT
  DATE_FORMAT(
    DATE_ADD(逐時觀測.時間, INTERVAL 30 MINUTE),
    '%Y-%m-%d %H:00:00'
  ) AS time,
  逐時觀測.溫度
FROM 逐時觀測
WHERE (DATE(逐時觀測.時間) BETWEEN '2016-10-01' AND '2017-06-30')
  AND (TIME(逐時觀測.時間) BETWEEN '08:30:00' AND '18:29:59')
  AND (逐時觀測.測站 = 'BANQIAO,板橋')
```