# Task 2

### Transaction Definition: (板橋測站的溫度, 板橋測站的濕度, 板橋測站的風速, 北區用電量) for each hour

由於溫度、濕度、風速都會影響人的舒適度，而人舒適與否又會影響用電量（覺得悶熱便開冷氣等），故設計這項分析。  
Run the following query, store the results in TSV format.  
Note: Round `DATETIME` towards its nearest hour. (e.g. 2017/10/16 14:37:52 -> 2017/10/16 15:00:00)

```sql
SELECT DATE_FORMAT(DATE_ADD(Power.updateTime, INTERVAL 30 MINUTE), '%Y-%m-%d %H:00:00') AS time,
       逐時觀測.溫度,
       逐時觀測.相對濕度,
       逐時觀測.風速,
       Power.northUsage
FROM Power
INNER JOIN 逐時觀測
ON DATE_FORMAT(DATE_ADD(Power.updateTime, INTERVAL 30 MINUTE), '%Y-%m-%d %H:00:00')
   = DATE_FORMAT(DATE_ADD(逐時觀測.時間, INTERVAL 30 MINUTE), '%Y-%m-%d %H:00:00')
WHERE 逐時觀測.測站 = 'BANQIAO,板橋'
```

### Discretization: 溫度以五度為一單位，濕度以五百分點為一單位，風速逕取整數，用電量以五十萬瓩為一單位

將用電量加上 10000，風速減去 10000，濕度取其相反數，以便區分彼此。  
Note: 實際上不會剛好是「四捨五入」，因為浮點數在電腦裡是用二進制來儲存的，不過這並不影響結果。

In [10]:
dataset = []
with open('task2.tsv') as file:
    for line in file:
        fields = line.split('\t')
        assert len(fields) == 5
        t = int(round(float(fields[1]))) // 5
        h = int(round(float(fields[2]))) // -5
        s = int(round(float(fields[3]))) - 10000
        p = int(round(float(fields[4]), -1)) // 50 + 10000
        dataset.append([t, h, s, p])

In [11]:
len(dataset)

4636

In [12]:
print(dataset[0:10])

[[5, -16, -9988, 10016], [5, -17, -9986, 10016], [5, -17, -9983, 10015], [5, -16, -9988, 10015], [5, -17, -9990, 10015], [5, -17, -9989, 10015], [5, -16, -9992, 10016], [5, -17, -9993, 10016], [5, -15, -9993, 10016], [5, -14, -9993, 10016]]


### Algorithm: FP-Growth

Use Orange3-Associate by Bioinformatics Laboratory, FRI UL.  
Ref: http://orange3-associate.readthedocs.org/

In [13]:
from orangecontrib.associate.fpgrowth import *
itemsets = dict(frequent_itemsets(dataset, 0.01))

In [14]:
len(itemsets)

290

In [15]:
rules = association_rules(itemsets, 0.7)

### Rules Discovered: 8 in total

minimum support = 0.01, minimum confidence = 0.7  
Cannot find any rules. Need coarser granularity.

In [16]:
for antecedent, consequent, support, confidence in rules:
    print(antecedent, '->', consequent, 'supp =', support, 'conf =', confidence)

frozenset({-10000, 10014}) -> frozenset({4}) supp = 71 conf = 0.8068181818181818
frozenset({10014, -17}) -> frozenset({4}) supp = 58 conf = 0.7160493827160493
frozenset({-15, 10020}) -> frozenset({4}) supp = 79 conf = 0.7053571428571429
frozenset({-15, 10022}) -> frozenset({5}) supp = 49 conf = 0.9074074074074074
frozenset({10022}) -> frozenset({5}) supp = 168 conf = 0.7148936170212766
frozenset({10025}) -> frozenset({6}) supp = 96 conf = 0.8888888888888888
frozenset({-19}) -> frozenset({4}) supp = 72 conf = 0.7272727272727273
frozenset({10026}) -> frozenset({6}) supp = 53 conf = 0.828125


### Discretization: 溫度四捨五入至個位，以五度為一單位，用電量四捨五入至十位，以五十萬瓩為一單位

In [8]:
with open('task1.apriori', 'w') as dataset:
    with open('task1.tsv') as file:
        for line in file:
            fields = line.split('\t')
            assert len(fields) == 3
            t = int(round(float(fields[1])))
            p = int(round(float(fields[2]), -1)) + 10000
            dataset.write(str(t // 5) + ',' + str(p // 50) + '\n')

### Algorithm: Apriori

Use the Python Implementation of Apriori Algorithm by Abhinav Saini.  
Ref: https://github.com/asaini/Apriori

In [10]:
!python2 apriori.py -f task1.apriori -s 0.01 -c 0.6

item: ('212', '4') , 0.011
item: ('3', '218') , 0.011
item: ('3', '212') , 0.011
item: ('226', '6') , 0.011
item: ('217', '3') , 0.012
item: ('216', '3') , 0.012
item: ('214', '5') , 0.012
item: ('224', '5') , 0.013
item: ('226',) , 0.014
item: ('3', '215') , 0.014
item: ('223', '6') , 0.014
item: ('4', '221') , 0.018
item: ('3', '214') , 0.019
item: ('225', '6') , 0.021
item: ('223', '5') , 0.022
item: ('224', '6') , 0.023
item: ('225',) , 0.023
item: ('212',) , 0.026
item: ('3', '219') , 0.027
item: ('5', '220') , 0.027
item: ('3', '213') , 0.029
item: ('4', '218') , 0.029
item: ('5', '219') , 0.030
item: ('5', '218') , 0.031
item: ('2',) , 0.034
item: ('215', '5') , 0.034
item: ('224',) , 0.036
item: ('5', '222') , 0.036
item: ('217', '5') , 0.036
item: ('5', '221') , 0.037
item: ('223',) , 0.038
item: ('216', '4') , 0.038
item: ('217', '4') , 0.039
item: ('4', '219') , 0.042
item: ('216', '5') , 0.042
item: ('215', '4') , 0.045
item: ('213', '4')

### Rules Discovered: 5 in total

minimum support = 0.01, minimum confidence = 0.6  
發現用電量為 1250～1350 萬瓩時，溫度為 30～35 度，此規則的可信度超過 80％。

In [12]:
rules = []
rules.append((214, 4, 0.600))
rules.append((224, 6, 0.636))
rules.append((222, 5, 0.715))
rules.append((226, 6, 0.828))
rules.append((225, 6, 0.889))
for antecedent, consequent, confidence in rules:
    print(antecedent * 50 - 10000, '->', consequent * 5, 'conf =', confidence)

700 -> 20 conf = 0.6
1200 -> 30 conf = 0.636
1100 -> 25 conf = 0.715
1300 -> 30 conf = 0.828
1250 -> 30 conf = 0.889


### What I Have Learned

When transforming continuous values into discrete values, fine granularity may not be a good idea.