# Task 2

### Transaction Definition: (板橋測站的溫度, 板橋測站的濕度, 板橋測站的風速, 北區用電量) for each hour

由於溫度、濕度、風速都會影響人的舒適度，而人舒適與否又會影響用電量（覺得悶熱便開冷氣等），故設計這項分析。  
Run the following query, store the results in TSV format.  
Note: Round `DATETIME` towards its nearest hour. (e.g. 2017/10/16 14:37:52 -> 2017/10/16 15:00:00)

```sql
SELECT DATE_FORMAT(DATE_ADD(Power.updateTime, INTERVAL 30 MINUTE), '%Y-%m-%d %H:00:00') AS time,
       逐時觀測.溫度,
       逐時觀測.相對濕度,
       逐時觀測.風速,
       Power.northUsage
FROM Power
INNER JOIN 逐時觀測
ON DATE_FORMAT(DATE_ADD(Power.updateTime, INTERVAL 30 MINUTE), '%Y-%m-%d %H:00:00')
   = DATE_FORMAT(DATE_ADD(逐時觀測.時間, INTERVAL 30 MINUTE), '%Y-%m-%d %H:00:00')
WHERE 逐時觀測.測站 = 'BANQIAO,板橋'
```

### Discretization: 溫度以五度為一單位，濕度以五百分點為一單位，風速逕取整數，用電量以五十萬瓩為一單位

將用電量加上 10000，風速減去 10000，濕度取其相反數，以便區分彼此。  
Note: 實際上不會剛好是「四捨五入」，因為浮點數在電腦裡是用二進制來儲存的，不過這並不影響結果。

In [1]:
dataset = []
with open('task2.tsv') as file:
    for line in file:
        fields = line.split('\t')
        assert len(fields) == 5
        t = int(round(float(fields[1]))) // 5
        h = int(round(float(fields[2]))) // -5
        s = int(round(float(fields[3]))) - 10000
        p = int(round(float(fields[4]), -1)) // 50 + 10000
        dataset.append([t, h, s, p])

In [2]:
len(dataset)

4636

In [3]:
print(dataset[0:10])

[[5, -16, -9988, 10016], [5, -17, -9986, 10016], [5, -17, -9983, 10015], [5, -16, -9988, 10015], [5, -17, -9990, 10015], [5, -17, -9989, 10015], [5, -16, -9992, 10016], [5, -17, -9993, 10016], [5, -15, -9993, 10016], [5, -14, -9993, 10016]]


### Algorithm: FP-Growth

Use Orange3-Associate by Bioinformatics Laboratory, FRI UL.  
Ref: http://orange3-associate.readthedocs.org/

In [4]:
from orangecontrib.associate.fpgrowth import *
itemsets = dict(frequent_itemsets(dataset, 0.01))

In [5]:
len(itemsets)

290

In [6]:
rules = list(association_rules(itemsets, 0.7))

### Rules Discovered: 8 in total

minimum support = 0.01, minimum confidence = 0.7  
Cannot find any rules. Need coarser granularity.

In [7]:
for antecedent, consequent, support, confidence in rules:
    print(antecedent, '->', consequent, 'supp =', support, 'conf =', confidence)

frozenset({-10000, 10014}) -> frozenset({4}) supp = 71 conf = 0.8068181818181818
frozenset({10014, -17}) -> frozenset({4}) supp = 58 conf = 0.7160493827160493
frozenset({-15, 10020}) -> frozenset({4}) supp = 79 conf = 0.7053571428571429
frozenset({-15, 10022}) -> frozenset({5}) supp = 49 conf = 0.9074074074074074
frozenset({10022}) -> frozenset({5}) supp = 168 conf = 0.7148936170212766
frozenset({10025}) -> frozenset({6}) supp = 96 conf = 0.8888888888888888
frozenset({-19}) -> frozenset({4}) supp = 72 conf = 0.7272727272727273
frozenset({10026}) -> frozenset({6}) supp = 53 conf = 0.828125


In [8]:
def whatisthis(i):
    if i < 0:
        if i < -1000:
            return str(i + 10000) + 's'
        else:
            return str(i * -5) + 'h'
    else:
        if i < 1000:
            return str(i * 5) + 't'
        else:
            return str((i - 10000) * 50) + 'p'

In [9]:
for antecedent, consequent, support, confidence in rules:
    ant = tuple(whatisthis(x) for x in antecedent)
    con = tuple(whatisthis(x) for x in consequent)
    print(ant, '->', con, 'conf =', confidence)

('0s', '700p') -> ('20t',) conf = 0.8068181818181818
('700p', '85h') -> ('20t',) conf = 0.7160493827160493
('75h', '1000p') -> ('20t',) conf = 0.7053571428571429
('75h', '1100p') -> ('25t',) conf = 0.9074074074074074
('1100p',) -> ('25t',) conf = 0.7148936170212766
('1250p',) -> ('30t',) conf = 0.8888888888888888
('95h',) -> ('20t',) conf = 0.7272727272727273
('1300p',) -> ('30t',) conf = 0.828125


### Discretization: 溫度以兩度為一單位，濕度以五百分點為一單位，風速逕取整數，用電量以五十萬瓩為一單位

有了上面的經驗，想將溫度再分得細一點。

In [10]:
with open('task2.apriori', 'w') as dataset:
    with open('task2.tsv') as file:
        for line in file:
            fields = line.split('\t')
            assert len(fields) == 5
            t = int(round(float(fields[1]))) // 2
            h = int(round(float(fields[2]))) // -5
            s = int(round(float(fields[3]))) - 10000
            p = int(round(float(fields[4]), -1)) // 50 + 10000
            dataset.write(str(t) + ',' + str(h) + ',' + str(s) + ',' + str(p) + '\n')

### Algorithm: Apriori

Use the Python Implementation of Apriori Algorithm by Abhinav Saini.  
Ref: https://github.com/asaini/Apriori

In [11]:
!python2 apriori.py -f task2.apriori -s 0.01 -c 0.6

item: ('-10000', '10021') , 0.010
item: ('-10000', '10017') , 0.010
item: ('-9996', '14') , 0.010
item: ('-10000', '10013') , 0.010
item: ('8', '-15') , 0.010
item: ('10021', '-16') , 0.010
item: ('-9995', '-14') , 0.010
item: ('10024', '-9999') , 0.010
item: ('10018', '-17') , 0.010
item: ('10017', '13', '-9999') , 0.010
item: ('10', '-9997', '-15') , 0.010
item: ('13', '10020') , 0.011
item: ('-18', '10016') , 0.011
item: ('-13', '10018') , 0.011
item: ('13', '-9999', '10016') , 0.011
item: ('9', '10019') , 0.011
item: ('-18', '10015') , 0.011
item: ('-9996', '10016') , 0.011
item: ('-9996', '10015') , 0.011
item: ('-9996', '10017') , 0.011
item: ('8', '-9997') , 0.011
item: ('10025', '16') , 0.011
item: ('12', '-9999', '10015') , 0.011
item: ('10', '-9997', '-14') , 0.011
item: ('15', '-15') , 0.011
item: ('15', '10023') , 0.011
item: ('-10000', '10016') , 0.011
item: ('-18', '10014') , 0.011
item: ('-14', '10014') , 0.011
item: ('-10000', '10018') , 0.011
item: ('-19', '12') , 0.01

### Rules Discovered: 1 in total

minimum support = 0.01, minimum confidence = 0.6  
發現用電量為 1250～1350 萬瓩時，溫度為 30～35 度，此規則的可信度超過 80％。

In [12]:
def whatisthis(i):
    if i < 0:
        if i < -1000:
            return str(i + 10000) + 's'
        else:
            return str(i * -5) + 'h'
    else:
        if i < 1000:
            return str(i * 2) + 't'
        else:
            return str((i - 10000) * 50) + 'p'

In [13]:
rules = []
rules.append(((-18, 13), (-9999,), 0.615))
for antecedent, consequent, confidence in rules:
    ant = tuple(whatisthis(x) for x in antecedent)
    con = tuple(whatisthis(x) for x in consequent)
    print(ant, '->', con, 'conf =', confidence)

('90h', '26t') -> ('1s',) conf = 0.615


### What I Have Learned

When transforming continuous values into discrete values, fine granularity may not be a good idea.