# Task 3

### Transaction Definition: (時間, 北區電量逆差) for each hour

現在，我想找出什麼時間缺電的情況最嚴重，以北區為例。  
電量逆差 ＝ 用電量 — 供電量  
Run the following query, store the results in TSV format.  
Note: Round `DATETIME` towards its nearest hour. (e.g. 2017/10/16 14:37:52 -> 2017/10/16 15:00:00)

```sql
SELECT DATE_FORMAT(DATE_ADD(Power.updateTime, INTERVAL 30 MINUTE), '%Y-%m-%d %H:00:00'),
       Power.northUsage - Power.northSupply
FROM Power
```

### Discretization: 時間只看月份及小時，電量逆差以五十萬瓩為一單位

我將時間中的月份和小時當作兩個獨立的特徵。  
將小時加上 666，電量逆差加上 66666，以便區分彼此。

In [1]:
import datetime
dataset = []
with open('task3.tsv') as file:
    for line in file:
        fields = line.split('\t')
        assert len(fields) == 2
        dt = datetime.datetime.strptime(fields[0], "%Y-%m-%d %H:%M:%S")
        m = dt.month
        h = dt.hour + 666
        p = int(round(float(fields[1]), -1)) // 50 + 66666
        dataset.append([m, h, p])

In [2]:
len(dataset)

6529

In [3]:
print(dataset[0:10])

[[9, 678, 66669], [9, 679, 66670], [9, 680, 66671], [9, 681, 66671], [9, 682, 66671], [9, 683, 66670], [9, 684, 66670], [9, 685, 66670], [9, 686, 66669], [9, 687, 66669]]


### Algorithm: FP-Growth

Use Orange3-Associate by Bioinformatics Laboratory, FRI UL.  
Ref: http://orange3-associate.readthedocs.org/

In [10]:
from orangecontrib.associate.fpgrowth import *
itemsets = dict(frequent_itemsets(dataset, 0.005))

In [11]:
len(itemsets)

187

In [12]:
rules = list(association_rules(itemsets, 0.7))

### Rules Discovered: 1 in total

minimum support = 0.005, minimum confidence = 0.7  
發現電量逆差為 300～350 萬瓩時，月份為八月，此規則的可信度約為 78％。

In [13]:
for antecedent, consequent, support, confidence in rules:
    print(antecedent, '->', consequent, 'supp =', support, 'conf =', confidence)

frozenset({66672}) -> frozenset({8}) supp = 61 conf = 0.782051282051282


In [16]:
for antecedent, consequent, support, confidence in rules:
    ant, = antecedent
    con, = consequent
    print((ant - 66666) * 50, '->', con, 'conf =', confidence)

300 -> 8 conf = 0.782051282051282


### Discretization: 時間只看季節及晝夜，電量逆差以五十萬瓩為一單位

有了上面的經驗，想將時間再分得粗一點。  
Note: 1 = spring, 2 = summer, 3 = autumn, 4 = winter, 666 = day, 667 = night

In [17]:
def get_season(month):
    return {
         1: 4,
         2: 4,
         3: 1,
         4: 1,
         5: 1,
         6: 2,
         7: 2,
         8: 2,
         9: 3,
        10: 3,
        11: 3,
        12: 4,
    }.get(month, -1)

In [18]:
import datetime
with open('task3.apriori', 'w') as dataset:
    with open('task3.tsv') as file:
        for line in file:
            fields = line.split('\t')
            assert len(fields) == 2
            dt = datetime.datetime.strptime(fields[0], "%Y-%m-%d %H:%M:%S")
            m = get_season(dt.month)
            h = 666 if (6 <= dt.hour < 18) else 667
            p = int(round(float(fields[1]), -1)) // 50 + 66666
            dataset.write(str(m) + ',' + str(h) + ',' + str(p) + '\n')

### Algorithm: Apriori

Use the Python Implementation of Apriori Algorithm by Abhinav Saini.  
Ref: https://github.com/asaini/Apriori

In [19]:
!python2 apriori.py -f task3.apriori -s 0.01 -c 0.7

item: ('2', '66672') , 0.010
item: ('4', '66671') , 0.010
item: ('66665', '4') , 0.011
item: ('666', '66667', '4') , 0.011
item: ('1', '666', '66669') , 0.011
item: ('666', '3', '66670') , 0.011
item: ('66665', '666', '3') , 0.012
item: ('66672',) , 0.012
item: ('1', '66669') , 0.012
item: ('667', '66669', '4') , 0.013
item: ('1', '666', '66666') , 0.014
item: ('667', '3', '66669') , 0.014
item: ('3', '66670') , 0.014
item: ('66665', '667', '2') , 0.014
item: ('666', '2', '66671') , 0.015
item: ('666', '66667', '2') , 0.016
item: ('667', '66664', '3') , 0.017
item: ('66665', '2') , 0.018
item: ('2', '66671') , 0.018
item: ('1', '666', '66667') , 0.018
item: ('66665', '667', '1') , 0.018
item: ('1', '667', '66667') , 0.019
item: ('666', '4', '66670') , 0.020
item: ('667', '66666', '4') , 0.020
item: ('667', '3', '66668') , 0.021
item: ('667', '66670') , 0.021
item: ('667', '1', '66666') , 0.022
item: ('1', '666', '66668') , 0.022
item: ('66664', '3') , 0.024

### Rules Discovered: 16 in total

minimum support = 0.01, minimum confidence = 0.7  
發現電量逆差為 250～300 萬瓩時，正值白天，此規則的可信度約為 85％；  
　　電量逆差為 300～350 萬瓩時，正值夏季，此規則的可信度約為 85％。  
發現春季、電量逆差為 150～200 萬瓩時，正值白天，此規則的可信度約為 90％；  
　　夏季、電量逆差為 250～300 萬瓩時，正值白天，此規則的可信度約為 82％；  
　　秋季、電量逆差為 200～250 萬瓩時，正值白天，此規則的可信度約為 81％。  
發現夏季、電量逆差為 -100～-50 萬瓩時，正值晚上，此規則的可信度約為 80％。

In [20]:
def whatisthis(i):
    if i == '1':
        return 'SPRING'
    elif i == '2':
        return 'SUMMER'
    elif i == '3':
        return 'AUTUMN'
    elif i == '4':
        return 'WINTER'
    elif i == '666':
        return 'DAY'
    elif i == '667':
        return 'NIGHT'
    else:
        return str((int(i) - 66666) * 50)

In [21]:
rules = []
rules.append((('4', '66670'), ('666',), 0.701))
rules.append((('66665', '3'), ('667',), 0.702))
rules.append((('66667', '2'), ('667',), 0.717))
rules.append((('66665',), ('667',), 0.719))
rules.append((('66664', '3'), ('667',), 0.721))
rules.append((('66664',), ('667',), 0.756))
rules.append((('66666', '4'), ('667',), 0.767))
rules.append((('2', '66666'), ('667',), 0.770))
rules.append((('66670',), ('666',), 0.771))
rules.append((('2', '66670'), ('666',), 0.789))
rules.append((('66665', '2'), ('667',), 0.802))
rules.append((('3', '66670'), ('666',), 0.806))
rules.append((('2', '66671'), ('666',), 0.821))
rules.append((('66672',), ('2',), 0.846))
rules.append((('66671',), ('666',), 0.850))
rules.append((('1', '66669'), ('666',), 0.901))
for antecedent, consequent, confidence in rules:
    ant = tuple(whatisthis(x) for x in antecedent)
    con = tuple(whatisthis(x) for x in consequent)
    print(ant, '->', con, 'conf =', confidence)

('WINTER', '200') -> ('DAY',) conf = 0.701
('-50', 'AUTUMN') -> ('NIGHT',) conf = 0.702
('50', 'SUMMER') -> ('NIGHT',) conf = 0.717
('-50',) -> ('NIGHT',) conf = 0.719
('-100', 'AUTUMN') -> ('NIGHT',) conf = 0.721
('-100',) -> ('NIGHT',) conf = 0.756
('0', 'WINTER') -> ('NIGHT',) conf = 0.767
('SUMMER', '0') -> ('NIGHT',) conf = 0.77
('200',) -> ('DAY',) conf = 0.771
('SUMMER', '200') -> ('DAY',) conf = 0.789
('-50', 'SUMMER') -> ('NIGHT',) conf = 0.802
('AUTUMN', '200') -> ('DAY',) conf = 0.806
('SUMMER', '250') -> ('DAY',) conf = 0.821
('300',) -> ('SUMMER',) conf = 0.846
('250',) -> ('DAY',) conf = 0.85
('SPRING', '150') -> ('DAY',) conf = 0.901


### What I Have Learned

By choosing coarser granularity, I can get more association rules.  
Finding association rules may be easy, but finding meaningful association rules is difficult.