# Ruleminer tutorial

Welcome to the tutorial on ruleminer! 

This tutorial explains the most important features of the ruleminer package.

The ruleminer package works with Pandas DataFrames.

In [None]:
import pandas as pd
import numpy as np
import ruleminer

Let's construct a simple dataframe to do some rule mining.

In [None]:
col = ['Name', 'Type', 'Assets', 'TP-life', 'TP-nonlife', 'Own funds', 'Diversification','Excess']
insurers = [['Insurer  1', 'life insurer',     1000,  800,    0,  200,   12,  200], 
            ['Insurer  2', 'non-life insurer',   40,    0,   32,    8,    9,    8], 
            ['Insurer  3', 'non-life insurer',  800,    0,  700,  100,   -1,  100],
            ['Insurer  4', 'life insurer',       25,   18,    0,    7,    8,    7], 
            ['Insurer  5', 'non-life insurer', 2100,    0, 2200,  200,   12,  200], 
            ['Insurer  6', 'life insurer',      907,  887,    0,   20,    7,   20],
            ['Insurer  7', 'life insurer',     7123,    0, 6800,  323,    5,  323],
            ['Insurer  8', 'life insurer',     6100, 5920,    0,  180,   14,  180],
            ['Insurer  9', 'non-life insurer', 9011,    0, 8800,  211,   19,  211],
            ['Insurer 10', 'non-life insurer', 1034,    0,  901,  133,    1,  134]]
df = pd.DataFrame(columns = col, data = insurers)
df.set_index('Name', inplace = True)
df

Can we find the errors in this report?


### Rule with equal values

In [None]:
templates = [{'expression': '({".*"}=={".*"})'}]
r = ruleminer.RuleMiner(templates=templates, data=df)
r.rules

When using the equal-pattern you can define the accuracy of the equal pattern. For this you can use the decimal-parameter.

In [None]:
params = {'decimal': 0}
templates = [{'expression': '({".*"}=={".*"})'}]
r = ruleminer.RuleMiner(templates=templates, data=df, params=params)
r.rules

### Patterns with value constant value

In [None]:
params = {'filter': {'confidence': 0.5, 'abs support': 2}, 
          'decimal': 0}
templates = [{'expression': '({".*"}>=0)'}]
r = ruleminer.RuleMiner(templates=templates, data=df, params=params)
r.rules

So we have six patterns (for each column), with one exception, namely that the column 'diversification' contains one negative value.

### Sum-patterns

With an expression this would look like this:

In [None]:
params = {'filter': {'confidence': 0.5, 'abs support': 2}, 
          'decimal': 0}
templates = [{'expression': 'if ({"TP."}>0) then (({"TP."}+{".*"})=={".*"})'}]
r = ruleminer.RuleMiner(templates=templates, data=df, params=params)
r.rules

In [None]:
r.rules.values

### Conditional patterns


With the conditional pattern you can find conditional statements between columns, such as IF TV-life = 0 THEN TV-nonlife > 0:

In [None]:
params = {'filter': {'confidence': 0.5, 'abs support': 2}, 
          'decimal': 0}
templates = [{'expression': 'if ({"TP-life"}==0) then ({"TP-nonlife"} > 0)'}]
r = ruleminer.RuleMiner(templates=templates, data=df, params=params)
r.rules

# p2 = {'name'      : 'equal values',
#                           'expression'   : 'IF {"TV-life"} = 0 THEN {"TV-nonlife"} > 0',
#                           'parameters': {"min_confidence": 0.5,
#                                          "min_support"   : 2}}
# miner.find(p2)

The following metrics are currently available

* added value
* conviction
* casual confidence
* casual support
* lift
* relative support


## Background

Our approach to pattern mining is somewhat different from traditional association rules mining. Association rules work on a set of items (binary attributes). In the original definition, the items in the set are not linked to column names. However, often we want to find associations between the values of specific columns in a dataset. The pattern mining applied here finds patterns between the values of different columns in a dataset while using the basic measures of association rules mining like support and confidence.
