# Pattern mining tutorial

Welcome to the tutorial on pattern mining! 

This tutorial explains the most important features of the data-patterns package.

The data-pattern-package works with Pandas DataFrames.

In [None]:
import pandas as pd
import data_patterns
GLOBALS = {'data_patterns': data_patterns}

Let's construct a simple dataframe to do some pattern mining.

In [None]:
col = ['Name', 'Type', 'Assets', 'TV-life', 'TV-nonlife', 'Own funds', 'Diversification','Excess']
insurers = [['Insurer  1', 'life insurer',     1000,  800,    0,  200,   12,  200], 
            ['Insurer  2', 'non-life insurer',   40,    0,   32,    8,    9,    8], 
            ['Insurer  3', 'non-life insurer',  800,    0,  700,  100,   -1,  100],
            ['Insurer  4', 'life insurer',       25,   18,    0,    7,    8,    7], 
            ['Insurer  5', 'non-life insurer', 2100,    0, 2200,  200,   12,  200], 
            ['Insurer  6', 'life insurer',      907,  887,    0,   20,    7,   20],
            ['Insurer  7', 'life insurer',     7123,    0, 6800,  323,    5,  323],
            ['Insurer  8', 'life insurer',     6100, 5920,    0,  180,   14,  180],
            ['Insurer  9', 'non-life insurer', 9011,    0, 8800,  211,   19,  211],
            ['Insurer 10', 'non-life insurer', 1034,    0,  901,  133,    1,  134]]
df = pd.DataFrame(columns = col, data = insurers)
df.set_index('Name', inplace = True)
df

Can we find the errors in this report?


### Patterns with value constant value

To find patterns you need to construct a PatternMiner-object and input a pattern definition. Then you can use the find-function. The result is a Pandas DataFrame with the patterns that were found.

First of all, let's find patterns for whether values are positive or negative.

In [None]:
p1 = {'name'      : 'positive values', 
      'pattern'   : '>=',
      'value'     : 0,
      'parameters': {'min_confidence': 0.5,
                     'min_support'   : 2}}
miner = data_patterns.PatternMiner(p1)
miner.find(df)

So we have six patterns (for each column), with one exception, namely that the column 'diversification' contains one negative value.

### Patterns with equal values

Now, let's find patterns with equal columns.

In [None]:
parameters = {'min_confidence': 0.5,'min_support'   : 2}
p2 = {'name'      : 'equal values', 
      'pattern'   : '=',
      'parameters': parameters}
miner = data_patterns.PatternMiner(p2)
miner.find(df)

When using the equal-pattern you can define the accuracy of the equal pattern. For this you can use the decimal-parameter.

In [None]:
parameters = {'min_confidence': 0.5, 'min_support': 2, 'decimal': -1}

If we now run the miner with the alternative 

In [None]:
p2_alt = {'name'      : 'equal values', 
          'pattern'   : '=',
          'parameters': parameters}
miner = data_patterns.PatternMiner(p2_alt)
miner.find(df)

### Sum-patterns

To find sum-pattern you can use

In [None]:
p3 = {'name'   : 'sum pattern',
      'pattern': 'sum',
      'parameters': {"min_confidence": 0.5,
                     "min_support"   : 1}}
miner = data_patterns.PatternMiner(p3)
miner.find(df)

### Patterns in whether cells are reported or not

Suppose we expect a relation or association between Feature 1 and Feature 2. For this, we can now define a metapattern and initialize a PatternMiner-object with this metapattern.

In [None]:
p4 = {'name'     : 'type pattern',
      'P_columns': ['Type'],
      'Q_columns': ['Assets', 'TV-life', 'TV-nonlife', 'Own funds'],
      'encode'   : {'Assets'    : data_patterns.reported,
                    'TV-life'   : data_patterns.reported,
                    'TV-nonlife': data_patterns.reported,
                    'Own funds' : data_patterns.reported}}
p = data_patterns.PatternMiner(p4)
p.find(df)

### Combining patterns 

You can run the miner with a list of pattern definitions.

In [None]:
miner = data_patterns.PatternMiner([p1, p2, p3, p4])
df_patterns = miner.find(df)

In [None]:
df_patterns

### Getting different codings of patterns

Now that we have the patterns we can transform then to different codings. 

The Pandas code of the exceptions of 7-th pattern is

In [None]:
pattern_text = df_patterns.loc[12, 'pandas co']
print(pattern_text)

You can evaluate the Pandas code directly with the eval-function inside Python.

In [None]:
eval(pattern_text, GLOBALS, {'df': df})

The code for the XBRL-validation of the confirmation of this pattern is

In [None]:
df_patterns.loc[12, 'xbrl co']

### Analyzing results

If you want to know the results of the patterns per insurer then you can use the analyze-function.

In [None]:
df_results = miner.analyze(df)

df_results is a proper Pandas DataFrame, so you can do the usual stuff with it. For example all exceptions to the patterns.

In [None]:
df_results[df_results['result_type']==False]

### Export to and import from Excel

You can export the DataFrame with the patterns with the to_excel-function. This produces an Excel file in a humanly readable format.

In [None]:
df_patterns.to_excel(filename = "patterns.xlsx")

And you can read the Excel with the patterns into the PatternMiner-object in the following way.

In [None]:
p = data_patterns.PatternMiner(df_patterns = data_patterns.read_excel(filename = "export.xlsx"))

In [None]:
df_patterns = p.update_statistics(df)
df_patterns

## Background

Our approach to pattern mining is somewhat different from traditional association rules mining. Association rules work on a set of items (binary attributes). In the original definition, the items in the set are not linked to column names. However, often we want to find associations between the values of specific columns in a dataset. The pattern mining applied here finds patterns between the values of different columns in a dataset while using the basic measures of association rules mining like support and confidence.
