# Frequent Itemset mining

## Mlxtend package

`mlxtend` (Machine Learning Extensions) is a python library that provides additional utility functions for machine learning tasks. The package can be installed using the following command.

    pip install mlxtend

### Apriori Algorithm

The `mlxtend` package provides an implementation of Apriori algorithm. The `apriori` function defined in the `frequent_patterns` module implements the apriori algorithm.

In [1]:
from mlxtend.frequent_patterns import apriori

## Consider a transaction database
dataset = [['Milk', 'Bread', 'Butter', 'Cheese', 'Paneer', 'Curd'],
           ['Barfi', 'Bread', 'Butter', 'Cheese', 'Paneer', 'Curd'],
           ['Milk', 'Peda', 'Cheese', 'Paneer'],
           ['Milk', 'Barfi', 'Peda', 'Cheese', 'Curd'],
           ['Curd', 'Bread', 'Cheese', 'Ice cream', 'Paneer']]

The `apriori` function requires that the transactions are coded using one-hot encoded 
pandas dataframe. We, therefore, need to transform the dataset into the required format.
 
This can be achieved using the `TransactionEncoder` class defined in the `preprocessing`
module of `mlxtend`.

In [2]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()  # instantiate TransactionEncoder
te = te.fit(dataset)  # train transaction encoder
CodedDataset = te.transform(dataset)
df = pd.DataFrame(CodedDataset, columns=te.columns_) # transform into DataFrame
df

Unnamed: 0,Barfi,Bread,Butter,Cheese,Curd,Ice cream,Milk,Paneer,Peda
0,False,True,True,True,True,False,True,True,False
1,True,True,True,True,True,False,False,True,False
2,False,False,False,True,False,False,True,True,True
3,True,False,False,True,True,False,True,False,True
4,False,True,False,True,True,True,False,True,False


Observe that the columns of the DataFrame have been arranged in alphabetical order of item names.

Next use `apriori` function to obtain the frequent itemsets.

In [3]:
FreqSets = apriori(df, min_support=0.6, use_colnames=True)
FreqSets

Unnamed: 0,support,itemsets
0,0.6,(Bread)
1,1.0,(Cheese)
2,0.8,(Curd)
3,0.6,(Milk)
4,0.8,(Paneer)
5,0.6,"(Cheese, Bread)"
6,0.6,"(Curd, Bread)"
7,0.6,"(Paneer, Bread)"
8,0.8,"(Curd, Cheese)"
9,0.6,"(Milk, Cheese)"


Note the use of `use_colnames` parameter supplied to the `apriori` function. By default the value of `use_colnames` parameter is False, and the items are represented in terms of their index in the output as shown below.

In [4]:
apriori(df, min_support=0.6)

Unnamed: 0,support,itemsets
0,0.6,(1)
1,1.0,(3)
2,0.8,(4)
3,0.6,(6)
4,0.8,(7)
5,0.6,"(1, 3)"
6,0.6,"(1, 4)"
7,0.6,"(1, 7)"
8,0.8,"(3, 4)"
9,0.6,"(3, 6)"


### Mining Strong Association Rules

Association rules can be mined based on the frequent itemsets that have been mined. This can be achieved using `association_rules` function is module `frequent_patterns`.

In [5]:
from mlxtend.frequent_patterns import association_rules
association_rules(FreqSets, min_threshold=0.8)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Bread),(Cheese),0.6,1.0,0.6,1.0,1.0,0.0,inf
1,(Bread),(Curd),0.6,0.8,0.6,1.0,1.25,0.12,inf
2,(Bread),(Paneer),0.6,0.8,0.6,1.0,1.25,0.12,inf
3,(Curd),(Cheese),0.8,1.0,0.8,1.0,1.0,0.0,inf
4,(Cheese),(Curd),1.0,0.8,0.8,0.8,1.0,0.0,1.0
5,(Milk),(Cheese),0.6,1.0,0.6,1.0,1.0,0.0,inf
6,(Cheese),(Paneer),1.0,0.8,0.8,0.8,1.0,0.0,1.0
7,(Paneer),(Cheese),0.8,1.0,0.8,1.0,1.0,0.0,inf
8,"(Curd, Bread)",(Cheese),0.6,1.0,0.6,1.0,1.0,0.0,inf
9,"(Cheese, Bread)",(Curd),0.6,0.8,0.6,1.0,1.25,0.12,inf


In [6]:
int_rules = association_rules(FreqSets, min_threshold=0.8).query('lift > 1').reset_index()
int_rules

Unnamed: 0,index,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,1,(Bread),(Curd),0.6,0.8,0.6,1.0,1.25,0.12,inf
1,2,(Bread),(Paneer),0.6,0.8,0.6,1.0,1.25,0.12,inf
2,9,"(Cheese, Bread)",(Curd),0.6,0.8,0.6,1.0,1.25,0.12,inf
3,10,(Bread),"(Curd, Cheese)",0.6,0.8,0.6,1.0,1.25,0.12,inf
4,12,"(Cheese, Bread)",(Paneer),0.6,0.8,0.6,1.0,1.25,0.12,inf
5,13,(Bread),"(Cheese, Paneer)",0.6,0.8,0.6,1.0,1.25,0.12,inf
6,14,"(Curd, Paneer)",(Bread),0.6,0.6,0.6,1.0,1.666667,0.24,inf
7,15,"(Bread, Paneer)",(Curd),0.6,0.8,0.6,1.0,1.25,0.12,inf
8,16,"(Curd, Bread)",(Paneer),0.6,0.8,0.6,1.0,1.25,0.12,inf
9,17,(Bread),"(Curd, Paneer)",0.6,0.6,0.6,1.0,1.666667,0.24,inf


### Home Work
Find out more about the measures **leverage** and **conviction**.