# Apriori: Frequent itemsets via the Apriori algorithm

Apriori function to extract frequent itemsets for association rule mining

```py
from mlxtend.frequent_patterns import apriori
```

### Example 1: Generating Frequent Itemsets

Source: https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/

The apriori function expects data in a one-hot encoded pandas DataFrame. Suppose we have the following JDM Cars data:

In [1]:
dataset = [
  ['Nissan Skyline', 'Toyota Supra', 'Mazda RX-7', 'Honda NSX', 'Subaru Impreza', 'Mitsubishi Lancer'],
  ['Toyota Supra', 'Mazda RX-7', 'Honda S2000', 'Subaru Impreza', 'Mitsubishi Lancer'],
  ['Nissan Skyline', 'Honda Civic', 'Subaru Impreza', 'Mitsubishi Lancer'],
  ['Nissan Skyline', 'Toyota AE86', 'Mazda MX-5', 'Subaru Impreza', 'Honda NSX'],
  ['Mazda MX-5', 'Toyota Supra', 'Toyota Supra', 'Subaru Impreza', 'Nissan Silvia', 'Honda S2000']
]

We can transform it into the right format via the TransactionEncoder as follows:

In [2]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)
df

Unnamed: 0,Honda Civic,Honda NSX,Honda S2000,Mazda MX-5,Mazda RX-7,Mitsubishi Lancer,Nissan Silvia,Nissan Skyline,Subaru Impreza,Toyota AE86,Toyota Supra
0,False,True,False,False,True,True,False,True,True,False,True
1,False,False,True,False,True,True,False,False,True,False,True
2,True,False,False,False,False,True,False,True,True,False,False
3,False,True,False,True,False,False,False,True,True,True,False
4,False,False,True,True,False,False,True,False,True,False,True


return the items and itemsets with at least 55% support:

In [3]:
from mlxtend.frequent_patterns import apriori

apriori(df, min_support=0.55)

Unnamed: 0,support,itemsets
0,0.6,(5)
1,0.6,(7)
2,1.0,(8)
3,0.6,(10)
4,0.6,"(8, 5)"
5,0.6,"(8, 7)"
6,0.6,"(8, 10)"


Instead of using column indices, use item names instead:

In [4]:
apriori(df, min_support=0.55, use_colnames=True)

Unnamed: 0,support,itemsets
0,0.6,(Mitsubishi Lancer)
1,0.6,(Nissan Skyline)
2,1.0,(Subaru Impreza)
3,0.6,(Toyota Supra)
4,0.6,"(Subaru Impreza, Mitsubishi Lancer)"
5,0.6,"(Subaru Impreza, Nissan Skyline)"
6,0.6,"(Subaru Impreza, Toyota Supra)"


### Example 2: Selecting and Filtering Results

Filtering itemsets of length 2 that have a support of at least 80 percent ~ 

First, we create the frequent itemsets via apriori and add a new column that stores the length of each itemset:

In [5]:
frequent_itemsets = apriori(df, min_support=0.55, use_colnames=True)
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))
frequent_itemsets

Unnamed: 0,support,itemsets,length
0,0.6,(Mitsubishi Lancer),1
1,0.6,(Nissan Skyline),1
2,1.0,(Subaru Impreza),1
3,0.6,(Toyota Supra),1
4,0.6,"(Subaru Impreza, Mitsubishi Lancer)",2
5,0.6,"(Subaru Impreza, Nissan Skyline)",2
6,0.6,"(Subaru Impreza, Toyota Supra)",2


Filter itemsets of length 2 that have a support of at least 50 percent:

In [6]:
frequent_itemsets[ (frequent_itemsets['length'] == 2) &
                   (frequent_itemsets['support'] >= 0.55) ]

Unnamed: 0,support,itemsets,length
4,0.6,"(Subaru Impreza, Mitsubishi Lancer)",2
5,0.6,"(Subaru Impreza, Nissan Skyline)",2
6,0.6,"(Subaru Impreza, Toyota Supra)",2


Similarly, using the Pandas API, we can select entries based on the "itemsets" column:

In [7]:
frequent_itemsets[ frequent_itemsets['itemsets'] == {'Nissan Skyline', 'Subaru Impreza'} ]

Unnamed: 0,support,itemsets,length
5,0.6,"(Subaru Impreza, Nissan Skyline)",2


### Frozensets

In [8]:
frequent_itemsets[ frequent_itemsets['itemsets'] == {'Nissan Skyline', 'Subaru Impreza'} ]

Unnamed: 0,support,itemsets,length
5,0.6,"(Subaru Impreza, Nissan Skyline)",2


is equivalent to any of the following three

In [9]:
frequent_itemsets[ frequent_itemsets['itemsets'] == {'Subaru Impreza', 'Nissan Skyline'} ]

Unnamed: 0,support,itemsets,length
5,0.6,"(Subaru Impreza, Nissan Skyline)",2


In [10]:
frequent_itemsets[ frequent_itemsets['itemsets'] == frozenset(('Subaru Impreza', 'Nissan Skyline')) ]

Unnamed: 0,support,itemsets,length
5,0.6,"(Subaru Impreza, Nissan Skyline)",2


In [11]:
frequent_itemsets[ frequent_itemsets['itemsets'] == frozenset(('Nissan Skyline', 'Subaru Impreza')) ]

Unnamed: 0,support,itemsets,length
5,0.6,"(Subaru Impreza, Nissan Skyline)",2


### Example 3: Working with Sparse Representations

To save memory, you may want to represent your transaction data in the sparse format. This is especially useful if you have lots of products and small transactions.

In [12]:
oht_ary = te.fit(dataset).transform(dataset, sparse=True)
sparse_df = pd.DataFrame.sparse.from_spmatrix(oht_ary, columns=te.columns_)
sparse_df

  sparse_df = pd.DataFrame.sparse.from_spmatrix(oht_ary, columns=te.columns_)


Unnamed: 0,Honda Civic,Honda NSX,Honda S2000,Mazda MX-5,Mazda RX-7,Mitsubishi Lancer,Nissan Silvia,Nissan Skyline,Subaru Impreza,Toyota AE86,Toyota Supra
0,0,True,0,0,True,True,0,True,True,0,True
1,0,0,True,0,True,True,0,0,True,0,True
2,True,0,0,0,0,True,0,True,True,0,0
3,0,True,0,True,0,0,0,True,True,True,0
4,0,0,True,True,0,0,True,0,True,0,True


In [13]:
apriori(sparse_df, min_support=0.55, use_colnames=True, verbose=1)

Processing 6 combinations | Sampling itemset size 32


Unnamed: 0,support,itemsets
0,0.6,(Mitsubishi Lancer)
1,0.6,(Nissan Skyline)
2,1.0,(Subaru Impreza)
3,0.6,(Toyota Supra)
4,0.6,"(Subaru Impreza, Mitsubishi Lancer)"
5,0.6,"(Subaru Impreza, Nissan Skyline)"
6,0.6,"(Subaru Impreza, Toyota Supra)"
