<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Example-1----Generating-Association-Rules-from-Frequent-Itemsets" data-toc-modified-id="Example-1----Generating-Association-Rules-from-Frequent-Itemsets-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Example 1 -- Generating Association Rules from Frequent Itemsets</a></span></li><li><span><a href="#Example-2----Rule-Generation-and-Selection-Criteria" data-toc-modified-id="Example-2----Rule-Generation-and-Selection-Criteria-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Example 2 -- Rule Generation and Selection Criteria</a></span></li></ul></div>

**Main Link this is taken from:**
    * http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/

**Overview:**  
    
    
    Apriori is a popular algorithm [1] for extracting frequent itemsets with applications in association rule learning. The apriori algorithm has been designed to operate on databases containing transactions, such as purchases by customers of a store. An itemset is considered as "frequent" if it meets a user-specified support threshold. For instance, if the support threshold is set to 0.5 (50%), a frequent itemset is defined as a set of items that occur together in at least 50% of all transactions in the database.
    


```
Association Rules Generation from Frequent Itemsets

Function to generate association rules from frequent itemsets

from mlxtend.frequent_patterns import association_rules
```



**Overview:**
    
    

1. Rule generation is a common task in the mining of frequent patterns.

1. An association rule is an implication expression of the form X→Y, where X and Y are disjoint itemsets [1]. 

1. A more concrete example based on consumer behaviour would be {Diapers}→{Beer} suggesting that people who buy diapers are also likely to buy beer.

1. To evaluate the "interest" of such an association rule, different metrics have been developed. 

1. The current implementation make use of the confidence and lift metrics.




```


Metrics
The currently supported metrics for evaluating association rules and setting selection thresholds are listed below. Given a rule "A -> C", A stands for antecedent and C stands for consequent.

'support':
support(A→C)=support(A∪C),range: [0,1]
introduced in [3]


```

The support metric is defined for itemsets, not assocication rules. The table produced by the association rule mining algorithm contains three different support metrics: 'antecedent support', 'consequent support', and 'support'. Here, 'antecedent support' computes the proportion of transactions that contain the antecedent A, and 'consequent support' computes the support for the itemset of the consequent C. The 'support' metric then computes the support of the combined itemset A ∪ C -- note that 'support' depends on 'antecedent support' and 'consequent support' via min('antecedent support', 'consequent support').

Typically, support is used to measure the abundance or frequency (often interpreted as significance or importance) of an itemset in a database. We refer to an itemset as a "frequent itemset" if you support is larger than a specified minimum-support threshold. Note that in general, due to the downward closure property, all subsets of a frequent itemset are also frequent.


### Example 1 -- Generating Association Rules from Frequent Itemsets


The generate_rules takes dataframes of frequent itemsets as produced by the apriori function in mlxtend.association. To demonstrate the usage of the generate_rules method, we first create a pandas DataFrame of frequent itemsets as generated by the apriori function:

In [1]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori


dataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Milk', 'Apple', 'Kidney Beans', 'Eggs'],
           ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'],
           ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']]

te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)
frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)

frequent_itemsets

Unnamed: 0,support,itemsets
0,0.8,(Eggs)
1,1.0,(Kidney Beans)
2,0.6,(Milk)
3,0.6,(Onion)
4,0.6,(Yogurt)
5,0.8,"(Kidney Beans, Eggs)"
6,0.6,"(Onion, Eggs)"
7,0.6,"(Kidney Beans, Milk)"
8,0.6,"(Kidney Beans, Onion)"
9,0.6,"(Kidney Beans, Yogurt)"


The generate_rules() function allows you to (1) specify your metric of interest and 
(2) the according threshold. 

Currently implemented measures are confidence and lift. 

Let's say you are interesting in rules derived from the frequent itemsets only if the level of confidence is above the 90 percent threshold (min_threshold=0.7):

In [2]:
from mlxtend.frequent_patterns import association_rules

association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Onion),(Kidney Beans),0.6,1.0,0.6,1.0,1.0,0.0,inf
1,"(Kidney Beans, Onion)",(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf
2,"(Kidney Beans, Eggs)",(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6
3,"(Onion, Eggs)",(Kidney Beans),0.6,1.0,0.6,1.0,1.0,0.0,inf
4,(Onion),"(Kidney Beans, Eggs)",0.6,0.8,0.6,1.0,1.25,0.12,inf
5,(Eggs),"(Kidney Beans, Onion)",0.8,0.6,0.6,0.75,1.25,0.12,1.6
6,(Kidney Beans),(Eggs),1.0,0.8,0.8,0.8,1.0,0.0,1.0
7,(Eggs),(Kidney Beans),0.8,1.0,0.8,1.0,1.0,0.0,inf
8,(Onion),(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf
9,(Eggs),(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6


### Example 2 -- Rule Generation and Selection Criteria


If you are interested in rules according to a different metric of interest, you can simply adjust the metric and min_threshold arguments . E.g. if you are only interested in rules that have a lift score of >= 1.2, you would do the following:

In [3]:
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.2)
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,"(Kidney Beans, Onion)",(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf
1,"(Kidney Beans, Eggs)",(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6
2,(Onion),"(Kidney Beans, Eggs)",0.6,0.8,0.6,1.0,1.25,0.12,inf
3,(Eggs),"(Kidney Beans, Onion)",0.8,0.6,0.6,0.75,1.25,0.12,1.6
4,(Onion),(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf
5,(Eggs),(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6


Pandas DataFrames make it easy to filter the results further. Let's say we are ony interested in rules that satisfy the following criteria:

at least 2 antecedents

a confidence > 0.75

a lift score > 1.2

We could compute the antecedent length as follows:
    
    

In [4]:
rules["antecedent_len"] = rules["antecedents"].apply(lambda x: len(x))
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len
0,"(Kidney Beans, Onion)",(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf,2
1,"(Kidney Beans, Eggs)",(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6,2
2,(Onion),"(Kidney Beans, Eggs)",0.6,0.8,0.6,1.0,1.25,0.12,inf,1
3,(Eggs),"(Kidney Beans, Onion)",0.8,0.6,0.6,0.75,1.25,0.12,1.6,1
4,(Onion),(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf,1
5,(Eggs),(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6,1


In [5]:

#Then, we can use pandas' selection syntax as shown below:

rules[ (rules['antecedent_len'] >= 2) &
       (rules['confidence'] > 0.75) &
       (rules['lift'] > 1.2) ]


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len
0,"(Kidney Beans, Onion)",(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf,2


In [6]:
# Similarly, using the Pandas API, we can select entries based on the "antecedents" or "consequents" columns:

rules[rules['antecedents'] == {'Eggs', 'Kidney Beans'}]

    
    

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len
1,"(Kidney Beans, Eggs)",(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6,2


```

Frozensets

Note that the entries in the "itemsets" column are of type frozenset, which is built-in Python type that is similar to a Python set but immutable, which makes it more efficient for certain query or comparison operations (https://docs.python.org/3.6/library/stdtypes.html#frozenset). Since frozensets are sets, the item order does not matter. I.e., the query

rules[rules['antecedents'] == {'Eggs', 'Kidney Beans'}]

is equivalent to any of the following three

rules[rules['antecedents'] == {'Kidney Beans', 'Eggs'}]
rules[rules['antecedents'] == frozenset(('Eggs', 'Kidney Beans'))]
rules[rules['antecedents'] == frozenset(('Kidney Beans', 'Eggs'))]

```
