# Rules and Patterns:

Using the plants dataset processed in Part A, in this notebook we shall implement Apriori algorithm. Further, we shall validate the "interesting-ness" of the rules and petterns generated using various measures.

In [35]:
import pandas as pd
import numpy as np

from apyori import apriori


In [11]:
dataset = pd.read_csv('plants1.csv', header=None)
dataset.head()

Unnamed: 0,0,1,2,3
0,0,0,0,0
1,0,2,1,0
2,0,2,2,0
3,0,2,3,0
4,0,2,3,0


**Association rules** are rules of the form X -> Y where X (the antecedent or LHS) and Y (the consequent or RHS) are distinct itemsets and where finding Y in transactions which also contain X is probable. The probability is controlled by measures of interest which can be used as constraints to mine association rules.

ref:https://michael.hahsler.net/research/recommender/associationrules.html


#### Support

sup(X -> Y) = sup(Y -> X) = P(X and Y)

 Its main feature is that it possesses the property of down-ward closure which means that all sub sets of a frequent set (support > min. support threshold) are also frequent.
 
 The disadvantage of support is the rare item problem. Items that occur very infrequently in the data set are pruned although they would still produce interesting and potentially valuable rules.

#### Confidence

conf(X -> Y) = P(Y | X) = P(X and Y)/P(X) 
             = sup(X -> Y)/sup(X)

Confidence is not down-ward closed and was developed together with support (the so-called support-confidence framework). While support is used to prune the search space and only leave potentially interesting rules, confidence is used in a second step to filter rules that exceed a min. confidence threshold. A problem with confidence is that it is sensitive to the frequency of the consequent (Y) in the data set. Caused by the way confidence is calculated, Ys with higher support will automatically produce higher confidence values even if they exists no association between the items.

#### Lift
lift(X -> Y) = lift(Y -> X) = P(X and Y)/(P(X)P(Y))
                            = conf(X -> Y)/sup(Y) = conf(Y -> X)/sup(X)

Lift measures how many times more often X and Y occur together than expected if they where statistically independent. Lift is not down-ward closed and does not suffer from the rare item problem.

#### Coverage

coverage(X) = P(X) = sup(X)

A simple measure of how often a item set appears in the data set.

#### Conviction

conviction(X -> Y) = P(X)P(not Y)/P(X and not Y)
                   =(1-sup(Y))/(1-conf(X -> Y))

Conviction compares the probability that X appears without Y if they were dependent with the actual frequency of the appearance of X without Y. In that respect it is similar to lift , however, it contrast to lift it is a directed measure. Furthermore, conviction is monotone in confidence and lift.

#### Leverage

leverage(X -> Y) = P(X and Y) - (P(X)P(Y))

Leverage measures the difference of X and Y appearing together in the data set and what would be expected if X and Y where statistically dependent.


In [12]:
# Check for the null values in the dataset and if present, remove them.
transactions = []
for index, data in dataset.iterrows():
    transaction = pd.Series.tolist(data[~pd.isnull(data)])
    transactions.append(set(transaction))

In [28]:
_association_rules = apriori(transactions, min_support=0.0025, min_confidence=0.8, min_lift=0, min_length=2)

association_rules = []

# Print the association rule
print("Association Rules: Min Support = 0.25%, Min Confidence = 80%", end='\n\n')
for association_rule in _association_rules:
    
    itemset = set([item for item in association_rule[0]])
    support = association_rule[1]

    precedent = set([item for item in association_rule[2][0][0]])
    antecedent = set([item for item in association_rule[2][0][1]])
    
    confidence = association_rule[2][0][2]
   
    lift = association_rule[2][0][3]
   
    association_rules.append((precedent, antecedent))
    
    print("{} => {}".format(precedent, antecedent))
    print("Support = {},\nConfidence = {},\nLift = {}\n".format(support, confidence, lift), end='\n\n')

Association Rules: Min Support = 0.25%, Min Confidence = 80%

{285} => {19}
Support = 0.005026755310523755,
Confidence = 1.0,
Lift = 7.367980884109916


{286} => {19}
Support = 0.002756607750932382,
Confidence = 1.0,
Lift = 7.367980884109916


{516} => {19}
Support = 0.003080914545159721,
Confidence = 0.95,
Lift = 6.99958183990442


{2986} => {120}
Support = 0.0037295281336143992,
Confidence = 0.8679245283018868,
Lift = 7.602969554030875


{3000} => {120}
Support = 0.003161991243716556,
Confidence = 0.9750000000000001,
Lift = 8.540944602272727


{999} => {135}
Support = 0.004134911626398573,
Confidence = 0.8360655737704918,
Lift = 9.223642922079827


{1625} => {135}
Support = 0.004621371817739582,
Confidence = 0.8636363636363638,
Lift = 9.527809399902425


{2841} => {135}
Support = 0.0037295281336143992,
Confidence = 0.9583333333333334,
Lift = 10.572525342874181


{2888} => {135}
Support = 0.0028376844494892167,
Confidence = 0.9459459459459459,
Lift = 10.435865203307065


{2964} => {13

In [41]:
class ARMInterestMeasures:
    
    def __init__(self, transactions, antecedent, consequent):
        ''' Compute necessary parameters involving antecedent, consequent from transactions '''
        
        self.transactions = transactions
        self.antecedent = antecedent
        self.consequent = consequent
        
        self.n_transactions = len(transactions) # Number of transactions in the database
        
        self.n_antecedent_present_trans = 0 # Number of transactions that contain antecedent
        self.n_consequent_present_trans = 0 # Number of transactions thar contain consequent
        self.n_consequent_absent_trans = 0 # Number of transactions that oppose the consequent
        self.n_support_trans = 0 # Number of transactions that support rule (A ^ B)
        self.n_oppose_trans = 0 # Number of transactions that oppose the rule (A ^ !B)
        
        for transaction in transactions:
            
            antecedent_present = self.antecedent <= transaction # Check if antecedent is subset of transaction
            consequent_present = self.consequent <= transaction # Check if consequent is subset of transaction
            
            if antecedent_present:
                self.n_antecedent_present_trans += 1
                
            if consequent_present:
                self.n_consequent_present_trans += 1
            else:
                self.n_consequent_absent_trans += 1
                
            if antecedent_present and consequent_present:
                self.n_support_trans += 1
                
            if antecedent_present and not consequent_present:
                self.n_oppose_trans += 1     
    
    def computeConviction(self):
        
        div = (self.n_oppose_trans/self.n_transactions)

        if div == 0:
          result = 0
        else:
          result = ((self.n_antecedent_present_trans/self.n_transactions)*(self.n_consequent_absent_trans/self.n_transactions))/ div
        return result
    
    def computeLeverage(self):
        
        return (self.n_support_trans/self.n_transactions) - (self.n_antecedent_present_trans/self.n_transactions)*(self.n_consequent_present_trans/self.n_transactions)
    
    def computeCoverage(self):
        
        return self.n_antecedent_present_trans/self.n_transactions
    

In [44]:
for precedent, antecedent in association_rules:
    
    print("{} => {}".format(precedent, antecedent))
    
    arm_interest_measures = ARMInterestMeasures(transactions, precedent, antecedent)
    print("Conviction = {}".format(arm_interest_measures.computeConviction()))
    print("Leverage = {}".format(arm_interest_measures.computeLeverage()))
    print("Coverage = {}".format(arm_interest_measures.computeCoverage()))
    print('\n')

{285} => {19}
Conviction = 0
Leverage = 0.004344512048823028
Coverage = 0.005026755310523755


{286} => {19}
Conviction = 0
Leverage = 0.0023824743493545643
Coverage = 0.002756607750932382


{516} => {19}
Conviction = 17.28555213231717
Leverage = 0.0026407576021269938
Coverage = 0.0032430679422733905


{2986} => {120}
Conviction = 6.707104635270681
Leverage = 0.0032389924150230875
Coverage = 0.004297065023512243


{3000} => {120}
Conviction = 35.43376033727906
Leverage = 0.002791775607043868
Coverage = 0.0032430679422733905


{999} => {135}
Conviction = 5.547073131182099
Leverage = 0.003686616775727338
Coverage = 0.004945678611966921


{1625} => {135}
Conviction = 6.668612507432029
Leverage = 0.00413633148750513
Coverage = 0.005351062104751094


{2841} => {135}
Conviction = 21.82455002432301
Leverage = 0.0033767715298075257
Coverage = 0.003891681530728069


{2888} => {135}
Conviction = 16.823090643748984
Leverage = 0.002565767900721418
Coverage = 0.0029998378466028863


{2964} => {135}