# Phone Faceplates

A store that sells accessories for cellular phones runs a promotion on faceplates. Customers who purchase multiple faceplates from a choice of six different colors get a discount. The store managers, who would like to know what colors of faceplates customers are likely to purchase together, collected the transaction database, the data is stored in the Faceplate.csv.

In [16]:
import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

In [23]:
# Load and preprocess data set 
fp_df = pd.read_csv('data/Faceplate.csv',dtype={"Red":bool,"White":bool,"Blue":bool,"Orange":bool,"Green":bool,"Yellow":bool})
fp_df.set_index('Transaction', inplace=True)
print(fp_df)

               Red  White   Blue  Orange  Green  Yellow
Transaction                                            
1             True   True  False   False   True   False
2            False   True  False    True  False   False
3            False   True   True   False  False   False
4             True   True  False    True  False   False
5             True  False   True   False  False   False
6            False   True   True   False  False   False
7             True  False   True   False  False   False
8             True   True   True   False   True   False
9             True   True   True   False  False   False
10           False  False  False   False  False    True


In [24]:
fp_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 1 to 10
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   Red     10 non-null     bool 
 1   White   10 non-null     bool 
 2   Blue    10 non-null     bool 
 3   Orange  10 non-null     bool 
 4   Green   10 non-null     bool 
 5   Yellow  10 non-null     bool 
dtypes: bool(6)
memory usage: 140.0 bytes


### ```apriori()```
Apriori function to extract frequent itemsets for association rule mining

http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/

```min_support = transactions_where_item(s)_occur / total_transactions.```

In [25]:
# create frequent itemsets
itemsets = apriori(fp_df, min_support=0.2, use_colnames=True)
itemsets['length'] = itemsets['itemsets'].apply(lambda x: len(x))
print(itemsets)
#print (itemsets.sort_values(by='support',ascending=False))
print (itemsets.shape)

    support             itemsets  length
0       0.6                (Red)       1
1       0.7              (White)       1
2       0.6               (Blue)       1
3       0.2             (Orange)       1
4       0.2              (Green)       1
5       0.4         (Red, White)       2
6       0.4          (Red, Blue)       2
7       0.2         (Red, Green)       2
8       0.4        (White, Blue)       2
9       0.2      (Orange, White)       2
10      0.2       (White, Green)       2
11      0.2   (Red, Blue, White)       3
12      0.2  (Red, Green, White)       3
(13, 3)


In [26]:
itemsets[ (itemsets['length'] == 2) &
                   (itemsets['support'] >= 0.3) ]

Unnamed: 0,support,itemsets,length
5,0.4,"(Red, White)",2
6,0.4,"(Red, Blue)",2
8,0.4,"(White, Blue)",2


### ```association_rules```
Function to generate association rules from frequent itemsets


https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/


#### Supported Metrics 

The **support** metric is defined for itemsets, not assocication rules. The table produced by the association rule mining algorithm contains three different support metrics: 'antecedent support', 'consequent support', and 'support'. Here, 'antecedent support' computes the proportion of transactions that contain the antecedent A, and 'consequent support' computes the support for the itemset of the consequent C. The 'support' metric then computes the support of the combined itemset A ∪ C -- note that 'support' depends on 'antecedent support' and 'consequent support' via min('antecedent support','consequent support').

Typically, support is used to measure the abundance or frequency (often interpreted as significance or importance) of an itemset in a database. We refer to an itemset as a "frequent itemset" if you support is larger than a specified minimum-support threshold. Note that in general, due to the downward closure property, all subsets of a frequent itemset are also frequent.

The **confidence** of a rule A->C is the probability of seeing the consequent in a transaction given that it also contains the antecedent. Note that the metric is not symmetric or directed; for instance, the confidence for A->C is different than the confidence for C->A. The confidence is 1 (maximal) for a rule A->C if the consequent and antecedent always occur together. 

The **lift** metric is commonly used to measure how much more often the antecedent and consequent of a rule A->C occur together than we would expect if they were statistically independent. If A and C are independent, the Lift score will be exactly 1.

**Leverage** computes the difference between the observed frequency of A and C appearing together and the frequency that would be expected if A and C were independent. A leverage value of 0 indicates independence.

A high **conviction** value means that the consequent is highly depending on the antecedent. For instance, in the case of a perfect confidence score, the denominator becomes 0 (due to 1 - 1) for which the conviction score is defined as 'inf'. Similar to lift, if items are independent, the conviction is 1.

In [27]:
# convert into rules
rules = association_rules(itemsets, metric='lift', min_threshold=0.5)
rules.sort_values(by=['lift'], ascending=False).head(6)
print(rules.sort_values(by=['lift'], ascending=False)
      .drop(columns=['antecedent support'
                     ,'consequent support'
                     , 'conviction'])
      .head(6))

       antecedents     consequents  support  confidence      lift  leverage  \
22         (Green)    (Red, White)      0.2    1.000000  2.500000      0.12   
19    (Red, White)         (Green)      0.2    0.500000  2.500000      0.12   
4            (Red)         (Green)      0.2    0.333333  1.666667      0.08   
5          (Green)           (Red)      0.2    1.000000  1.666667      0.08   
21           (Red)  (Green, White)      0.2    0.333333  1.666667      0.08   
20  (Green, White)           (Red)      0.2    1.000000  1.666667      0.08   

    zhangs_metric  
22           0.75  
19           1.00  
4            1.00  
5            0.50  
21           1.00  
20           0.50  


**Rule #21:** if green, then white and red,” meaning that if a green faceplate is purchased, a white and a red one are purchased as well. Here the antecedent is green and the consequent is white and red.  

**Rule #4:** If green is purchased, then with confidence 100% red will also be purchased. This rule has a lift ratio of 1.66.