# Topic: Association Rules

# ---------------------------------------------------------------------------

## 5. A retail store in India, has its transaction data, and it would like to know the buying pattern of the consumers in its locality, you have been assigned this task to provide the manager with rules on how the placement of products needs to be there in shelves so that it can improve the buying patterns of consumes and increase customer footfall. 


## -----------------------------------------------------------------------------------

### Business objective: To help the retail store to know the buying pattern of the consumers in its locality.  To apply Association Rule Algorithm, explain how the placement of products needs to be there in shelves so that it can improve the buying patterns of consumes and increase customer footfall.

## -------------------------------------------------------------------------------------------

Import libraries:

In [6]:
import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori, association_rules
import matplotlib.pyplot as plt

Load Dataset:

In [7]:
df = pd.read_csv('G:/transactions_retail.csv', sep=',') 

Print top 5 rows: 

In [8]:
df.head(5)

Unnamed: 0,0,1,2,3,4,5
0,'HANGING','HEART','HOLDER','T-LIGHT','WHITE',
1,'LANTERN','METAL','WHITE',,,
2,'COAT','CREAM','CUPID','HANGER','HEARTS',
3,'BOTTLE','FLAG','HOT','KNITTED','UNION','WATER'
4,'HEART.','HOTTIE','RED','WHITE','WOOLLY',


###### Each row of the dataset represents items that were purchased together on the same day at the same store.The dataset is a sparse dataset as relatively high percentage of data is NA or NaN or equivalent.

###### These NaNs make it hard to read the table. Let’s find out how many unique items are actually there in the table.

In [9]:
items = (df['0'].unique())
items

array(["'HANGING'", "'LANTERN'", "'COAT'", "'BOTTLE'", "'HEART.'", "'7'",
       "'FROSTED'", "'HAND'", "'DOT'", "'ASSORTED'", "'BEDROOM'",
       "'KITCHEN'", "'CHARLOTTE'", "'COSY'", "'6'", "'BLOCKS'",
       "'ALPHABET'", "'BLOCK'", "'BOX'", "'DOORMAT'", "'JAM'", "'BLUE'",
       "'BATH'", "'ALARM'", "'AND'", "'GIFT'", "'GLOBE'", "'RED'",
       "'BOXES'", "'BAG'", "'LED'", "'2'", "'TOWELS'", "'JIGSAW'",
       "'CIRCUS'", "'MINI'", "'POSTAGE'", "'50''S'", "'EDWARDIAN'",
       "'MUG'", "'BILLBOARD'", "'ANT'", "'FINISH'", "'ANTIQUE'", "'3'",
       "'PAPER'", "'72'", "'60'", "'PINK'", "'CHARLIE+LOLA'", "'&'",
       "'11'", "'GIRLY'", "'JUMBO'", "'AIRLINE'", "'CERAMIC'",
       "'ACRYLIC'", "'CLIP'", "'CHICKEN'", "'BANK'", "'CONFUSING'",
       "'COOK'", "'+'", "'CHAIN'", "'FLOWERS'", "'CANDLE'", "'JUG'",
       "'BLACK'", "'ART'", "'CLOCHE'", "'Discount'", "'CALCULATOR'",
       "'GARDEN'", "'TIN'", "'WICKER'", "'COLOUR'", "'CHALKBOARD'",
       "'HEART'", "'BIRDCAGE'", "'CARD'", "

In [13]:
len(items)

949

######  *There are 949 items in total that make up the entire dataset.

### Data Preprocessing: 

To make use of the apriori module given by mlxtend library, we need to convert the dataset according to it’s liking. apriori module requires a dataframe that has either 0 and 1 or True and False as data. The data we have is all string (name of items), we need to One Hot Encode the data.

##### Custom One Hot Encoding: 

In [None]:
itemset = set(items)
encoded_vals = []
for index, row in df.iterrows():
    rowset = set(row) 
    labels = {}
    uncommons = list(itemset - rowset)
    commons = list(itemset.intersection(rowset))
    for uc in uncommons:
        labels[uc] = 0
    for com in commons:
        labels[com] = 1
    encoded_vals.append(labels)
encoded_vals[0]
ohe_df = pd.DataFrame(encoded_vals)

# Due to my low RAM processing capacity, i could'nt complete the code. apologies from my end. Rest of the code is mentioned below to show my approach towards this problem.

In [None]:
columns_to_keep = [x for x in range(ohe_df.shape[1]) if len(x) > 2 ]

### Applying Apriori:

In [1]:
freq_items = apriori(ohe_df, min_support = 0.0075, max_len = 4, use_colnames=True, verbose=1)
freq_items.head(10)

NameError: name 'apriori' is not defined

#### Mining Association Rules:

#### Most Frequent item sets based on support:

In [42]:
freq_items.sort_values('support', ascending = False, inplace = True)

In [None]:
plt.bar(x = list(range(0, 11)), height = freq_items.support[0:11], color ='rgmyk')
plt.xticks(list(range(0, 11)), freq_items.itemsets[0:11], rotation=20)
plt.xlabel('item-sets')
plt.ylabel('support')
plt.rcParams["figure.figsize"]= 20, 10

#### Rules:

In [None]:
rules = association_rules(freq_items, metric = "lift", min_threshold = 1)
rules.head(20)
rules.sort_values('lift', ascending = False).head(10)

The result of association analysis shows which item is frequently purchased with other items.

## Analysis of Rules:

### --------------------EXTRA PART-------------------------

In [31]:
def to_list(i):
    return (sorted(list(i)))

In [32]:
ma_X = rules.antecedents.apply(to_list) + rules.consequents.apply(to_list)
ma_X = ma_X.apply(sorted)
rules_sets = list(ma_X)

In [33]:
unique_rules_sets = [list(m) for m in set(tuple(i) for i in rules_sets)]

In [34]:
index_rules = []
for i in unique_rules_sets:
    index_rules.append(rules_sets.index(i))

#### Getting rules without any redudancy: 

In [35]:
rules_no_redudancy = rules.iloc[index_rules, :]

#### Sorting them with respect to list and getting top 10 rules:

In [None]:
rules_no_redudancy.sort_values('lift', ascending = False)

##### Recommendations: