# Finding relationships between product groups

## 1. Business Understanding
### Objective
The primary objective of this analysis is to identify interesting relationships between product groups in the sales dataset, drone_prod_groups.csv. By utilizing association rule mining, we aim to uncover patterns in customer purchasing behavior that can help the company increase revenue.

## 2. Data Understanding 
### Dataset Overview
The dataset `drone_prod_groups.csv` consists of transaction-level data with the following structure:

- `ID`: The transaction ID (unique identifier for each purchase).
Prod1 to Prod20: Binary variables (0 or 1) indicating whether at least one product from each group was purchased in the transaction.
- `1`: At least one product from the group was purchased.
- `0`: No products from the group were purchased.


## 3. Data Preparation 
### Data cleaning and Transformation
- Dropping the `ID`column: Since the ID does not contribute to the analysis of the product associatios, it will be removed.
- Binary Conversion: Replace 1 with true and 0 with False

In [3]:
import pandas as pd
import numpy as np
df = pd.read_csv('drone_prod_groups.csv', sep=',')
df.head(10)

Unnamed: 0,ID,Prod1,Prod2,Prod3,Prod4,Prod5,Prod6,Prod7,Prod8,Prod9,...,Prod11,Prod12,Prod13,Prod14,Prod15,Prod16,Prod17,Prod18,Prod19,Prod20
0,1,0,0,0,0,0,0,0,0,1,...,0,0,0,0,1,0,0,0,0,1
1,2,0,1,0,0,0,0,0,0,1,...,0,0,0,0,1,1,1,1,1,1
2,3,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,1,1
3,4,1,0,0,1,0,0,0,0,0,...,1,0,0,0,0,0,0,0,1,1
4,5,0,0,0,0,0,0,0,0,1,...,0,0,0,0,1,0,0,0,1,1
5,6,0,1,0,0,0,0,1,0,0,...,1,0,0,0,0,0,0,0,0,0
6,7,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,1,1
7,8,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,1,0,0,0,0
8,9,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,10,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [4]:
# drop id
df = df.drop(columns='ID')

# replace NaN values with False
df = df.fillna(False)

df = df.map(lambda x: True if x == 1 else False)
df.head(10)

Unnamed: 0,Prod1,Prod2,Prod3,Prod4,Prod5,Prod6,Prod7,Prod8,Prod9,Prod10,Prod11,Prod12,Prod13,Prod14,Prod15,Prod16,Prod17,Prod18,Prod19,Prod20
0,False,False,False,False,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,True
1,False,True,False,False,False,False,False,False,True,False,False,False,False,False,True,True,True,True,True,True
2,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,True
3,True,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,True
4,False,False,False,False,False,False,False,False,True,False,False,False,False,False,True,False,False,False,True,True
5,False,True,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,False
6,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,True
7,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False
8,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
9,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False


## 4. Modelling
### Association Rule Mining 
Using the apriori algorithm, we will derive frequent itemsets from the binary data, followed by generating association rules.

Key Steps:

- Set a minimum support threshold (e.g., 0.1) to find frequent itemsets.
- Generate association rules based on confidence levels to identify strong relationships.

In [5]:
from mlxtend.frequent_patterns import apriori
# find frequent itemsets
frequent_itemsets = apriori(df, min_support=0.1, use_colnames=True)
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.10998,(Prod1)
1,0.13098,( Prod2)
2,0.10459,( Prod5)
3,0.13499,( Prod7)
4,0.16179,( Prod8)
5,0.19853,( Prod9)
6,0.10848,( Prod11)
7,0.15971,( Prod12)
8,0.14557,( Prod14)
9,0.1188,( Prod15)


In [6]:
from mlxtend.frequent_patterns import association_rules

# generate association rules

rules = association_rules(frequent_itemsets, metric='confidence', min_threshold=0.5)

# sort in descending order of confidence
rules = rules.sort_values(by='confidence', ascending=False)

rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,( Prod15),( Prod9),0.1188,0.19853,0.11145,0.938131,4.725388,0.087865,12.954372,0.894663
2,( Prod20),( Prod19),0.14798,0.20626,0.13476,0.910664,4.415125,0.104238,8.884845,0.907849
3,( Prod19),( Prod20),0.20626,0.14798,0.13476,0.65335,4.415125,0.104238,2.457869,0.974508
1,( Prod9),( Prod15),0.19853,0.1188,0.11145,0.561376,4.725388,0.087865,2.009011,0.983664


In [7]:
rules = association_rules(frequent_itemsets, metric='lift', min_threshold=2)

# sort in descending order of lift
rules = rules.sort_values(by='lift', ascending=False)
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,( Prod15),( Prod9),0.1188,0.19853,0.11145,0.938131,4.725388,0.087865,12.954372,0.894663
1,( Prod9),( Prod15),0.19853,0.1188,0.11145,0.561376,4.725388,0.087865,2.009011,0.983664
2,( Prod20),( Prod19),0.14798,0.20626,0.13476,0.910664,4.415125,0.104238,8.884845,0.907849
3,( Prod19),( Prod20),0.20626,0.14798,0.13476,0.65335,4.415125,0.104238,2.457869,0.974508


## 5. Evaluation
### Output breakdown 
1. `Antecedent`: The second element (e.g., (Prod9), (Prod15)) represents the antecedent (the "if" part of the rule). This indicates the product or products purchased.
2. `Consequent`: The third element (e.g., (Prod15), (Prod9)) represents the consequent (the "then" part of the rule). This indicates the product or products that are likely to be purchased as a result of the antecedent.
3. `Support`: The first numeric value following the antecedent and consequent (e.g., 0.19853, 0.11880, etc.) indicates the support of the rule. This is the proportion of transactions in the dataset that include both the antecedent and the consequent. Higher values indicate that the rule is common across transactions.
4. `Confidence`: The second numeric value (e.g., 0.11880, 0.19853, etc.) indicates the confidence of the rule. This measures how often the consequent is purchased when the antecedent is present. A higher value suggests a strong relationship.
5. `Lift`: The next value (e.g., 4.725388, 0.561376, etc.) is the lift of the rule. Lift compares the observed support of the rule with the expected support if the two items were independent. A lift greater than 1 indicates a positive correlation (the items are often purchased together), while a lift less than 1 suggests that the items are negatively correlated.
6. Other Metrics: The subsequent values (e.g., 0.087865, 2.009011, 0.983664, etc.) likely represent additional metrics related to the strength and significance of the rules, such as:
- `Conviction`: A measure of how much more likely the antecedent is to occur if the consequent is also present.
- `Leverage`: Indicates the difference between the observed frequency of the rule and the expected frequency if the items were independent.
- `Correlation`: Indicates the degree to which the antecedent and consequent vary together.

### Interpretation of the rules 
1. Rule 0 : If Prod9 is purchased, then Prod15 is likely to be purchased.
- Support: 19.85% of all transactions include both products.
- Confidence: 11.88% of transactions that include Prod9 also include Prod15.
- Lift: 4.73 (indicates a strong positive association between the two products).

2. Rule 1: If Prod15 is purchased, then Prod9 is likely to be purchased.
- Support: 11.88% of all transactions include both products.
- Confidence: 19.85% of transactions that include Prod15 also include Prod9.
- Lift: 4.73 (indicates a strong positive association).

3. Rule 2: If Prod19 is purchased, then Prod20 is likely to be purchased.
- Support: 20.63% of all transactions include both products.
- Confidence: 14.80% of transactions that include Prod19 also include Prod20.
- Lift: 4.42 (strong positive association).

4. Rule 3 If Prod20 is purchased, then Prod19 is likely to be purchased.
- Support: 14.80% of all transactions include both products.
- Confidence: 20.63% of transactions that include Prod20 also include Prod19.
- Lift: 4.42 (strong positive association).



## 6. Deployment

Based on the rules, the following recommendations can be made:

1. Cross-promotion Strategies: For example prod9 and prod15, if a customer buys one, suggest the other as complementary purchase.
2. Bundling products: Create Bundled offers that encourage customers to purchase both products together.
3. Inventory management: Adjust inventoru leves to ensure that associated products are stocked together, reducing risk of stockouts for high-demand pairs.