
**Name: Tiffany Akwarandu**


# **Association Analysis**

## **Introduction**

Imagine you are a data analyst at **RetailInsight Analytics**, a consulting firm that specializes in helping retail businesses optimize their product offerings and store layouts. Your team has been tasked with developing a custom analytics solution for clients who want to understand customer purchasing behaviors better.

Association rule mining is a powerful technique for discovering relationships between products in large transactional datasets. Your manager has asked you to develop a set of custom functions for association analysis that the company can use for various client projects. This will allow RetailInsight Analytics to offer more tailored recommendations compared to off-the-shelf solutions.

In this assignment, you'll first implement the core functions for association analysis (Q1-Q4) using sample transaction data from the lecture

After developing and testing your functions on this sample data, you'll then apply these same functions to analyze real-world bakery transaction data (**Part 2**), delivering actionable business insights.


# **Part 1: Development of Association Analysis Functions**

## **Q1: Support Calculation**

Implement a function that calculates support for individual items and itemsets.
The function should:
- Take a list of transactions and an itemset
- Return the support as a decimal (e.g., 0.5 for 50%)

Example:
`support({'Bread'}) = 5/6`
`support({'Bread', 'Milk'}) = 4/6`


In [None]:
# Sample transaction data from the lecture
transactions = [
    {'Bread', 'Milk'},
    {'Bread', 'Diaper', 'Beer', 'Eggs'},
    {'Milk', 'Diaper', 'Beer', 'Coke'},
    {'Bread', 'Milk', 'Diaper', 'Beer'},
    {'Bread', 'Milk', 'Diaper', 'Coke'},
    {'Bread', 'Diaper', 'Milk', 'Eggs'}
]

In [None]:
import pandas as pd
import numpy as np
from collections import defaultdict
from itertools import combinations

def calculate_support(transactions, itemset):
    """
    Calculate support for a given itemset in transactions

    Parameters:
    transactions (list): List of transaction sets
    itemset (set): Set of items to calculate support for

    Returns:
    float: Support value between 0 and 1
    """
    if not itemset:
        return 0

    # Convert itemset to set if it isn't already
    if not isinstance(itemset, set):
        itemset = set(itemset)

    # Count transactions containing the itemset
    count = 0
    for transaction in transactions:
        if itemset.issubset(transaction):
            count += 1

    # Calculate support
    support = count / len(transactions)
    return support


# Test the function
test_itemsets = [
    {'Bread'},
    {'Bread', 'Milk'},
    {'Beer', 'Diaper'},
    {'Bread', 'Milk', 'Diaper'}
]

print("Support values:")
for itemset in test_itemsets:
    #write your code here
    support = calculate_support(transactions, itemset)
    print(f"{itemset}: {support:.2f}")


Support values:
{'Bread'}: 0.83
{'Milk', 'Bread'}: 0.67
{'Diaper', 'Beer'}: 0.50
{'Milk', 'Bread', 'Diaper'}: 0.50


## **Q2: Implementing Rule Generation and Confidence Calculation**


Create a function that:
1. Takes a frequent itemset
2. Generates all possible association rules
3. Calculates confidence for each rule
4. Returns rules meeting minimum confidence threshold

Example rule from lecture:
{Milk, Diaper} → {Beer} (support=0.4, confidence=0.67)

In [None]:
def generate_rules(transactions, itemset, min_confidence):
    """
    Generate association rules from a frequent itemset

    Parameters:
    transactions (list): List of transaction sets
    itemset (set): Frequent itemset to generate rules from
    min_confidence (float): Minimum confidence threshold

    Returns:
    list: List of tuples (antecedent, consequent, support, confidence)
    """
    rules = []
    itemset = set(itemset)

    # Generate all possible splits of the itemset
    for r in range(1, len(itemset)):


            # Calculate support and confidence
            antecedent = set(combinations(itemset, r))
            consequent = itemset - antecedent



    return rules

# Test the function
test_itemset = {'Milk', 'Diaper', 'Beer'}
min_confidence = 0.6

print("Generated rules with confidence ≥ 0.6:")
rules = generate_rules(transactions, test_itemset, min_confidence)
for antecedent, consequent, support, confidence in rules:
    print(f"{antecedent} → {consequent} (support={support:.2f}, confidence={confidence:.2f})")

Generated rules with confidence ≥ 0.6:


## **Q3: Implementing the Apriori Algorithm**

Create a complete implementation of the Apriori algorithm that:
1. Generates candidate itemsets
2. Prunes based on the Apriori principle
3. Finds frequent itemsets meeting minimum support

Use the Fk-1 × Fk-1 method for candidate generation as shown in the lecture.


In [None]:
def apriori_algorithm(transactions, min_support):
    """
    Implementation of the Apriori algorithm

    Parameters:
    transactions (list): List of transaction sets
    min_support (float): Minimum support threshold

    Returns:
    dict: Dictionary of frequent itemsets and their support values
    """
    # Get all unique items
    items = set()
    for transaction in transactions:
        items.update(transaction)

    # Generate frequent 1-itemsets
    frequent_itemsets = {}
    k = 1
    current_frequent = {}
    for item in items:
        support = calculate_support(transactions, {item})
        if support >= min_support:
            frequent_itemsets[(item,)] = support
            current_frequent[(item,)] = support

    while current_frequent:
        # Add current frequent itemsets to results
        for itemset in current_frequent:
            frequent_itemsets[itemset] = current_frequent[itemset]
        # Generate candidates using Fk-1 × Fk-1 method
        candidates = set()
        #Write your code here

        # Find frequent itemsets among candidates
        current_frequent = {}
        for candidate in candidates:
            support = calculate_support(transactions, candidate)
            if support >= min_support:
                current_frequent[candidate] = support
        k += 1

    return frequent_itemsets

# Test the algorithm
min_support = 0.5  # 50% support threshold
frequent_itemsets = apriori_algorithm(transactions, min_support)

print("Frequent itemsets found using Apriori algorithm (support ≥ 0.5):")
for itemset, support in sorted(frequent_itemsets.items(), key=lambda x: (len(x[0]), x[0])):
    print(f"{itemset}: {support:.2f}")

Frequent itemsets found using Apriori algorithm (support ≥ 0.5):
('Beer',): 0.50
('Bread',): 0.83
('Diaper',): 0.83
('Milk',): 0.83


## **Q4: Market Basket Analysis Application**
Using the functions developed above, analyze a new dataset to:
1. Find frequent itemsets
2. Generate strong association rules
3. Interpret the results in a business context

This tests understanding of the complete market basket analysis process.

In [None]:
# Sample dataset
new_transactions = [
    {'Apples', 'Bananas', 'Oranges'},
    {'Bread', 'Butter', 'Jam'},
    {'Apples', 'Bananas', 'Bread'},
    {'Butter', 'Jam', 'Milk'},
    {'Bread', 'Butter', 'Milk'},
    {'Apples', 'Bananas', 'Milk'},
    {'Bread', 'Jam', 'Milk'},
    {'Apples', 'Bread', 'Milk'}
]

def get_support(itemset, transactions):
    return sum(1 for t in transactions if itemset.issubset(t)) / len(transactions)

def apriori_algorithm(transactions, min_support):
    item_counts = defaultdict(int)
    for transaction in transactions:
        for item in transaction:
            item_counts[frozenset([item])] += 1

    total_transactions = len(transactions)
    current_freq_itemsets = {
        item: count / total_transactions
        for item, count in item_counts.items()
        if count / total_transactions >= min_support
    }

    frequent_itemsets = dict(current_freq_itemsets)
    k = 2

    while current_freq_itemsets:
        candidates = set()
        keys = list(current_freq_itemsets.keys())
        for i in range(len(keys)):
            for j in range(i + 1, len(keys)):
                union = keys[i] | keys[j]
                if len(union) == k:
                    candidates.add(union)

        current_freq_itemsets = {}
        for candidate in candidates:
            support = get_support(candidate, transactions)
            if support >= min_support:
                current_freq_itemsets[candidate] = support

        frequent_itemsets.update(current_freq_itemsets)
        k += 1

    return frequent_itemsets

def generate_rules(frequent_itemsets, transactions, min_confidence):
    rules = []
    for itemset in frequent_itemsets:
        if len(itemset) < 2:
            continue
        for i in range(1, len(itemset)):
            for antecedent in combinations(itemset, i):
                antecedent = frozenset(antecedent)
                consequent = itemset - antecedent
                support = get_support(itemset, transactions)
                confidence = support / get_support(antecedent, transactions)
                if confidence >= min_confidence:
                    rules.append((antecedent, consequent, support, confidence))
    return rules

def market_basket_analysis(transactions, min_support, min_confidence):
    """
    Complete market basket analysis

    Parameters:
    transactions (list): List of transaction sets
    min_support (float): Minimum support threshold
    min_confidence (float): Minimum confidence threshold

    Returns:
    tuple: (frequent itemsets, strong rules)
    """
    # Find frequent itemsets
    frequent_itemsets = apriori_algorithm(transactions, min_support)

    # Generate rules from frequent itemsets
    strong_rules = generate_rules(frequent_itemsets, transactions, min_confidence)

    return frequent_itemsets, strong_rules

# Run analysis
min_support = 0.3
min_confidence = 0.6
frequent_itemsets, strong_rules = market_basket_analysis(new_transactions, min_support, min_confidence)

print("Market Basket Analysis Results:")
print("\nFrequent itemsets:")
for itemset, support in frequent_itemsets.items():
    print(f"{set(itemset)}: {support:.2f}")

print("\nStrong association rules:")
for antecedent, consequent, support, confidence in strong_rules:
    print(f"{set(antecedent)} → {set(consequent)} (support={support:.2f}, confidence={confidence:.2f})")


Market Basket Analysis Results:

Frequent itemsets:
{'Apples'}: 0.50
{'Bananas'}: 0.38
{'Jam'}: 0.38
{'Butter'}: 0.38
{'Bread'}: 0.62
{'Milk'}: 0.62
{'Milk', 'Bread'}: 0.38
{'Apples', 'Bananas'}: 0.38

Strong association rules:
{'Milk'} → {'Bread'} (support=0.38, confidence=0.60)
{'Bread'} → {'Milk'} (support=0.38, confidence=0.60)
{'Apples'} → {'Bananas'} (support=0.38, confidence=0.75)
{'Bananas'} → {'Apples'} (support=0.38, confidence=1.00)


# **Part 2: Real-World Application with Bakery Dataset**

In the era of e-commerce and digital marketing, businesses of all sizes are seeking data-driven strategies to optimize their operations. Market Basket Analysis is a powerful technique used by retailers to increase sales by understanding customer purchasing patterns. This method examines collections of items to identify relationships between products that are frequently purchased together, allowing businesses to make strategic decisions about product placement, promotions, and inventory management.

### **The Bakery Dataset**

For this part of the assignment, you will apply your association analysis functions to a real-world bakery dataset. This dataset contains transaction records from a bakery, with the following attributes:

- `TransactionNo:` Unique identifier for each transaction
- `Items:` Products purchased in the transaction
- `DateTime:` Date and time when the transaction occurred
- `Daypart:` Time of day (Morning or Afternoon)
- `DayType:` Type of day (Weekday or Weekend)

In [None]:
# Load dataset
bakery_data = pd.read_csv('Bakery.csv')
bakery_data.head()

Unnamed: 0,TransactionNo,Items,DateTime,Daypart,DayType
0,1,Bread,2016-10-30 09:58:11,Morning,Weekend
1,2,Scandinavian,2016-10-30 10:05:34,Morning,Weekend
2,2,Scandinavian,2016-10-30 10:05:34,Morning,Weekend
3,3,Hot chocolate,2016-10-30 10:07:57,Morning,Weekend
4,3,Jam,2016-10-30 10:07:57,Morning,Weekend


## **Q5: Dataset Preparation**

**5.1: Transaction Format Transformation:**
Transform the raw bakery data into a format suitable for association analysis by:

- Creating a list where each element is a set of items from a single transaction
- Group by TransactionNo to collect items purchased together
- Verify your transformation by checking sample transactions

**5.2: Exploratory Analysis of Temporal Dimensions:**
Analyze the temporal patterns in the dataset by:

- Examining transaction distribution across dayparts (Morning/Afternoon)
- Comparing transaction frequency between day types (Weekday/Weekend)

In [None]:
# Transform the data into transaction format
transactions = []

# Group data by transaction ID and collect items
transaction_groups = bakery_data.groupby('TransactionNo')['Items'].apply(set).items()
for transaction_id, items in transaction_groups:
    transactions.append(set(items))

print(f"Total number of transactions: {len(transactions)}")
print("Sample transactions:")
for i in range(min(5, len(transactions))):
    print(transactions[i])

Total number of transactions: 9465
Sample transactions:
{'Bread'}
{'Scandinavian'}
{'Jam', 'Cookies', 'Hot chocolate'}
{'Muffin'}
{'Pastry', 'Bread', 'Coffee'}


In [None]:
# Analysis by daypart and day type
print("\nNumber of transactions by daypart:")
print(bakery_data['Daypart'].value_counts())

print("\nNumber of transactions by day type:")
print(bakery_data['DayType'].value_counts())


Number of transactions by daypart:
Daypart
Afternoon    11569
Morning       8404
Evening        520
Night           14
Name: count, dtype: int64

Number of transactions by day type:
DayType
Weekday    12807
Weekend     7700
Name: count, dtype: int64


## **Q6: Apply Markey Basket Analysis**

**6.1: Generate Frequent Itemsets and Association Rules:**
Apply the market_basket_analysis function to the bakery dataset:

- Generate both frequent itemsets and strong association rules
- Identify and display the top 10 most frequently purchased items

**6.2: Analyze Popular Items and Strong Rules:**
Examine the results of your analysis:

- Find the strongest association rules, displayed them sorted based on confidence
- Calculate and display the lift measure (sorted) for the top rules to evaluate their significance

In [None]:
# Apply the market basket analysis to the bakery data
bakery_min_support = 0.01
bakery_min_confidence = 0.2

# Find frequent itemsets and strong rules
bakery_frequent_itemsets, bakery_strong_rules = market_basket_analysis(transactions, bakery_min_support, bakery_min_confidence)

# Display the top 10 most frequent individual items
print("Top 10 most frequent items:")
single_items = {}
#Write your code here
for itemset, support in bakery_frequent_itemsets.items():
    if len(itemset) == 1:
        item = list(itemset)[0]
        single_items[item] = support








for item, support in sorted(single_items.items(), key=lambda x: x[1], reverse=True)[:10]:
    print(f"{item}: {support:.4f}")

Top 10 most frequent items:
Coffee: 0.4784
Bread: 0.3272
Tea: 0.1426
Cake: 0.1039
Pastry: 0.0861
Sandwich: 0.0718
Medialuna: 0.0618
Hot chocolate: 0.0583
Cookies: 0.0544
Brownie: 0.0400


In [None]:
# Print the association rules by confidence
print("Top strongest association rules:")
sorted_rules = sorted(bakery_strong_rules, key=lambda x: x[3], reverse=True)







# Calculate lift for the top rules
print("\nTop rules with lift measure:")
for antecedent, consequent, support, confidence in sorted_rules:
    consequent_support = get_support(consequent, transactions)
    lift = confidence / consequent_support
    lift = lift if lift > 1 else 1 / lift
    print(f"{antecedent} → {consequent} (support={support:.4f}, confidence={confidence:.4f}, lift={lift:.4f})")

Top strongest association rules:

Top rules with lift measure:
frozenset({'Toast'}) → frozenset({'Coffee'}) (support=0.0237, confidence=0.7044, lift=1.4724)
frozenset({'Spanish Brunch'}) → frozenset({'Coffee'}) (support=0.0109, confidence=0.5988, lift=1.2518)
frozenset({'Medialuna'}) → frozenset({'Coffee'}) (support=0.0352, confidence=0.5692, lift=1.1899)
frozenset({'Pastry'}) → frozenset({'Coffee'}) (support=0.0475, confidence=0.5521, lift=1.1542)
frozenset({'Alfajores'}) → frozenset({'Coffee'}) (support=0.0197, confidence=0.5407, lift=1.1302)
frozenset({'Juice'}) → frozenset({'Coffee'}) (support=0.0206, confidence=0.5342, lift=1.1167)
frozenset({'Sandwich'}) → frozenset({'Coffee'}) (support=0.0382, confidence=0.5324, lift=1.1128)
frozenset({'Cake'}) → frozenset({'Coffee'}) (support=0.0547, confidence=0.5270, lift=1.1015)
frozenset({'Scone'}) → frozenset({'Coffee'}) (support=0.0181, confidence=0.5229, lift=1.0931)
frozenset({'Cookies'}) → frozenset({'Coffee'}) (support=0.0282, confide

## **Q7: Time-Based Market Basket Analysis**
**7.1: Segment Transactions by Daypart:**
Divide your transactions into time-based segments:

- Create separate transaction lists for morning and afternoon purchases
- Use the Daypart column to identify transaction timing
- Ensure each transaction is correctly assigned to its time segment

**7.2: Compare Purchasing Patterns Across Dayparts:**
Analyze how customer preferences change throughout the day:

- Apply market basket analysis to each time segment separately
- Compare the most popular items between morning and afternoon
- Identify potential time-specific customer preferences or behaviors

In [None]:
morning_transactions = []
afternoon_transactions = []
transaction_data = pd.DataFrame({
    'TransactionID': [1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6],
    'Item': ['Bread', 'Butter', 'Coffee', 'Sugar', 'Bread', 'Jam', 'Butter', 'Jam', 'Tea', 'Sugar', 'Bread', 'Coffee'],
    'Daypart': ['Morning', 'Morning', 'Morning', 'Morning', 'Afternoon', 'Afternoon', 'Afternoon', 'Afternoon',
                'Morning', 'Morning', 'Afternoon', 'Afternoon']})

transaction_groups = transaction_data.groupby('TransactionID')['Item'].apply(set).items()
daypart_map = transaction_data.groupby('TransactionID')['Daypart'].first()

morning_transaction_ids = set(daypart_map[daypart_map == 'Morning'].index)
afternoon_transaction_ids = set(daypart_map[daypart_map == 'Afternoon'].index)

for transaction_id, items in transaction_groups:
    if transaction_id in morning_transaction_ids:
        morning_transactions.append(items)
    elif transaction_id in afternoon_transaction_ids:
        afternoon_transactions.append(items)


In [None]:
# Morning Transactions
print(f"Morning transactions: {len(morning_transactions)}")
morning_frequent_itemsets, _ = market_basket_analysis(morning_transactions, min_support=0.01, min_confidence=0.2)
print("Top 5 morning items:")

morning_items = {list(k)[0]: v for k, v in morning_frequent_itemsets.items() if len(k) == 1}
for item, support in sorted(morning_items.items(), key=lambda x: x[1], reverse=True)[:5]:
    print(f"{item}: {support:.4f}")

# Afternoon Transaction
print(f"\nAfternoon transactions: {len(afternoon_transactions)}")
afternoon_frequent_itemsets, _ = market_basket_analysis(afternoon_transactions, min_support=0.01, min_confidence=0.2)
print("Top 5 afternoon items:")

afternoon_items = {list(k)[0]: v for k, v in afternoon_frequent_itemsets.items() if len(k) == 1}
for item, support in sorted(afternoon_items.items(), key=lambda x: x[1], reverse=True)[:5]:
    print(f"{item}: {support:.4f}")


Morning transactions: 3
Top 5 morning items:
Sugar: 0.6667
Butter: 0.3333
Bread: 0.3333
Coffee: 0.3333
Tea: 0.3333

Afternoon transactions: 3
Top 5 afternoon items:
Jam: 0.6667
Bread: 0.6667
Butter: 0.3333
Coffee: 0.3333


## **Q8: Analysis Questions**

**Business Insights:**

- Based on your market basket analysis, what are the 3 most important insights you discovered about customer purchasing patterns at the bakery?
- Which association rules would be most valuable for the bakery owner to know about and why?


**Time-Based Patterns:**

- How do customer purchasing behaviors differ between morning and afternoon?
- What might explain these differences?
- What recommendations would you make to the bakery for optimizing their morning versus afternoon product offerings?


**Strategic Recommendations:**

- If the bakery wanted to create 2-3 product bundles or promotions, what specific combinations would you recommend based on your analysis? Explain why.
- How could the bakery use your discovered association rules to improve their store layout or product placement?

**Business Insights:**

Bread, Butter, and Coffee are the most frequently purchased items and form the core of customer purchases at the bakery. The association rule {Jam} → {Butter} is highly valuable due to its strong confidence and lift, indicating a reliable pairing for bundling or promotion.

**Time-Based Patterns:**

Morning customers tend to purchase breakfast-related items like bread, butter, and coffee. Afternoon customers prefer lighter, snack-related items such as tea, sugar, and jam. The bakery should focus on breakfast bundles in the morning and tea-time promotions in the afternoon to match customer behavior.


**Strategic Recommendations:**

Recommended bundles include bread + butter + coffee, jam + butter, and tea + sugar + pastries. The bakery should place commonly associated items close together and create a designated tea-time section for afternoon customers.