# Product Analysis: Two-Product Combinations (Excluding CIGARETTES)

This notebook performs an analysis of two-product combinations in invoices, excluding products from the **CIGARETTES** megacategory. Additionally, the average revenue and quantity of **WAFERS** are calculated for each store type.

In [1]:
# Import Libraries
import pandas as pd
from collections import defaultdict

## Step 1: Load Data
We begin by loading the data from the provided Excel file. The file contains three sheets: **Categories**, **Stores**, and **Transactions**. These sheets will be loaded into Pandas DataFrames.

In [2]:
# Read Excel sheets
file_path = 'University Dataset 2024.xlsx'

df_categories = pd.read_excel(file_path, sheet_name='Categories')
df_stores = pd.read_excel(file_path, sheet_name='Stores')
df_transactions = pd.read_excel(file_path, sheet_name='Transactions')

## Step 2: Exclude 'CIGARETTES' Megacategory
We identify and exclude subcategories that belong to the **CIGARETTES** megacategory, as it dominates the transactions.

In [3]:
# Identify subcategories belonging to the 'CIGARETTES' Megacategory
excluded_subcategories = df_categories[df_categories['Megacategory'] == 'CIGARETTES']['Subcategory'].tolist()

# Exclude transactions containing these subcategories
df_filtered_transactions = df_transactions[~df_transactions['Subcategory'].isin(excluded_subcategories)]

## Step 3: Merge Transactions with Store Types
We now merge the filtered transactions with the store types information from the **Stores** sheet.

In [4]:
# Merge Transactions with Store Types
df_merged = df_filtered_transactions.merge(df_stores, on='StoreId', how='left')

## Step 4: Analyze Two-Product Combinations Per Store Type
We analyze the transactions to find two-product combinations in invoices. For each store type, we extract invoices containing exactly two products and count the frequency of each unique combination of products.

In [5]:
# Dictionary to store results per store type
store_type_results = {}

# Analyze transactions for each store type
for store_type, df_store in df_merged.groupby('StoreType'): 
    # Get invoices with exactly two products
    invoice_counts = df_store.groupby('InvoiceGlobalId')['Subcategory'].count()
    two_product_invoices = invoice_counts[invoice_counts == 2].index
    
    df_filtered = df_store[df_store['InvoiceGlobalId'].isin(two_product_invoices)]
    
    # Group by invoice and get product pairs
    basket_combinations = df_filtered.groupby('InvoiceGlobalId')['Subcategory'].agg(list)
    product_pairs = defaultdict(int)

    for basket in basket_combinations:
        if len(basket) == 2:
            pair = tuple(sorted(basket))  # Sort to avoid duplicates
            product_pairs[pair] += 1

    # Create DataFrame from results
    chemistry_df = pd.DataFrame([(k[0], k[1], v) for k, v in product_pairs.items()],
                                columns=['Product1', 'Product2', 'Frequency'])
    
    chemistry_df = chemistry_df.sort_values('Frequency', ascending=False)
    store_type_results[store_type] = chemistry_df

    # Print top results for each store type
    print(f"\nMost Frequent Two-Product Combinations in {store_type} (Excluding CIGARETTES):")
    print(chemistry_df.head(10))


Most Frequent Two-Product Combinations in Kiosk (Excluding CIGARETTES):
           Product1        Product2  Frequency
527    CHEWING GUMS          WAFERS       1005
35     ENERGY DRINK  NATURAL WATER         518
88            COLAS          WAFERS        501
161    ENERGY DRINK          WAFERS        493
440       CROISSANT          WAFERS        480
156  NATURAL WATER           WAFERS        456
19            COLAS  NATURAL WATER         422
78            BEERS          CHIPS         394
66            BEERS           COLAS        392
53            COLAS         FLAVORS        363

Most Frequent Two-Product Combinations in Mini-Market (Excluding CIGARETTES):
         Product1        Product2  Frequency
8    ENERGY DRINK  NATURAL WATER         655
9           BEERS          CHIPS         586
550  CHEWING GUMS  NATURAL WATER         576
127        CHIPS        EXTRUDED         484
70          COLAS         TABLETS        467
122         COLAS         FLAVORS        467
59          BEER

WAFERS appear prominently in the most frequent two-product combinations in **Kiosks**, but they do not show up in **Mini-Markets**. In Kiosk transactions, they are frequently paired with products like Chewing Gums, Energy Drinks, and Colas. These combinations suggest that WAFERS are a popular impulse item in Kiosks, particularly alongside snacks and drinks.

## Step 5: Calculate Average Revenue and Quantity for WAFERS
For each store type, we calculate the average revenue and quantity for **WAFERS** products.

In [6]:
# Calculate average revenue and quantity of WAFERS per store type
for store_type, df_store in df_merged.groupby('StoreType'): 
    wafers_transactions = df_store[df_store['Subcategory'] == 'WAFERS']
    avg_revenue = wafers_transactions['Revenue'].mean()
    avg_quantity = wafers_transactions['Quantity'].mean()
    
    print(f"\nAverage Revenue of WAFERS in {store_type}: {avg_revenue:.2f}")
    print(f"Average Quantity of WAFERS in {store_type}: {avg_quantity:.2f}")


Average Revenue of WAFERS in Kiosk: 0.97
Average Quantity of WAFERS in Kiosk: 1.78

Average Revenue of WAFERS in Mini-Market: 1.41
Average Quantity of WAFERS in Mini-Market: 1.56


In the **Kiosk**, the average revenue for WAFERS is relatively low compared to other products, but they are sold in larger quantities. In **Mini-Markets**, WAFERS do not appear in the top combinations, which suggests they are either not as popular or are positioned differently within the store.

### Conclusion

For **Kiosks**, businesses should continue focusing on **high-volume sales** for WAFERS, possibly through **bundles or combo offers** with other frequently purchased items like Chewing Gums, Energy Drinks, and Colas. These combos could be priced attractively to encourage impulse buys and increase the volume of WAFERS sold.

For **Mini-Markets**, even though WAFERS aren’t part of the most common product combinations, it’s still worth considering **promotions or bundles** that include WAFERS with products that are often paired with them in Kiosks. This approach will not only make WAFERS more appealing to customers by offering better value but also encourage cross-selling with complementary products that drive additional sales. By offering similar pairings like **WAFERS with Chewing Gums** or **WAFERS with Energy Drinks**, Mini-Markets could potentially stimulate demand for WAFERS and tap into the customer base that already purchases these combinations in Kiosks. This strategy could help boost WAFERS sales and increase overall store traffic.