# CS634 – Data Mining Midterm Project
### Author: Taymar Walters

This notebook is a breakdown on the code execution for my data mining project using:
- A **Brute Force implementation (from scratch)**
- **Apriori** (mlxtend)
- **FP-Growth** (mlxtend)

It shows:
1. Dataset loading
2. Algorithm execution
3. Frequent itemsets and rules



## Import Packages
This is part of the code will install all the needed packages includings the ones that are missing.

In [None]:
import itertools
import subprocess
import sys
import os
import pandas as pd

def install_if_missing(package):
    try:
        __import__(package)
    except ImportError:
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])

for pkg in ["pandas", "mlxtend"]:
    install_if_missing(pkg)

from mlxtend.frequent_patterns import apriori, fpgrowth, association_rules
from mlxtend.preprocessing import TransactionEncoder

## Load and Preview Dataset
Here we ask the user to select a transactional dataset and the unique list of items corressponding to that dataset. All of them are saved as CSV files. For this execution, we will use `generic_transactions.csv` and the corresponding `generic_items.csv` file for demonstration.

In [None]:
# Prompting the User to select a dataset
print("Here are the following transactional databases\n"
      " 1) Generic\n 2) Nike\n 3) Best Buy\n 4) Coffee Shop\n 5) K-mart\n ")
items = ""
def selectfile():
    while True:
        try:
            fileNumber = int(input("Enter number to select a database: \n"))
            match fileNumber:
                case 1:
                    items = "generic_items.csv"
                    return "generic_transactions.csv", items
                case 2:
                    items = "nike_products.csv"
                    return "nike_product_transactions.csv", items
                case 3:
                    items = "bestbuy_products.csv"
                    return "bestbuy_transactions.csv", items
                case 4:
                    items = "coffee_items.csv"
                    return "coffee_transactions.csv", items
                case 5:
                    items = "k-mart_items.csv"
                    return "k-mart_transactions.csv", items
                case _:
                    print("Invalid input. Please try again.")
        except ValueError:
            print("Please enter a valid number between 1–5.")
# Get file path for the individual items dataset
transactions, items = selectfile()
base_path = os.path.dirname(os.path.abspath(__file__))
file_path = os.path.join(base_path, items)
# print the list of seperate items
print("=============================================================")
print("Here are the unique items corresponding to the transactions:")
print("=============================================================")
df = pd.read_csv(file_path)
df = df.dropna(how='all')
df.columns = df.columns.str.strip()
df["Item #"] = df["Item #"].astype(int)
print(df.to_string(index=False))
print("================================================")
# Get file path for the transaction dataset
base_path = os.path.dirname(os.path.abspath(__file__))
file_path = os.path.join(base_path, transactions)
# Columns that might represent transactions
possible_cols = ["transaction", "transactions", "items", "basket"]

try:
    # Try to read the CSV normally, then fallback to alternate delimiter
    try:
        data = pd.read_csv(file_path)
    except Exception:
        data = pd.read_csv(file_path, delimiter=';')

    data.columns = data.columns.str.strip().str.lower()

    # Detect transaction column dynamically
    target_col = None
    for col in possible_cols:
        if col in data.columns:
            target_col = col
            break

    if target_col is None:
        raise KeyError("❌ No valid transaction column found in this file.")

    transactions = [
        str(t).replace(" ", "").split(",")
        for t in data[target_col]
        if pd.notna(t)
    ]

except FileNotFoundError:
    raise FileNotFoundError(f"❌ File not found: {file_path}")
except Exception as e:
    raise RuntimeError(f"⚠️ Error loading data: {e}")

all_items = sorted(set(item for sublist in transactions for item in sublist))

## User Input for Support & Confidence
This is where the program will now ask the user to enter a value for both the minimum support and confidence while checking for valid inputs.

**Default Parameters:**

*Minimum Support* = 0.3

*Minimum Confidence* = 0.6

In [None]:
try:
    min_support = float(input("Enter minimum support (e.g., 0.3 for 30%): "))
except ValueError:
    print("Invalid input. Using default min_support = 0.3")
    min_support = 0.3

try:
    min_confidence = float(input("Enter minimum confidence (e.g., 0.6 for 60%): "))
except ValueError:
    print("Invalid input. Using default min_confidence = 0.6")
    min_confidence = 0.6

print(f"\nUsing min_support = {min_support} and min_confidence = {min_confidence}\n")

## Running Brute Force for frequent itemset mining

In [None]:
def get_support(itemset, transactions):
    """Compute support count for a given itemset."""
    return sum(1 for t in transactions if set(itemset).issubset(set(t)))

def brute_force_mining(transactions, min_support):
    num_transactions = len(transactions)
    frequent_itemsets = []
    k = 1

    while True:
        candidates = [list(i) for i in itertools.combinations(all_items, k)]
        level_frequent = []
        for c in candidates:
            support = get_support(c, transactions) / num_transactions
            if support >= min_support:
                level_frequent.append((tuple(c), support))

        if not level_frequent:
            break

        frequent_itemsets.extend(level_frequent)
        print(f"Found {len(level_frequent)} frequent {k}-itemsets")
        k += 1

    return frequent_itemsets



# Generate Association Rules from Brute Force

In [None]:
print("\nRunning Brute-Force Algorithm...")
start_brute = time.time()
frequent_itemsets_brute = brute_force_mining(transactions, min_support)
rules_brute = []
rules_brute_start = time.time()

def generate_rules(frequent_itemsets, min_confidence, transactions):
    num_transactions = len(transactions)
    rules = []
    for itemset, support in frequent_itemsets:
        if len(itemset) < 2:
            continue
        for i in range(1, len(itemset)):
            for antecedent in itertools.combinations(itemset, i):
                consequent = tuple(set(itemset) - set(antecedent))
                sup_itemset = get_support(itemset, transactions) / num_transactions
                sup_antecedent = get_support(antecedent, transactions) / num_transactions
                confidence = sup_itemset / sup_antecedent if sup_antecedent > 0 else 0
                if confidence >= min_confidence:
                    rules.append((antecedent, consequent, sup_itemset, confidence))
    return rules

rules_brute = generate_rules(frequent_itemsets_brute, min_confidence, transactions)
end_brute = time.time()
brute_force_time = end_brute - start_brute
rules_brute = generate_rules(frequent_itemsets_brute, min_confidence, transactions)


## Apriori and FP-Growth Execution


In [None]:
import warnings
warnings.filterwarnings("ignore", category=RuntimeWarning)

print("\nRunning Apriori Algorithm...")
start_apriori = time.time()
frequent_itemsets_ap = apriori(one_hot, min_support=min_support, use_colnames=True)
rules_ap = association_rules(frequent_itemsets_ap, metric="confidence", min_threshold=min_confidence)
rules_ap = rules_ap.dropna()
rules_ap = rules_ap[(rules_ap['support'] > 0) & (rules_ap['confidence'] > 0)]
end_apriori = time.time()
apriori_time = end_apriori - start_apriori

print("Running FP-Growth Algorithm...")
start_fp = time.time()
frequent_itemsets_fp = fpgrowth(one_hot, min_support=min_support, use_colnames=True)
rules_fp = association_rules(frequent_itemsets_fp, metric="confidence", min_threshold=min_confidence)
rules_fp = rules_fp.dropna()
rules_fp = rules_fp[(rules_fp['support'] > 0) & (rules_fp['confidence'] > 0)]
end_fp = time.time()
fp_growth_time = end_fp - start_fp

## Generating and Displaying Association Rules with ALL 3 Algorithms.

In [None]:
print("\n\n================================================")
print("🔹 FREQUENT ITEMS FOUND BY BRUTE FORCE:")
print("================================================")
for items, sup in frequent_itemsets_brute:
    print(f"{items} | support: {sup:.2f}")

print("\n\n================================================")
print("🔸 ASSOCIATION RULES — BRUTE FORCE")
print("================================================")
for ant, cons, sup, conf in rules_brute:
    print(f"{ant} → {cons} (support: {sup:.2f}, confidence: {conf:.2f})")

print("\n\n================================================")
print("🔸 ASSOCIATION RULES — APRIORI")
print("================================================")
for _, row in rules_ap.iterrows():
    print(f"{tuple(row['antecedents'])} → {tuple(row['consequents'])} "
          f"(support: {row['support']:.2f}, confidence: {row['confidence']:.2f})")

print("\n\n================================================")
print("🔸 ASSOCIATION RULES — FP-GROWTH")
print("================================================")
for _, row in rules_fp.iterrows():
    print(f"{tuple(row['antecedents'])} → {tuple(row['consequents'])} "
          f"(support: {row['support']:.2f}, confidence: {row['confidence']:.2f})")


##Displaying Timing Summary

In [None]:
print("\n\n================================================")
print("⏱️ EXECUTION TIME SUMMARY (seconds)")
print("================================================")
print(f"Brute-Force Algorithm: {brute_force_time:.4f} sec")
print(f"Apriori Algorithm:     {apriori_time:.4f} sec")
print(f"FP-Growth Algorithm:   {fp_growth_time:.4f} sec")
print("================================================")

print("\n✅ All algorithms executed successfully!")

##OUTPUT:
This is what the output of the program looks like all together when selecting the `generic_transactions.csv` as an example.

In [None]:
Here are the following transactional databases
 (1) Generic
 (2) Nike
 (3) Best Buy
 (4) Coffee Shop
 (5) K-mart

Enter number to select a database:
1
=============================================================
Here are the unique items corresponding to the transactions:
=============================================================
 Item # Item
      1    A
      2    B
      3    C
      4    D
      5    E
      6    F
================================================
Enter minimum support (e.g., 0.3 for 30%): 0.3
Enter minimum confidence (e.g., 0.6 for 60%): 0.6

Using min_support = 0.3 and min_confidence = 0.6


Running Brute-Force Algorithm...
Found 5 frequent 1-itemsets
Found 6 frequent 2-itemsets
Found 2 frequent 3-itemsets

Running Apriori Algorithm...
Running FP-Growth Algorithm...


================================================
🔹 FREQUENT ITEMS FOUND BY BRUTE FORCE:
================================================
('A',) | support: 1.00
('B',) | support: 0.40
('C',) | support: 0.60
('D',) | support: 0.45
('E',) | support: 0.70
('A', 'B') | support: 0.40
('A', 'C') | support: 0.60
('A', 'D') | support: 0.45
('A', 'E') | support: 0.70
('C', 'D') | support: 0.30
('C', 'E') | support: 0.35
('A', 'C', 'D') | support: 0.30
('A', 'C', 'E') | support: 0.35


================================================
🔸 ASSOCIATION RULES — BRUTE FORCE
================================================
('B',) → ('A',) (support: 0.40, confidence: 1.00)
('A',) → ('C',) (support: 0.60, confidence: 0.60)
('C',) → ('A',) (support: 0.60, confidence: 1.00)
('D',) → ('A',) (support: 0.45, confidence: 1.00)
('A',) → ('E',) (support: 0.70, confidence: 0.70)
('E',) → ('A',) (support: 0.70, confidence: 1.00)
('D',) → ('C',) (support: 0.30, confidence: 0.67)
('D',) → ('C', 'A') (support: 0.30, confidence: 0.67)
('A', 'D') → ('C',) (support: 0.30, confidence: 0.67)
('C', 'D') → ('A',) (support: 0.30, confidence: 1.00)
('C', 'E') → ('A',) (support: 0.35, confidence: 1.00)


================================================
🔸 ASSOCIATION RULES — APRIORI
================================================
('B',) → ('A',) (support: 0.40, confidence: 1.00)
('C',) → ('A',) (support: 0.60, confidence: 1.00)
('A',) → ('C',) (support: 0.60, confidence: 0.60)
('D',) → ('A',) (support: 0.45, confidence: 1.00)
('E',) → ('A',) (support: 0.70, confidence: 1.00)
('A',) → ('E',) (support: 0.70, confidence: 0.70)
('D',) → ('C',) (support: 0.30, confidence: 0.67)
('D', 'C') → ('A',) (support: 0.30, confidence: 1.00)
('D', 'A') → ('C',) (support: 0.30, confidence: 0.67)
('D',) → ('C', 'A') (support: 0.30, confidence: 0.67)
('E', 'C') → ('A',) (support: 0.35, confidence: 1.00)


================================================
🔸 ASSOCIATION RULES — FP-GROWTH
================================================
('C',) → ('A',) (support: 0.60, confidence: 1.00)
('A',) → ('C',) (support: 0.60, confidence: 0.60)
('E', 'C') → ('A',) (support: 0.35, confidence: 1.00)
('B',) → ('A',) (support: 0.40, confidence: 1.00)
('D',) → ('A',) (support: 0.45, confidence: 1.00)
('D',) → ('C',) (support: 0.30, confidence: 0.67)
('D', 'C') → ('A',) (support: 0.30, confidence: 1.00)
('D', 'A') → ('C',) (support: 0.30, confidence: 0.67)
('D',) → ('C', 'A') (support: 0.30, confidence: 0.67)
('E',) → ('A',) (support: 0.70, confidence: 1.00)
('A',) → ('E',) (support: 0.70, confidence: 0.70)


================================================
⏱️ EXECUTION TIME SUMMARY (seconds)
================================================
Brute-Force Algorithm: 0.0130 sec
Apriori Algorithm:     0.0000 sec
FP-Growth Algorithm:   0.0202 sec
================================================

✅ All algorithms executed successfully!

## Conclusion
All three algorithms produced the same frequent itemsets and rules, confirming correctness.
- **Brute Force** verified the logic of the libraries.
- **Apriori** was efficient for smaller datasets.
- **FP-Growth** was the fastest overall.

