# Study Design: Customer Segmentation

This notebook defines the **2-Track Study Design** for customer segmentation research.

## Table of Contents

1. [Overview & Framework](#1.-Overview-&-Framework)
2. [Track Relationships & Practical Usage](#2.-Track-Relationships-&-Practical-Usage)
3. [Track 1: Customer Understanding](#3.-Track-1:-Customer-Understanding)
   - 3.1 Methods
   - 3.2 Operational Definitions
   - 3.3 Features
   - 3.4 Evaluation
4. [Track 2: Causal Targeting](#4.-Track-2:-Causal-Targeting)
   - 4.1 Methods
   - 4.2 Operational Definitions (2 Scenarios)
   - 4.3 Causal Graph
   - 4.4 Features
   - 4.5 Evaluation
5. [Period & Validation Design](#5.-Period-&-Validation-Design)
6. [Implementation Plan](#6.-Implementation-Plan)
7. [Cohort Definition & Comparison](#7.-Cohort-Definition-&-Comparison)

---

## 1. Overview & Framework

### 1.1 Framework Structure

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                         2-TRACK STUDY DESIGN                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  TRACK 1: CUSTOMER UNDERSTANDING (Descriptive)                              │
│  ═══════════════════════════════════════════════                            │
│                                                                             │
│  Step 1.1: Customer Profiling                                                      │
│    - Factor Analysis (NMF) → Discover latent dimensions                     │
│    - Clustering → Derive base segments                                      │
│                                                                             │
│  Step 1.2: Value × Need Integration                                         │
│    - Value layer: CLV, Engagement                                           │
│    - Need layer: Behavior, Category preference                              │
│    - Integration: Value-Need Matrix or 2D Segmentation                      │
│                                                                             │
│  Output: "To whom (Value) should we offer what (Need)?"                     │
│                                                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  TRACK 2: CAUSAL TARGETING                                                  │
│  ═════════════════════════                                                  │
│                                                                             │
│  Step 2.1: HTE Analysis                                                     │
│    - Campaign effect heterogeneity                                          │
│    - Covariates: Track 1 segments + raw features                            │
│                                                                             │
│  Step 2.2: Optimal Policy                                                   │
│    - Targeting rules                                                        │
│    - Policy value estimation                                                │
│                                                                             │
│  Output: "How do we optimize targeting?"                                    │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘
```

### 1.2 Design Rationale

The 2-Track approach separates:
- **Descriptive analysis** (understanding customers) from
- **Causal analysis** (optimizing interventions)

This separation enables:
1. Different teams to work on appropriate tracks based on expertise
2. Track 1 to provide immediate marketing value
3. Track 2 to build on Track 1 segments for causal insights

---

## 2. Track Relationships & Practical Usage

### 2.1 Track Relationships

```
Step 1.1 (Customer Profiling) ──────────────────────────┐
    │                                            │
    │ (use if useful, otherwise independent)     │ (selective factor usage)
    ↓                                            ↓
Step 1.2 (Value × Need)                   Track 2: Steps 2.1-2.2
    │                                            ↑
    │ (sequential: Track 1 first)                │
    └─────── segments as moderators ─────────────┘
```

**Key Dependencies:**
- **Step 1.1 → Step 1.2**: Use NMF factors if useful, otherwise independent analysis
- **Track 1 → Track 2**: Sequential execution (complete Track 1 before Track 2)
- **Cross-Track**: Track 1 segments serve as HTE moderators in Track 2

### 2.2 Practical Usage Comparison

| Aspect | Track 1 (Descriptive) | Track 2 (Causal) |
|--------|----------------------|------------------|
| **Core Question** | "Who is this customer?" | "Will this intervention work for them?" |
| **Primary Users** | Marketing, CRM, Strategy | Data Science, Optimization |
| **Usage Timing** | Campaign planning, Strategy | Campaign execution, A/B optimization |
| **Difficulty** | Low | High |
| **Explainability** | "Premium Fresh Lover segment" | "CATE = 0.15 for this customer" |
| **Usage Frequency** | High (daily operations) | Medium (specific scenarios) |
| **Org Requirements** | Basic analytics capability | Causal thinking culture required |

### 2.3 Marketer's Perspective

**Track 1 (Almost certainly utilized):**
- Intuitive: "High-Value + Deal Seeker" → immediately understandable
- Easy to act on: Select segment → Design offer
- Easy to communicate: Explainable to executives and other departments
- Stable: Segments don't change frequently

**Track 2 (Depends on organizational maturity):**
- Requires conceptual understanding: "Treatment effect", "Uplift"
- Requires experimentation infrastructure: A/B testing systems
- Interpretation gap: Numbers need translation to actionable insights
- High value: Enables ROI optimization when properly implemented

---

## 3. Track 1: Customer Understanding

### 3.1 Methods

#### Step 1.1: Customer Profiling Analysis

**Objective:** Discover latent customer dimensions and derive base segments.

**Method:**
1. **Non-negative Matrix Factorization (NMF)**
   - Input: All 33 customer features
   - Output: Latent factor scores and loadings
   - Interpretation: Name factors based on loading patterns

2. **Clustering**
   - Algorithm: K-Means or GMM
   - Input: NMF factor scores (or raw features)
   - Output: Base customer segments

**Research Questions:**
- What are the latent dimensions explaining customer behavior?
- What customer types exist in the data?

#### Step 1.2: Value × Need Integration (Optional)

**Objective:** Create actionable segments combining customer value and needs.

**Why This Step May Be Optional After Step 1.1:**

Step 1.1 (Customer Profiling) already captures both Value and Need dimensions through NMF:
- **Value factors**: F2 (Loyal Regular), F3 (Big Basket) capture customer worth
- **Need factors**: F4 (Fresh Focused), F5 (Health & Beauty) capture preferences
- **Bubble charts**: Enable 2D comparison of any factor pair

**Limitation of Single-Space NMF:**
```
Current (1.1):                    Alternative (1.2):
Customer → [All Features]         Customer → [Value Features] → Value Tier
         → NMF                             → [Need Features]  → Need Type
         → Segment                         → (Value, Need) Matrix
```

Within Seg1 (VIP), we cannot easily distinguish:
- VIP + Fresh preference
- VIP + H&B preference

Cross-sell targeting for "same value tier, different needs" is limited.

**What Value × Need Would Provide:**
1. **Explicit 2-axis separation**: Direct answer to "To whom (Value)" × "What (Need)"
2. **Cross-sell targeting**: Identify need expansion opportunities within same Value Tier
3. **Hierarchical strategy**: Value-based prioritization + Need-based offer design

**Value Layer (RFM + Engagement):**
- Core features: recency, frequency, monetary_sales
- Extended: monetary_avg_basket_sales, purchase_regularity, week_coverage
- CLV estimation: Historical, RFM-based, or Predictive (BG/NBD, Gamma-Gamma)

**Need Layer (Behavioral + Category):**
- Behavioral: discount_rate, discount_usage_pct, private_label_ratio, n_departments
- Category: share_grocery, share_fresh, share_bakery, share_health_beauty, share_alcohol

**Integration Options:**
- Option A: Value-Need Matrix (Value tier × Need cluster)
- Option B: Sequential segmentation (Value tier → Need cluster within tier)

**Recommendation:** Proceed to Track 2 (Causal Targeting). Revisit Value × Need if cross-sell optimization becomes a priority.

### 3.2 Operational Definitions

| Item | Definition |
|------|------------|
| **Analysis Unit** | `household_key` (customer) |
| **Cohort** | Customers with at least 1 purchase in Week 1-102 (~2,500) |
| **Observation Period** | Week 1-102 (Full 102 weeks) |
| **Features** | 33 base features (RFM, Behavioral, Category, Time) |
| **Output** | Customer segments, Factor scores |

### 3.3 Features

#### Base Features (33)

| Group | Count | Features |
|-------|-------|----------|
| **Recency** | 6 | recency, recency_weeks, active_last_4w, active_last_12w, days_between_purchases_avg, days_between_purchases_std |
| **Frequency** | 6 | frequency, frequency_per_week, frequency_per_month, transaction_count, weeks_with_purchase, purchase_regularity |
| **Monetary** | 7 | monetary_sales, monetary_actual, monetary_avg_basket_sales, monetary_avg_basket_actual, monetary_std, monetary_per_week, coupon_savings_ratio |
| **Behavioral** | 7 | discount_rate, discount_usage_pct, private_label_ratio, n_departments, n_products, avg_items_per_basket, avg_products_per_basket |
| **Category** | 6 | share_grocery, share_fresh, share_bakery, share_health_beauty, share_alcohol, share_other |
| **Time** | 1 | week_coverage |

#### Feature Allocation

| Step | Feature Scope | Count |
|------|--------------|-------|
| **Step 1.1 Customer Profiling** | All 33 features → NMF | 33 |
| **Step 1.2a Value** | RFM + Engagement | ~21 |
| **Step 1.2b Need** | Behavioral + Category | 13 |

### 3.3.1 Feature Definitions (Detailed)

#### Track 1: Base Features (33)

| Category | Subcategory | Variable | Description | Formula |
|----------|-------------|----------|-------------|---------|
| **RFM** | Recency | `recency` | Days since last purchase | max(DAY) - last_purchase_day |
| RFM | Recency | `recency_weeks` | Weeks since last purchase | max(WEEK) - last_purchase_week |
| RFM | Recency | `active_last_4w` | Active in last 4 weeks (binary) | 1 if last_week >= max_week - 4 else 0 |
| RFM | Recency | `active_last_12w` | Active in last 12 weeks (binary) | 1 if last_week >= max_week - 12 else 0 |
| RFM | Recency | `days_between_purchases_avg` | Average days between purchases | mean(diff(purchase_days)) |
| RFM | Recency | `days_between_purchases_std` | Std dev of days between purchases | std(diff(purchase_days)) |
| **RFM** | Frequency | `frequency` | Total number of visits (baskets) | nunique(BASKET_ID) |
| RFM | Frequency | `frequency_per_week` | Average visits per week | frequency / tenure_weeks |
| RFM | Frequency | `frequency_per_month` | Average visits per month | frequency_per_week × 4.33 |
| RFM | Frequency | `transaction_count` | Total number of transactions | count(transactions) |
| RFM | Frequency | `weeks_with_purchase` | Number of weeks with purchases | nunique(WEEK_NO) |
| RFM | Frequency | `purchase_regularity` | Purchase regularity (0-1) | weeks_with_purchase / tenure_weeks |
| **RFM** | Monetary | `monetary_sales` | Total sales (after discount) | sum(SALES_VALUE) |
| RFM | Monetary | `monetary_actual` | Total actual payment | sum(ACTUAL_SPENT) |
| RFM | Monetary | `monetary_avg_basket_sales` | Average basket sales | mean(basket_sales) |
| RFM | Monetary | `monetary_avg_basket_actual` | Average basket payment | mean(basket_actual) |
| RFM | Monetary | `monetary_std` | Basket sales standard deviation | std(basket_sales) |
| RFM | Monetary | `monetary_per_week` | Average sales per week | monetary_sales / tenure_weeks |
| RFM | Monetary | `coupon_savings_ratio` | Coupon savings ratio | total_coupon_disc / monetary_sales |
| **Behavioral** | Price Sensitivity | `discount_rate` | Overall discount rate | total_discount / (sales + total_discount) |
| Behavioral | Price Sensitivity | `discount_usage_pct` | Proportion of discounted transactions | discount_transactions / total_transactions |
| Behavioral | Brand | `private_label_ratio` | Private label purchase ratio | PL_count / (PL_count + NB_count) |
| Behavioral | Basket | `n_departments` | Number of departments visited | nunique(DEPARTMENT) |
| Behavioral | Basket | `n_products` | Number of unique products purchased | nunique(PRODUCT_ID) |
| Behavioral | Basket | `avg_items_per_basket` | Average items per basket | total_quantity / n_baskets |
| Behavioral | Basket | `avg_products_per_basket` | Average unique products per basket | n_products / n_baskets |
| **Category** | Grocery | `share_grocery` | Grocery category sales share | grocery_sales / total_sales |
| Category | Fresh | `share_fresh` | Fresh category sales share | fresh_sales / total_sales |
| Category | Bakery | `share_bakery` | Bakery category sales share | bakery_sales / total_sales |
| Category | Health&Beauty | `share_health_beauty` | Health & Beauty category sales share | hb_sales / total_sales |
| Category | Alcohol | `share_alcohol` | Alcohol category sales share | alcohol_sales / total_sales |
| Category | Other | `share_other` | Other categories sales share | other_sales / total_sales |
| **Time** | Coverage | `week_coverage` | Purchase week coverage ratio | n_weeks_active / (week_range + 1) |

**Macro Category Mapping:**
- `grocery`: GROCERY, FROZEN GROCERY, GRO BAKERY
- `fresh`: PRODUCE, MEAT, MEAT-PCKGD, SEAFOOD, DELI, DAIRY DELI, SALAD BAR, ...
- `bakery`: PASTRY
- `health_beauty`: DRUG GM, NUTRITION, COSMETICS, RX, HBC, ...
- `alcohol`: SPIRITS

---

#### Track 1: Reduced Features (19)

Features used for NMF in Step 1.1 (reduced set to remove redundancy):

| Category | Variables | Selection Rationale |
|----------|-----------|---------------------|
| Recency (2) | `recency`, `days_between_purchases_avg` | Core metrics, removed redundant variants |
| Frequency (3) | `frequency`, `frequency_per_week`, `purchase_regularity` | Absolute, relative, and regularity measures |
| Monetary (4) | `monetary_sales`, `monetary_avg_basket_sales`, `monetary_std`, `coupon_savings_ratio` | Total, average, variability, and savings |
| Behavioral (5) | `discount_usage_pct`, `private_label_ratio`, `n_departments`, `n_products`, `avg_items_per_basket` | Price sensitivity, brand preference, variety, basket size |
| Category (5) | `share_grocery`, `share_fresh`, `share_bakery`, `share_health_beauty`, `share_alcohol` | Excluded share_other (sum to 1) |

---

#### Track 2: Additional Features

| Category | Subcategory | Variable | Description | Formula |
|----------|-------------|----------|-------------|---------|
| **Exposure** | Display | `display_exposure_rate` | Rate of purchasing display-exposed products | display_exposed_count / n_transactions |
| Exposure | Display | `display_intensity_avg` | Average display intensity when exposed (1-10) | mean(display_level) when exposed |
| Exposure | Mailer | `mailer_exposure_rate` | Rate of purchasing mailer-exposed products | mailer_exposed_count / n_transactions |
| **Campaign** | Targeting | `targeted_typeA` | Targeted by TypeA campaign (binary) | 1 if targeted by TypeA else 0 |
| Campaign | Targeting | `targeted_typeB` | Targeted by TypeB campaign (binary) | 1 if targeted by TypeB else 0 |
| Campaign | Targeting | `targeted_typeC` | Targeted by TypeC campaign (binary) | 1 if targeted by TypeC else 0 |
| Campaign | Count | `n_campaigns_targeted` | Total number of campaigns targeted | sum(TypeA + TypeB + TypeC) |
| Campaign | Count | `n_typeA_campaigns` | Number of TypeA campaigns targeted | count(TypeA campaigns) |
| **Outcome** | Redemption | `redemption_count` | Number of coupon redemptions | count(coupon redemptions) |
| Outcome | Redemption | `redemption_any` | Any coupon redemption (binary) | 1 if redemption_count > 0 else 0 |
| Outcome | Purchase | `purchase_amount` | Purchase amount in campaign period | sum(SALES_VALUE) in campaign period |
| Outcome | Purchase | `purchase_count` | Number of visits in campaign period | nunique(BASKET_ID) in campaign period |
| **Demographic** | Age | `AGE_DESC` | Age group | 19-24, 25-34, 35-44, 45-54, 55-64, 65+ |
| Demographic | Marital | `MARITAL_STATUS_CODE` | Marital status | A(Single), B(Married), U(Unknown) |
| Demographic | Income | `INCOME_DESC` | Income level | Under 15K ~ 250K+ |
| Demographic | Home | `HOMEOWNER_DESC` | Home ownership | Homeowner, Renter, Unknown |
| Demographic | Composition | `HH_COMP_DESC` | Household composition | 1 Adult, 2 Adults, ... |
| Demographic | Size | `HOUSEHOLD_SIZE_DESC` | Household size | 1, 2, 3, 4, 5+ |
| Demographic | Kids | `KID_CATEGORY_DESC` | Presence of children | None, 1, 2, 3+ |

---

#### Value × Need Feature Allocation

| Layer | Category | Feature Count | Variables |
|-------|----------|---------------|-----------|
| **Value** | Recency | 6 | recency, recency_weeks, active_4w/12w, days_between_* |
| Value | Frequency | 6 | frequency, frequency_per_*, transaction_count, weeks_with_purchase, purchase_regularity |
| Value | Monetary | 7 | monetary_*, coupon_savings_ratio |
| Value | Time | 1 | week_coverage |
| | **Value Total** | **20** | |
| **Need** | Behavioral | 7 | discount_*, private_label_ratio, n_*, avg_*_per_basket |
| Need | Category | 6 | share_* |
| | **Need Total** | **13** | |

### 3.4 Evaluation

**Technical Metrics:**
- NMF: Reconstruction error, Sparsity
- Clustering: Silhouette Score, Calinski-Harabasz Index, Davies-Bouldin Index

**Interpretability:**
- Factor loadings interpretability
- Cluster separation (ANOVA, Chi-square tests)

**Business:**
- CLV difference significance across segments
- Actionability of segment definitions

**Stability:**
- Segment consistency across periods (Adjusted Rand Index)
- Customer segment migration rate

---

## 4. Track 2: Causal Targeting

### 4.1 Methods

#### Step 2.1: HTE Analysis

**Objective:** Estimate heterogeneous treatment effects of campaigns.

**Methods:**
- Meta-learners: S-Learner, T-Learner, X-Learner, R-Learner
- Tree-based: Causal Forest (GRF) - enables rule extraction

**Treatment Definition:**
- Campaign targeting (TypeA: targeted, TypeB/C: uniform)
- Coupon issuance

**Outcome Variables:**
- Coupon redemption
- Purchase amount increase
- Category purchase

**Covariates:**
- Track 1 features (RFM, Behavioral, Category)
- Track 1 segments (as moderators)
- Demographics (hh_demographic)
- Marketing exposure (display, mailer)

**Analysis Questions:**
1. Does treatment effect heterogeneity exist? (BLP test)
2. Which customer characteristics drive effect differences?
3. What is the CATE distribution?

#### Step 2.2: Optimal Policy

**Objective:** Derive optimal targeting rules and estimate policy value.

**Methods:**
- Subgroup discovery: Causal Tree rules
- Policy learning: Optimal treatment assignment based on CATE
- Policy value estimation: Expected outcome under policy

**Output:**
- Targeting rules (interpretable decision rules)
- Policy value (expected improvement vs. random targeting)
- ROI estimates under optimal policy

### 4.2 Operational Definitions (2 Scenarios)

**Campaign Data Overview:**

| Type | # Campaigns | # Targeting Records | Avg per Campaign |
|------|-------------|---------------------|------------------|
| TypeA | 5 | 3,979 | ~796 |
| TypeB | 19 | 2,655 | ~140 |
| TypeC | 6 | 574 | ~96 |

---

#### Scenario 1: TypeA Only (Primary) - **First Campaign Only**

**Design Rationale:** For clean causal identification, each customer appears exactly once using their **first TypeA campaign** only. This prevents pre-treatment contamination from previous campaigns.

| Item | Definition |
|------|------------|
| **Analysis Unit** | `household_key` (customer) |
| **Treatment** | Customer's **first** TypeA campaign targeting (1,513 customers) |
| **Control** | Never targeted by any TypeA campaign (987 customers) |
| **Total Sample** | 2,500 observations (1 per customer) |

**First Campaign Distribution:**

| Campaign | Week | Treatment Count |
|----------|------|-----------------|
| 26 | 32-38 | 332 |
| 30 | 47-53 | 126 |
| 8 | 59-66 | 768 |
| 13 | 72-79 | 153 |
| 18 | 84-92 | 134 |

**Period Design (Campaign-specific):**

| Campaign | Pre-treatment | Outcome Period |
|----------|---------------|----------------|
| 26 | Week 1-31 | Week 32-42 |
| 30 | Week 1-46 | Week 47-53 |
| 8 | Week 1-58 | Week 59-70 |
| 13 | Week 1-71 | Week 72-83 |
| 18 | Week 1-83 | Week 84-96 |

**Why First Campaign Only?**
- Avoids pre-treatment contamination: Campaign 30's pre-treatment (Week 1-46) would include Campaign 26 period (Week 32-38) if we used all campaigns
- Each customer appears exactly once → independent observations
- Clean identification with no carry-over effects
- 62% sample reduction (3,979 → 1,513 treatment) but cleaner estimates

---

#### Scenario 2: First Campaign Overall with Type as Attribute

**Design Rationale:** For clean causal identification with campaign **type comparison**, each customer appears exactly once using their **first campaign** (any type). The `campaign_type` is recorded as an attribute for HTE analysis.

| Item | Definition |
|------|------------|
| **Analysis Unit** | `household_key` (customer) |
| **Treatment** | Customer's **first** campaign targeting (any type) |
| **Control** | Never targeted by ANY campaign |
| **Moderator** | `campaign_type` (TypeA/TypeB/TypeC as attribute) |
| **Total Sample** | ~2,484 observations (1 per customer) |

**First Campaign Type Distribution (estimated):**

| First Campaign Type | Treatment Count |
|--------------------|-----------------|
| TypeA | ~500-600 |
| TypeB | ~700-800 |
| TypeC | ~200-300 |
| Control (never targeted) | ~900 |

**Period Design (Campaign-specific):**
- Pre-treatment: Week 1 to (first campaign start - 1)
- Outcome: First campaign start to end + 4 weeks (capped by next campaign)

**Research Questions:**
1. Do treatment effects differ across campaign types?
2. Which campaign type is most effective for which customer segment?
3. Is there an interaction between customer characteristics and campaign type?

**Why First Campaign Overall?**
- All campaign types can overlap in time (TypeA, TypeB, TypeC run simultaneously)
- Using "first campaign per type" would still have contamination (e.g., TypeB after TypeA)
- Single observation per customer with campaign_type as attribute enables clean type comparison

---

#### Common Definitions (All Scenarios)

| Item | Definition |
|------|------------|
| **Outcome** | `redemption` (coupon usage), `purchase_amount` (outcome period) |
| **Confounders** | Pre-treatment Track 1 features, Marketing exposure tendency, Demographics |

#### Scenario Selection Guide

| Research Question | Recommended Scenario |
|-------------------|---------------------|
| "What is the causal effect of TypeA campaigns?" | **Scenario 1 (First TypeA Only)** |
| "How do effects differ across campaign types?" | **Scenario 2 (First Campaign Overall)** |
| "Which customers respond to TypeA?" (HTE) | Scenario 1 |
| "Which campaign type works best?" (Type HTE) | Scenario 2 |

### 4.3 Causal Graph

#### DAG Structure

```
                    Marketing Exposure (display, mailer)
                           │
                           ↓
Campaign Targeting ──────→ Purchase ←────── Customer Characteristics
         │                                     (confounders)
         │                                          │
         └──────→ Coupon Redemption ←───────────────┘
```

#### Variable Roles

| Role | Variables |
|------|----------|
| **Treatment** | Campaign targeting (TypeA), Coupon issuance |
| **Confounders** | Marketing exposure (display, mailer), Customer characteristics (demographics, historical behavior) |
| **Outcome** | Coupon redemption, Purchase amount, Category purchase |
| **Effect Modifiers** | All covariates (for HTE analysis) |

#### Identification Strategy

**Assumptions:**
1. **Unconfoundedness**: Conditional on observed covariates, treatment assignment is independent of potential outcomes
2. **Overlap**: All customers have positive probability of treatment

**Debiasing Approach:**
- Include marketing exposure (display, mailer) as confounders
- Control for pre-treatment customer characteristics
- Use pre-campaign period (Week 1-31) features to avoid endogeneity

### 4.4 Features

**Treatment Variables:**
- `campaign_targeted`: Campaign targeting indicator (binary)
- `campaign_type`: Campaign type (A/B/C)

**Confounders - Marketing Exposure (from causal_data, Pre-treatment Period Week 1-31):**
- `display_exposure_rate`: Proportion of purchases with display exposure
- `mailer_exposure_rate`: Proportion of purchases with mailer exposure
- `mailer_type_A/B/C`: Exposure to each mailer type
- **Note**: These features measure the customer's *tendency* to purchase display/mailer-exposed products, not the treatment itself. Measured from pre-treatment period to avoid endogeneity.

**Confounders - Demographics (from hh_demographic):**
- AGE_GRP, INCOME_GRP, HOUSEHOLD_SIZE
- KID_CATEGORY_DESC, MARITAL_STATUS
- CLASSIFICATION_1~7

**Outcome Variables (Campaign Period Week 32-102):**
- `redemption`: Coupon redemption indicator
- `purchase_amount`: Purchase amount in campaign period
- `category_purchase`: Target category purchase indicator

### 4.5 Evaluation

**Causal Metrics:**
- ATE significance (bootstrap CI, p-value)
- HTE existence test (BLP test, calibration)
- CATE distribution analysis

**Prediction Metrics:**
- Uplift curves (AUUC, Qini coefficient)
- Calibration plots

**Policy Metrics:**
- Policy value (expected outcome under policy)
- Targeting efficiency (precision@k)
- ROI under optimal policy vs. random targeting

**Robustness & Validation:**
- Bootstrap variance
- Cross-validation stability
- Model comparison (CATE correlation across different models)

**Causal Validation (Ground truth unavailable):**
- **Refutation tests:**
  - Placebo treatment: Random treatment → effect ≈ 0
  - Random confounder: Add random variable → effect unchanged
  - Subset validation: Similar results on data subsets
- **Sensitivity analysis:**
  - E-value: How strong would unmeasured confounding need to be?
  - Rosenbaum bounds: Sensitivity to hidden bias
- **Calibration:**
  - RATE (Sorted Group ATE): Do high-CATE groups have higher actual effects?

---

## 5. Period & Validation Design

### 5.1 Data Availability

| Data | Period | Notes |
|------|--------|-------|
| Transactions | Day 1-711 (Week 1-102) | Full 2 years |
| Campaigns | Day 224-719 (Week 32-102) | Campaigns only in later period |
| causal_data | Week 9-101 | Marketing exposure data |

**Key Periods:**
- Week 1-31: Pre-campaign period (no campaigns)
- Week 32-102: Campaign period

### 5.2 Track 1 Period Design

| Setting | Value | Rationale |
|---------|-------|----------|
| **Analysis Period** | Week 1-102 (Full) | Maximum data, complete customer behavior |
| **Stability Check** | Compare half-periods | Validate segment consistency over time |

### 5.3 Track 2 Period Design

| Setting | Value | Rationale |
|---------|-------|----------|
| **Analysis Period** | Week 32-102 | Campaign period only (treatment data available) |
| **Pre-treatment Features** | Week 1-31 | Use pre-campaign behavior as confounders (avoid information leakage) |
| **Validation** | TBD | Options: Time-based split, Campaign-based split, Cross-validation |

### 5.4 Customer Coverage

| Period | Weeks | Customers | Coverage |
|--------|-------|-----------|----------|
| Full (1-102) | 102 | 2,500 | 100% |
| Campaign (32-102) | 71 | 2,490 | 99.6% |
| Last 52w (51-102) | 52 | 2,479 | 99.2% |

---

## 6. Implementation Plan

### 6.1 Notebooks

| Notebook | Track | Content |
|----------|-------|--------|
| `00_study_design.ipynb` | - | Study design documentation (this notebook) |
| `01_feature_engineering.ipynb` | 1 & 2 | Build all customer features |
| `02_customer_profiling.ipynb` | 1.1 | NMF + Clustering |
| `03_clv_behavior_segments.ipynb` | 1.2 | CLV × Behavior segmentation |
| `04_hte.ipynb` | 2.1 | HTE analysis |
| `05_optimal_policy.ipynb` | 2.2 | Optimal policy learning |

### 6.2 Implementation Order

```
Track 1 (Priority: High)
────────────────────────
1. Feature Engineering (01_feature_engineering.ipynb)
   - Build all 33 base features
   - Implement Value and Need feature groups

2. Step 1.1 Customer Profiling (02_customer_profiling.ipynb)
   - NMF factor analysis
   - Base clustering

3. Step 1.2 CLV × Behavior (03_clv_behavior_segments.ipynb)
   - CLV comparison
   - Value-Need integration

Track 2 (Priority: Medium)
────────────────────────
4. Step 2.1 HTE Analysis (04_hte.ipynb)
   - Build campaign features
   - CATE estimation
   - Heterogeneity testing

5. Step 2.2 Optimal Policy (05_optimal_policy.ipynb)
   - Subgroup discovery
   - Policy evaluation
```

---

## 7. Cohort Definition & Comparison

This section loads data, builds cohorts using `src/cohorts.py`, and compares them.

In [10]:
import sys
from pathlib import Path

import numpy as np
import pandas as pd

from projects.segmentation_dunnhumby.src.cohorts import (
    define_track1_cohort,
    define_track2_base_cohort,
    build_scenario1_cohort,
    build_scenario2_cohort,
    compare_cohorts
)

In [11]:
# Load data
DATA_PATH = Path('.').absolute().parents[2] / 'data' / 'dunnhumby' / 'raw'

df_trans = pd.read_csv(DATA_PATH / 'transaction_data.csv')
df_campaign_table = pd.read_csv(DATA_PATH / 'campaign_table.csv')
df_campaign_desc = pd.read_csv(DATA_PATH / 'campaign_desc.csv')

print(f"Transactions: {len(df_trans):,} rows")
print(f"Campaign targeting: {len(df_campaign_table):,} rows")
print(f"Campaigns: {len(df_campaign_desc)} campaigns")

Transactions: 2,595,732 rows
Campaign targeting: 7,208 rows
Campaigns: 30 campaigns


In [12]:
# Build Track 1 Cohort
track1_cohort = define_track1_cohort(df_trans)
print(f"Track 1 Cohort: {len(track1_cohort):,} customers")

Track 1 Cohort: 2,500 customers


In [13]:
# Build Track 2 Base Cohort
track2_base = define_track2_base_cohort(df_trans)
print(f"Track 2 Base Cohort: {len(track2_base):,} customers")
print(f"  (requires purchase in Week 1-31 AND Week 32-102)")

Track 2 Base Cohort: 2,484 customers
  (requires purchase in Week 1-31 AND Week 32-102)


In [14]:
# Build Scenario 1: TypeA Only
scenario1 = build_scenario1_cohort(df_trans, df_campaign_table, df_campaign_desc)
print(f"Scenario 1 (TypeA Only):")
print(f"  Total rows: {len(scenario1):,}")
print(f"  Customers: {scenario1['household_key'].nunique():,}")
print(f"  Treatment: {scenario1['targeted'].sum():,} ({scenario1['targeted'].mean()*100:.1f}%)")
print(f"  Control: {(1-scenario1['targeted']).sum():,}")

Scenario 1 (TypeA Only):
  Total rows: 12,420
  Customers: 2,484
  Treatment: 3,978 (32.0%)
  Control: 8,442


In [15]:
# Build Scenario 2: All Types with Moderator
scenario2 = build_scenario2_cohort(df_trans, df_campaign_table, df_campaign_desc)
print(f"Scenario 2 (All Types):")
print(f"  Total rows: {len(scenario2):,}")
print(f"  Customers: {scenario2['household_key'].nunique():,}")
print(f"  Treatment: {scenario2['targeted'].sum():,} ({scenario2['targeted'].mean()*100:.1f}%)")
print(f"\n  By campaign type:")
print(scenario2.groupby('campaign_type')['targeted'].agg(['sum', 'mean']).round(3))

Scenario 2 (All Types):
  Total rows: 74,520
  Customers: 2,484
  Treatment: 7,206 (9.7%)

  By campaign type:
                sum   mean
campaign_type             
TypeA          3978  0.320
TypeB          2654  0.056
TypeC           574  0.039


In [16]:
# Cohort Comparison Summary
print("=" * 60)
print("COHORT COMPARISON SUMMARY")
print("=" * 60)

comparison = pd.DataFrame([
    {
        'Cohort': 'Track 1',
        'Unit': 'Customer',
        'N_Rows': len(track1_cohort),
        'N_Customers': len(track1_cohort),
        'N_Treatment': '-',
        'Treatment_%': '-'
    },
    {
        'Cohort': 'Track 2 Base',
        'Unit': 'Customer',
        'N_Rows': len(track2_base),
        'N_Customers': len(track2_base),
        'N_Treatment': '-',
        'Treatment_%': '-'
    },
    {
        'Cohort': 'Scenario 1 (TypeA)',
        'Unit': 'Customer × Campaign',
        'N_Rows': len(scenario1),
        'N_Customers': scenario1['household_key'].nunique(),
        'N_Treatment': scenario1['targeted'].sum(),
        'Treatment_%': f"{scenario1['targeted'].mean()*100:.1f}%"
    },
    {
        'Cohort': 'Scenario 2 (All)',
        'Unit': 'Customer × Campaign',
        'N_Rows': len(scenario2),
        'N_Customers': scenario2['household_key'].nunique(),
        'N_Treatment': scenario2['targeted'].sum(),
        'Treatment_%': f"{scenario2['targeted'].mean()*100:.1f}%"
    }
])

comparison

COHORT COMPARISON SUMMARY


Unnamed: 0,Cohort,Unit,N_Rows,N_Customers,N_Treatment,Treatment_%
0,Track 1,Customer,2500,2500,-,-
1,Track 2 Base,Customer,2484,2484,-,-
2,Scenario 1 (TypeA),Customer × Campaign,12420,2484,3978,32.0%
3,Scenario 2 (All),Customer × Campaign,74520,2484,7206,9.7%
