<a href="https://colab.research.google.com/github/tursunait/IDS_701_causal_final_report/blob/main/Untitled_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# 🔧 Step 1: Install linearmodels (only needed once per Colab session)
!pip install linearmodels


Collecting linearmodels
  Downloading linearmodels-6.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.9 kB)
Collecting mypy-extensions>=0.4 (from linearmodels)
  Downloading mypy_extensions-1.0.0-py3-none-any.whl.metadata (1.1 kB)
Collecting pyhdfe>=0.1 (from linearmodels)
  Downloading pyhdfe-0.2.0-py3-none-any.whl.metadata (4.0 kB)
Collecting formulaic>=1.0.0 (from linearmodels)
  Downloading formulaic-1.1.1-py3-none-any.whl.metadata (6.9 kB)
Collecting setuptools-scm<9.0.0,>=8.0.0 (from setuptools-scm[toml]<9.0.0,>=8.0.0->linearmodels)
  Downloading setuptools_scm-8.2.0-py3-none-any.whl.metadata (6.8 kB)
Collecting interface-meta>=1.2.0 (from formulaic>=1.0.0->linearmodels)
  Downloading interface_meta-1.3.0-py3-none-any.whl.metadata (6.7 kB)
Downloading linearmodels-6.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m23.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
import pandas as pd
from linearmodels.panel import PanelOLS

# Step 2: Load JD order data from GitHub
url = "https://raw.githubusercontent.com/tursunait/IDS_701_causal_final_report/main/JD_order_data.csv"
orders = pd.read_csv(url)

# Step 3: Filter purchases only
orders = orders[orders["quantity"] > 0]

# Step 4: Optimize memory
orders["user_ID"] = orders["user_ID"].astype("category")
orders["sku_ID"] = orders["sku_ID"].astype("category")

# Step 5: Set panel index — fixed effects by user
orders = orders.set_index(["user_ID", orders.index])  # MultiIndex: user + row index

# Step 6: Run PanelOLS with absorbed fixed effects (EntityEffects = user_ID)
model = PanelOLS.from_formula(
    "quantity ~ direct_discount_per_unit + coupon_discount_per_unit + "
    "quantity_discount_per_unit + bundle_discount_per_unit + EntityEffects",
    data=orders
)

result = model.fit()
print(result.summary)


  group_mu = self._frame.groupby(level=level).transform("mean")
  group_mu = self._frame.groupby(level=level).transform("mean")
  mu = self._frame.groupby(level=level).mean()
  mu = self._frame.groupby(level=level).mean()
  out = self._frame.groupby(level=level).count()
  mu = self._frame.groupby(level=level).mean()
  group_mu = self._frame.groupby(level=level).transform("mean")
  mu = self._frame.groupby(level=level).mean()
  mu = self._frame.groupby(level=level).mean()
  group_mu = self._frame.groupby(level=level).transform("mean")


                          PanelOLS Estimation Summary                           
Dep. Variable:               quantity   R-squared:                        0.0008
Estimator:                   PanelOLS   R-squared (Between):             -0.0095
No. Observations:              549989   R-squared (Within):               0.0008
Date:                Tue, Apr 22 2025   R-squared (Overall):             -0.0057
Time:                        04:19:56   Log-likelihood                -6.359e+05
Cov. Estimator:            Unadjusted                                           
                                        F-statistic:                      18.028
Entities:                      454897   P-value                           0.0000
Avg Obs:                       1.2090   Distribution:                 F(4,95088)
Min Obs:                       1.0000                                           
Max Obs:                       605.00   F-statistic (robust):             18.028
                            

#  Summary Report: Causal Impact of Promotion Types on Order Quantity

###  Research Question
> What is the **causal effect** of different promotion types — *direct discounts, coupon discounts, quantity discounts, and bundle discounts* — on the **number of units purchased** per order, conditional on a purchase?

---

###  Data & Methodology

- **Dataset**: JD.com e-commerce transactions (March 2018), ~550,000 purchase records across ~450,000 users.
- **Outcome variable**: `quantity` (number of units purchased per order).
- **Promotion types**:
  - `direct_discount_per_unit`
  - `coupon_discount_per_unit`
  - `quantity_discount_per_unit`
  - `bundle_discount_per_unit`
- **Model**: PanelOLS (Fixed Effects)
  - **User-level fixed effects** control for unobserved user characteristics such as loyalty, preference, or income level.
  - Each user serves as their own control group across different promotional exposures.

---

###  Key Results

| Promotion Type        | Coefficient | P-Value | Interpretation                                 |
|-----------------------|-------------|---------|------------------------------------------------|
| Direct Discount       | -0.0010     | 0.0000  | Slightly reduces the number of items per order |
| Coupon Discount       | -0.0026     | 0.0000  | Reduces order quantity more than direct        |
| Quantity Discount     | +0.0019     | 0.0005  | Increases number of units purchased            |
| Bundle Discount       | -0.0009     | 0.1740  | No statistically significant effect            |

- **R-squared (within)**: 0.0008 — modest but expected due to low variation per user
- **F-test (robust)**: p < 0.001 — model is statistically significant

---

### 🔍 Interpretation

- ✅ **Quantity Discounts** are the most effective tool for increasing bulk purchases.
- ⚠️ **Direct and Coupon Discounts** reduce the quantity ordered — they may drive conversions but not larger baskets.
- ❌ **Bundle Discounts** show no clear effect — potentially underutilized or unclear to users.

---

###  Business Implications

-  **To increase order volume**, JD.com should focus on **quantity-based promotions** (e.g., “Buy 3, get 1 free”).
-  **Direct and coupon discounts** are helpful for encouraging purchases but **do not encourage multi-unit orders**.
-  **Bundle offers** may need to be redesigned or made more attractive to drive impact.

---

###  Next Steps

To further strengthen causal interpretation:

- Add **SKU fixed effects** to control for product-specific demand.
- Apply **propensity score matching** or **instrumental variables** to correct for selection bias.
- Explore **time fixed effects** and **channel-based heterogeneity** (e.g., app vs. WeChat).

---
