# A/B Testing Report: Evaluating Discount Impact on GMV and ROI

by Lau Wen Jun

## Disclaimer:
The dataset used in this analysis is a **synthetic dataset generated solely for demonstration purposes and does not reflect real-world user data**. You can download the dataset [here](https://www.dropbox.com/scl/fi/khi17myq2m3rosz5eawx7/abtestgmvroi.xlsx?rlkey=ikwv4htvaryfx9ehepqvdd7uw&st=a6381wai&dl=1).

## Table of Contents

1. [Business Context](#1.-Business-Context)
2. [Key Variables](#2.-Key-Variables)
3. [Objective](#3.-Objective)
4. [Experimental Design](#4.-Experimental-Design)
5. [Statistical Tests](#5.-Statistical-Tests)
6. [Results](#6.-Results)
7. [Conclusions](#7.-Conclusions)

## [1. Business Context](#Table-of-Contents)

An e-commerce platform explored offering a **10% discount** to boost sales. While discounts can increase order volume, they may also reduce profit margins. To evaluate the trade-off, the team launched an A/B test to determine whether offering a 10% discount leads to a meaningful increase in **Gross Merchandise Value (GMV)** and delivers a positive **Return on Investment (ROI)**.

## [2. Key Variables](#Table-of-Contents)


**GMV (Gross Merchandise Value)**: Total dollar value of items purchased per user

**Discount Given**: Amount of promotional discount applied

**ROI (Return on Investment)**:
$$
ROI = \frac{\text{Total GMV} - \text{Total Discount}}{\text{Total Discount}}
$$
 

 


## [3. Objective](#Table-of-Contents)

To determine if the 10% discount campaign increases GMV significantly and generates a positive ROI, justifying its use in future marketing efforts.

## [4. Experimental Design](#Table-of-Contents)

Control Group: Received no discount

Test Group: Received a 10% discount at checkout

Users were randomly assigned at the user ID level to avoid contamination. The discount was delivered privately through personalized notifications and app banners, preventing confusion or perceived unfairness.

All other site conditions were held constant. The campaign ran for a fixed period, and user-level data was collected for analysis.

## [5. Statistical Tests](#Table-of-Contents)

A two-sample t-test was used to assess whether average GMV differed significantly between the control and test groups.

𝑡
=
𝑥
ˉ
1
−
𝑥
ˉ
2
𝑠
1
2
𝑛
1
+
𝑠
2
2
𝑛
2
t= 
n 
1
​
 
s 
1
2
​
 
​
 + 
n 
2
​
 
s 
2
2
​
 
​
 
​
 
x
ˉ
  
1
​
 − 
x
ˉ
  
2
​
 
​
 
𝑥
ˉ
1
,
𝑥
ˉ
2
x
ˉ
  
1
​
 , 
x
ˉ
  
2
​
 : Mean GMV per user in test/control

𝑠
1
2
,
𝑠
2
2
s 
1
2
​
 ,s 
2
2
​
 : Variances

𝑛
1
,
𝑛
2
n 
1
​
 ,n 
2
​
 : Sample sizes

## [6. Results](#Table-of-Contents)

The following results were computed using a Python script analyzing a synthetic dataset:



In [None]:
import pandas as pd
import requests
from io import BytesIO
from scipy.stats import ttest_ind

In [3]:
# Download Excel from GitHub
url = "https://www.dropbox.com/scl/fi/khi17myq2m3rosz5eawx7/abtestgmvroi.xlsx?rlkey=ts1ujc7whmrezf8dkmq67zhne&st=aijoy80p&dl=1"
response = requests.get(url)
df = pd.read_excel(BytesIO(response.content))

In [17]:
# Group summary
summary = df.groupby("group").agg(
    avg_gmv=("gmv", "mean"),
    total_gmv=("gmv", "sum"),
    total_discount=("discount_given", "sum"),
    user_count=("user_id", "count")
).reset_index()

# ROI calculation
summary["roi"] = (summary["total_gmv"] - summary["total_discount"]) / summary["total_discount"]
summary.loc[summary["group"] == "control", "roi"] = None  # no discount given

# t-test
control_gmv = df[df["group"] == "control"]["gmv"]
test_gmv = df[df["group"] == "test"]["gmv"]
t_stat, p_val = ttest_ind(test_gmv, control_gmv, equal_var=False)

# GMV uplift %
control_avg = summary.loc[summary["group"] == "control", "avg_gmv"].values[0]
test_avg = summary.loc[summary["group"] == "test", "avg_gmv"].values[0]
uplift_pct = ((test_avg - control_avg) / control_avg) * 100

# Round the summary for clean display
summary["avg_gmv"] = summary["avg_gmv"].round(2)
summary["total_gmv"] = summary["total_gmv"].round(2)
summary["total_discount"] = summary["total_discount"].round(2)
summary["roi"] = summary["roi"].round(2)

# Display group summary
print("🔍 Group Summary:\n")
print(summary[["group", "avg_gmv", "total_gmv", "total_discount", "user_count", "roi"]])

# Display test results separately
print("\n📈 Test Results:\n")
print(f"GMV Uplift: {uplift_pct:.2f}%")
print(f"t-Statistic: {t_stat:.2f}")
print(f"p-Value: {p_val:.4e}")

🔍 Group Summary:

     group  avg_gmv  total_gmv  total_discount  user_count  roi
0  control    49.89  250109.66            0.00        5013  NaN
1     test    58.03  289420.46        28942.05        4987  9.0

📈 Test Results:

GMV Uplift: 16.32%
t-Statistic: 38.16
p-Value: 2.8631e-297


The test group had a **statistically significant 16.3% increase in GMV (p < 0.001)**. ROI was **9.0**, meaning for every 1 dollar spent on discounts, **10 dollars in GMV was generated** — 1 dollar was the cost, and 9 dollars was the net gain, which is why the ROI is 9, not 10.

## [7. Conclusions](#Table-of-Contents)
    

The A/B test confirmed that the 10% discount campaign effectively increased GMV and delivered a strong ROI of 9.0, meaning every 1 dollar spent returned $10 dollars in revenue. Given the statistically significant uplift and high return, the discount strategy is both effective and profitable. It is recommended to roll out the campaign to a broader user base and monitor long-term performance. Further A/B testing could explore varying discount levels or targeting specific user segments for even greater impact.

