## A/B Testing Case For an E-commerce Website

Ufuk Taner CEYHANLI <br>
[*tanerceyhanli.github.io*](https://tanerceyhanli.github.io)

### Business Problem

In this study, I am going to design an A/B test for a planned a change to be deployed in a e-commerce website and give recommendations based on the results. The company is planing to deploy a recommendation engine which is tought to be offering right products to the website visitors after clicking "Buy a product" button. However this change may harm or improve the company's bussiness, at this point A/B testing is necessary.  

### Introduction

A successful A/B test must include all the steps below.

- **Step 1** Set the goal, hypothesis and choose metrics for sanity check and evaluation
- **Step 2** Choose significance level, statistical power, and practical significance level
- **Step 3** Calculate the required sample size
- **Step 4** Design and Run the Experiment
- **Step 5** Analyze the results and draw conclusions: Sanity checks & Effect size analysis

In [297]:
#Importing necessary packages
import pandas as pd
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
import seaborn as sns

### Step 1) Set the goal, hypothesis and choose metrics for sanity check and evaluation

Our hypothesis is that payment and sign up rates of the site are going to increase due to the change on the site.

<center><h4><strong> Customer Funnel </strong></h4> <img src="customer_funnel.png" width="600"> </center>

#### Metrics

Invariant metrics:
- Pageviews
- Clicks
- CTP

Evaluation metrics:
- Gross Conversion: Probability of signing up, given click (Number of Sign-ups/Number of Clicks)
- Retention:         Probability of payment, given sign up   (Number of Payments/Number of Sign-ups)
- Net Conversion:    Probability of payment, given click    (Number of Payments/Number of Clicks)

### Step 2) Choose significance level, statistical power, and practical significance level

| Item  | Gross Conversion | Retension | Net Conversion  |
|:-:|:-:|:-:|:-:|
| Significance Level $\alpha$ | 5%  | 5%  | 5%  |
| Statistical Power 1-$\beta$ | 80%  | 80%  | 80%  |
| Practical Significance Level | 1%  | 1%  | 0.75%  |

### Step 3) Calculate the required sample size

The company has already baseline values seen below 

In [298]:
df_baseline=pd.read_csv("Baseline.csv", header=None)
df_baseline

Unnamed: 0,0,1
0,Unique cookies to view page per day,70000.0
1,"Unique cookies to click ""Buy a product"" per day",5600.0
2,Sign-ups per day,1155.0
3,"Click-through-probability on ""Buy a product""",0.08
4,Gross Conversion,0.20625
5,Retention,0.53
6,Net conversion,0.109313


In [299]:
#Inputs: required alpha value (alpha should already fit the required test)
#Returns: z-score for given alpha
def get_z_score(alpha):
    return norm.ppf(alpha)

# Inputs p-baseline conversion rate which is our estimated p and d-minimum detectable change
# Returns
def get_sds(p,d):
    sd1 = np.sqrt(p * (1-p) + p * (1-p))
    sd2 = np.sqrt(p *(1-p) + (p + d)* (1-(p+d)))
    sds = [sd1, sd2]
    return sds

# Inputs:sd1-sd for the baseline,sd2-sd for the expected change,alpha,beta,d-d_min,p-baseline estimate p
# Returns: the minimum sample size required per group according to metric denominator
def get_sampSize(sds,alpha,beta,d):
    n = pow ((get_z_score(1 - alpha / 2) * sds[0] + get_z_score (1 - beta) * sds[1]),2) / pow(d,2)
    return n

#### Gross Conversion

In [300]:
GC={}
GC["d_min"]=0.01
GC["p"]=df_baseline.iloc[4,1]
GC["SampSize"]=round(get_sampSize(get_sds(GC["p"],GC["d_min"]),0.05,0.2,GC["d_min"]))
GC["SampSize_total"] = GC["SampSize"] / 0.08*2
GC["SampSize_total"]

645875.0

Baseline Conversion: 20.625% <Br>
Minimum Detectable Effect: 1%
Alpha: 5% <br>
Beta: 20% <br>
Sensitivity (1 - Beta): 80% <br>
Sample Size = 25835 clicks/group <br>
Number of groups = 2 (experiment and control)<br>
Total sample size = 51670 clicks <br>
Clicks/Pageview: 5600/70000 = 0.08 <br>
Pageviews Required = 51670 / 0.08 = 645875

#### Retension

In [301]:
RT={}
RT["d_min"]=0.01
RT["p"]=df_baseline.iloc[5,1]
RT["SampSize"]=round(get_sampSize(get_sds(RT["p"],RT["d_min"]),0.05,0.2,RT["d_min"]))
RT["SampSize_total"] = round(RT["SampSize"] / 0.20625 / 0.08 * 2)
RT["SampSize_total"]

4737818

Baseline Conversion: 53% <br>
Minimum Detectable Effect: 1% <br>
Alpha: 5% <br>
Beta: 20% <br>
Sensitivity (1 - Beta): 80% <br>
Sample size = 39087 sign-ups/group <br>
Number of groups = 2 (experiment and control) <br>
Total sample size = 78174 sign-ups <br>
Sign-ups/pageview: 1155/70000 = 0.0165  <br>
Pageviews = 78174/0.0165 = 4737818

#### Net Conversion

In [302]:
NC={}
NC["d_min"]=0.0075
NC["p"]=df_baseline.iloc[3,1]
NC["SampSize"]=round(get_sampSize(get_sds(NC["p"],NC["d_min"]),0.05,0.2,NC["d_min"]))
NC["SampSize_total"] = NC["SampSize"] * 2 / (1600/20000)
NC["SampSize_total"]

519975.0

Baseline Conversion: 10.9313% <br>
Minimum Detectable Effect (MDE): 0.75% <br>
Alpha: 5% <br>
Beta: 20% <br>
Sensitivity (1 - Beta): 80% <br>
Sample size = 27413 clicks/group <br>
Number of groups = 2 (experiment and control) <br>
Total sample size = 54826 clicks <br>
Clicks/pageview: 5600/70000 = 0.08  <br>
Pageviews = 54826/0.08 = 685325

If the change is divert to **100%** traffic, the company needs at least **68 days** for the experiment due to high amount of sample size requirement of Retention metric. <br>
4737818 / 70000 = 67.68

Considering the importance of metric and the experiment duration, evaluating this change by using Gross Conversion and Net Conversion metrics with lower traffic diversion will be safer for the company's business. This way, the experiment duration will be at least **20 days** with **50%** divert to the traffic. <br>
685325/70000 = 9.79


### Step 4) Design and Run the Experiment

At least during 20 days the data is collected from the users with 50% diversion to the traffic.

### Step 5) Analyze the results and draw conclusions

Before starting effect size analysis, I need to ensure that number of pageviews and clicks are evenly distributed to control and experiment groups. I also need to check there is not a significant difference between control and experiment groups in terms of click through probability

In [317]:
df_cntrl=pd.read_csv("control_group.csv")
df_exprmnt=pd.read_csv("experiment_group.csv")
print(df_cntrl.head())
print(df_exprmnt.head())

       Date  Pageviews  Clicks  Sign-ups  Payments
0  6/5/2020       3862     344        67        20
1  6/6/2020       4551     390        74        25
2  6/7/2020       5256     455        84        30
3  6/8/2020       4936     418        78        33
4  6/9/2020       5007     419       110        51
       Date  Pageviews  Clicks  Sign-ups  Payments
0  6/5/2020       3858     343        53        17
1  6/6/2020       4644     393        58        46
2  6/7/2020       5240     442        73        40
3  6/8/2020       4934     414        69        46
4  6/9/2020       4897     416        70        47


In [318]:
df_cntrl.shape

(23, 5)

**23 days** of experiment

#### Sign Test for Invariant Metrics

In [319]:
results ={'Control':pd.Series([df_cntrl.Pageviews.sum(), df_cntrl.Clicks.sum()], index=["Pageviews","Clicks"]),
          'Experiment':pd.Series([df_exprmnt.Pageviews.sum(), df_exprmnt.Clicks.sum()], index=["Pageviews","Clicks"])}


In [320]:
df_results = pd.DataFrame(results)
df_results

Unnamed: 0,Control,Experiment
Pageviews,106087,105685
Clicks,8654,8635


In [326]:
df_results["Total"] = df_results["Control"] + df_results["Experiment"]
df_results['Prob'] = 0.5
df_results['StdErr'] = np.sqrt((df_results.Prob * (1- df_results.Prob))/df_results.Total)
df_results["MargErr"] = 1.96 * df_results.StdErr
df_results["CI_lower"] = df_results.Prob - df_results.MargErr
df_results["CI_upper"] = df_results.Prob + df_results.MargErr
df_results["Obs_val"] = df_results.Control/df_results.Total
df_results["Pass_Sanity"] = df_results.apply(lambda x: (x.Obs_val > x.CI_lower) and (x.Obs_val < x.CI_upper),axis=1)
df_results['Diff'] = abs((df_results.Experiment - df_results.Control)/df_results.Total)

df_results

Unnamed: 0,Control,Experiment,Total,Prob,StdErr,MargErr,CI_lower,CI_upper,Obs_val,Pass_Sanity,Diff
Pageviews,106087,105685,211772,0.5,0.001087,0.00213,0.49787,0.50213,0.500949,True,0.001898
Clicks,8654,8635,17289,0.5,0.003803,0.007453,0.492547,0.507453,0.500549,True,0.001099


Number of pageviews and clicks are equally distributed in two groups as we expect.

In [327]:
ctp ={'Control':pd.Series([df_cntrl.Clicks.sum()/df_cntrl.Pageviews.sum()], index=["CTP"]),
          'Experiment':pd.Series([df_exprmnt.Clicks.sum()/df_exprmnt.Pageviews.sum()], index=["CTP"])}

In [328]:
df_ctp = pd.DataFrame(ctp)
df_ctp['SE_pooled']=np.sqrt((df_ctp.Control * (1- df_ctp.Control))/df_cntrl.Pageviews.sum())
df_ctp['CI_lower']= df_ctp.Control-1.96*df_ctp.SE_pooled
df_ctp['CI_upper']= df_ctp.Control+1.96*df_ctp.SE_pooled
df_ctp['Obs_val']=df_ctp.Experiment
df_ctp["Pass_Sanity"] = df_ctp.apply(lambda x: (x.Obs_val > x.CI_lower) and (x.Obs_val < x.CI_upper),axis=1)

In [329]:
df_ctp

Unnamed: 0,Control,Experiment,SE_pooled,CI_lower,CI_upper,Obs_val,Pass_Sanity
CTP,0.081575,0.081705,0.00084,0.079927,0.083222,0.081705,True


There is no significant difference in CTP between two groups as we expect.

#### Effect Size Analysis

In [341]:
esa ={'Control':pd.Series([df_cntrl['Sign-ups'].sum()/df_cntrl.Clicks.sum(), df_cntrl.Payments.sum()/df_cntrl.Clicks.sum()], index=["Gross_Conversion","Net_Conversion"]),
          'Experiment':pd.Series([df_exprmnt['Sign-ups'].sum()/df_exprmnt.Clicks.sum(),df_exprmnt.Payments.sum()/df_exprmnt.Clicks.sum()], index=["Gross_Conversion","Net_Conversion"])}

In [342]:
df_esa = pd.DataFrame(esa)
df_esa['D']=df_esa['Experiment']-df_esa['Control']
df_esa['Dmin']=0.01,0.0075
df_esa['P_pooled']=(df_cntrl['Sign-ups'].sum()+df_exprmnt['Sign-ups'].sum())/(df_cntrl.Clicks.sum()+df_exprmnt.Clicks.sum()),(df_cntrl.Payments.sum()+df_exprmnt.Payments.sum())/(df_cntrl.Clicks.sum()+df_exprmnt.Clicks.sum())
df_esa['SE_pooled']=np.sqrt(df_esa.P_pooled* (1- df_esa.P_pooled)*((1/df_cntrl.Clicks.sum())+(1/df_exprmnt.Clicks.sum())))
df_esa['CI_lower']= df_esa.D-1.96*df_esa.SE_pooled
df_esa['CI_upper']= df_esa.D+1.96*df_esa.SE_pooled
df_esa['Practically Significant (|D|>dmin)']="Yes","No"
#If O is inside the confidence interval, it is not sitatistically significant.
df_esa['Statistically Significant']="Yes","No"
df_esa

Unnamed: 0,Control,Experiment,D,Dmin,P_pooled,SE_pooled,CI_lower,CI_upper,Practically Significant (|D|>dmin),Statistically Significant
Gross_Conversion,0.219205,0.198842,-0.020363,0.01,0.209035,0.006185,-0.032485,-0.008241,Yes,Yes
Net_Conversion,0.1107,0.11326,0.00256,0.0075,0.111979,0.004797,-0.006841,0.011961,No,No


Given these numbers, the experiment showed that a statistically and practically significant number of people chose not to sign up to the website, but that the ratio of users who actually paid for products did not significantly change in either direction. Given this, it suggests that the experiment only deterred some vistiors from signing up to the website but it did not impact the number of people who ultimately paid for the products.

### Recommendation

I recommend not to launch this change because the recommendation engine to be deployed causes people not to sign up to the website, while It does not have a significant positive effect on number of payments.