# A/B (Testing with Python)

## By: Tahsin Jahin Khalid

### Set up Notebook

In [6]:
!pip install -q kaggle

In [3]:
from google.colab import files
files.upload()

Saving kaggle_new.json to kaggle_new.json


{'kaggle_new.json': b'{"username":"jahin1997","key":"5fc20a90902b2e8a95c6ce59e6f68154"}'}

In [7]:
!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

mkdir: cannot create directory ‘/root/.kaggle’: File exists


In [8]:
!kaggle datasets list

ref                                                        title                                           size  lastUpdated          downloadCount  voteCount  usabilityRating  
---------------------------------------------------------  ---------------------------------------------  -----  -------------------  -------------  ---------  ---------------  
nelgiriyewithana/top-spotify-songs-2023                    Most Streamed Spotify Songs 2023                47KB  2023-08-26 11:04:57           5996        194  1.0              
nelgiriyewithana/global-youtube-statistics-2023            Global YouTube Statistics 2023                  60KB  2023-07-28 15:36:38          16612        549  1.0              
joebeachcapital/students-performance                       Students Performance                             2KB  2023-08-31 00:50:11           1740         40  1.0              
iamsouravbanerjee/airline-dataset                          Airline Dataset                                  4M

In [11]:
!kaggle datasets download 'zhangluyuan/ab-testing'

Downloading ab-testing.zip to /content
 74% 3.00M/4.04M [00:00<00:00, 5.57MB/s]
100% 4.04M/4.04M [00:00<00:00, 6.03MB/s]


In [12]:
!unzip /content/ab-testing.zip

Archive:  /content/ab-testing.zip
  inflating: ab_data.csv             


### Problem Statement

You work on the product team at a medium-sized online e-commerce business. The UX designer worked really hard on a new version of the product page, with the hope that it will lead to a higher conversion rate. The product manager (PM) told you that the current conversion rate is about 13% on average throughout the year, and that the team would be happy with an increase of 2%, meaning that the new design will be considered a success if it raises the conversion rate to 15%.
Before rolling out the change, the team would be more comfortable testing it on a small number of users to see how it performs, so you suggest running an A/B test on a subset of your user base users.

### Assumptions Made
<p>Hₒ: p = pₒ</p>
<p>Hₐ: p ≠ pₒ</p>
<p>where p and pₒ stand for the conversion rate of the new and old design, respectively.</p> <p>Confidence level has been set to 95%.</p>
<p>α = 0.05</p>

### Import Python Modules

In [52]:
import numpy as np
import pandas as pd
import scipy.stats as stats
import statsmodels.stats.api as sms
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
from math import ceil
from statsmodels.stats.proportion import proportions_ztest, proportion_confint

### Setup Notebook Configurations

In [15]:
%matplotlib inline

In [16]:
plt.style.use('seaborn-whitegrid')
font = {'family': 'Helvetica',
        'weight': 'bold',
        'size': 14}

  plt.style.use('seaborn-whitegrid')


In [18]:
mpl.rc('font', **font)
# calculate effect size based on expected rates
effect_size = sms.proportion_effectsize(0.13, 0.15)

In [19]:
# calculate sample size needed
required_n = sms.NormalIndPower().solve_power(
    effect_size,
    power=0.8,
    alpha=0.05,
    ratio=1
)

In [20]:
required_n = ceil(required_n) # get nearest whole number

In [21]:
print(required_n)

4720


### Read Dataset

In [62]:
df = pd.read_csv("ab_data.csv")

In [63]:
df.head()

Unnamed: 0,user_id,timestamp,group,landing_page,converted
0,851104,2017-01-21 22:11:48.556739,control,old_page,0
1,804228,2017-01-12 08:01:45.159739,control,old_page,0
2,661590,2017-01-11 16:55:06.154213,treatment,new_page,0
3,853541,2017-01-08 18:28:03.143765,treatment,new_page,0
4,864975,2017-01-21 01:52:26.210827,control,old_page,1


In [64]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 294478 entries, 0 to 294477
Data columns (total 5 columns):
 #   Column        Non-Null Count   Dtype 
---  ------        --------------   ----- 
 0   user_id       294478 non-null  int64 
 1   timestamp     294478 non-null  object
 2   group         294478 non-null  object
 3   landing_page  294478 non-null  object
 4   converted     294478 non-null  int64 
dtypes: int64(2), object(3)
memory usage: 11.2+ MB


In [65]:
pd.crosstab(df['group'], df['landing_page'])

landing_page,new_page,old_page
group,Unnamed: 1_level_1,Unnamed: 2_level_1
control,1928,145274
treatment,145311,1965


In [66]:
# do not include users with multiple participations
session_counts = df["user_id"].value_counts(ascending=False)

In [67]:
multi_users = session_counts[session_counts > 1].count()

In [68]:
print(f"There are {multi_users} users that appear multiple times in the dataset")

There are 3894 users that appear multiple times in the dataset


In [69]:
filtered_users = session_counts[session_counts > 1].index
df = df[~df['user_id'].isin(filtered_users)]

In [70]:
print(f'The updated dataset now has {df.shape[0]} entries')

The updated dataset now has 286690 entries


### Sampling the Dataset

In [71]:
control_sample = df[df['group'] == 'control'].sample(
    n=required_n,
    random_state=3007)
treatment_sample = df[df['group'] == 'treatment'].sample(
    n=required_n,
    random_state=3007
)

In [72]:
ab_test = pd.concat(
    [control_sample, treatment_sample],
    axis=0)
ab_test.reset_index(drop=True, inplace=True)

In [73]:
ab_test

Unnamed: 0,user_id,timestamp,group,landing_page,converted
0,931824,2017-01-14 01:20:45.918813,control,old_page,0
1,935065,2017-01-21 04:29:17.208341,control,old_page,0
2,661998,2017-01-04 21:57:00.451469,control,old_page,0
3,916638,2017-01-04 06:08:56.637967,control,old_page,0
4,739549,2017-01-03 09:07:59.110951,control,old_page,0
...,...,...,...,...,...
9435,942306,2017-01-05 08:46:26.581952,treatment,new_page,0
9436,847515,2017-01-02 23:55:23.738402,treatment,new_page,0
9437,716617,2017-01-20 21:04:33.714827,treatment,new_page,1
9438,786476,2017-01-08 21:08:28.088596,treatment,new_page,0


In [74]:
ab_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9440 entries, 0 to 9439
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   user_id       9440 non-null   int64 
 1   timestamp     9440 non-null   object
 2   group         9440 non-null   object
 3   landing_page  9440 non-null   object
 4   converted     9440 non-null   int64 
dtypes: int64(2), object(3)
memory usage: 368.9+ KB


In [75]:
ab_test['group'].value_counts()

control      4720
treatment    4720
Name: group, dtype: int64

### Visualise Data

In [76]:
conversion_rates = ab_test.groupby('group')['converted']

In [77]:
# standard deviation of the proportion
std_p = lambda x: np.std(x, ddof=0)
# standard error of the proportion
# (std/sqrt(n))
se_p = lambda x: stats.sem(x, ddof=0)

In [78]:
conversion_rates = conversion_rates.agg([np.mean, std_p, se_p])
conversion_rates.columns = ['conversion_rates',
                            'standard_deviation',
                            'std_error']

In [79]:
conversion_rates.style.format('{:.3f}')

Unnamed: 0_level_0,conversion_rates,standard_deviation,std_error
group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
control,0.115,0.319,0.005
treatment,0.11,0.312,0.005


Judging by the stats above, it does look like our two designs performed very similarly, with our new design performing slightly better, approx. 12.3% vs. 12.6% conversion rate.

### Testing the Hypothesis

In [80]:
control_results = ab_test[
    ab_test['group'] == "control"]['converted']

In [81]:
treatment_results = ab_test[ab_test['group'] == 'treatment']['converted']

In [82]:
n_con = control_results.count()

In [83]:
n_treat = treatment_results.count()

In [84]:
successes = [control_results.sum(), treatment_results.sum()]

In [85]:
nobs = [n_con, n_treat]

In [86]:
z_stat, pval = proportions_ztest(successes,
                                 nobs=nobs)

In [87]:
(lower_con, lower_treat), (upper_con, upper_treat) = proportion_confint(successes,
                                                                        nobs=nobs,
                                                                        alpha=0.05)

In [89]:
print(f"z statistics: {z_stat:.2f}")
print(f"p-value: {pval:.3f}")
print(f"CI 95% for control group: [{lower_con:.3f}, {upper_con:.3f}]")
print(f"CI 95% for treatment group: [{lower_treat:.3f}, {upper_treat:.3f}]")

z statistics: 0.85
p-value: 0.397
CI 95% for control group: [0.106, 0.124]
CI 95% for treatment group: [0.101, 0.118]


### Results Inpretation

Since the p-value `0.397` is higher than the threshold alpha of `0.05`, the null hypothesis cannot be rejected. This means the new design did not perform better than the control group.

Additionally, checking the 95% confidence intervals for the treatment group, we notice that the targeted 15% is not included.

This means that the true conversion rate of the new design is similar to the baseline.