Data Set Story

This data set, which includes a company's website information, includes information such as the number of ads users have seen and clicked, as well as earnings information from there. There are two separate data sets, Control and Test groups. These data sets are located on separate pages of ab_testing.xlsxexcel. Maximum Bidding was applied to the control group, and Average Bidding was applied to the test group.

impression: Number of ad views
Click: Number of clicks on the viewed ad
Purchase: Number of products purchased after clicked ads
Earning: Earnings after purchased products

In [1]:
import pandas as pd
from scipy.stats import shapiro, levene, ttest_ind
import warnings
warnings.filterwarnings("ignore")

In [2]:
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)
pd.set_option('display.float_format', lambda x: '%.5f' % x)

In [3]:
dataframe_control = pd.read_excel("ab_testing.xlsx", sheet_name="Control Group")
dataframe_test = pd.read_excel("ab_testing.xlsx", sheet_name="Test Group")

In [4]:
df_control = dataframe_control.copy()
df_test = dataframe_test.copy()

In [6]:
def check_df(dataframe, head=5):
    print("##################### Shape #####################")
    print(dataframe.shape)
    print("##################### Types #####################")
    print(dataframe.dtypes)
    print("##################### Head #####################")
    print(dataframe.head())
    print("##################### Tail #####################")
    print(dataframe.tail())
    print("##################### NA #####################")
    print(dataframe.isnull().sum())
    print("##################### Describe #####################")
    print(dataframe.describe([0, 0.25, 0.50, 0.75 ,0.95, 0.99, 1]).T)


In [7]:
check_df(df_control)

##################### Shape #####################
(40, 4)
##################### Types #####################
Impression    float64
Click         float64
Purchase      float64
Earning       float64
dtype: object
##################### Head #####################
    Impression      Click  Purchase    Earning
0  82529.45927 6090.07732 665.21125 2311.27714
1  98050.45193 3382.86179 315.08489 1742.80686
2  82696.02355 4167.96575 458.08374 1797.82745
3 109914.40040 4910.88224 487.09077 1696.22918
4 108457.76263 5987.65581 441.03405 1543.72018
##################### Tail #####################
     Impression      Click  Purchase    Earning
35 132064.21900 3747.15754 551.07241 2256.97559
36  86409.94180 4608.25621 345.04603 1781.35769
37 123678.93423 3649.07379 476.16813 2187.72122
38 101997.49410 4736.35337 474.61354 2254.56383
39 121085.88122 4285.17861 590.40602 1289.30895
##################### NA #####################
Impression    0
Click         0
Purchase      0
Earning       0
dtype: int6

In [8]:
check_df(df_test)

##################### Shape #####################
(40, 4)
##################### Types #####################
Impression    float64
Click         float64
Purchase      float64
Earning       float64
dtype: object
##################### Head #####################
    Impression      Click  Purchase    Earning
0 120103.50380 3216.54796 702.16035 1939.61124
1 134775.94336 3635.08242 834.05429 2929.40582
2 107806.62079 3057.14356 422.93426 2526.24488
3 116445.27553 4650.47391 429.03353 2281.42857
4 145082.51684 5201.38772 749.86044 2781.69752
##################### Tail #####################
     Impression      Click  Purchase    Earning
35  79234.91193 6002.21358 382.04712 2277.86398
36 130702.23941 3626.32007 449.82459 2530.84133
37 116481.87337 4702.78247 472.45373 2597.91763
38  79033.83492 4495.42818 425.35910 2595.85788
39 102257.45409 4800.06832 521.31073 2967.51839
##################### NA #####################
Impression    0
Click         0
Purchase      0
Earning       0
dtype: int6

In [9]:
df_control["group"] = "control"
df_test["group"] = "test"

In [10]:
df = pd.concat([df_control, df_test], axis=0, ignore_index=False)

In [11]:
df.head()

Unnamed: 0,Impression,Click,Purchase,Earning,group
0,82529.45927,6090.07732,665.21125,2311.27714,control
1,98050.45193,3382.86179,315.08489,1742.80686,control
2,82696.02355,4167.96575,458.08374,1797.82745,control
3,109914.4004,4910.88224,487.09077,1696.22918,control
4,108457.76263,5987.65581,441.03405,1543.72018,control


In [12]:
df.tail()

Unnamed: 0,Impression,Click,Purchase,Earning,group
35,79234.91193,6002.21358,382.04712,2277.86398,test
36,130702.23941,3626.32007,449.82459,2530.84133,test
37,116481.87337,4702.78247,472.45373,2597.91763,test
38,79033.83492,4495.42818,425.3591,2595.85788,test
39,102257.45409,4800.06832,521.31073,2967.51839,test


In [13]:
df.shape

(80, 5)

Defining the Hypothesis of A/B Testing

Define the hypothesis.

H0 : M1 = M2 -There is no difference between the control group and test group purchase averages.

H1 : M1!= M2 -There is a difference between the control group and test group purchase averages.

Analyze the purchase averages for the control and test groups

In [14]:
df.groupby("group").agg({"Purchase": "mean"})

Unnamed: 0_level_0,Purchase
group,Unnamed: 1_level_1
control,550.89406
test,582.1061


Performing Hypothesis Testing

Before performing the hypothesis test, perform assumption checks. These are Normality Assumption and Variance Homogeneity.

Test whether the control and test groups comply with the normality assumption separately through the Purchase variable
Normality Assumption:

H0: Normal distribution assumption is provided.
H1: Normal distribution assumption is not provided

p < 0.05 H0 REJECTED
p > 0.05 H0 CANNOT BE REJECTED

According to the test result, is the normality assumption provided for the control and test groups?

Interpret the p-values ​​obtained.

In [15]:
test_stat, pvalue = shapiro(df.loc[df["group"] == "control", "Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9773, p-value = 0.5891


p-value=0.5891 > 0.05
HO cannot be rejected. The values ​​of the control group provide the normal distribution assumption.

In [16]:
test_stat, pvalue = shapiro(df.loc[df["group"] == "test", "Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9589, p-value = 0.1541


p-value=0.1541 > 0.05

HO cannot be rejected. The values ​​of the test group provide the normal distribution assumption.

Variance Homogeneity:
H0: Variances are homogeneous.
H1: Variances are not homogeneous.

p < 0.05 H0 REJECT
p > 0.05 H0 CANNOT BE REJECTED

Test whether variance homogeneity is provided for the control and test groups using the Purchase variable.

Is the normality assumption provided according to the test result?

Let's interpret the p-value values ​​obtained.

In [17]:
test_stat, pvalue = levene(df.loc[df["group"] == "control", "Purchase"],
                           df.loc[df["group"] == "test", "Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 2.6393, p-value = 0.1083


p-value=0.1083 > 0.05
HO cannot be rejected. The values ​​of the Control and Test groups provide the assumption of variance homogeneity.
Variances are Homogeneous.

Step 2: Select the appropriate test according to the results of the Normality Assumption and Variance Homogeneity

Since the assumptions are met, an independent two-sample t-test parametric test is performed.

Initial hypothesis
H0: M1 = M2 (There is no statistically significant difference between the control group and the test group in terms of purchase averages.)
H1: M1 != M2 (There is a statistically significant difference between the control group and the test group in terms of purchase averages)
p<0.05 HO RED , p>0.05 HO CANNOT BE REJECTED

In [18]:
test_stat, pvalue = ttest_ind(df.loc[df["group"] == "control", "Purchase"],
                              df.loc[df["group"] == "test", "Purchase"],
                              equal_var=True)
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = -0.9416, p-value = 0.3493


p-value = 0.3493 > 0.05
HO CANNOT BE REJECTED

Considering the p_value obtained in the test result, comment on whether there is a statistically significant difference between the control and test group purchase means.

p-value=0.3493

HO cannot be rejected. There is no statistically significant difference between the control and test group purchase means.