Taylor Manivanh
8/23/2022
AB Testing

# A/B Testing

[A/B Testing Dataset from Kaggle](https://www.kaggle.com/datasets/zhangluyuan/ab-testing?select=ab_data.csv)

Dataset Information:
- user_id: A unique user ID number.
- timestamp: The date said user interacted with the website.
- group: Control if the user landed on the old page, Treatment if the user landed on the new page.
- landing_page: New page or old page.
- converted: Conversion status of the user.

Null Hypothesis: There is no significant difference between the two webpages shown

Alternate Hypothesis: There is significant difference between the two webpages shown

In [1]:
# Import Statements
import pandas as pd
from scipy.stats import ttest_1samp, shapiro, levene, ttest_ind, mannwhitneyu
from statsmodels.stats.proportion import proportions_ztest

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 10)
pd.set_option('display.float_format', lambda x: '%.5f' % x)
df = pd.read_csv("ab_data.csv")

In [3]:
df.head()

Unnamed: 0,user_id,timestamp,group,landing_page,converted
0,851104,2017-01-21 22:11:48.556739,control,old_page,0
1,804228,2017-01-12 08:01:45.159739,control,old_page,0
2,661590,2017-01-11 16:55:06.154213,treatment,new_page,0
3,853541,2017-01-08 18:28:03.143765,treatment,new_page,0
4,864975,2017-01-21 01:52:26.210827,control,old_page,1


## 1. Assumption of Normality & Homogeneity of Variance

#### Assumption of Normality: 
Asserts that the distribution of sample means (across independent samples) is normal.

**The Shapiro-Wilk Test**: The null-hypothesis of this test is that the population is normally distributed. Thus, if the p value is less than the chosen alpha level, then the null hypothesis is rejected and there is evidence that the data tested are not normally distributed. On the other hand, if the p value is greater than the chosen alpha level, then the null hypothesis (that the data came from a normally distributed population) can not be rejected

#### Homogeneity of Variance: 
Assumption that state population distributioon of scores around the mean of 2 or more samples are equal. All its random variables should have the same finite variance.

**The Levene Test**: An inferential statistic used to assess the equality of variances for a variable calculated for two or more groups. The null-hypothesis of this test is that the variances are homogenic and the alternate hypothesis is that the variances are not homogenic

In [8]:
# Assumption of Normality
test_stat, pvalue = shapiro(df.loc[df["group"] == "treatment", "converted"])
print(f'Test Statistic: {round(test_stat,5)}, p-value: {round(pvalue, 5)}')

# Homogeneity of Variance
test_stat2, pvalue2 = levene(df.loc[df["group"] == "treatment", "converted"],
                            df.loc[df["group"] == "control", "converted"])
print(f'Test Statistic: {round(test_stat2,5)}, p-value: {round(pvalue2, 5)}')

Test Statistic: 0.37702, p-value: 0.0
Test Statistic: 1.52997, p-value: 0.21612


p value < 0.05 means null hypothesis (H0) is rejected

- Normality: null hypothesis is rejected. Data is not normally distributed.
- Homogeneity: null hypothesis is not rejected. Variances are homogenic.


## 1a. Mann-Whitney U Test
When the assumption of normality is rejected, we use the non-parametric method or the Mann-Whitney U Test.

**Mann-Whitney U Test**: A nonparametric test of the null hypothesis that, for randomly selected values X and Y from two populations, the probability of X being greater than Y is equal to the probability of Y being greater than X. 
- Null Hypothesis: There is not a significant statistical difference between the two groups
- Alternate Hypothesis: There is a significant statistical difference between the two groups


In [9]:
# Data is discrete so manwhitneyu test is used
test_stat3, pvalue3 = mannwhitneyu(df.loc[df["group"] == "treatment", "converted"],
                                    df.loc[df["group"] == "control", "converted"])
print(f'Test Statistic: {round(test_stat3,5)}, p-value: {round(pvalue3, 5)}')

Test Statistic: 10823622516.0, p-value: 0.10806


p value is greater than 0.05 so the null hypothesis cannot be rejected. 

Thus there is not a significant statistical difference between the old and new web pages.

## 2. Another Way to Test - Two Proportions Z-Test
**Two Proportion Z-Test**: a statistical hypothesis test used to determine whether two proportions are different from each other. While performing the test, Z-statistics is computed from two independent samples and the null hypothesis is that the two proportions are equal.

- Null Hypothesis: There is not a significant statistical difference between the two proportions
- Alternate Hypothesis: There is a significant statistical difference between the two proportions

In [10]:
# Independent and non-homogenic
old_page_convert = df.loc[df["group"] == "control", "converted"].sum()
new_page_convert = df.loc[df["group"] == "treatment", "converted"].sum()

test_stat4, pvalue4 = proportions_ztest(count=[old_page_convert, new_page_convert],
                                    nobs=[df.loc[df["group"] == "control", "converted"].shape[0],
                                          df.loc[df["group"] == "treatment", "converted"].shape[0]])
print(f'Test Statistic: {round(test_stat4,5)}, p-value: {round(pvalue4, 5)}')

Test Statistic: 1.23692, p-value: 0.21612


However, while the Ztest is used when two independent actors are in play (like this scenario), keep in mind that to use the Ztest, we need the assumption of normality to not be rejected (unlike this scenario).