# A/B Testing 
*Notes summarized and condensed from https://medium.com/@RenatoFillinich/ab-testing-with-python-e5964dd66143*
   
1. Designing our experiment
2. Collecting and preparing the data
3. Visualizing the results
4. Testing the hypothesis 
5. Drawing conclusions 

**Scenario**
- online ecommerce business 
- current conversion rate of product page: 13% on average throughout the year
- team would be happy with 2% increase (new design considered successful if it raises conversion rate to 15%) 

### 1. Designing our experiment
   
#### Formulating hypothesis   
Two-tailed test to see if new design will perform better or worse or the same:   
$H_{0}$: p = $p_{0}$   
$H_{a}$: p != $p_{0}$   
p is conversion rate of new design, $p_{0}$ is conversion rate of old design    

$\alpha$ = 0.05    
- if probability of observing a result as extreme or more (p-value) is lower than $\alpha$, then reject null.    

(our independent variable)   
Control: old design   
Test: new design    
   
(dependent variable - what we are tryna measure)   
conversion rate:   
- 0: user did not buy the product during this user session   
- 1: user bought the product during this user session    

#### Choosing sample size   
The number of people/user sessions we decide in each group will have an affect on the precision of our estimated conversion rates: the larger the sample size, the more precise our estimate (i.e. the smaller the C.I.), the higher the chance to detect a difference in two groups, if present.   
But also more expensive.   
   
#### Power analysis:    
- Power of the test (1 - $\beta$): probability of finding a statistical difference between the groups in our test when a difference is actually present (0.80 by convention)   
    - have 80% chance to detect it as statistically significant in our test with the sample size we will calculate 
- Alpha value: critical value (0.05)   
- Effect size: how big of a difference we expect there to be between the conversion rates  

In [1]:
import numpy as np
import pandas as pd
import scipy.stats as stats
import statsmodels.stats.api as sms
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
from math import ceil

%matplotlib inline # will make your plot outputs appear and be stored within the notebook

In [4]:
effect_size = sms.proportion_effectsize(0.13, 0.15)
effect_size

-0.0576728617308947

In [5]:
# for two sample 
required_n = sms.NormalIndPower().solve_power(
    effect_size, 
    power=0.8, 
    alpha=0.05, 
    ratio=1
    ) # Calculating sample size needed

required_n = ceil(required_n) # Rounding up to next whole number                          

print(required_n) # required_n is the number needed for each group 

4720
