>## Background
A fictional productivity software company that is looking for ways to increase the number of people who pay for their software. The way that the software is currently set up, **users can download and use the software free of charge, for a 7-day trial. After the end of the trial, users are required to pay for a license to continue using the software.**
<br>
<br>One idea that the company wants to try is to **change the layout of the homepage** to emphasize more prominently and higher up on the page that there is a 7-day trial available for the company's software. The current fear is that some potential users are missing out on using the software because of a lack of awareness of the trial period. If more people download the software and use it in the trial period, the hope is that this entices more people to make a purchase after seeing what the software can do.

## 1. Metrics we want to increase for this AB_Testing:
* Download rate (How many percentage of the users download the software for a 7-day trial
* Purchase rate (How many percentage of the users purchase the license)

## 2. Desired result:
* result1: Download rate increases from 0.16 to 0.175
* result2: Purchase rate increases from 0.02 to 0.023

## 3. Experiment Sizing

**We want to detect the 2 results at an overall 5% Type I error rate and at 80% power**
<br>**Assumption: On average there are 3250 unique visitors every day**

Because we are optimizing 2 metrics, each with ideally 5% significance leve. Measuring 2 metrics at the same time will increase the probability of type 1 error.

In [1]:
print("Actual type 1 error: {:.4}".format(1-(1-0.05)**2))

Actual type 1 error: 0.0975


To resolve that, we will do a Bonferroni correction on the significance levels --> new significance levels = 0.025 

In [2]:
print("New actual type 1 error: {:.4}".format(1-(1-0.025)**2))

New actual type 1 error: 0.04938


In [3]:
import numpy as np
import scipy.stats as stats

def experiment_size(p_null, p_alt, alpha = .05, beta = .20):
    """
    Compute the minimum number of samples needed to achieve a desired power
    level for a given effect size.
    
    Input parameters:
        p_null: base success rate under null hypothesis
        p_alt : desired success rate to be detected
        alpha : Type-I error rate
        beta  : Type-II error rate
    
    Output value:
        n : Number of samples required for each group to obtain desired power
    """
    
    # Get necessary z-scores and standard deviations (@ 1 obs per group)
    z_null = stats.norm.ppf(1 - alpha)
    z_alt  = stats.norm.ppf(beta)
    sd_null = np.sqrt(p_null * (1-p_null) + p_null * (1-p_null))
    sd_alt  = np.sqrt(p_null * (1-p_null) + p_alt  * (1-p_alt) )
    
    # Compute and return minimum sample size
    p_diff = p_alt - p_null
    n = ((z_null*sd_null - z_alt*sd_alt) / p_diff) ** 2
    return np.ceil(n)

In [6]:
size1 = experiment_size(0.16, 0.175, alpha = 0.025)

print("Experimental size to detect result 1 : {}".format(size1))
print("Days needed for result 1: {}".format(round(size1 * 2 / 3250)))

size2 = experiment_size(0.02, 0.023, alpha = 0.025)

print("Experimental size to detect result 2 : {}".format(size2))
print("Days needed for result 2: {}".format(round(size2 * 2 / 3250)))

Experimental size to detect result 1 : 9481.0
Days needed for result 1: 6.0
Experimental size to detect result 2 : 34930.0
Days needed for result 2: 21.0


We also need to keep in mind that there could be about seven days before a user account associated with a cookie actually comes back to make their purchase. To compensate that, we will run the experiment for one more week.

We will run our experiment for **28** days

## 4. Analyze the result

In [7]:
import pandas as pd
data = pd.read_csv("homepage-experiment-data.csv")

In [8]:
data.head()

Unnamed: 0,Day,Control Cookies,Control Downloads,Control Licenses,Experiment Cookies,Experiment Downloads,Experiment Licenses
0,1,1764,246,1,1850,339,3
1,2,1541,234,2,1590,281,2
2,3,1457,240,1,1515,274,1
3,4,1587,224,1,1541,284,2
4,5,1606,253,2,1643,292,3


### 4.1 Check the Invariant Metric

check that the number of visitors assigned to each group is similar

In [11]:
def p_invariant_metric(n_control, n_obs):
    
    """
    Compute the p value for observing the actual control group proportion in a binomial distribution B(n, 0.5)
    
    Null hypothese: The mean of the actual distribution of the control group proportion equals to 0.5
    
    Input:
        n_control: size of control group
        n: total number of visitors from both group
    """
    p = 0.5 #if the assignment is random, we would expect the mean of control group proportion to be 0.5
    sd = np.sqrt(p * (1-p) * n_obs)
    z = ((n_control + 0.5) - p * n_obs) / sd #continuity correction on n_control
    p_result = round(2 * stats.norm.cdf(-abs(z)), 4)
    print("p value for observing the actual control group proportion and more extreme proportions in a binomial distribution B(n, 0.5): {}".format(p_result))

In [12]:
p_invariant_metric(data['Control Cookies'].sum(), data['Control Cookies'].sum()+data['Experiment Cookies'].sum())

p value for observing the actual control group proportion and more extreme proportions in a binomial distribution B(n, 0.5): 0.1075


Not significant enough to reject the null hypothese. We will consider that the number of visitors assigned to each group is similar.

### 4.2 Check the Evaluation Metric

In [13]:
def p_evaluation_metrics(n_control, 
                                    n_exper, 
                                    p_null, 
                                    p_click_control,
                                    p_click_exper,
                                    alt="larger"):
    
    """
    Statistical test to determine if we reached our goal;
    
    Input:
        n_control: size of control group
        n_exper: size of experiment group
        p_click_control: goal hit rate for control group
        p_click_exper: goal hit rate for experiment group
        alt: the relationship between p_click_control and p_click_exper in alternative hypothesis;
             larger: p_click_exper > p_click_control
             small: p_click_exper < p_click_control
             different p_click_exper is not equal to p_click_control
    """
    
    # compute standard error, z-score, and p-value
    se_p = np.sqrt(p_null * (1-p_null) * (1/n_control + 1/n_exper))
    z = (p_click_exper - p_click_control) / se_p
    if alt == "larger":
        p_result = 1-stats.norm.cdf(z)
    elif alt == "smaller":
        p_result = stats.norm.cdf(z)
    elif alt == "different":
        p_result = 2 * stats.norm.cdf(-abs(z))
    print("Success rate for control group: {}; Success rate for experiment group: {}".format(round(p_click_control, 4), round(p_click_exper, 4)))
    print("p value: {}".format(round(p_result, 4)))

In [14]:
#Download rate
p_evaluation_metrics(data['Control Cookies'].sum(),
                              data['Experiment Cookies'].sum(),
                              (data['Control Downloads'].sum() + data['Experiment Downloads'].sum()) / (data['Control Cookies'].sum() + data['Experiment Cookies'].sum()),
                              data['Control Downloads'].sum()  / data['Control Cookies'].sum(),
                              data['Experiment Downloads'].sum() / data['Experiment Cookies'].sum())

Success rate for control group: 0.1612; Success rate for experiment group: 0.1805
p value: 0.0


Download rate increases from 0.1612 to 0.1805 and the p value for rejecting the null hypothesis is 0. It's very convincing that the new website layout increased the download rate.

In [15]:
#Puchase rate
p_evaluation_metrics(data['Control Cookies'][:-8].sum(),
                              data['Experiment Cookies'][:-8].sum(),
                              (data['Control Licenses'].sum() + data['Experiment Licenses'].sum()) / (data['Control Cookies'][:-8].sum() + data['Experiment Cookies'][:-8].sum()),
                              data['Control Licenses'].sum()  / data['Control Cookies'][:-8].sum(),
                              data['Experiment Licenses'].sum() / data['Experiment Cookies'][:-8].sum())

Success rate for control group: 0.021; Success rate for experiment group: 0.0213
p value: 0.3979


Only the first 21 days of cookies account for all purchases. License purchasing rate, only shows a small increase from 0.0210 to 0.0213 and it's not significant enough for us to say that we have increased the purchase rate.

## 5. Conclusion

Despite the fact that statistical significance wasn't obtained for the number of licenses purchased, the new homepage appeared to have a strong effect on the number of downloads made. Based on our goals, this seems enough to suggest replacing the old homepage with the new homepage. Establishing whether there was a significant increase in the number of license purchases, either through the rate or the increase in the number of homepage visits, will need to wait for further experiments or data collection.

One inference we might like to make is that the new homepage attracted new users who would not normally try out the program, but that these new users didn't convert to purchases at the same rate as the existing user base. This is a nice story to tell, but we can't actually say that with the data as given. In order to make this inference, we would need more detailed information about individual visitors that isn't available. However, if the software did have the capability of reporting usage statistics, that might be a way of seeing if certain profiles are more likely to purchase a license. This might then open additional ideas for improving revenue.