## A/B Testing to Determine Effect of Free Trial Screener on Conversions

### Experiment Description

At the time of this experiment, Udacity courses currently have two options on the course overview page: "start free trial", and "access course materials". <br>

If the student clicks "start free trial", they will be asked to enter their credit card information, and then they will be enrolled in a free trial for the paid version of the course. After 14 days, they will automatically be charged unless they cancel first.

If the student clicks "access course materials", they will be able to view the videos and take the quizzes for free, but they will not receive coaching support or a verified certificate, and they will not submit their final project for feedback.

In the experiment, Udacity tested a change where if the student clicked "start free trial", they were asked how much time they had available to devote to the course. If the student indicated 5 or more hours per week, they would be taken through the checkout process as usual. If they indicated fewer than 5 hours per week, a message would appear indicating that Udacity courses usually require a greater time commitment for successful completion, and suggesting that the student might like to access the course materials for free. At this point, the student would have the option to continue enrolling in the free trial, or access the course materials for free instead.

### Hypothesis

We hypothesized that free trial screener set clearer expectations for students upfront and thus reduce the number of frustrated students who left the free trial because they didn't have enough time— without significantly reducing the number of students to continue pass the free trial and eventually complete the course.

### A/B Test Design

#### Unit of diversion

The ***unit of diversion*** is a cookie, although if the student enrolls in the free trial, they are tracked by user-id from that point forward. 

The same user-id cannot enroll in the free trial twice. For users that do not enroll, their user-id is not tracked in the experiment, even if they were signed in when they visited the course overview page.


### Metrics
<a class="anchor" id="metric"></a>
Here two types of metrics are selected for a successful experiment: Invariate and Evaluation metrics.

***Invariate metircs*** are used for sanity checks or A/A experiment before running the experiment, such as checking if the distributions are the same between control and experiment group, to make sure our experiment is not inherently wrong. Invariant metrics usually have a larger unit of diversion, randomly selected, or happens before the experiment starts.  

***Evaluation metrics*** are the metrics in which we expect to see a change, and are relevant to the business goals we aim to achieve. For each metric we state a $Dmin$ - which marks the minimum change which is practically significant to the business. **For instance, stating that any increase in net conversion rate that is under 1%, even if statistically significant, is not practical to the business.**

### Invariant Metric

Invariate Metrics - Sanity Checks <a class="anchor" id="invariate"></a>

| Metric Name  | Metric Formula  | $Dmin$  | Notation |
|:-:|:-:|:-:|:-:|
| Number of Cookies in Course Overview Page  | # unique daily cookies on page | 3000 cookies  | $C_k$ |
| Number of Clicks on Free Trial Button  | # unique daily cookies who clicked  | 240 clicks | $C_l$ |


In this case, the goal of measurement is how many students will allocate more than 5 hours a week for Udacity courses, which happens before students enrolling in the courses. And thus the clicks and cookies related metircs are for the invariant metircs.User-id, however, will be tracked after enrolling the course,which is not effective.

Here, we are concerned about whether students click the "free trial" and if students' clicks for the answer to questions about the number of hours they will devote to the course, and thus clicks and cookies are important. Number of cookies is used to tell whether the change is from the questions or not.

### Evaluation Metrics
Evaluation Metrics - Performance Indicators <a class="anchor" id="evaluation"></a>

| Metric Name  | Metric Formula  | $Dmin$  | Notation |
|:-:|:-:|:-:|:-:|
| Gross Conversion   |  $\frac{enrolled}{C_l}$  | 0.01  | $Conversion_{Gross}$ |
| Net Conversion  |  $\frac{paid}{C_l}$  | 0.0075 | $Conversion_{Net}$ |


Gross Conversion will be a good metric. Gross conversion means the number of enrolled divided by number of clicks. And thus in the experiment group, we hypothesized the number of enrollment will decrease after answering the screener questions, given those who selected <5 hour will not be encouraged to enroll. <br>


Net conversion is number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by the number of unique cookies to click the ”Start free trial” button. This metric is also necessary because of the same reason described in Retention. If the hypothesis hold true, the number of students who remain enrolled past 14 days won’t change so much. Therefore, this metric also doesn’t change.

In [111]:
import pandas as pd
import numpy as np
import math
import matplotlib as plt
%matplotlib inline

### Baseline Data

Udacity offers the following rough estimates for these metrics (presumably collected from aggregates on daily traffic)

In [112]:
df_baseline=pd.read_excel(r"C:\Users\vikraror\Google Drive\Analytics\Github & Other projects\AB_testing_final_project_udacity-master/final-project-baseline-values.xlsx", 
                 header=None, names=['metrics','value'])
df_baseline

Unnamed: 0,metrics,value
0,Unique cookies to view page per day:,40000.0
1,"Unique cookies to click ""Start free trial"" per...",3200.0
2,Enrollments per day:,660.0
3,"Click-through-probability on ""Start free trial"":",0.08
4,"Probability of enrolling, given click:",0.20625
5,"Probability of payment, given enroll:",0.53
6,"Probability of payment, given click",0.109313


Probability of enrolling, given click:	0.206---P(of enrolling once free trial button is clicked)

Probability of payment, given enroll:	0.53 ----P(payment given already enrolled)

**So, out of 100 who enroll 53 make payment**

## Calculate standard deviation for evaluation metrics

**The more variant a metric is, the harder it is to reach a significant result.**

Assuming a sample size of 5,000 cookies visiting the course overview page per day (as given in project's instructions) - we want to estimate a standard deviation, for the evaluation metrics only.

The sample size we are considering should be smaller than the "population" we collected and small enough to have two groups with that size. Given the sample size of 5000 pageviews for each metric, we will rescale the baseline values.

In [113]:
#sample size of 5,000 cookies visiting the course overview page per day
page_views=5000

## standard deviation= sqrt(p(1-p)/n)

prob_enrol = df_baseline.loc[4,'value']
sd_gc = np.sqrt((prob_enrol*(1-prob_enrol))/((page_views*df_baseline.iloc[1,1]/df_baseline.iloc[0,1])))

#0.206*(1-0.206)/(5000*(3200/40000))>-1st part is prob_enrol(1-prob_enrol), 2nd is (baseline page views)*click through rate

sd_gc

0.020230604137049392

In [114]:
# standard deviation of net conversion

prob_payment=df_baseline.loc[6,'value']

sd_nc= np.sqrt((prob_payment*(1-prob_payment))/(page_views*df_baseline.iloc[1,1]/df_baseline.iloc[0,1]))
sd_nc

0.01560154458248846

For those who engage/click with the website's free trail button:

- Mean of enrollment/Gross conversion is 0.206 and SD is 0.02

- Mean of payment/net conversion is 0.109 and SD is 0.0156

# Sample Size Determination

Baseline conversion. The baseline conversion rate defines the current conversion rate of the page you want to test. It is expressed as percentage and is calculated as the number of successful actions taken on that page, divided by the number of visitors who viewed the page.

In [115]:
alpha=0.05 #(acceptable level of type 1 error- accepting H0 when its false)

beta = 0.20 # (acceptable lebvel of type 2 error- failing to reject H0 when its false)

min_effect_size_gc= 0.01

min_effect_size_nc= 0.0075

sample size calculated using :- https://www.evanmiller.org/ab-testing/sample-size.html

In [116]:
sample_size=pd.DataFrame({'SampleSize':[25835,27413]}, index = ['Gross Conversion','Net Conversion'])
sample_size

#below is sample_size of clicks on free trail button

Unnamed: 0,SampleSize
Gross Conversion,25835
Net Conversion,27413


In [117]:
#a sample would mean a person who visited course/free trail page and clicked on free trail button

ctr_free_trail=3200/40000 #(unique cookies clicking/unique cookies visiting)

page_views_needed=27413/ctr_free_trail
print("Sample size of page views needed for control & experiment groups is {} per group".format(round(page_views_needed)))

Sample size of page views needed for control & experiment groups is 342662 per group


# Sanity Checks to check whether control & exp groups are similar

In [119]:
df_control=pd.read_excel("Sanity_Check.xlsx",sheet_name="Control")
df_exp=pd.read_excel("Sanity_Check.xlsx",sheet_name="Experiment")

In [120]:
df_control.head()

Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments
0,"Sat, Oct 11",7723,687,134.0,70.0
1,"Sun, Oct 12",9102,779,147.0,70.0
2,"Mon, Oct 13",10511,909,167.0,95.0
3,"Tue, Oct 14",9871,836,156.0,105.0
4,"Wed, Oct 15",10014,837,163.0,64.0


In [121]:
control=[]
exp=[]
for col in df_control.iloc[:,1:]:
    control.append(df_control[col].sum())
    exp.append(df_exp[col].sum())

In [122]:
control=pd.Series(control,index=list([df_control.iloc[:,1:].columns]))
exp=pd.Series(exp,index=list([df_control.iloc[:,1:]]))

In [123]:
df_results=pd.concat([control,exp],axis=1)
df_results.columns= ['Control','Experiment']
df_results

Unnamed: 0,Control,Experiment
Pageviews,345543.0,344660.0
Clicks,28378.0,28325.0
Enrollments,3785.0,3423.0
Payments,2033.0,1945.0


In [124]:
#dropping non-inavriant metrics enrollements & payments
df_results.drop(index=['Enrollments',"Payments"],inplace=True)
df_results

  new_axis = axis.drop(labels, errors=errors)


Unnamed: 0,Control,Experiment
Pageviews,345543.0,344660.0
Clicks,28378.0,28325.0


We want to count the total amount of cookie pageviews & clicks in each group and see if there is a significant difference in the amount of cookies. A significant difference will imply a biased experiment that we should not rely on it's results.

**Testing at 95% confidence level**

In [125]:
df_results['Total']=df_results.Control + df_results.Experiment
df_results['Prob'] = 0.5
df_results['StdErr'] = np.sqrt((df_results.Prob * (1- df_results.Prob))/df_results.Total)
df_results['MargErr']=1.96*df_results['StdErr']
df_results["CI_lower"] = df_results.Prob - df_results.MargErr
df_results["CI_upper"] = df_results.Prob + df_results.MargErr
df_results["Obs_val"] = df_results.Experiment/df_results.Total
df_results["Pass_Sanity"]=df_results.apply(lambda x: (x.Obs_val>x.CI_lower) and (x.Obs_val<x.CI_upper),axis=1)
df_results

Unnamed: 0,Control,Experiment,Total,Prob,StdErr,MargErr,CI_lower,CI_upper,Obs_val,Pass_Sanity
Pageviews,345543.0,344660.0,690203.0,0.5,0.000602,0.00118,0.49882,0.50118,0.49936,True
Clicks,28378.0,28325.0,56703.0,0.5,0.0021,0.004116,0.495884,0.504116,0.499533,True


No statistically significant difference between both the invariant metrics, so we are good to go

# Effect Size Tests For Evaluation Metrics: Gross & Net Conversions

In [128]:
df_control.dropna(inplace=True)
df_exp.dropna(inplace=True)

### Effect size test for Gross Conversion

In [130]:
gcr_control = sum(df_control['Enrollments'])/sum(df_control['Clicks'])
gcr_exp = sum(df_exp['Enrollments'])/sum(df_exp['Clicks'])

diff_gcr = gcr_exp-gcr_control

print("Gross conversion rates Control: {}    & Exp: {}".format(round(gcr_control,3),round(gcr_exp,3)))
print("Difference of Gross Conversion between Control Group and Experiment Group is {}".format(round(diff_gcr,3)))

Gross conversion rates Control: 0.219    & Exp: 0.198
Difference of Gross Conversion between Control Group and Experiment Group is -0.021


**Gross conversions for exp group is less than control group by 2%, is the decline statistically significant needs to be checked**

Side notes:
    
**Comparing Two Independent Proportions**
https://online.stat.psu.edu/stat800/node/53/

Recall from the previous page that when comparing two proportions – For proportions there consideration to using "pooled" or "unpooled" is based on the hypothesis: if testing "no difference" between the two proportions then we will pool the variance, however, if testing for a specific difference (e.g. the difference between two proportions is 0.1, 0.02, etc --- i.e. the value in Ho is a number other than 0) then unpooled will be used.  In this example with Ho being "no difference" (i.e. 0 is the null value) we will use the pooled estimate method.

Standard error. Compute the standard error (SE) of the sampling distribution difference between two proportions.
SE = sqrt{ p * ( 1 - p ) * [ (1/n1) + (1/n2) ] }

p = (p1 * n1 + p2 * n2) / (n1 + n2)
where p is the pooled sample proportion, n1 is the size of sample 1, and n2 is the size of sample 2.




https://stattrek.com/hypothesis-test/difference-in-proportions.aspx


**Pooled Standard Error of gross conversions**

In [18]:
pooled_prob_gc= (sum(df_control['Enrollments'])+sum(df_exp['Enrollments']))/(sum(df_control['Clicks'])+sum(df_exp['Clicks']))

pooled_sd_gc=math.sqrt(pooled_prob_gc*(1-pooled_prob_gc)*(1/sum(df_control['Clicks'])+1/sum(df_exp['Clicks'])))


print("Pooled standard error of gross conversions is {}".format(round(pooled_sd_gc,4)))

margin_err_gc=pooled_sd_gc*1.96 #(1.96 is critical value)

lower_bound_gcr= diff_gcr - margin_err_gc
upper_bound_gcr= diff_gcr + margin_err_gc
print("95% confidence interval for proportion difference is ({},{})".format(round(lower_bound_gcr,4),round(upper_bound_gcr,4)))

Pooled standard error of gross conversions is 0.0044
95% confidence interval for proportion difference is (-0.0291,-0.012)


Min Effect size= -1% 


The effect size of -0.01 is not within lower & upper bound, i.e. the difference in gcr even after adjusting for margin of error is beyond effect size it means the difference is statistically significant 

So neither 0 (no difference) nor -0.01(practical significance) fall within the 95% confidence interval, so difference of Gross connversion between Control group and Experiment group is statistically significant and also practically significant

Thus we conclude that gross converions have fallen

**Checking via Z-score**

z = (p1 - p2) / SE

In [19]:
z_score_gcr=diff_gcr/pooled_sd_gc
z_score_gcr

-4.701830023753982

For this one-tailed test, corresponding to z_score of -4.7 p-value is <.0001 so H0 is rejected and there is fall in gross conversion rate

### Effect size test for net conversions

In [131]:
ncr_control=sum(df_control['Payments'])/sum(df_control['Clicks'])
ncr_exp= sum(df_exp['Payments'])/sum(df_exp['Clicks'])
diff_ncr=ncr_exp-ncr_control


print("Net conversion rate Control: {}    & Exp: {}".format(round(ncr_control,3),round(ncr_exp,3)))
print("Difference of Net Conversion between Control Group and Experiment Group is {}".format(round(diff_ncr,3)))

Net conversion rate Control: 0.118    & Exp: 0.113
Difference of Net Conversion between Control Group and Experiment Group is -0.005


**Net conversion of exp is less than control by 0.5%**

**Pooled Standard Error of net conversions**

In [20]:
pooled_prob_nc= (sum(df_control['Payments']) + sum(df_exp['Payments']))/(sum(df_control['Clicks'])+sum(df_exp['Clicks']))

pooled_sd_nc= math.sqrt(pooled_prob_nc*(1-pooled_prob_nc)*(1/sum(df_control['Clicks'])+1/sum(df_exp['Clicks'])))
print("Pooled standard error of net conversions is {}".format(round(pooled_sd_nc,4)))

margin_err_nc=1.96*pooled_sd_nc
lower_bound_ncr=diff_ncr-margin_err_nc
upper_bound_ncr=diff_ncr+margin_err_nc
print("95% confidence interval for proportion difference is ({},{})".format(round(lower_bound_ncr,4),round(upper_bound_ncr,4)))

Pooled standard error of net conversions is 0.0034
95% confidence interval for proportion difference is (-0.0116,0.0019)


Min effect size = -0.0075

Since effect size is within the lower & upper bounds i.e. effect size is not beyond difference_ncr even after adjusting moe it means there is not a significant difference between conrol & exp net conversion rate


So both 0 (no difference) and 0.0075(practical significance) fall within the 95% confidence interval, so difference of Net conversion between Control group and Experiment group is not statiscally significant nor practically significant

In [21]:
z_score_ncr=diff_ncr/pooled_sd_nc
z_score_ncr

-1.4192001144365733

For this one-tailed test, corresponding to z_score of -1.42 p-value is .077804 so H0 is not rejected and there is no a fall in net conversion rate

### Recommendation Based on Test Results

I expect filtering students by setting minimum time expectations will not impact net conversions thus will not impact company's revenue and at the same time will reduce costs due to less tutor hours on non paying students.