# Module 1: Data Science Fundamentals

## Sprint 2: Statistical Tests and Experiments

## Let's analyse Udacity A/B test!

<div><img style="height: 350px;" src="https://upload.wikimedia.org/wikipedia/commons/3/3b/Udacity_logo.png" /></div>

## Background

For the last day of this sprint, we are analyzing once run, online Udacity A/B test results! The dataset is part of the Udacity Google's A/B testing course, which we have watched a couple of lessons from in Subproject 2 of this sprint.

Data is available at https://docs.google.com/spreadsheets/d/1Mu5u9GrybDdska-ljPXyBjTpdZIUev_6i7t4LRDfXM8/edit#gid=0

---------

## How to start?

The data consists of two sheets, each for different group of the test. Download the sheets as CSVs and union them to form 1 dataset.

## Concepts to explore

You will mostly focus on statistical hypothesis testing in this project on real-life data.

## Requirements

* Visualize 95% confidence interval between control and experiment groups on [click-through rate](https://en.wikipedia.org/wiki/Click-through_rate) metric. Explain what the confidence interval means, what's the relationship with the population of a group.
* Verify that the difference in each of the metrics between control and experiment groups is statistically significant using z-test, 95% confidence level.
* Verify that the difference in each of the metrics between control and experiment groups is statistically significant using t-test, 95% confidence level.
* Compare both test method results. Explain why they differ / do not differ that much and why.
* Choose 1 method (either z or t) and explore statistical significance of any metric under different confidence levels - 60%, 90%, 95%, 99%. If conclusions about significance differ under different confidence levels, explain why.
* Calculate p-values.

## Evaluation Criteria

- Correctness of used test methods.
- Soundness of explanations given.
- Adherence to the requirements.


## Sample correction questions

During a correction, you may get asked questions that test your understanding of covered topics.

- Why collect data from sample rather than data from a population?
- What is the Central Limit Theorem and why is it important?
- Explain confidence intervals and significance in statistics
- Explain what p-value is

In [1]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
from scipy.stats import binom
from scipy.stats import norm
import math
from statsmodels.stats.proportion import proportions_ztest
from scipy.stats import ttest_ind

  import pandas.util.testing as tm


Importing the data

In [2]:
gsheetkey = '1Mu5u9GrybDdska-ljPXyBjTpdZIUev_6i7t4LRDfXM8'
data_url = f'https://docs.google.com/spreadsheet/ccc?key={gsheetkey}&output=xlsx'
control_df = pd.read_excel(data_url, sheet_name='Control')
control_df.head()

Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments
0,"Sat, Oct 11",7723,687,134.0,70.0
1,"Sun, Oct 12",9102,779,147.0,70.0
2,"Mon, Oct 13",10511,909,167.0,95.0
3,"Tue, Oct 14",9871,836,156.0,105.0
4,"Wed, Oct 15",10014,837,163.0,64.0


In [3]:
control_df['group'] = 'Control'
control_df.head()

Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments,group
0,"Sat, Oct 11",7723,687,134.0,70.0,Control
1,"Sun, Oct 12",9102,779,147.0,70.0,Control
2,"Mon, Oct 13",10511,909,167.0,95.0,Control
3,"Tue, Oct 14",9871,836,156.0,105.0,Control
4,"Wed, Oct 15",10014,837,163.0,64.0,Control


In [4]:
experiment_df = pd.read_excel(data_url, sheet_name='Experiment')
experiment_df.head()

Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments
0,"Sat, Oct 11",7716,686,105.0,34.0
1,"Sun, Oct 12",9288,785,116.0,91.0
2,"Mon, Oct 13",10480,884,145.0,79.0
3,"Tue, Oct 14",9867,827,138.0,92.0
4,"Wed, Oct 15",9793,832,140.0,94.0


In [5]:
experiment_df['group'] = 'Experiment'
experiment_df.head()

Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments,group
0,"Sat, Oct 11",7716,686,105.0,34.0,Experiment
1,"Sun, Oct 12",9288,785,116.0,91.0,Experiment
2,"Mon, Oct 13",10480,884,145.0,79.0,Experiment
3,"Tue, Oct 14",9867,827,138.0,92.0,Experiment
4,"Wed, Oct 15",9793,832,140.0,94.0,Experiment


Concatenated the two datasets

In [6]:
total_data = pd.concat([control_df, experiment_df])
total_data

Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments,group
0,"Sat, Oct 11",7723,687,134.0,70.0,Control
1,"Sun, Oct 12",9102,779,147.0,70.0,Control
2,"Mon, Oct 13",10511,909,167.0,95.0,Control
3,"Tue, Oct 14",9871,836,156.0,105.0,Control
4,"Wed, Oct 15",10014,837,163.0,64.0,Control
...,...,...,...,...,...,...
32,"Wed, Nov 12",10042,802,,,Experiment
33,"Thu, Nov 13",9721,829,,,Experiment
34,"Fri, Nov 14",9304,770,,,Experiment
35,"Sat, Nov 15",8668,724,,,Experiment


In [7]:
total_grouped = total_data.groupby('group', as_index=False).sum()
total_grouped

Unnamed: 0,group,Pageviews,Clicks,Enrollments,Payments
0,Control,345543,28378,3785.0,2033.0
1,Experiment,344660,28325,3423.0,1945.0


Calculated the CTR

In [8]:
total_grouped['ctr'] = total_grouped['Clicks']/total_grouped['Pageviews']
total_grouped

Unnamed: 0,group,Pageviews,Clicks,Enrollments,Payments,ctr
0,Control,345543,28378,3785.0,2033.0,0.082126
1,Experiment,344660,28325,3423.0,1945.0,0.082182


# Sanity Check

In [10]:
def get_z_score(confidence = 95):
  sig = 1 - ((100 - confidence)/200)
  (1-(1-0.95)/2)
  z_score = norm.ppf(sig)
  return round(z_score, 4)

In [11]:
z_score = get_z_score(confidence= 95)
print(z_score)

1.96


In [12]:
m_e = z_score * math.sqrt((0.5*0.5)/(total_grouped['Pageviews'][0] + total_grouped['Pageviews'][1]))
m_e

0.0011796078509768765

In [13]:
confidence_range = [0.5 - m_e , 0.5 + m_e]
confidence_range

[0.49882039214902313, 0.5011796078509769]

# Calculating the Confidence Interval



In [None]:
total_grouped['margin_error'] = total_grouped[['ctr', 'Pageviews']].apply(lambda x: z_score * math.sqrt((x[0] * (1 - x[0]))/ x[1]), axis = 1)
total_grouped

Unnamed: 0,group,Pageviews,Clicks,Enrollments,Payments,ctr,margin_error
0,Control,345543,28378,3785.0,2033.0,0.082126,0.000915
1,Experiment,344660,28325,3423.0,1945.0,0.082182,0.000917


In [None]:
total_grouped['min_interval_ctr'] = total_grouped[['ctr','margin_error']].apply(lambda x: x[0] - x[1], axis = 1)
total_grouped

Unnamed: 0,group,Pageviews,Clicks,Enrollments,Payments,ctr,margin_error,min_interval_ctr
0,Control,345543,28378,3785.0,2033.0,0.082126,0.000915,0.08121
1,Experiment,344660,28325,3423.0,1945.0,0.082182,0.000917,0.081266


In [None]:
total_grouped['max_interval_ctr'] = total_grouped[['ctr','margin_error']].apply(lambda x: x[0] + x[1], axis = 1)
total_grouped

Unnamed: 0,group,Pageviews,Clicks,Enrollments,Payments,ctr,margin_error,min_interval_ctr,max_interval_ctr
0,Control,345543,28378,3785.0,2033.0,0.082126,0.000915,0.08121,0.083041
1,Experiment,344660,28325,3423.0,1945.0,0.082182,0.000917,0.081266,0.083099


The confidence interval signifies that the true click through rate falls between 0.0812 and 0.0830 for the control group with 95% confidence
The true click through rate falls between 0.0813 and 0.0831 for the test/experiment group 

In [None]:
lcb = [total_grouped.min_interval[0], total_grouped.min_interval[1]]
ucb = [total_grouped.max_interval[0], total_grouped.max_interval[1]]

In [None]:
x = [lcb, 0, ucb]
y = [0,0,0]
sns.set()
plt.figure(figsize=(8,4))
g=sns.lineplot(x,y);
plt.title('Confidence interval of Click through rate of the two groups')


g.set_xlim(lcb,ucb)

plt.show();

# 2 sample Z-test

In [None]:
def two_sample_z_test(c1, c2, n1, n2, x1 = 0, x2 = 0):
  p1 = c1/n1
  p2 = c2/n2
  p = (c1 + c2)/(n1 + n2)
  num = (p1 - p2) - (x1 - x2)
  denum = math.sqrt((p * (1 - p)) * (1/n1 + 1/n2))
  test_z_score = num / denum
  print('Test z score is {}'.format(test_z_score))
  p_value = 2 * (1 - norm.cdf(abs(test_z_score)))
  print('P-value is {}'.format( p_value))
  return test_z_score, p_value

In [None]:
c2 = total_grouped['Clicks'][0]
c1 = total_grouped['Clicks'][1]
n2 = total_grouped['Pageviews'][0]
n1 = total_grouped['Pageviews'][1]
c1, c2, n1, n2 

(28325, 28378, 344660, 345543)

In [None]:
z_test, z_p_val = two_sample_z_test(c1, c2, n1, n2)

Test z score is 0.08566094109242048
P-value is 0.9317359524473912


In [None]:
def sig_test(significance, p_value):
  if p_value > significance:
   print ("Fail to reject the null hypothesis!")
  else:
   print ("Reject the null hypothesis - this suggests the alternative hypothesis is true")

In [None]:
sig_test(0.05, z_p_val)

Fail to reject the null hypothesis!


In [None]:
pageviews = np.array([total_grouped['Pageviews'][0], total_grouped['Pageviews'][1]])
clicks = np.array([total_grouped['Clicks'][0], total_grouped['Clicks'][1]])

In [None]:
z_test_2, z_p_val_2 = proportions_ztest(count= clicks, nobs = pageviews, alternative='two-sided', prop_var=False)
z_test_2, z_p_val_2

(-0.08566094109242048, 0.9317359524473912)

At 95% confidence level, the result from the z-test show that we fail to reject the null hypothesis. This means that the test is not statistically significant.

# 2 sample T-test

In [None]:
t_test,t_test_pval = ttest_ind(control_df['Clicks'], experiment_df['Clicks'], equal_var=False)
print(f't_val is {t_test}', f'p_val is {t_test_pval}')

t_val is 0.09270642968639531 p_val is 0.9263950638615311


In [None]:
sig_test(0.05, t_test_pval)

Fail to reject the null hypothesis!


At 95% confidence level, the result from the z-test show that we fail to reject the null hypothesis. This means that the test is not statistically significant.

Although the p-value for the t-test is less than that for the z-test, when compared with the significance level, the results are the same.

# New metric - Gross conversion

From the udacity course, Gross conversion = payment/clicks.

Here - https://classroom.udacity.com/courses/ud257/lessons/4126079196/concepts/41906885800923

Using a Z-test

In [None]:
total_grouped

Unnamed: 0,group,Pageviews,Clicks,Enrollments,Payments,ctr,margin_error,min_interval_ctr,max_interval_ctr
0,Control,345543,28378,3785.0,2033.0,0.082126,0.000915,0.08121,0.083041
1,Experiment,344660,28325,3423.0,1945.0,0.082182,0.000917,0.081266,0.083099


In [None]:
total_grouped['Conv_rate'] = total_grouped['Payments']/total_grouped['Clicks']
total_grouped

Unnamed: 0,group,Pageviews,Clicks,Enrollments,Payments,ctr,margin_error,min_interval_ctr,max_interval_ctr,Conv_rate
0,Control,345543,28378,3785.0,2033.0,0.082126,0.000915,0.08121,0.083041,0.07164
1,Experiment,344660,28325,3423.0,1945.0,0.082182,0.000917,0.081266,0.083099,0.068667


In [None]:
c1 = total_grouped['Payments'][0]
c2 = total_grouped['Payments'][1]
n1 = total_grouped['Clicks'][0]
n2 = total_grouped['Clicks'][1]
c1, c2, n1, n2 

(2033.0, 1945.0, 28378, 28325)

In [None]:
z_test, z_p_val = two_sample_z_test(c1, c2, n1, n2)

Test z score is 1.3857862391515965
P-value is 0.16581218550913213


In [None]:
payments = np.array([total_grouped['Payments'][0], total_grouped['Payments'][1]])
clicks = np.array([total_grouped['Clicks'][0], total_grouped['Clicks'][1]])
z_test_2, z_p_val_2 = proportions_ztest(count= payments, nobs = clicks, alternative='two-sided', prop_var=False)
z_test_2, z_p_val_2

(1.3857862391515965, 0.16581218550913213)

In [None]:
def significance(confidence_level = 95):
  significance = (100 - confidence_level)/100
  return significance

In [None]:
sig = significance(confidence_level = 60)
sig

0.4

In [None]:
sig_test(sig, z_p_val)

Reject the null hypothesis - this suggests the alternative hypothesis is true


At 60% confidence, we can reject the null hypothesis and accept the alternative hypothesis. This means that the test is staistically significant.

In [None]:
sig = significance(confidence_level = 90)
sig_test(sig, z_p_val)

Fail to reject the null hypothesis!


At 90% confidence level and greater, the p-value is greater than the significance, so we fail to reject the null hypothesis. This means that there is no statistical significance. 

In [None]:
sig = significance(confidence_level = 95)
sig_test(sig, z_p_val)

Fail to reject the null hypothesis!


In [None]:
sig = significance(confidence_level = 99)
sig_test(sig, z_p_val)

Fail to reject the null hypothesis!
