__Montana Library Project__

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

# Preliminary Questions


- Would you include all suggested variants in the experiment (Connect, Learn, Help, Services)?
    * No, I would either include only "Help" as it was the most popular in our anecdotal sample, or "Help" and "Services"  
    _
- What is the “business value” that performing this experiment would add within the broader strategy of the University?
    * Right now, students might need help but are not aware that there is help available. Therefore, by making students aware through a more descriptive button, they will get the help they need, increasing their satisfaction and their learning experience. This is good for the university.  
    _
- Which main metric would you choose to measure the success of a variant and perform the experiment on?
    * I would use the click-through-rate.  
    _
- Which additional metrics would you choose to track?
    * Depends on the technical capabilities of the platform. Time spent on site might also be useful.  
    _
- How would you define the null and the alternative hypotheses?
    * H0: There is no difference in CTR between the variations.
    * HA: There is a difference in CTR between the variations.  
    _
- What threshold for statistical significance would you set?
    * I would go with the standard 0.05, as there are no particular risks for either type of error in this case. The classic value of 0.05 balances the probability of both errors nicely.  
    _
- What is the minimum detectable effect effect (the smallest improvement you would care about) that you expect to detect?
    * A 50% improvement in CTR for the interact button (so from 2% to 3%)  
    _
- Do you think this experiment would require a software engineering team to develop a custom platform, or could it be developed with external tools such as Google Optimize?
    * No idea since I have no clue about Google Optimize or other software solutions

# Import Data

In [2]:
# Import .csv files
v1_interact = pd.read_csv(R"data\csv\v1_interact.csv")
v2_connect = pd.read_csv(R"data\csv\v2_connect.csv")
v3_learn = pd.read_csv(R"data\csv\v3_learn.csv")
v4_help = pd.read_csv(R"data\csv\v4_help.csv")
v5_services = pd.read_csv(R"data\csv\v5_services.csv")

# Append files into one dataframe
data= []
data.append(v1_interact)
data.append(v2_connect)
data.append(v3_learn)
data.append(v4_help)
data.append(v5_services)

# Get important metrics

In [3]:
# Define total views of page for each variation (taken from heatmaps in data folder)
# Order from v1 to v5
total_visits = [10283, 2742, 2747, 3180, 2064]

# Get clicks and no_clicks  
click = []
no_click = []
ctr_values =[]

v_name = ['INTERACT', 'CONNECT', 'LEARN', 'HELP', 'SERVICES']
v_nr=0

for variation in data:
    click.append(variation.loc[lambda df_ : df_['Name']== v_name[v_nr], 'No. clicks'].sum())
    no_click.append(total_visits[v_nr] - click[v_nr])
    ctr_values.append(round(click[v_nr] / total_visits[v_nr], 3))
    v_nr+=1


observed = pd.DataFrame([click, no_click, ctr_values],
                           columns = v_name,
                           index = ["click", "no_click", "ctr_value"])

# Create contingency table
cont_table = observed.drop('ctr_value', axis = 0)

print(observed)

            INTERACT   CONNECT     LEARN      HELP  SERVICES
click         42.000    53.000    21.000    38.000    45.000
no_click   10241.000  2689.000  2726.000  3142.000  2019.000
ctr_value      0.004     0.019     0.008     0.012     0.022


# Data Exploration Questions

- What was the click-through rate for each version?
    * See "observed" df
- Which version was the winner?
    * V5 ('services') has the highest CTR (0.022) and can therefore be considered "the winner"
- Do the results seem conclusive?
    * Not at all because we didn't perform any statistical tests, so we can't evaluate how likely it is that we got these results due to chance. 

# Perform Chi-Square test

In [4]:
# Perform test itself
chisq, pvalue, df, expected = stats.chi2_contingency(cont_table)

# Arrange results in df for easy access
output = {'chi_square': chisq, 'p_value': pvalue, 'df': df}
chi2_res_full = pd.DataFrame(data = output, index = ['value'])
chi2_res_full

Unnamed: 0,chi_square,p_value,df
value,96.743235,4.852334e-20,4


In [5]:
# define function for Chi-Square Test, returns df with results
def chi_square(contingency_table):
    chisq, pvalue, df, expected = stats.chi2_contingency(contingency_table)
    output = {'chi_square': chisq, 'p_value': pvalue, 'df': df}
    return pd.DataFrame(data = output, index = ['value'])

# define temporary variables for the while loop
tmp_chi2_res = chi2_res_full.copy()
tmp_observed = observed.copy()
tmp_pvalue = chi2_res_full.at['value', 'p_value']

# when the p_value of the chi-square test is below our set alpha, the variation with the worst CTR is removed and the test is performed again 
# stops when the p_value is higher than alpha, or if only one variation is left
alpha = 0.1

while tmp_pvalue < alpha :
    worst_variant = tmp_observed.loc['ctr_value',].idxmin()
    tmp_observed = tmp_observed.drop(worst_variant, axis= 1)
    tmp_chi2_res = chi_square(tmp_observed.loc[['click', 'no_click']])
    tmp_pvalue = tmp_chi2_res.at['value', 'p_value']
    if len(tmp_observed.columns) == 1:
        break

# save final versions in new variables for clarity
best_variants = tmp_observed.copy()
final_chi2_res = tmp_chi2_res.copy()

# Interpretation and Evaluation

- The last remaining variants are "Connect" and "Services"
- Considering the extremely low dropoff - und homepage-return rates of "Services", I conclude that it is the best variation and should be implemented with fairly high certainty of an improvement. 

In [6]:
print(best_variants)

            CONNECT  SERVICES
click        53.000    45.000
no_click   2689.000  2019.000
ctr_value     0.019     0.022
