**1. Context**
*   Company: Udacity (in collaboration with Google)
*   Goal:   Setting clearer expectations reduces dropouts without lowering  net conversion. Improve student experience and coaching effectiveness
*   Approach: A/B test to assess if time commitment screener filters out unqualified students without reducing paying users

**2. Project**

2.1 Status

  * Two choices on course page: Start Free Trial vs Access Course Materials
  * Free trial requires credit card and auto-bills after 14 days
  
2.2 Treatment

  Users clicking “Start Free Trial” are shown a time screener
  * <5 hrs/week → see a caution message, offered to view free materials instead
  * ≥5 hrs/week → proceed to enroll

**3. Experiment Setup**

3.1 Unit of Diversion: cookie
  *   A visitor’s session, identified by a cookie, is randomly shown the control or treatment version of the page.
  *   If the user enrolls in the free trial, they are then tracked by user ID, which ensures no one can enroll multiple times.

3.2 Hypotheses

  3.2.1 Hypothesis 1: Gross Conversion Rate (Enrollments / Clicks)**

  *   H₀ : The screener has no effect on number of users who enroll after clicks.
  *   H₁ : The screener reduces number of users who enroll since low-commitment users leave.

  3.2.2 Hypothesis 2: Retention Rate (Payments / Enrollments)**

  *   H₀ : The screener has no effect on number of users who stay past the trial.
  *   H₁ : The screener improves retention since better-qualified users enroll.

  3.2.3 Hypothesis 3: Net Conversion Rate (Payments / Clicks)**

  *   H₀: The screener has no effect on overall paying users.
  *   H₁: The screener does affect the number of users who end up paying.

**4. Metrics**

4.1 Invariant_metrics:

*   Cookies: Unique cookies on course overview page
*   Clicks: Clicks on Free Trial button
*   CTP: Click-through-probability (Clicks/Pageviews)

4.2 Evaluation Metrics:

*   Gross Conversion: Enrollments / Clicks
*   Retention: Payments / Enrollments
*   Net Conversion: Payments / Clicks

In [52]:
from google.colab import drive
drive.mount('/content/drive')

import numpy as np
import pandas as pd
from scipy import stats
from statsmodels.stats.proportion import proportions_ztest
from statsmodels.stats.proportion import binom_test
from statsmodels.stats.power import NormalIndPower
from statsmodels.stats.proportion import proportion_effectsize

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [53]:
baseline = {
    "Cookies": 40000,
    "Clicks": 3200,
    "Enrollments": 660,
    "CTP": 0.08,
    "Gross Conversion": 0.20625,
    "Retention": 0.53,
    "Net Conversion": 0.1093125
}

mde = {
    "Gross Conversion": 0.01,
    "Retention": 0.01,
    "Net Conversion": 0.0075
}

alpha = 0.05
power = 0.8

In [54]:
click_through_rate = baseline["CTP"]                  # Clicks / Cookies
gross_conversion_rate = baseline["Gross Conversion"]  # Enrollments / Clicks

# Define function to calculate user-level sample size
def get_user_sample_size(baseline_rate, mde):
    effect_size = proportion_effectsize(baseline_rate, baseline_rate + mde)
    analysis = NormalIndPower()
    sample_size = analysis.solve_power(effect_size=effect_size, power=power, alpha=alpha, alternative='two-sided')
    return round(sample_size)

# Convert user-level sample sizes to required cookie traffic
def scale_to_cookies(user_sample_size, conversion_chain):
    """
    Scale user-level sample size to required cookies using cumulative conversion rate.
    e.g. for Retention, divide by (CTP * GC)
    """
    return round(user_sample_size / conversion_chain)*2

# Sample size calculations
gc_user_sample = get_user_sample_size(baseline["Gross Conversion"], mde["Gross Conversion"])
re_user_sample = get_user_sample_size(baseline["Retention"], mde["Retention"])
nc_user_sample = get_user_sample_size(baseline["Net Conversion"], mde["Net Conversion"])

# Scale up to cookies
gc_cookie_sample = scale_to_cookies(gc_user_sample, click_through_rate)
re_cookie_sample = scale_to_cookies(re_user_sample, click_through_rate * gross_conversion_rate)
nc_cookie_sample = scale_to_cookies(nc_user_sample, click_through_rate)

# Output
print(f"Required cookies for Gross Conversion (for both groups): {gc_cookie_sample}")
print(f"Required cookies for Retention (for both groups): {re_cookie_sample}")
print(f"Required cookies for Net Conversion (for both groups): {nc_cookie_sample}")

Required cookies for Gross Conversion (for both groups): 653824
Required cookies for Retention (for both groups): 4733454
Required cookies for Net Conversion (for both groups): 699450


In [55]:
# Daily user volume
daily_traffic = {
    "Clicks": 3200,
    "Enrollments": 660
}

# Calculate test duration
gc_duration = round(gc_user_sample / (daily_traffic["Clicks"] / 2))  # divide by 2 → per group
re_duration = round(re_user_sample / (daily_traffic["Enrollments"] / 2))
nc_duration = round(nc_user_sample / (daily_traffic["Clicks"] / 2))

print(f"Estimated duration for Gross Conversion test: {gc_duration} days")
print(f"Estimated duration for Net Conversion test: {nc_duration} days")
print(f"Estimated duration for Retention test: {re_duration} days")

Estimated duration for Gross Conversion test: 16 days
Estimated duration for Net Conversion test: 17 days
Estimated duration for Retention test: 118 days


> Retention metric was excluded from this A/B test due to feasibility constraints.

In [56]:
control = pd.read_csv("/content/drive/MyDrive/portfolio/a b testing udacity/control_data.csv")
experiment = pd.read_csv("/content/drive/MyDrive/portfolio/a b testing udacity/experiment_data.csv")
control.head()

Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments
0,"Sat, Oct 11",7723,687,134.0,70.0
1,"Sun, Oct 12",9102,779,147.0,70.0
2,"Mon, Oct 13",10511,909,167.0,95.0
3,"Tue, Oct 14",9871,836,156.0,105.0
4,"Wed, Oct 15",10014,837,163.0,64.0


In [57]:
# Sum
clicks_control = control["Clicks"].sum()
clicks_experiment = experiment["Clicks"].sum()

enrollments_control = control["Enrollments"].sum()
enrollments_experiment = experiment["Enrollments"].sum()

payments_control = control["Payments"].sum()
payments_experiment = experiment["Payments"].sum()

pageviews_control = control["Pageviews"].sum()
pageviews_experiment = experiment["Pageviews"].sum()

In [58]:
# Sanity Check
# Split balance
clicks_ratio = clicks_experiment / (clicks_control + clicks_experiment)
enrollments_ratio = enrollments_experiment / (enrollments_control + enrollments_experiment)
payments_ratio = payments_experiment / (payments_control + payments_experiment)
pageviews_ratio = pageviews_experiment / (pageviews_control + pageviews_experiment)

print("Sanity Check: Split balance")
print(f"Clicks Split Ratio: {clicks_ratio:.2%}")
print(f"Enrollments Split Ratio: {enrollments_ratio:.2%}")
print(f"Payments Split Ratio: {payments_ratio:.2%}")
print(f"Pageviews Split Ratio: {pageviews_ratio:.2%}")

# Missing values
print("Sanity Check: Missing Values")
print("Control:")
print(control.isnull().sum())
print("\nExperiment:")
print(experiment.isnull().sum())

# Negative values (numeric columns only)
control_numeric = control.select_dtypes(include='number')
experiment_numeric = experiment.select_dtypes(include='number')

print("\n Sanity Check: Negative Values ")
print("Control:")
print((control_numeric < 0).sum())
print("\nExperiment:")
print((experiment_numeric < 0).sum())

Sanity Check: Split balance
Clicks Split Ratio: 49.95%
Enrollments Split Ratio: 47.49%
Payments Split Ratio: 48.89%
Pageviews Split Ratio: 49.94%
Sanity Check: Missing Values
Control:
Date            0
Pageviews       0
Clicks          0
Enrollments    14
Payments       14
dtype: int64

Experiment:
Date            0
Pageviews       0
Clicks          0
Enrollments    14
Payments       14
dtype: int64

 Sanity Check: Negative Values 
Control:
Pageviews      0
Clicks         0
Enrollments    0
Payments       0
dtype: int64

Experiment:
Pageviews      0
Clicks         0
Enrollments    0
Payments       0
dtype: int64


In [59]:
missing_days_control = control[control["Enrollments"].isnull()]["Date"]
missing_days_experiment = experiment[experiment["Enrollments"].isnull()]["Date"]
print(missing_days_control.equals(missing_days_experiment))

True


In [60]:
control_clean = control.dropna()
experiment_clean = experiment.dropna()

In [61]:
# Z-test
# Conversion = proportion = successes / trials --> binomial distribution
# --> large sample --> binomial ≈ normal (by CLT) --> Z-test to compare two proportions

# Z-test for Gross Conversion (Enrollments / Clicks)
gc_counts = [enrollments_experiment, enrollments_control] # Number of successes in each group
gc_nobs = [clicks_experiment, clicks_control] # Number of observations/trials in each group

z_gc, p_gc = proportions_ztest(count=gc_counts, nobs=gc_nobs, alternative='smaller')

# Z-test for Net Conversion (Payments / Clicks)
nc_counts = [payments_experiment, payments_control]
nc_nobs = [clicks_experiment, clicks_control]

z_nc, p_nc = proportions_ztest(count=nc_counts, nobs=nc_nobs, alternative='two-sided')

# Results
gc_rate_ctrl = enrollments_control / clicks_control
gc_rate_exp = enrollments_experiment / clicks_experiment
gc_lift = gc_rate_exp - gc_rate_ctrl

nc_rate_ctrl = payments_control / clicks_control
nc_rate_exp = payments_experiment / clicks_experiment
nc_lift = nc_rate_exp - nc_rate_ctrl

print("Gross Conversion")
print(f"Control Rate:    {gc_rate_ctrl:.4f}")
print(f"Experiment Rate: {gc_rate_exp:.4f}")
print(f"Lift:            {gc_lift:.4%}")
print(f"Z-statistic:     {z_gc:.3f}")
print(f"p-value:         {p_gc:.5f}")
print("Significant" if p_gc < 0.05 else "Not Significant")

print("\nNet Conversion")
print(f"Control Rate:    {nc_rate_ctrl:.4f}")
print(f"Experiment Rate: {nc_rate_exp:.4f}")
print(f"Lift:            {nc_lift:.4%}")
print(f"Z-statistic:     {z_nc:.3f}")
print(f"p-value:         {p_nc:.5f}")
print("Significant" if p_nc < 0.05 else "Not Significant")

Gross Conversion
Control Rate:    0.1334
Experiment Rate: 0.1208
Lift:            -1.2531%
Z-statistic:     -4.479
p-value:         0.00000
Significant

Net Conversion
Control Rate:    0.0716
Experiment Rate: 0.0687
Lift:            -0.2973%
Z-statistic:     -1.386
p-value:         0.16581
Not Significant


In [62]:
# 95% Confidence Intervals
def calc_ci(success_ctrl, n_ctrl, success_exp, n_exp, alpha=0.05):
    prop_ctrl = success_ctrl / n_ctrl
    prop_exp = success_exp / n_exp
    diff = prop_exp - prop_ctrl
    se = np.sqrt((prop_ctrl * (1 - prop_ctrl)) / n_ctrl + (prop_exp * (1 - prop_exp)) / n_exp)
    margin = 1.96 * se
    return diff, (diff - margin, diff + margin)

# Gross Conversion Confidence Interval
gc_diff, gc_ci = calc_ci(enrollments_control, clicks_control, enrollments_experiment, clicks_experiment)

# Net Conversion Confidence Interval
nc_diff, nc_ci = calc_ci(payments_control, clicks_control, payments_experiment, clicks_experiment)

# Print results
print("\n95% Confidence Intervals")
print(f"Gross Conversion Diff: {gc_diff:.4%}, CI: ({gc_ci[0]:.4%}, {gc_ci[1]:.4%})")
print(f"Net Conversion Diff:   {nc_diff:.4%}, CI: ({nc_ci[0]:.4%}, {nc_ci[1]:.4%})")


95% Confidence Intervals
Gross Conversion Diff: -1.2531%, CI: (-1.8013%, -0.7048%)
Net Conversion Diff:   -0.2973%, CI: (-0.7177%, 0.1232%)


## Business Insight & Conclusion

### Gross Conversion (Enrollments / Clicks)
- Control: 13.34%, Experiment: 12.08%
- Lift: -1.25%, p-value: < 0.001
- Statistically significant drop
- Interpretation: The screener discourages low-commitment users from enrolling — expected behavior.

### Net Conversion (Payments / Clicks)
- Control: 7.16%, Experiment: 6.87%
- Lift: -0.30%, p-value: 0.166
- Not statistically significant
- Interpretation: Overall paying user volume is unaffected — the screener filters users, but retained users are more likely to convert.

## Recommendation

**Launch the screener**

- It filters out less committed users
- No significant loss in paying users
- Likely improves learner experience and support efficiency

## Suggested Next Steps

1. Track long-term user outcomes (e.g. course completion, refund rate).
2. Test different screener versions (wording, threshold).
3. Run segment analysis by region, device, or traffic source.
4. Measure impact on customer support load and NPS.

**5.Conclusion**

5.1 Gross Conversion (Enrollments / Clicks)
* Control: 13.34%
* Experiment: 12.08%
* Observed Lift: –1.25%
* 95% Confidence Interval: (–1.80%, –0.70%)
* Statistical Significance: Yes (p < 0.001)
* Interpretation:
The screener significantly reduces gross conversion by filtering out less-commited users. Since the entire confidence interval lies below zero, the negative impact is statistically and practically meaningful.

5.2 Net Conversion (Payments / Clicks)

* Control: 7.16%
* Experiment: 6.87%
* Observed Lift: –0.30%
* 95% Confidence Interval: (–0.72%, +0.12%)
* Statistical Significance: No (p = 0.166)
* Interpretation: There is no statistically significant evidence that the screener impacts the number of paying users. The confidence interval includes both slight loss and slight gain, so the true effect may be neutral or very small in either direction.

**6. Recommendation**

Proceed with deploying the screener.

It reduces unqualified user enrollment without reducing the number of paying customers. This improves overall lead quality, reduces coaching waste, and aligns with the intended business objective: clearer user expectations and stronger retention.

**7. Suggested Next Steps**

*   Monitor downstream metrics: course completion rate, refund rate, and customer satisfaction.
*   Test alternative screener designs: try adjusting the time threshold or improving the message tone.
*   Perform segmented analysis: evaluate whether the screener performs differently by device type, region, or acquisition channel.
*   Track long-term user value: assess if retained users under the screener model contribute more over time.