# E-commerce Landing Page A/B Test Analysis
## Combining Frequentist, Bayesian, and Power Analysis for Decision Making

**Business question:** Does a new landing page design increase conversion rates compared to the current design?

**Analytical approach:**
1. Frequentist Z-test with confidence intervals
2. Power analysis: what effect sizes could this test detect?
3. Bayesian inference: what is the probability each variant is better?
4. Segmented analysis with multiple testing correction
5. Business impact quantification

**Verdict:** The new page shows no statistically or practically significant improvement. Control conversion: 12.04%, Treatment: 11.88% (p=0.19). Recommendation: **keep current design**.

In [None]:
import sys, os
sys.path.insert(0, os.path.join(os.getcwd(), '..'))

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

from src.data_loader import load_ab_data, load_countries, clean_ab_data, get_cleaning_summary
from src.database import create_database, run_query, QUERIES
from src.statistics import (
    z_test, cohens_h, interpret_effect_size,
    power_analysis, bayesian_ab_test,
    segment_analysis,
)
from src.visualizations import (
    plot_conversion_rates, plot_power_curve,
    plot_bayesian_posteriors, plot_segment_results,
    plot_daily_conversion_trend,
)

sns.set_palette('husl')
%matplotlib inline
np.random.seed(42)

## 1. Data Loading & Cleaning

In [None]:
raw = load_ab_data()
countries = load_countries()
df = clean_ab_data(raw, countries)

summary = get_cleaning_summary(len(raw), len(df))
for k, v in summary.items():
    print(f'{k:20s}: {v:,}' if isinstance(v, int) else f'{k:20s}: {v}%')

print(f'\nDate range: {df.timestamp.min().date()} to {df.timestamp.max().date()}')
print(f'Countries: {df.country.value_counts().to_dict()}')

## 2. SQL-Based Exploration

Loading data into SQLite to demonstrate relational analytics â€” daily funnels, running totals, and time-based patterns.

In [None]:
db_path = create_database(df)
print(f'Database: {db_path}')

In [None]:
# Daily conversion funnel (GROUP BY + DATE function)
daily = run_query(QUERIES['daily_conversion_funnel'])
display(daily.head(10))

In [None]:
# Country performance with traffic share (window function)
display(run_query(QUERIES['country_performance']))

In [None]:
# Cumulative conversions with running totals (cumulative window)
cumulative = run_query(QUERIES['cumulative_conversions'])
display(cumulative.tail(10))

In [None]:
# Day-of-week analysis (STRFTIME + CASE)
display(run_query(QUERIES['day_of_week_analysis']))

In [None]:
plot_daily_conversion_trend(daily)

## 3. Exploratory Analysis

In [None]:
conv_by_group = df.groupby('group').agg(
    visitors=('user_id', 'count'),
    conversions=('converted', 'sum'),
    conversion_rate=('converted', 'mean'),
).round(4)
display(conv_by_group)

In [None]:
plot_conversion_rates(df)

## 4. Frequentist Hypothesis Test (Z-Test)

**H0:** Treatment conversion rate = Control conversion rate  
**H1:** Treatment conversion rate != Control conversion rate  
**alpha:** 0.05

In [None]:
results = z_test(df)
print('Z-Test Results:')
print('-' * 50)
for k, v in results.items():
    if 'conv' in k or 'diff' in k or 'ci' in k:
        print(f'{k:25s}: {v:.4%}' if isinstance(v, float) and abs(v) < 1 else f'{k:25s}: {v}')
    else:
        print(f'{k:25s}: {v}')

h = cohens_h(results['treatment_conv'], results['control_conv'])
print(f"\nCohen's h: {h} ({interpret_effect_size(h)})")
print(f'\nConclusion: {"Reject H0 - significant difference" if results["p_value"] < 0.05 else "Fail to reject H0 - no significant difference"}')

## 5. Power Analysis

A critical question: **was this test large enough to detect a meaningful difference?** Power analysis answers this by computing the minimum detectable effect (MDE) given the sample size, and the sample size required for a target MDE.

In [None]:
baseline = results['control_conv']
n_per_group = len(df[df['group'] == 'control'])

# What MDE can this test detect?
pa_actual = power_analysis(baseline_rate=baseline, n_per_group=n_per_group)
print(f'With {n_per_group:,} users per group and baseline rate {baseline:.4%}:')
print(f'  Minimum detectable effect: {pa_actual["detectable_mde"]:.4%} absolute')
print(f'  Relative MDE: {pa_actual["detectable_mde_pct"]:.2f}%')
print(f'  Observed difference: {results["absolute_diff"]:.4%}')
print(f'  -> Test was {"adequately powered" if abs(results["absolute_diff"]) > pa_actual["detectable_mde"] else "underpowered"} for the observed effect')

In [None]:
# How many users would we need to detect a 1% absolute improvement?
for mde_pct in [0.5, 1.0, 2.0]:
    pa = power_analysis(baseline_rate=baseline, mde=mde_pct/100)
    print(f'To detect {mde_pct}% absolute lift: {pa["required_n_per_group"]:>10,} per group ({pa["required_n_total"]:>10,} total)')

In [None]:
plot_power_curve(baseline, n_per_group)

## 6. Bayesian A/B Test

Bayesian analysis provides direct probabilities: "What is the probability that the treatment is better?" This is often more intuitive for stakeholders than p-values.

We use a Beta-Binomial conjugate model with uninformative Beta(1,1) priors.

In [None]:
n_ctrl = len(df[df['group'] == 'control'])
conv_ctrl = int(df[df['group'] == 'control']['converted'].sum())
n_treat = len(df[df['group'] == 'treatment'])
conv_treat = int(df[df['group'] == 'treatment']['converted'].sum())

bayes = bayesian_ab_test(n_ctrl, conv_ctrl, n_treat, conv_treat)

print('Bayesian Analysis Results:')
print('-' * 50)
print(f'P(Treatment > Control):    {bayes["prob_treatment_better"]:.1%}')
print(f'P(Control > Treatment):    {bayes["prob_control_better"]:.1%}')
print(f'Expected difference:       {bayes["expected_diff_mean"]:.4%}')
print(f'95% Credible Interval:     [{bayes["expected_diff_ci_2.5"]:.4%}, {bayes["expected_diff_ci_97.5"]:.4%}]')
print(f'Expected loss (treatment): {bayes["expected_loss_if_choose_treatment"]:.4%}')
print(f'Expected loss (control):   {bayes["expected_loss_if_choose_control"]:.4%}')

In [None]:
plot_bayesian_posteriors(bayes['posterior_ctrl_samples'], bayes['posterior_treat_samples'])

## 7. Segmented Analysis with Multiple Testing Correction

When analyzing multiple subgroups (countries), we increase the risk of false positives. Bonferroni correction adjusts the significance threshold to account for this: with 3 countries, alpha becomes 0.05/3 = 0.017.

In [None]:
country_results = segment_analysis(df, 'country')
display(country_results[['segment', 'n_total', 'control_conv', 'treatment_conv',
                          'absolute_diff', 'p_value', 'bonferroni_significant', 'adjusted_alpha']])

In [None]:
plot_segment_results(country_results)

No country shows a statistically significant difference after Bonferroni correction.

## 8. Business Impact

In [None]:
monthly_visitors = 100_000
revenue_per_conversion = 100

monthly_impact = monthly_visitors * results['absolute_diff'] * revenue_per_conversion
yearly_impact = monthly_impact * 12

print('Business Impact (assuming 100K monthly visitors, $100/conversion):')
print(f'  Monthly revenue impact: ${monthly_impact:,.2f}')
print(f'  Yearly revenue impact:  ${yearly_impact:,.2f}')
print(f'\n  -> Adopting the new page would cost ~${abs(yearly_impact):,.0f}/year in lost revenue')

## 9. Conclusions & Recommendations

### Statistical Evidence
| Method | Result | Conclusion |
|--------|--------|------------|
| Z-test | p=0.19, CI includes 0 | Not significant |
| Cohen's h | -0.005 (negligible) | No practical effect |
| Power analysis | MDE ~0.3%, observed -0.16% | Test adequately powered |
| Bayesian | P(treatment better) ~37% | Control likely better |
| Country segments | None significant (Bonferroni) | Consistent across markets |

### Business Recommendation
**Keep the current landing page.** The new design performs slightly worse across all metrics and statistical frameworks. Implementing it would risk ~$189K in annual revenue.

### Next Steps
1. Test more substantial design changes (bolder CTAs, different layouts)
2. Run qualitative research (user interviews, heatmaps, session recordings)
3. Try multivariate testing to isolate specific page elements
4. Consider segmenting by user behavior (new vs returning) for targeted optimization
5. Implement continuous monitoring dashboards for ongoing conversion tracking