# Mini-Project: Analyze an Experiment

**Module: Descriptive & Inferential Statistics**

## Project Overview

You've been hired as a data analyst at **TechMart**, an e-commerce company. The product team recently ran an A/B test on their checkout flow and needs you to analyze the results and present findings to stakeholders.

### The Experiment
- **Control (A)**: Original checkout - single long form
- **Treatment (B)**: New checkout - multi-step wizard with progress bar
- **Duration**: 2 weeks
- **Primary metric**: Checkout completion rate
- **Secondary metrics**: Average order value, time to complete checkout

### Your Task
1. Explore and clean the data
2. Calculate key metrics for both groups
3. Test for statistical significance
4. Analyze by user segments
5. Write a recommendation report

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

np.random.seed(42)

---
## Part 1: Load and Explore the Data

In [None]:
# Generate realistic experiment data
n_users = 15000

# User assignment
experiment_data = pd.DataFrame({
    'user_id': range(1, n_users + 1),
    'variant': np.random.choice(['control', 'treatment'], n_users),
    'device': np.random.choice(['mobile', 'desktop', 'tablet'], n_users, p=[0.55, 0.35, 0.10]),
    'user_type': np.random.choice(['new', 'returning'], n_users, p=[0.4, 0.6]),
    'traffic_source': np.random.choice(['organic', 'paid', 'email', 'social'], n_users, p=[0.35, 0.30, 0.20, 0.15])
})

# Generate checkout completion based on variant and other factors
def generate_completion(row):
    # Base rates
    base_rate = 0.28 if row['variant'] == 'control' else 0.32
    
    # Device effects
    if row['device'] == 'mobile':
        base_rate *= 0.85  # Mobile converts worse
    elif row['device'] == 'desktop':
        base_rate *= 1.1   # Desktop converts better
    
    # User type effects
    if row['user_type'] == 'returning':
        base_rate *= 1.15  # Returning users convert better
    
    # Treatment helps mobile more
    if row['variant'] == 'treatment' and row['device'] == 'mobile':
        base_rate *= 1.08  # Extra boost for mobile
    
    return np.random.binomial(1, min(base_rate, 0.95))

experiment_data['completed_checkout'] = experiment_data.apply(generate_completion, axis=1)

# Generate order value for those who completed
def generate_order_value(row):
    if row['completed_checkout'] == 0:
        return np.nan
    base = 85 if row['variant'] == 'control' else 82  # Treatment slightly lower AOV
    if row['user_type'] == 'returning':
        base *= 1.2
    return max(15, np.random.normal(base, 35))

experiment_data['order_value'] = experiment_data.apply(generate_order_value, axis=1)

# Generate time to checkout (seconds)
def generate_checkout_time(row):
    if row['completed_checkout'] == 0:
        return np.nan
    base = 180 if row['variant'] == 'control' else 210  # Wizard takes longer
    if row['device'] == 'mobile':
        base *= 1.2
    return max(30, np.random.normal(base, 50))

experiment_data['checkout_time_seconds'] = experiment_data.apply(generate_checkout_time, axis=1)

print(f"Dataset shape: {experiment_data.shape}")
experiment_data.head(10)

In [None]:
# TODO: Check the data types and missing values



In [None]:
# TODO: Check the distribution of users across variants
# Is the split roughly 50/50? This is important for a valid test.



In [None]:
# TODO: Check if the randomization was balanced
# Compare the distribution of device, user_type, and traffic_source across variants
# Any major imbalances could bias results



---
## Part 2: Calculate Key Metrics

In [None]:
# TODO: Calculate the overall checkout completion rate for each variant
# What's the absolute and relative difference?



In [None]:
# TODO: Calculate average order value for each variant
# Only include users who completed checkout



In [None]:
# TODO: Calculate average checkout time for each variant



In [None]:
# TODO: Create a summary table with all key metrics



---
## Part 3: Statistical Significance Testing

In [None]:
# Helper function for proportion test
def proportion_test(successes_a, total_a, successes_b, total_b):
    """Z-test for comparing two proportions."""
    p_a = successes_a / total_a
    p_b = successes_b / total_b
    p_pooled = (successes_a + successes_b) / (total_a + total_b)
    
    se = np.sqrt(p_pooled * (1 - p_pooled) * (1/total_a + 1/total_b))
    z = (p_b - p_a) / se
    p_value = 2 * (1 - stats.norm.cdf(abs(z)))
    
    # CI for difference
    se_diff = np.sqrt(p_a*(1-p_a)/total_a + p_b*(1-p_b)/total_b)
    ci = ((p_b - p_a) - 1.96*se_diff, (p_b - p_a) + 1.96*se_diff)
    
    return {'z': z, 'p_value': p_value, 'ci': ci}

In [None]:
# TODO: Test if the checkout completion rate difference is statistically significant



In [None]:
# TODO: Test if the average order value difference is significant
# Use a two-sample t-test



In [None]:
# TODO: Test if the checkout time difference is significant



---
## Part 4: Segment Analysis

In [None]:
# TODO: Break down checkout completion rate by device type
# Which device benefits most from the new checkout?



In [None]:
# TODO: Run significance tests for each device segment



In [None]:
# TODO: Break down by user type (new vs returning)
# Does the treatment work equally well for both?



In [None]:
# TODO: Create a visualization comparing conversion rates across segments



---
## Part 5: Business Impact Analysis

In [None]:
# Business context:
# - TechMart gets 500,000 checkout attempts per month
# - Current checkout completion rate matches control group
# - Average order value as observed

monthly_checkout_attempts = 500000

# TODO: Calculate the projected monthly impact of rolling out the treatment
# - Additional completed checkouts
# - Additional revenue (accounting for potentially lower AOV)
# - Include confidence intervals



In [None]:
# TODO: Calculate the annualized impact



---
## Part 6: Write Your Recommendation

Based on your analysis, write a brief report (in markdown cells below) that includes:

1. **Executive Summary** (2-3 sentences)
2. **Key Findings** (bullet points)
3. **Statistical Evidence** (summary of significance tests)
4. **Recommendation** (clear action to take)
5. **Caveats and Next Steps** (limitations, what to monitor)

### Your Report

**Executive Summary:**

[Write here]

**Key Findings:**

[Write here]

**Statistical Evidence:**

[Write here]

**Recommendation:**

[Write here]

**Caveats and Next Steps:**

[Write here]

---
## Example Solutions

Below are example solutions. Try to complete the exercises on your own first!

In [None]:
# Part 1 Solutions

# Data info
print("Data Types:")
print(experiment_data.dtypes)
print("\nMissing Values:")
print(experiment_data.isnull().sum())

In [None]:
# Variant distribution
print("Variant Distribution:")
print(experiment_data['variant'].value_counts())
print(f"\nSplit ratio: {experiment_data['variant'].value_counts(normalize=True).round(3).to_dict()}")

In [None]:
# Randomization balance check
print("Device distribution by variant:")
print(pd.crosstab(experiment_data['variant'], experiment_data['device'], normalize='index').round(3))

print("\nUser type distribution by variant:")
print(pd.crosstab(experiment_data['variant'], experiment_data['user_type'], normalize='index').round(3))

print("\nTraffic source distribution by variant:")
print(pd.crosstab(experiment_data['variant'], experiment_data['traffic_source'], normalize='index').round(3))

In [None]:
# Part 2 Solutions

# Overall metrics
control = experiment_data[experiment_data['variant'] == 'control']
treatment = experiment_data[experiment_data['variant'] == 'treatment']

control_rate = control['completed_checkout'].mean()
treatment_rate = treatment['completed_checkout'].mean()

print("=== CHECKOUT COMPLETION ===")
print(f"Control: {control_rate*100:.2f}%")
print(f"Treatment: {treatment_rate*100:.2f}%")
print(f"Absolute lift: {(treatment_rate - control_rate)*100:.2f}%")
print(f"Relative lift: {(treatment_rate - control_rate)/control_rate*100:.1f}%")

In [None]:
# AOV
control_aov = control['order_value'].dropna().mean()
treatment_aov = treatment['order_value'].dropna().mean()

print("\n=== AVERAGE ORDER VALUE ===")
print(f"Control: ${control_aov:.2f}")
print(f"Treatment: ${treatment_aov:.2f}")
print(f"Difference: ${treatment_aov - control_aov:.2f}")

In [None]:
# Checkout time
control_time = control['checkout_time_seconds'].dropna().mean()
treatment_time = treatment['checkout_time_seconds'].dropna().mean()

print("\n=== CHECKOUT TIME ===")
print(f"Control: {control_time:.0f} seconds ({control_time/60:.1f} min)")
print(f"Treatment: {treatment_time:.0f} seconds ({treatment_time/60:.1f} min)")
print(f"Difference: {treatment_time - control_time:.0f} seconds")

In [None]:
# Summary table
summary = pd.DataFrame({
    'Metric': ['Checkout Rate', 'Avg Order Value', 'Checkout Time (sec)'],
    'Control': [f"{control_rate*100:.2f}%", f"${control_aov:.2f}", f"{control_time:.0f}"],
    'Treatment': [f"{treatment_rate*100:.2f}%", f"${treatment_aov:.2f}", f"{treatment_time:.0f}"],
    'Difference': [
        f"+{(treatment_rate-control_rate)*100:.2f}%",
        f"${treatment_aov-control_aov:.2f}",
        f"+{treatment_time-control_time:.0f}"
    ]
})
print(summary.to_string(index=False))

In [None]:
# Part 3 Solutions

# Checkout completion significance
conv_result = proportion_test(
    control['completed_checkout'].sum(), len(control),
    treatment['completed_checkout'].sum(), len(treatment)
)

print("=== CHECKOUT RATE SIGNIFICANCE ===")
print(f"Z-statistic: {conv_result['z']:.3f}")
print(f"P-value: {conv_result['p_value']:.6f}")
print(f"95% CI for difference: ({conv_result['ci'][0]*100:.2f}%, {conv_result['ci'][1]*100:.2f}%)")
print(f"Significant at α=0.05: {conv_result['p_value'] < 0.05}")

In [None]:
# AOV significance
t_aov, p_aov = stats.ttest_ind(
    treatment['order_value'].dropna(),
    control['order_value'].dropna()
)

print("\n=== AOV SIGNIFICANCE ===")
print(f"T-statistic: {t_aov:.3f}")
print(f"P-value: {p_aov:.4f}")
print(f"Significant at α=0.05: {p_aov < 0.05}")

In [None]:
# Checkout time significance
t_time, p_time = stats.ttest_ind(
    treatment['checkout_time_seconds'].dropna(),
    control['checkout_time_seconds'].dropna()
)

print("\n=== CHECKOUT TIME SIGNIFICANCE ===")
print(f"T-statistic: {t_time:.3f}")
print(f"P-value: {p_time:.6f}")
print(f"Significant at α=0.05: {p_time < 0.05}")

In [None]:
# Part 4 Solutions

# By device
device_analysis = experiment_data.groupby(['device', 'variant'])['completed_checkout'].agg(['sum', 'count', 'mean'])
device_analysis.columns = ['completions', 'users', 'rate']
device_analysis['rate_pct'] = device_analysis['rate'] * 100
print("Conversion Rate by Device:")
print(device_analysis.round(4))

In [None]:
# Significance by device
print("\nSignificance Tests by Device:")
for device in ['mobile', 'desktop', 'tablet']:
    ctrl = experiment_data[(experiment_data['variant'] == 'control') & (experiment_data['device'] == device)]
    treat = experiment_data[(experiment_data['variant'] == 'treatment') & (experiment_data['device'] == device)]
    
    result = proportion_test(
        ctrl['completed_checkout'].sum(), len(ctrl),
        treat['completed_checkout'].sum(), len(treat)
    )
    
    lift = (treat['completed_checkout'].mean() - ctrl['completed_checkout'].mean()) / ctrl['completed_checkout'].mean() * 100
    print(f"\n{device.capitalize()}:")
    print(f"  Relative lift: {lift:.1f}%")
    print(f"  P-value: {result['p_value']:.4f}")
    print(f"  Significant: {result['p_value'] < 0.05}")

In [None]:
# Visualization
pivot = experiment_data.groupby(['device', 'variant'])['completed_checkout'].mean().unstack()

fig, ax = plt.subplots(figsize=(10, 5))
x = np.arange(len(pivot.index))
width = 0.35

ax.bar(x - width/2, pivot['control']*100, width, label='Control', color='#3498db')
ax.bar(x + width/2, pivot['treatment']*100, width, label='Treatment', color='#2ecc71')

ax.set_ylabel('Checkout Completion Rate (%)')
ax.set_xlabel('Device Type')
ax.set_title('Checkout Completion Rate by Device and Variant')
ax.set_xticks(x)
ax.set_xticklabels(pivot.index)
ax.legend()

# Add value labels
for i, (c, t) in enumerate(zip(pivot['control'], pivot['treatment'])):
    ax.annotate(f'{c*100:.1f}%', (i - width/2, c*100 + 0.5), ha='center')
    ax.annotate(f'{t*100:.1f}%', (i + width/2, t*100 + 0.5), ha='center')

plt.tight_layout()
plt.show()

In [None]:
# Part 5 Solutions

monthly_attempts = 500000

# Current state (control)
current_completions = monthly_attempts * control_rate
current_revenue = current_completions * control_aov

# Projected with treatment
projected_completions = monthly_attempts * treatment_rate
projected_revenue = projected_completions * treatment_aov

# Differences
additional_completions = projected_completions - current_completions
additional_revenue = projected_revenue - current_revenue

# Confidence intervals for completions
ci_low_rate = conv_result['ci'][0] + control_rate
ci_high_rate = conv_result['ci'][1] + control_rate

print("=== MONTHLY IMPACT PROJECTION ===")
print(f"\nCheckout Completions:")
print(f"  Current: {current_completions:,.0f}")
print(f"  Projected: {projected_completions:,.0f}")
print(f"  Additional: {additional_completions:,.0f}")

print(f"\nRevenue:")
print(f"  Current: ${current_revenue:,.0f}")
print(f"  Projected: ${projected_revenue:,.0f}")
print(f"  Additional: ${additional_revenue:,.0f}")

# Note: Revenue calculation is complex because higher completions but lower AOV
print(f"\nNote: Treatment has higher conversion but slightly lower AOV")
print(f"Net revenue change: ${additional_revenue:,.0f}/month")

In [None]:
# Annualized
print("\n=== ANNUAL IMPACT ===")
print(f"Additional completions: {additional_completions * 12:,.0f}")
print(f"Additional revenue: ${additional_revenue * 12:,.0f}")

---
## Example Report

### Executive Summary

The multi-step checkout wizard (Treatment B) significantly outperformed the original single-form checkout, increasing completion rates by approximately 4 percentage points (relative lift of ~14%). Despite a small decrease in average order value, the net revenue impact is positive, projecting an additional $1.5-2M in annual revenue.

### Key Findings

- **Checkout completion rate improved significantly**: Treatment achieved ~32% vs. Control's ~28%
- **Mobile users benefited most**: The new checkout design showed the largest relative improvement on mobile devices
- **Slight decrease in AOV**: Treatment showed ~$2-3 lower average order value, but this was offset by higher conversion
- **Checkout time increased**: The wizard takes ~30 seconds longer to complete, but doesn't hurt conversion

### Statistical Evidence

| Metric | Control | Treatment | P-value | Significant? |
|--------|---------|-----------|---------|-------------|
| Checkout Rate | 28.1% | 32.2% | <0.001 | Yes |
| Avg Order Value | $97.50 | $94.80 | 0.08 | No |
| Checkout Time | 180s | 210s | <0.001 | Yes |

### Recommendation

**Roll out the multi-step checkout wizard to 100% of users.** The significant improvement in completion rate justifies the change. Prioritize mobile rollout first as that segment shows the largest benefit.

### Caveats and Next Steps

1. **Monitor AOV closely**: While not statistically significant, there's a trend toward lower AOV. Continue monitoring post-launch.
2. **Novelty effect**: Some lift may be due to novelty. Re-evaluate conversion rates after 4-6 weeks.
3. **Long-term retention**: This test measured immediate checkout behavior. Consider tracking if wizard users have different return rates.
4. **Technical performance**: The wizard takes longer - ensure server load and timeout settings are appropriate.
5. **Consider further optimization**: Test variations of the wizard (e.g., fewer steps, different progress indicators) for additional gains.