# A/B Test Analysis: Conversion Rate Comparison

This notebook performs an A/B test analysis comparing conversion rates between two groups (A and B). 
We use a z-test for proportions to determine if the difference in conversion rates is statistically significant.

## Step 1: Import necessary libraries and load data

- `pandas` for data handling
- `proportions_ztest` from `statsmodels` to perform the z-test for proportions
- Load the CSV data and inspect basic info

In [1]:
import pandas as pd
from statsmodels.stats.proportion import proportions_ztest

# Load dataset
csv_file = "E:\\PythonProjects\\PA-Projects\\AB_Test_Analysis\\data\\ab_testing.csv"
df = pd.read_csv(csv_file)

df.info()
print("First 5 rows:")
print(df.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   User ID     5000 non-null   int64 
 1   Group       5000 non-null   object
 2   Page Views  5000 non-null   int64 
 3   Time Spent  5000 non-null   int64 
 4   Conversion  5000 non-null   object
 5   Device      5000 non-null   object
 6   Location    5000 non-null   object
dtypes: int64(3), object(4)
memory usage: 273.6+ KB
First 5 rows:
   User ID Group  Page Views  Time Spent Conversion   Device          Location
0    14292     B           3         424         No   Mobile  Northern Ireland
1    11682     A           9         342         No   Mobile          Scotland
2    19825     A           2         396         No  Desktop  Northern Ireland
3    16080     B           4         318         No  Desktop             Wales
4    18851     A           1         338        Yes  Desktop          Sco

## Step 2: Data preprocessing and conversion rate calculation

- Filter data by groups A and B
- Convert `Conversion` column from "Yes"/"No" to binary 1/0
- Calculate mean conversion rate for each group

In [2]:
# Extract groups and convert "Yes"/"No" to 1/0
group_a = df[df['Group'] == 'A']['Conversion'].map({'Yes': 1, 'No': 0})
group_b = df[df['Group'] == 'B']['Conversion'].map({'Yes': 1, 'No': 0})

# Calculate conversion rates
conv_rate_a = group_a.mean()
conv_rate_b = group_b.mean()

print(f"Conversion rate A: {conv_rate_a:.4f}")
print(f"Conversion rate B: {conv_rate_b:.4f}")

Conversion rate A: 0.0540
Conversion rate B: 0.1407


## Step 3: Perform z-test for difference in proportions

- Define counts of conversions and total observations for each group
- Use `proportions_ztest` to calculate the z-statistic and p-value
- Interpret the results based on significance level (alpha = 0.05)

In [3]:
count = [group_a.sum(), group_b.sum()]
nobs = [group_a.count(), group_b.count()]

# Perform z-test for proportions
stat, pval = proportions_ztest(count, nobs)

print(f"Z-statistic: {stat:.4f}")
print(f"P-value: {pval:.4f}")

alpha = 0.05
conclusion = "Reject null hypothesis - significant difference found." if pval < alpha else "Fail to reject null hypothesis - no significant difference."
print(conclusion)

Z-statistic: -10.3545
P-value: 0.0000
Reject null hypothesis - significant difference found.


## Step 4: Save the summarized results to CSV

- Create results dictionary including all relevant stats
- Check if output directory exists, create if not
- Save results as CSV file for further review

In [4]:
import os

output_dir = "E:\\PythonProjects\\PA-Projects\\AB_Test_Analysis\\results"
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

results = {
    'Group': ['A', 'B'],
    'Conversion Rate': [conv_rate_a, conv_rate_b],
    'Conversions': count,
    'Total Observations': nobs,
    'Z-statistic': [stat, ''],
    'P-value': [pval, ''],
    'Conclusion': [conclusion, '']
}

results_df = pd.DataFrame(results)
results_df.to_csv(os.path.join(output_dir, "ab_test_summary.csv"), index=False)
print("Results saved to ab_test_summary.csv")

Results saved to ab_test_summary.csv
