We analyze historical user behavior data to determine the **baseline click-through rate (CTR)**, which is essential for calculating the required sample size.  

Key aspects we examine:  
1. **Weekly CTR trends**  
2. **Monthly CTR fluctuations**  

By evaluating these trends, we gain insights into **CTR stability** and select a representative baseline value for sample size estimation.  

For the sample size calculation, we use a [tool](https://wawei225.github.io/sub_pages/ABtesting.html) and assume a **Minimum Detectable Effect (MDE) of 5%**.  


In [57]:
import sqlite3
import pandas as pd

In [58]:
event_hist_df = pd.read_csv('hist_data/event_table_daily.csv')
print(event_hist_df.head())

   user_id device            timestamp    event_name page_name section_name
0        1    ios  2024-09-19 15:20:00     view_page      home          NaN
1        1    ios  2024-09-19 15:20:01  view_section      home    discovery
2        1    ios  2024-09-19 15:24:39  view_section      home     trending
3        2    ios  2024-09-07 17:57:00     view_page      home          NaN
4        2    ios  2024-09-07 17:57:01  view_section      home    discovery


In [59]:
event_hist_df["timestamp"] = pd.to_datetime(event_hist_df["timestamp"])
event_hist_df["week"] = event_hist_df["timestamp"].dt.strftime("%Y-%U")
event_hist_df["date"] = event_hist_df["timestamp"].dt.strftime("%Y-%m-%d")  

In [60]:
view_df = event_hist_df[
    (event_hist_df['device'] == 'ios') & 
    (event_hist_df['event_name'] == 'view_section') & 
    (event_hist_df['section_name'] == 'trending')
]

click_df = event_hist_df[
    (event_hist_df['device'] == 'ios') & 
    (event_hist_df['event_name'] == 'click_item') & 
    (event_hist_df['section_name'] == 'trending')
]

# Count unique users per week
view_trending_weekly = view_df.groupby("week")["user_id"].nunique()
click_trending_weekly = click_df.groupby("week")["user_id"].nunique()

# Calculate weekly conversion rate
weekly_conversion_rate = (click_trending_weekly / view_trending_weekly).fillna(0) * 100

In [61]:
print(weekly_conversion_rate)

week
2024-35    25.546304
2024-36    26.071874
2024-37    26.159577
2024-38    25.907140
2024-39    26.354579
Name: user_id, dtype: float64


In [62]:
# Count unique users per week
view_trending = view_df["user_id"].nunique()
click_trending = click_df["user_id"].nunique()
# Calculate overall conversion rate
conversion_rate = (click_trending / view_trending) * 100

In [63]:
print(conversion_rate)

25.95009981859452


Finall, we selected **25.95%** as the baseline User-Level CTR because it aligns closely with both the overall conversion rate and the average weekly conversion rate.

With a 5% relative MDE, the total required sample size is 18002 users.

In [65]:
# Count unique users per day
daily_unique_users = view_df.groupby("date")["user_id"].nunique()

# Compute the average daily unique users
avg_daily_unique_users = daily_unique_users.mean()

print(f"Average Daily Unique Users: {avg_daily_unique_users:.2f}")

required_users = 36004  # Total users needed (A + B groups)
experiment_days = required_users / avg_daily_unique_users

print(f"Estimated Experiment Duration: {experiment_days:.2f} days")

Average Daily Unique Users: 3538.68
Estimated Experiment Duration: 10.17 days


Suggested experiment days is **14 days** to account for weekly traffic patterns.