# Case Study

Sebagai seorang data analyst di perusahaan game yang mengembangkan Cookie Cats, sebuah game mobile yang berbasis puzzle. . Dalam game ini, pemain harus melewati berbagai level. Salah satu fitur penting dalam game adalah gate, yaitu batasan level yang memaksa pemain menunggu atau membayar untuk melanjutkan permainan.

Pengembang ingin mengetahui apakah mengubah lokasi gate dari level 30 (gate_30) ke level 40 (gate_40) akan memengaruhi retention rate pemain.

Tujuan eksperimen:
Menentukan apakah menunda hambatan permainan (dengan gate di level 40) akan meningkatkan jumlah pemain yang kembali bermain setelah 1 hari dan 7 hari.

# Hipotesis

H0 (Null Hypothesis): Tidak ada perbedaan signifikan dalam retention rate antara gate_30 dan gate_40.


H1 (Alternative Hypothesis): Ada perbedaan signifikan dalam retention rate antara gate_30 dan gate_40.

# Metrik Evaluasi

Success Metrics <br><br>
Retention 1-day (%) → Persentase pemain yang kembali setelah 1 hari.<br>
Retention 7-day (%) → Persentase pemain yang kembali setelah 7 hari.<br>
Total game rounds played → Rata-rata jumlah ronde yang dimainkan.<br><br>

Guardrail Metrics <br><br>
Total player churn → Jangan sampai retention turun drastis yang bisa menyebabkan kehilangan pemain secara besar-besaran.<br>
Revenue dari in-app purchase → Jika perubahan gate mengurangi pembelian dalam aplikasi, itu bisa merugikan bisnis.<br>

[Cookie Cats Dataset](https://drive.google.com/file/d/1ncKF9hy6Apeu-3zk6BGxZJyW8rr57Aqk/view?usp=sharing)

# Sample Size

Menggunakan A/B Test Sample Size Calculator dari CXL dengan parameter berikut:

    Confidence Level: 95%
    Power Level: 80%
    Conversion Rate for Control (Baseline Retention Day-1): 49%
    Minimum Detectable Effect (MDE): 3% (diharapkan retention meningkat menjadi 52.56%)
    Number of Variants (not including control): 1
    Weekly Traffic: 100000
    One-sided or Two-sided Test: Two-sided

Hasil perhitungan:

    Sample Size per Group: 18,155 pemain
    Total Sample Size: 36,310 pemain
    Estimated Duration: 1 minggu



# Uji Statistik


In [1]:
# import library
import pandas as pd
from statsmodels.stats.proportion import proportions_ztest
import os

In [2]:
# load data
data_path = 'https://raw.githubusercontent.com/wahyughifari/AB-testing-CookieCats/refs/heads/main/Cookie_Cats_dataset.csv'

In [3]:
df = pd.read_csv(data_path)

df.head()

Unnamed: 0,userid,version,sum_gamerounds,retention_1,retention_7
0,116,gate_30,3,False,False
1,337,gate_30,38,True,False
2,377,gate_40,165,True,False
3,483,gate_40,1,False,False
4,488,gate_40,179,True,True


In [4]:
df

Unnamed: 0,userid,version,sum_gamerounds,retention_1,retention_7
0,116,gate_30,3,False,False
1,337,gate_30,38,True,False
2,377,gate_40,165,True,False
3,483,gate_40,1,False,False
4,488,gate_40,179,True,True
...,...,...,...,...,...
90184,9999441,gate_40,97,True,False
90185,9999479,gate_40,30,False,False
90186,9999710,gate_30,28,True,False
90187,9999768,gate_40,51,True,False


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 90189 entries, 0 to 90188
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   userid          90189 non-null  int64 
 1   version         90189 non-null  object
 2   sum_gamerounds  90189 non-null  int64 
 3   retention_1     90189 non-null  bool  
 4   retention_7     90189 non-null  bool  
dtypes: bool(2), int64(2), object(1)
memory usage: 2.2+ MB


In [6]:
#cek missing value
df.isnull().sum()

Unnamed: 0,0
userid,0
version,0
sum_gamerounds,0
retention_1,0
retention_7,0


In [7]:
# cek duplicated
df.duplicated().sum()

np.int64(0)

In [8]:
# Distribusi Pemain per Grup
df['version'].value_counts().reset_index()

Unnamed: 0,version,count
0,gate_40,45489
1,gate_30,44700


In [9]:
df.value_counts('version', normalize=True).reset_index()

Unnamed: 0,version,proportion
0,gate_40,0.504374
1,gate_30,0.495626


In [10]:
# Konversi kolom boolean ke integer (True = 1, False = 0)
df['retention_1'] = df['retention_1'].fillna(0).astype(int)
df['retention_7'] = df['retention_7'].fillna(0).astype(int)

In [11]:
df

Unnamed: 0,userid,version,sum_gamerounds,retention_1,retention_7
0,116,gate_30,3,0,0
1,337,gate_30,38,1,0
2,377,gate_40,165,1,0
3,483,gate_40,1,0,0
4,488,gate_40,179,1,1
...,...,...,...,...,...
90184,9999441,gate_40,97,1,0
90185,9999479,gate_40,30,0,0
90186,9999710,gate_30,28,1,0
90187,9999768,gate_40,51,1,0


In [12]:
# Retention rate 1-day
retention_1_gate30 = df[df['version'] == 'gate_30']['retention_1'].mean()
retention_1_gate40 = df[df['version'] == 'gate_40']['retention_1'].mean()

print(f"Retention 1-day (gate_30): {retention_1_gate30:.2%}")
print(f"Retention 1-day (gate_40): {retention_1_gate40:.2%}")

Retention 1-day (gate_30): 44.82%
Retention 1-day (gate_40): 44.23%


In [13]:
# Retention rate 7-day
retention_7_gate30 = df[df['version'] == 'gate_30']['retention_7'].mean()
retention_7_gate40 = df[df['version'] == 'gate_40']['retention_7'].mean()

print(f"Retention 7-day (gate_30): {retention_7_gate30:.2%}")
print(f"Retention 7-day (gate_40): {retention_7_gate40:.2%}")

Retention 7-day (gate_30): 19.02%
Retention 7-day (gate_40): 18.20%


In [14]:
# Hitung jumlah pemain yang kembali di hari pertama
retained_1_gate30 = df[df['version'] == 'gate_30']['retention_1'].sum()
retained_1_gate40 = df[df['version'] == 'gate_40']['retention_1'].sum()

# Total pemain di setiap grup
total_gate30 = df[df['version'] == 'gate_30']['retention_1'].count()
total_gate40 = df[df['version'] == 'gate_40']['retention_1'].count()

# Uji Z-test
stat, pval = proportions_ztest([retained_1_gate30, retained_1_gate40], [total_gate30, total_gate40])
print(f"Z-test for Retention 1-day: p-value = {pval}")

if pval < 0.05:
    print("Retention 1-day is significantly different between the two groups!")
else:
    print("No significant difference in Retention 1-day.")

Z-test for Retention 1-day: p-value = 0.07440965529691913
No significant difference in Retention 1-day.


In [15]:
retained_7_gate30 = df[df['version'] == 'gate_30']['retention_7'].sum()
retained_7_gate40 = df[df['version'] == 'gate_40']['retention_7'].sum()

stat, pval = proportions_ztest([retained_7_gate30, retained_7_gate40], [total_gate30, total_gate40])
print(f"Z-test for Retention 7-day: p-value = {pval}")

if pval < 0.05:
    print("Retention 7-day is significantly different between the two groups!")
else:
    print("No significant difference in Retention 7-day.")

Z-test for Retention 7-day: p-value = 0.001554249975614329
Retention 7-day is significantly different between the two groups!


Insight:

  * Hasil eksperimen menunjukkan bahwa mengubah gate dari level 30 ke level 40 berdampak negatif terhadap Retention 7-day.
  * Retention 1-day tidak berubah secara signifikan, sehingga keputusan gate tidak berdampak langsung pada retensi awal pemain.
  * Retention 7-day menurun signifikan di gate_40, yang berarti pemain yang mencapai level 40 lebih cepat cenderung lebih cepat meninggalkan game

Rekomendasi:

* Tetap gunakan gate_30 karena gate_40 tidak meningkatkan retention dan justru mengurangi engagement jangka panjang.
* Eksplorasi alternatif peningkatan retention, misalnya:
  * Menambahkan reward atau bonus setelah pemain melewati gate.
  * Memberikan notifikasi atau event spesial di sekitar level gate untuk mempertahankan pemain.
