# Cookie Cats Data A/B Test Project

## Project Description:

This Cookie_cats data is obtained from Kaggle. Cookie Cats is a hugely popular mobile puzzle game developed by Tactile Entertainment. It's a classic "connect three" style puzzle game where the player must connect tiles of the same color in order to clear the board and win the level. It also features singing cats. We're not kidding! As players progress through the game they will encounter gates that force them to wait some time before they can progress or make an in-app purchase.

In this project, we will analyze the result of an A/B test where the first gate in Cookie Cats was moved from level 30 to level 40. In particular, we will analyze the impact on player retention. 

The Cookie_cats data set has 90189 rows and 5 columns. The columns include userid, version, sum_gamerounds, retention_1, retention_7. “userid” is a unique number that identifies each player. “version” is  whether the player was put in the control group (gate_30 - a gate at level 30) or the test group with the moved gate (gate_40 - a gate at level 40). “sum_gamerounds” means the number of game rounds played by the player during the first 14 days after install. “retention_1” means whether the players come back and play 1 day after installing the app or not. “retention_7” means whether the players come back and play 7 days after installing the app or not.

In this project, we will conduct A/B test and assume gate_40 will have more player retention. Then the null hypothesis is proposed as no difference between gate_30 and gate_40 for player retention. Further, we will compute the number of game rounds played by the player during the first 14 days after installing the app for these two groups: gate_30 and gate_40. Also, significant difference or not on the number of game rounds played (sum_gamerounds) between these two groups will be investigated as a secondary metric.  A t-test, z-test and p-value of 0.05 are used in this project.

Two stages are designed for this A/B test. First, a rollout plan is to investigate player retention after one day playing. Becuase this data set already provided two group sizes (almost same size half-half 50:50, about 45000), let's use these sample sizes for A/B test. Next, it is about investigating 7-day retention: there is any difference between these two groups? 

Hence, this project task will have: 
   * The distribution of game rounds
   * Overall 1-day retention
   * 1-day retention by AB-group
   * Should we be confident in the difference
   * Zooming in on the difference
   * The probability of a difference
   * 7-day retention by AB-group
   * Bootstrapping the difference again

Now, let's start to take a look data.

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
from scipy import stats
%matplotlib inline

In [3]:
PATH = 'C:/Users/yuluc/OneDrive/Documents/Thinkful/Module 14/cookie_cats.csv'
experiment_data = pd.read_csv(PATH)
experiment_data.head(10)

Unnamed: 0,userid,version,sum_gamerounds,retention_1,retention_7
0,116,gate_30,3,False,False
1,337,gate_30,38,True,False
2,377,gate_40,165,True,False
3,483,gate_40,1,False,False
4,488,gate_40,179,True,True
5,540,gate_40,187,True,True
6,1066,gate_30,0,False,False
7,1444,gate_40,2,False,False
8,1574,gate_40,108,True,True
9,1587,gate_40,153,True,False


This data set has 5 columns as mentioned in above description section. And let's continue to look at data set other information. The code is given below. 

In [4]:
# This data set contains the number of rows and columns.
experiment_data.shape

(90189, 5)

In [5]:
# How large is test group and control group?

test_size = len(experiment_data[experiment_data.version == 'gate_30'])
control_size = len(experiment_data[experiment_data.version == 'gate_40'])
print('test sample size:', test_size)
print('control sample size:', control_size)

test_proportion = test_size/ (test_size + control_size)
print('test proportion:',test_proportion)

test sample size: 44700
control sample size: 45489
test proportion: 0.49562585237667567


From the test proportion, it can be seen that test group and control group sizes are almost same, test is a slight less than control and about 49.6%.

In [6]:
# Task 1: What is the sum_game rounds distribution for test and control group?
experiment_data[['version', 'sum_gamerounds']].groupby('version').describe()

Unnamed: 0_level_0,sum_gamerounds,sum_gamerounds,sum_gamerounds,sum_gamerounds,sum_gamerounds,sum_gamerounds,sum_gamerounds,sum_gamerounds
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
version,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
gate_30,44700.0,52.456264,256.716423,0.0,5.0,17.0,50.0,49854.0
gate_40,45489.0,51.298776,103.294416,0.0,5.0,16.0,52.0,2640.0


In [7]:
# Are the test and control groups significantly different?
stats.ttest_ind(experiment_data[experiment_data.version == 'gate_30'].sum_gamerounds,
                experiment_data[experiment_data.version == 'gate_40'].sum_gamerounds)

Ttest_indResult(statistic=0.8910426211362967, pvalue=0.37290868247405207)

P-value obtained about 0.37 (more than 0.05) implied that overall there is is no significantly different for the number of game rounds played by the player during the first 14 days for both test group-gate_40 and control group-gate_30.

next let's check the situation about 1-day playing.

## After One Day

In [8]:
# Task 2: Overall 1-day retention.
print(experiment_data['retention_1'].value_counts())
experiment_data['retention_1'].describe()

False    50036
True     40153
Name: retention_1, dtype: int64


count     90189
unique        2
top       False
freq      50036
Name: retention_1, dtype: object

In [9]:
# Task 3: 1-day retention by AB-group.
experiment_data[['version','retention_1']].groupby('version').describe()

Unnamed: 0_level_0,retention_1,retention_1,retention_1,retention_1
Unnamed: 0_level_1,count,unique,top,freq
version,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
gate_30,44700,2,False,24666
gate_40,45489,2,False,25370


After one day playing the game, it seems that slightly more people left and didn't play game any more than those retented for both groups from the above results (False > True).

In [10]:
# Task 4: should we be confident in the difference; zooming in on the difference; the probability of a difference.
# Conduct t-test and compute p-value for 1-day retention.
stats.ttest_ind(experiment_data[experiment_data.version == 'gate_30'].retention_1,
                experiment_data[experiment_data.version == 'gate_40'].retention_1)

Ttest_indResult(statistic=1.7840979256519656, pvalue=0.07441111525563184)

In [1]:
import statsmodels.api as sm 
z1, p_value = sm.stats.proportions_ztest([0.448, 0.442], [44700, 45489])
print(p_value)

0.9883403344273437


The probabiltiy of two groupm from t-test, p-value = 0.0744 is more than 0.05. So there is no significantly different between two groups for one-day player retention. Also p-vlaue from z-test is 0.988 more than 0.05, it confirms the result obtained from t-test.

This result also indicates that one day might not be enough time to study the impact on player retention. Time plays a huge role in almost all experiments, and not all behaviors are immediate. Sometimes it takes a while to figure out a new feature.   

## After Seven Days

One more week goes by and we will turn the experiment off at the end of second week, in other words, total 14 days. Now it's time to analyze what happened and see how players are performing.

In [26]:
# Task 5: Overall 7-day retention.
print(experiment_data['retention_7'].value_counts())
experiment_data['retention_7'].describe()

False    73408
True     16781
Name: retention_7, dtype: int64


count     90189
unique        2
top       False
freq      73408
Name: retention_7, dtype: object

In [27]:
# Task 6: 7-day retention by AB-group.
experiment_data[['version','retention_7']].groupby('version').describe()

Unnamed: 0_level_0,retention_7,retention_7,retention_7,retention_7
Unnamed: 0_level_1,count,unique,top,freq
version,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
gate_30,44700,2,False,36198
gate_40,45489,2,False,37210


Again, it can be seen from the above results more people left after 7 days playing.

And is the difference between two groups same as one day player retention? let's conduct t-test again to dig deeper.

In [None]:
# Task 7:t-test and p-value for 7-day retention
stats.ttest_ind(experiment_data[experiment_data.version == 'gate_30'].retention_7,
                experiment_data[experiment_data.version == 'gate_40'].retention_7)

In [2]:
import statsmodels.api as sm 
z1, p_value = sm.stats.proportions_ztest([0.190, 0.182], [44700, 45489])
print(p_value)

0.9852774686354742


p-value = 0.00155 from t-test is less than 0.05. This experiment got very significant after allowing for more time. In other words, after 7 days, there are more player retention in test group-gate_40 than control group gate_30.

However, p-value obtained from z-test is 0.985 still more higher than 0.05. And it indicates no significant difference between two group. Because in this project, variables in retention_1 and retention_7 are the proportion of the total values(False or True). Hence, z-test is more suitable to apply to this project.

## Conclusion
After 7days playing the game, it seems to a significant improvement on player retention for high level gate group from t-test, but no any difference showing in z-test. This confirms that z-test is more reliable for propotion samples.But the number of game rounds played by the player during the first 14 days after install seems no significant difference between two groups. So, it might be worth working with the advertising and engineering teams to see if you could figure out the reasons for it. For example, it can decrease waiting time to progress or make an in-app purchase and so on.