In [2]:
import pandas as pd  # type: ignore
from pandasql import sqldf # type: ignore
data = pd.read_csv('IndieZ.csv')

## **1\. User experience comparison**

**Main problem**: Identification for user experience's change between 2 versions.

**Hypothesis**: With the given information of bad user experience in the tutorial and my own experience after several test runs, it is apparent that the tutorial section failed to explain the game's rule. I theorized that with the uncertanty of not knowing how to play the game, a good amount of players would struggle at level 1 even though it is supposed to be the easiest level. To be able to prove this, I compiled a query to first acknowledging the tutorial completion rate.

In [3]:
sqldf('''
SELECT 
    nop.version,
    nop.user as num_of_player,
    notc.user as tut_completed,
    (CAST(notc.user as FLOAT) / CAST(nop.user as FLOAT)) *100 
        as tut_complete_percentage
FROM (
    SELECT 
        version,
        COUNT(DISTINCT user) as user 
    FROM 
        data
    GROUP BY 
        version
) as nop
JOIN (
    SELECT 
        version,
        COUNT(DISTINCT user) as user
    FROM 
        data
    WHERE 
        quantity = -2 
    GROUP BY 
        version
) as notc
ON nop.version = notc.version 
''')

Unnamed: 0,version,num_of_player,tut_completed,tut_complete_percentage
0,1.5.2,6671,6341,95.053215
1,1.6.0,6929,6571,94.833309


The tutorial completion rate is fairly similar across both versions. Then using this information, I complied another query to get the number of players that completed the tutorial but lost at level 1:

In [4]:
sqldf('''
SELECT 
    tut_complete.version,
    tut_complete.user as tut_completed_user,
    lv_1_lost.user as lv_1_lost_player,
    ROUND(CAST(lv_1_lost.user as FLOAT) / CAST(tut_complete.user as FLOAT) * 100, 2) as percentage
FROM (
    SELECT 
        version,
        COUNT(DISTINCT user) as user 
    FROM 
        data
    WHERE 
        quantity = -2 
    GROUP BY 
        version
) as tut_complete
JOIN (
    SELECT 
        version,
        COUNT(DISTINCT user) as user
    FROM 
        data
    WHERE 
        user IN (
            SELECT  
                user
            FROM 
                data
            WHERE 
                event_name = 'tutorial' AND quantity = -2) 
        AND level = 1 AND win = 0
    GROUP BY 
        version
) as lv_1_lost
ON tut_complete.version = lv_1_lost.version 
''')

Unnamed: 0,version,tut_completed_user,lv_1_lost_player,percentage
0,1.5.2,6341,1561,24.62
1,1.6.0,6571,1373,20.89


Using the same logic as the corellation between the uncertanty of the game's rule and level 1's losing rate, I hypothesized that the number of loses would drop significantly in level 2 since most player would understand how the game works by that time:

In [5]:
sqldf('''
SELECT 
    tut_complete.version,
    tut_complete.player as num_of_lv_2_player,
    lost.player as lv_2_lost_player,
    ROUND(CAST(lost.player as FLOAT) / CAST(tut_complete.player as FLOAT) * 100, 2) as percentage
FROM (
    SELECT 
        version,
        COUNT(DISTINCT user) as player 
    FROM 
        data
    WHERE 
        level = 2
    GROUP BY 
        version
) as tut_complete
JOIN (
    SELECT 
        version,
        COUNT(DISTINCT user) as player
    FROM 
        data
    WHERE 
        user IN (
            SELECT  
                user
            FROM 
                data
            WHERE 
                event_name = 'tutorial' AND quantity = -2) 
            AND 
                level = 2 AND win = 0
    GROUP BY 
        version
) as lost
ON tut_complete.version = lost.version
''')

Unnamed: 0,version,num_of_lv_2_player,lv_2_lost_player,percentage
0,1.5.2,6223,56,0.9
1,1.6.0,6396,48,0.75


As expected, the lost rate dropped dramatically from more than 20% to less than 1% in both versions. 

**Key findings**

- The bad user experience due to the vague nature of the tutorial section can be identified using the number of player that struggle at the very first level of the game. This can be further confirmed with the significant difference between level 1's lost rate and level 2's lost rate. The lost rate of other levels beyond 1 and 2 are most likely the reflection of the level's difficulty. 

- There is a significant drop in the percentage of player that completed the tutorial but lost at the first level between 2 versions, which may indicates that the new tutorial is more likely to provide a better user experience. However, further inspection is required to conclude the real impact of version 1.6.0's changes.

## **2\. Versions evaluation**

**Main problem**: Whether the changes in version 1.6.0 is significant enough to be able to replace version 1.5.2 .

**Hypothesis**: Based on the earlier inspection, version 1.6.0 could benefit from further improvement before launch.

### **Retention rate** 

Utilized both SQL and pandas, I compiled a data frame to calculate the retention rate of each version: 

In [14]:
df_152 = sqldf('''
    SELECT 
        day_diff,
        COUNT(DISTINCT user) num_of_player
    FROM 
        data 
    WHERE 
        version = '1.5.2' 
    GROUP BY 
        day_diff
''')
player_count_152 = data[data['version'] == '1.5.2'].groupby('user')['day_diff'].max().value_counts().sum() 

df_160 = sqldf('''
    SELECT 
        day_diff,
        COUNT(DISTINCT user) num_of_player
    FROM 
        data 
    WHERE 
        version = '1.6.0' 
    GROUP BY 
        day_diff
''')
player_count_160 = data[data['version'] == '1.6.0'].groupby('user')['day_diff'].max().value_counts().sum()

In [36]:
retention_rate = pd.DataFrame({
    'ver_152': df_152['num_of_player'],
    'retention_rate_152' : round(df_152['num_of_player'] 
                                 / player_count_152 * 100, 2),
    'ver_160': df_160['num_of_player'],
    'retention_rate_160' : round(df_160['num_of_player'] 
                                 / player_count_160 * 100, 2) 
})
retention_rate.index.name = 'day_dif'
retention_rate.drop(index = 0, inplace = True)
retention_rate

Unnamed: 0_level_0,ver_152,retention_rate_152,ver_160,retention_rate_160
day_dif,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,1780,26.68,1981,28.59
2,891,13.36,1036,14.95
3,517,7.75,714,10.3
4,385,5.77,520,7.5
5,275,4.12,423,6.1
6,202,3.03,334,4.82
7,183,2.74,289,4.17


The retention rate of day 1 is positive toward version 1.6.0's change, while the retention rate from other days also shows that players were  dropping the game in a slightly lower rate. Due to the lack of experience in the given field, I did some research on the topic of mobile game retention rate and concluded that a day 1 retention rate of 30% is a desirable. With this information, I made a quick calculated field to evaluate the significance of the increased amount. 

In [37]:
retention_rate['increase_rate'] = round(retention_rate['retention_rate_160'] 
                                        / retention_rate['retention_rate_152'] * 100 - 100, 2)

In [38]:
retention_rate[retention_rate.index == 1]

Unnamed: 0_level_0,ver_152,retention_rate_152,ver_160,retention_rate_160,increase_rate
day_dif,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,1780,26.68,1981,28.59,7.16


With the increase of 7.16%, it is save to conclude that the change is on the favorable side.  

### **Checking a variety of other metrics**

This section is dedicated to compare several other relevant metrics.

#### ***The level where most players drop the game*** 

The initial idea was to check the average of the max level that was played by each player, but due to some outlier in the dataset, a comparison of other statistical status is more suitable. 

In [39]:
max_lv_152 = data[data['version'] == '1.5.2'].groupby('user')['level'].max().describe()
max_lv_160 = data[data['version'] == '1.6.0'].groupby('user')['level'].max().describe()

max_lv = pd.DataFrame({
    'stat' : max_lv_152.index,
    'lv_152' : max_lv_152.values,
    'lv_160' : max_lv_160.values
})

max_lv 

Unnamed: 0,stat,lv_152,lv_160
0,count,6671.0,6929.0
1,mean,9.41478,10.321691
2,std,17.068243,19.521062
3,min,1.0,1.0
4,25%,3.0,4.0
5,50%,6.0,6.0
6,75%,11.0,11.0
7,max,376.0,519.0


The statistic about max level came out pretty similar for both versions, with most players dropping the game after level 6.

#### ***Average total playtime across versions***

In [3]:
df_playtime = sqldf('''
SELECT 
    user,
    SUM(quantity) as play_time,
    version
FROM 
    data 
WHERE 
    event_name = 'game_end' 
GROUP BY 
    user 
''')

playtime_152 = df_playtime[df_playtime['version'] == '1.5.2'].describe()
playtime_160 = df_playtime[df_playtime['version'] == '1.6.0'].describe()

playtime = pd.DataFrame({
    'ver_152' : playtime_152['play_time'],
    'ver_160' : playtime_160['play_time']
})
playtime

Unnamed: 0,ver_152,ver_160
count,6435.0,6607.0
mean,779.674903,988.136673
std,3482.261223,4342.895857
min,6.0,7.0
25%,61.0,69.0
50%,133.0,153.0
75%,372.0,451.0
max,97182.0,118078.0


The playtime increased modestly in version 1.6.0 with positive return on mean, median and both first and third quartiles. But similar to other statistic, its significance can be questioned further.   

#### **_Average total user engagement_**

In [4]:
df_ue = sqldf('''
SELECT 
    user,
    COUNT(event_name) as user_engagement,
    version
FROM 
    data 
WHERE 
    event_name = 'user_engagement' 
GROUP BY 
    user 
''')

ue_152 = df_ue[df_ue['version'] == '1.5.2'].describe()
ue_160 = df_ue[df_ue['version'] == '1.6.0'].describe()

ue = pd.DataFrame({
    'ver_152' : ue_152['user_engagement'],
    'ver_160' : ue_160['user_engagement']
})
ue

Unnamed: 0,ver_152,ver_160
count,6671.0,6911.0
mean,13.515815,15.350166
std,35.046254,34.540717
min,1.0,1.0
25%,3.0,4.0
50%,6.0,7.0
75%,13.0,14.0
max,1276.0,952.0


All 3 quartiles return with one engagement difference from version 1.5.2 to 1.6.0. 

**Key findings**

- Version 1.6.0 returns positive on all relevant aspects, including retention rate, playtime and user engagement count. However, the increased number can be deemed insignificant depending on the business objective. 

**Acknowledgement**

- Due to the lack of experience in the given field, my judgement on the statistical significance of version 1.6.0's positive return might not be reliable. Nevertheless, I still believe that version 1.6.0 can benefit from further improvement before launching in order to yield better result.  

## 3\. **Recommendation**

**Keep a minimal aesthetic**

- When playing the game, I found that the transition to the final artwork was somewhat satisfying. This satisfaction could be enhanced with a minial level UI with only neccessary color to instruct the player (slide 1). The transition from a barely nothing screen to a colorfull illustration may provide a better user satisfaction. 

**Illustration**: https://view.genial.ly/661f45f4619969001404849b/presentation-indiez-tutorial-ui-demo 

**Emphasize the color**

- One issue I had with the tutorial was the red circle that highlight the number. I would prefer something similar to the illustration for more visual clarity and better gamerule understanding (slide 2). 

- Since the game has a colorful theme, it would be reasonable to emphasize it with color. My idea of integrating the theme of color to the game is to highlight different elements of the tutorial using different colors. For example, the tutorial in version 1.6.0 can be greatly improved using a color highlighter and animation to instruct player (slide 2). The same priciple can be applied to when player complete a row or column (slide 3). 

