# General remarks from preprocessing stage

It has been studied previously, that the membership data, the user log data, and the training member data (including the churn flag) contain no duplicates. Duplicates can be found in the transaction data, those have been removed before aggregating information. The user log data is too large to be dealt with in one go using pandas, due to insufficient memory of my laptop, thus aggregation has to be dealt with in chunks in pandas. All of that has been performed in the DataProcessor_KKBox notebook.

In [20]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math

from sklearn import metrics
import datetime as dt
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score, cross_val_predict, RandomizedSearchCV, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score, f1_score

import seaborn as sns

import plotly.offline as pyoff
import plotly.graph_objs as go
import plotly.io as pio

## User Activity Time Evolution

Check activity of users by user logs, since we care about churn as inactivity of 30 days, group the user logs in intervals of 30 days. Churn is in fact defined as no renewal of subscription 30 days after a previous membership has been expired. In fact a membership might be ongoing, but a user shows now activity in terms of songs listened to. But user acitivity in terms of played songs can give us an idea about churn, retained or reactivated active users.

In [2]:
df_ul=pd.read_csv("data/user_logs.csv",usecols=['msno','date'])
df_ul["date"]=pd.to_datetime(df_ul["date"], format='%Y%m%d', errors='ignore')
df_ul.head()

Unnamed: 0,msno,date
0,rxIP2f2aN0rYNp+toI0Obt/N/FYQX8hcO1fTmmy2h34=,2015-05-13
1,rxIP2f2aN0rYNp+toI0Obt/N/FYQX8hcO1fTmmy2h34=,2015-07-09
2,yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8=,2015-01-05
3,yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8=,2015-03-06
4,yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8=,2015-05-01


In [3]:
df_ul_v2=pd.read_csv("data/user_logs_V2.csv",usecols=['msno','date'])
df_ul_v2["date"]=pd.to_datetime(df_ul_v2["date"], format='%Y%m%d', errors='ignore')
df_ul_v2

Unnamed: 0,msno,date
0,u9E91QDTvHLq6NXjEaWv8u4QIqhrHk72kE+w31Gnhdg=,2017-03-31
1,nTeWW/eOZA/UHKdD5L7DEqKKFTjaAj3ALLPoAWsU8n0=,2017-03-30
2,2UqkWXwZbIjs03dHLU9KHJNNEvEkZVzm69f3jCS+uLI=,2017-03-31
3,ycwLc+m2O0a85jSLALtr941AaZt9ai8Qwlg9n0Nql5U=,2017-03-31
4,EGcbTofOSOkMmQyN1NMLxHEXJ1yV3t/JdhGwQ9wXjnI=,2017-03-31
...,...,...
18396357,FGpiy2mB+vXLKziYRcY/xJcJEFJfRDfUqlU+p760f7E=,2017-03-14
18396358,iZRjKNMrw5ffEbfXODLhV/0tJLPbOH3am1WYDgqBf8Q=,2017-03-06
18396359,yztw4Y0EggG0w2wPkbMZx7ke7saSx7dLSfMheHZG/DQ=,2017-03-31
18396360,swCHwkNx30/aENjq30qqaLlm7bUUytbMXdz1bH7g0Jk=,2017-03-07


In [4]:
df_ul=df_ul.append(df_ul_v2)
df_ul

Unnamed: 0,msno,date
0,rxIP2f2aN0rYNp+toI0Obt/N/FYQX8hcO1fTmmy2h34=,2015-05-13
1,rxIP2f2aN0rYNp+toI0Obt/N/FYQX8hcO1fTmmy2h34=,2015-07-09
2,yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8=,2015-01-05
3,yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8=,2015-03-06
4,yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8=,2015-05-01
...,...,...
18396357,FGpiy2mB+vXLKziYRcY/xJcJEFJfRDfUqlU+p760f7E=,2017-03-14
18396358,iZRjKNMrw5ffEbfXODLhV/0tJLPbOH3am1WYDgqBf8Q=,2017-03-06
18396359,yztw4Y0EggG0w2wPkbMZx7ke7saSx7dLSfMheHZG/DQ=,2017-03-31
18396360,swCHwkNx30/aENjq30qqaLlm7bUUytbMXdz1bH7g0Jk=,2017-03-07


In [5]:
min_date=df_ul['date'].min()

In [6]:
#df_ul.groupby(["msno",pd.Grouper(key="date",freq="30D", origin='2015-01-01')])

In [7]:
max_date=df_ul['date'].max()

Using the full logging information in terms of active days seems not to work as desired due to high memory consumption while grouping the dates in 30 day intervals, checking on how many days the users had been active. Try now to operate in junks of 180 days, and grouping those per user to reduce the overall load, aka always 6 group periods and then sum up all at the end.

In [8]:
day_period=90
n_time_periods=math.ceil((max_date-min_date)/dt.timedelta(days=day_period))
time_periods=[min_date + i*dt.timedelta(days=day_period) for i in range (0,n_time_periods+1)]
time_periods

[Timestamp('2015-01-01 00:00:00'),
 Timestamp('2015-04-01 00:00:00'),
 Timestamp('2015-06-30 00:00:00'),
 Timestamp('2015-09-28 00:00:00'),
 Timestamp('2015-12-27 00:00:00'),
 Timestamp('2016-03-26 00:00:00'),
 Timestamp('2016-06-24 00:00:00'),
 Timestamp('2016-09-22 00:00:00'),
 Timestamp('2016-12-21 00:00:00'),
 Timestamp('2017-03-21 00:00:00'),
 Timestamp('2017-06-19 00:00:00')]

In [9]:
df_activity_per_user_list=[]
for tp in range(0,len(time_periods)-1):
    print("address periods",time_periods[tp],time_periods[tp+1])
    df_ul_in_period=df_ul[(df_ul['date']>=time_periods[tp]) & (df_ul['date']<time_periods[tp+1])].groupby(["msno",pd.Grouper(
        key="date",freq="30D", origin='2015-01-01')]).agg(
        activedays_per_msno=('msno','count')).reset_index()
    df_activity_per_user_list.append(df_ul_in_period)

address periods 2015-01-01 00:00:00 2015-04-01 00:00:00
address periods 2015-04-01 00:00:00 2015-06-30 00:00:00
address periods 2015-06-30 00:00:00 2015-09-28 00:00:00
address periods 2015-09-28 00:00:00 2015-12-27 00:00:00
address periods 2015-12-27 00:00:00 2016-03-26 00:00:00
address periods 2016-03-26 00:00:00 2016-06-24 00:00:00
address periods 2016-06-24 00:00:00 2016-09-22 00:00:00
address periods 2016-09-22 00:00:00 2016-12-21 00:00:00
address periods 2016-12-21 00:00:00 2017-03-21 00:00:00
address periods 2017-03-21 00:00:00 2017-06-19 00:00:00


In [11]:
large_df_activity_30day_intervalslarge_df_activity_30day_intervalslarge_df_activity_30day_intervals = pd.concat(df_activity_per_user_list, ignore_index=True)
large_df_activity_30day_intervals

Unnamed: 0,msno,date,activedays_per_msno
0,+++IZseRRiQS9aaSkH6cMYU6bGDcxUieAi/tH67sC5s=,2015-01-01,20
1,+++IZseRRiQS9aaSkH6cMYU6bGDcxUieAi/tH67sC5s=,2015-01-31,15
2,+++IZseRRiQS9aaSkH6cMYU6bGDcxUieAi/tH67sC5s=,2015-03-02,30
3,+++l/EXNMLTijfLBa8p2TUVVVp2aFGSuUI/h7mLmthw=,2015-01-01,12
4,+++l/EXNMLTijfLBa8p2TUVVVp2aFGSuUI/h7mLmthw=,2015-01-31,9
...,...,...,...
28702517,zzy0oyiTnRTo5Mbg23oKbBkf9eoaS7+eU4V+d14bzfY=,2017-03-21,8
28702518,zzy7iqSpfcRq7R4hmKKuhI+CJRs79a6pteqEggpiNO0=,2017-03-21,11
28702519,zzyHq6TK2+cBkeGFUHvh12Z7UxFZiSM7dOOSllSBPDw=,2017-03-21,10
28702520,zzz1Dc3P9s53HAowRTrm3fNsWju5yeN4YBfNDq7Z99Q=,2017-03-21,5


In [12]:
df_user_act_retention_30days = pd.crosstab(large_df_activity_30day_intervals['msno'], large_df_activity_30day_intervals['date'].dt.date).reset_index()

df_30days_mindate_of_msno = large_df_activity_30day_intervals.groupby('msno')['date'].min().reset_index()
df_30days_mindate_of_msno=df_30days_mindate_of_msno.rename(columns={"date": "min_activity_date"})

df_user_act_retention_30days= pd.merge(df_user_act_retention_30days,df_30days_mindate_of_msno,on="msno")
df_user_act_retention_30days

Unnamed: 0,msno,2015-01-01,2015-01-31,2015-03-02,2015-04-01,2015-05-01,2015-05-31,2015-06-30,2015-07-30,2015-08-29,...,2016-07-24,2016-08-23,2016-09-22,2016-10-22,2016-11-21,2016-12-21,2017-01-20,2017-02-19,2017-03-21,min_activity_date
0,+++4vcS9aMH7KWdfh5git6nA5fC5jjisd5H/NcM++WM=,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2015-04-01
1,+++EI4HgyhgcJHIPXk/VRP7bt17+2joG39T6oEfJ+tc=,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2016-03-26
2,+++FOrTS7ab3tIgIh8eWwX4FqRv8w/FoiOuyXsFvphY=,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,2016-08-23
3,+++IZseRRiQS9aaSkH6cMYU6bGDcxUieAi/tH67sC5s=,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,2015-01-01
4,+++TipL0Kt3JvgNE9ahuJ8o+drJAnQINtxD4c5GePXI=,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2015-12-27
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5339417,zzzqx+aMPSFYjW71JqJ6T/hita+iVemVWzJTE4yQRx8=,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2016-02-25
5339418,zzztPAN9xjMytpZ0RN2gU9mScDULJnHQZK8eZb4uELU=,1,1,1,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2015-01-01
5339419,zzztsqkufVj9DPVJDM3FxDkhlbCL5z4aiYxgPSGkIK4=,0,1,1,1,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2015-01-31
5339420,zzzueVTwIa5TjXnG2c77bohCVkuksqLkd5mQTP0wTwQ=,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,2017-02-19


In [13]:
check_dates_30d = df_user_act_retention_30days.columns[1:-1]
print(check_dates_30d)

Index([2015-01-01, 2015-01-31, 2015-03-02, 2015-04-01, 2015-05-01, 2015-05-31,
       2015-06-30, 2015-07-30, 2015-08-29, 2015-09-28, 2015-10-28, 2015-11-27,
       2015-12-27, 2016-01-26, 2016-02-25, 2016-03-26, 2016-04-25, 2016-05-25,
       2016-06-24, 2016-07-24, 2016-08-23, 2016-09-22, 2016-10-22, 2016-11-21,
       2016-12-21, 2017-01-20, 2017-02-19, 2017-03-21],
      dtype='object')


In [15]:
retention_array = []
for i in range(len(check_dates_30d)-1):
    retention_data = {}
    selected_check_date = check_dates_30d[i+1]
    prev_check_date = check_dates_30d[i]
    retention_data['min_activity_date'] = selected_check_date
    retention_data['TotalActiveUserCount'] = df_user_act_retention_30days[selected_check_date].sum()
    retention_data['NewUserCount'] = df_user_act_retention_30days[(df_user_act_retention_30days[selected_check_date]>0) & (selected_check_date==df_user_act_retention_30days['min_activity_date'].dt.date)][selected_check_date].sum()
    retention_data['RetainedUserCount'] = df_user_act_retention_30days[(df_user_act_retention_30days[selected_check_date]>0) & (df_user_act_retention_30days[prev_check_date]>0)][selected_check_date].sum()
    retention_data['InactiveUserCount'] = df_user_act_retention_30days[(df_user_act_retention_30days[selected_check_date]==0) & (df_user_act_retention_30days[prev_check_date]>0)][selected_check_date].count()
    #Customers reactivated after at least 1 period of inactivity
    retention_data['Reactivated_1Period_PLUS_UserCount'] = df_user_act_retention_30days[(df_user_act_retention_30days[selected_check_date]>0) & (df_user_act_retention_30days[prev_check_date]==0) & (prev_check_date>df_user_act_retention_30days['min_activity_date'].dt.date)][selected_check_date].count()
    # Customers reactivated after at least 2 periods of inactivity, we can only check that if we are two periods in
    if (i>0):
        second_prev_check_date = check_dates_30d[i-1] # if months[i-1].isna() == False else 0
        retention_data['Reactivated_2Period_PLUS_UserCount'] = df_user_act_retention_30days[(df_user_act_retention_30days[selected_check_date]>0) & (df_user_act_retention_30days[prev_check_date]==0) & (df_user_act_retention_30days[second_prev_check_date]==0) & (prev_check_date>df_user_act_retention_30days['min_activity_date'].dt.date)][selected_check_date].count()
    else:
        retention_data['Reactivated_2Period_PLUS_UserCount'] = 0
    # Customers reactivated after at least 3 periods of inactivity
    if (i>1):
        second_prev_check_date = check_dates_30d[i-1] # if months[i-1].isna() == False else 0
        third_prev_check_date = check_dates_30d[i-2] # if months[i-2].isna() == False else 0
        retention_data['Reactivated_3Period_PLUS_UserCount'] = df_user_act_retention_30days[(df_user_act_retention_30days[selected_check_date]>0) & (df_user_act_retention_30days[prev_check_date]==0) & (df_user_act_retention_30days[second_prev_check_date]==0) & (df_user_act_retention_30days[third_prev_check_date]==0) & (prev_check_date>df_user_act_retention_30days['min_activity_date'].dt.date)][selected_check_date].count()
    else:
        retention_data['Reactivated_3Period_PLUS_UserCount'] = 0
        
    # Customers reactivated after at ONLY 1 period of inactivity
    if(i==0):
        retention_data['Reactivated_1Period_ONLY_UserCount'] = df_user_act_retention_30days[(df_user_act_retention_30days[selected_check_date]>0) & (df_user_act_retention_30days[prev_check_date]==0) &  (prev_check_date>df_user_act_retention_30days['min_activity_date'].dt.date)][selected_check_date].count()
    else: 
        retention_data['Reactivated_1Period_ONLY_UserCount'] = df_user_act_retention_30days[(df_user_act_retention_30days[selected_check_date]>0) & (df_user_act_retention_30days[prev_check_date]==0) & (df_user_act_retention_30days[second_prev_check_date]>0) & (prev_check_date>df_user_act_retention_30days['min_activity_date'].dt.date)][selected_check_date].count() 
    
    # Customers reactivated after at ONLY 2 periods of inactivity
    if(i>0):
        second_prev_check_date = check_dates_30d[i-1] # if months[i-1].isna() == False else 0
        retention_data['Reactivated_2Period_ONLY_UserCount'] = df_user_act_retention_30days[(df_user_act_retention_30days[selected_check_date]>0) & (df_user_act_retention_30days[prev_check_date]==0) & (df_user_act_retention_30days[second_prev_check_date]==0) & (df_user_act_retention_30days[third_prev_check_date]>0) & (prev_check_date>df_user_act_retention_30days['min_activity_date'].dt.date)][selected_check_date].count()
    else:
        retention_data['Reactivated_2Period_ONLY_UserCount'] = 0
    # Customers reactivated after at ONLY 3 periods of inactivity
    third_prev_check_date = check_dates_30d[i-2] # if months[i-2].isna() == False else 0
    if(i>2):
        second_prev_check_date = check_dates_30d[i-1] # if months[i-1].isna() == False else 0
        fourth_prev_check_date = check_dates_30d[i-3] # if months[i-2].isna() == False else 0
        third_prev_check_date = check_dates_30d[i-2] # if months[i-2].isna() == False else 0
        retention_data['Reactivated_3Period_ONLY_UserCount'] = df_user_act_retention_30days[(df_user_act_retention_30days[selected_check_date]>0) & (df_user_act_retention_30days[prev_check_date]==0) & (df_user_act_retention_30days[second_prev_check_date]==0) & (df_user_act_retention_30days[third_prev_check_date]==0) & (prev_check_date>df_user_act_retention_30days['min_activity_date'].dt.date)][selected_check_date].count()
    else:
        retention_data['Reactivated_3Period_ONLY_UserCount'] = 0
    retention_array.append(retention_data)

In [16]:
tx_retention_plot = pd.DataFrame(retention_array)
tx_retention_plot["TotalActiveUserCount_prevPeriod"]=tx_retention_plot["TotalActiveUserCount"].shift(1)
tx_retention_plot["ActiveUserRate"]=tx_retention_plot["TotalActiveUserCount"]/tx_retention_plot["TotalActiveUserCount_prevPeriod"]
tx_retention_plot["NewUserRate"]=tx_retention_plot["NewUserCount"]/tx_retention_plot["TotalActiveUserCount_prevPeriod"]
tx_retention_plot["RetainedUserRate"]=tx_retention_plot["RetainedUserCount"]/tx_retention_plot["TotalActiveUserCount_prevPeriod"]
tx_retention_plot["ChurnRate"]=tx_retention_plot["InactiveUserCount"]/tx_retention_plot["TotalActiveUserCount_prevPeriod"]
tx_retention_plot["ReactivationRate"]=tx_retention_plot["Reactivated_1Period_PLUS_UserCount"]/tx_retention_plot["TotalActiveUserCount_prevPeriod"]
#print(np.max(tx_retention_plot["TotalActiveUserCount"]-tx_retention_plot["NewUserCount"]-tx_retention_plot["RetainedUserCount"]-tx_retention_plot["Reactivated_1Period_PLUS_UserCount"]))
tx_retention_plot.head(50)

Unnamed: 0,min_activity_date,TotalActiveUserCount,NewUserCount,RetainedUserCount,InactiveUserCount,Reactivated_1Period_PLUS_UserCount,Reactivated_2Period_PLUS_UserCount,Reactivated_3Period_PLUS_UserCount,Reactivated_1Period_ONLY_UserCount,Reactivated_2Period_ONLY_UserCount,Reactivated_3Period_ONLY_UserCount,TotalActiveUserCount_prevPeriod,ActiveUserRate,NewUserRate,RetainedUserRate,ChurnRate,ReactivationRate
0,2015-01-31,948986,210603,738383,191354,0,0,0,0,0,0,,,,,,
1,2015-03-02,936596,172092,739768,209218,24736,0,0,24736,0,0,948986.0,0.986944,0.181343,0.779535,0.220465,0.026066
2,2015-04-01,939930,145035,751170,185426,43725,13984,0,29741,13984,0,936596.0,1.00356,0.154853,0.802021,0.197979,0.046685
3,2015-05-01,917844,118848,754980,184950,44016,19584,6842,24432,12742,6842,939930.0,0.976503,0.126443,0.80323,0.19677,0.046829
4,2015-05-31,918345,116150,751769,166075,50426,23933,12765,26493,11168,12765,917844.0,1.000546,0.126547,0.81906,0.18094,0.05494
5,2015-06-30,866356,119409,713611,204734,33336,15762,8462,17574,7300,8462,918345.0,0.943388,0.130026,0.777062,0.222938,0.0363
6,2015-07-30,907676,167353,703728,162628,36595,17419,10249,19176,7170,10249,866356.0,1.047694,0.193169,0.812285,0.187715,0.04224
7,2015-08-29,907009,137981,730609,177067,38419,20697,11783,17722,8914,11783,907676.0,0.999265,0.152016,0.804923,0.195077,0.042327
8,2015-09-28,992513,214981,742114,164895,35418,20591,13955,14827,6636,13955,907009.0,1.09427,0.237022,0.818199,0.181801,0.039049
9,2015-10-28,1039647,242735,761638,230875,35274,20192,14093,15082,6099,14093,992513.0,1.04749,0.244566,0.767383,0.232617,0.03554


In [24]:
from datetime import datetime, timedelta,date

tx_retention_plot=tx_retention_plot[(tx_retention_plot["min_activity_date"]>datetime.strptime("2015-01-01", '%Y-%m-%d').date())
                                & (tx_retention_plot["min_activity_date"]<datetime.strptime("2017-03-21", '%Y-%m-%d').date())]

In [26]:
plot_data = [
    go.Scatter(
        x=tx_retention_plot['min_activity_date'],
        y=tx_retention_plot['TotalActiveUserCount'],
        name="TotalActiveUserCount"
    ),
    go.Scatter(
        x=tx_retention_plot['min_activity_date'],
        y=tx_retention_plot['NewUserCount'],
        name="NewUserCount"
    ),
    go.Scatter(
        x=tx_retention_plot['min_activity_date'],
        y=tx_retention_plot['RetainedUserCount'],
        name="RetainedUserCount"
    ),
    # Inactive customers keep adding up as we know from Recency, therefore this is not shown to avoid squeezing the chart
    go.Scatter(
       x=tx_retention_plot['min_activity_date'],
       y=tx_retention_plot['InactiveUserCount'],
       name="ChurnedUserCount"
    ),
    go.Scatter(
        x=tx_retention_plot['min_activity_date'],
        y=tx_retention_plot['Reactivated_1Period_PLUS_UserCount'],
        name="ReactivatedUserCount"
    )
]

plot_layout = go.Layout(
        xaxis={"type": "category"},
        title='Periodical (30 days) Users Evolution'
    )
fig = go.Figure(data=plot_data, layout=plot_layout)
pyoff.iplot(fig)
pio.write_image(fig, 'figures/KKBox_30d_UserEvolution_AllAbsoluteNumbers.png')

In [30]:
tx_retention_plot=tx_retention_plot[(tx_retention_plot["min_activity_date"]>datetime.strptime("2015-01-31", '%Y-%m-%d').date())
                                & (tx_retention_plot["min_activity_date"]<datetime.strptime("2017-03-21", '%Y-%m-%d').date())]

In [33]:
plot_data = [

    go.Scatter(
        x=tx_retention_plot['min_activity_date'],
        y=tx_retention_plot['RetainedUserRate'],
        name="Retained User Rate"
    ),
    go.Scatter(
        x=tx_retention_plot['min_activity_date'],
        y=tx_retention_plot['ReactivationRate'],
        name="Reactivated User Rate"
    ),
    go.Scatter(
        x=tx_retention_plot['min_activity_date'],
        y=tx_retention_plot['ChurnRate'],
        name="Churn Rate"
    )
]

plot_layout = go.Layout(
        xaxis={"type": "category"},
        title='Periodical (30 days) Users Evolution',
        yaxis={"tickformat": ',.0%',
                "range": [0,1.0]}
    )
fig = go.Figure(data=plot_data, layout=plot_layout)
fig.update_layout(
    font=dict(size=16),
    legend=dict(
    #title_text='Rate To Previous Active Users',
    yanchor="top",
    y=0.75,
    xanchor="right",
    x=0.80
))
pyoff.iplot(fig)
pio.write_image(fig, 'figures/KKBox_30d_UserEvolution_30d_UserEvolution_Retained_Reactivation_Rates.png')

### Overall User Evolution

Studied in periods of 30 days the retained user rate is rather flat around 80 %, the reactivated user rate is typically of the order of 3-4 %, thus the churn rate is around 20 %, slightly lower in the first quarter of 2017. Definitions are defined as activity in terms of users streaming songs.