<a href="https://colab.research.google.com/github/weibb123/user_segmentation_4App/blob/main/App_user_segmentation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio



df = pd.read_csv("https://raw.githubusercontent.com/weibb123/user_segmentation_4App/main/userbehaviour.csv")
df.head(5)

Unnamed: 0,userid,Average Screen Time,Average Spent on App (INR),Left Review,Ratings,New Password Request,Last Visited Minutes,Status
0,1001,17.0,634.0,1,9,7,2990,Installed
1,1002,0.0,54.0,0,4,8,24008,Uninstalled
2,1003,37.0,207.0,0,8,5,971,Installed
3,1004,32.0,445.0,1,6,2,799,Installed
4,1005,45.0,427.0,1,5,6,3668,Installed


In [3]:
print(f'Average screen time = {df["Average Screen Time"].mean()}')
print(f'Highest screen time = {df["Average Screen Time"].max()}')
print(f'Lowest screen time = {df["Average Screen Time"].min()}')

Average screen time = 24.39039039039039
Highest screen time = 50.0
Lowest screen time = 0.0


In [4]:
print(f'Average time spend on App = {df["Average Spent on App (INR)"].mean()}')
print(f'Highest time spend on App = {df["Average Spent on App (INR)"].max()}')
print(f'Lowest time spend on App = {df["Average Spent on App (INR)"].min()}')

Average time spend on App = 424.4154154154154
Highest time spend on App = 998.0
Lowest time spend on App = 0.0


look at the relationship between the spending capacity and screen time of the active users and the users who have uninstalled the app:



In [5]:
figure = px.scatter(data_frame = df,
                    x = "Average Screen Time",
                    y = "Average Spent on App (INR)",
                    size = "Average Spent on App (INR)",
                    color = "Status",
                    title = "Relationship Between Spending Capacity and Screentime",
                    trendline = "ols")
figure.show()

Those people who installed the app, have a high average on screen time and average spend on App. In other words, user stick to the App after using it.

Those who uninstalled the app leave the app at an early on before actually trying it out.

Now let’s have a look at the relationship between the ratings given by users and the average screen time:



In [6]:
figure = px.scatter(data_frame = df, 
                    x="Average Screen Time",
                    y="Ratings", 
                    size="Ratings", 
                    color= "Status", 
                    title = "Relationship Between Ratings and Screentime",
                    trendline="ols")
figure.show()

User who does not spend a lot of time on the app uninstall and give a low rating.

# K-means to find retained and lost users

Let's move forward to App user segmentation to find users that retained and lost forever using K-means algorithm.

In [11]:
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import KMeans

In [12]:
clustering_data = df[['Average Screen Time', "Left Review", "Ratings", "Last Visited Minutes",
                        "Average Spent on App (INR)", "New Password Request"]]

for i in clustering_data.columns:
  MinMaxScaler(i)

kmeans = KMeans(n_clusters=3)
clusters = kmeans.fit_predict(clustering_data)
df["segments"] = clusters

print(df.head(5))





   userid  Average Screen Time  Average Spent on App (INR)  Left Review  \
0    1001                 17.0                       634.0            1   
1    1002                  0.0                        54.0            0   
2    1003                 37.0                       207.0            0   
3    1004                 32.0                       445.0            1   
4    1005                 45.0                       427.0            1   

   Ratings  New Password Request  Last Visited Minutes       Status  segments  
0        9                     7                  2990    Installed         0  
1        4                     8                 24008  Uninstalled         1  
2        8                     5                   971    Installed         0  
3        6                     2                   799    Installed         0  
4        5                     6                  3668    Installed         0  


In [13]:
print(df['segments'].value_counts())

0    910
2     45
1     44
Name: segments, dtype: int64


we need to map these clusters in order to understand what they means...

Looking at the data...

0 -> retained

1 -> churn

2 -> needs attention

Next, let us visualize our segmentations...

In [14]:
PLOT = go.Figure()
for i in list(df["segments"].unique()):
    

    PLOT.add_trace(go.Scatter(x = df[df["segments"]== i]['Last Visited Minutes'],
                                y = df[df["segments"] == i]['Average Spent on App (INR)'],
                                mode = 'markers',marker_size = 6, marker_line_width = 1,
                                name = str(i)))
PLOT.update_traces(hovertemplate='Last Visited Minutes: %{x} <br>Average Spent on App (INR): %{y}')

    
PLOT.update_layout(width = 800, height = 800, autosize = True, showlegend = True,
                   yaxis_title = 'Average Spent on App (INR)',
                   xaxis_title = 'Last Visited Minutes',
                   scene = dict(xaxis=dict(title = 'Last Visited Minutes', titlefont_color = 'black'),
                                yaxis=dict(title = 'Average Spent on App (INR)', titlefont_color = 'black')))

