# Android Market on Google Play EDA 

By DataCamp Unguided Project and yeah ***bit improvement***

## 1. Introduction
<p><img src="https://assets.datacamp.com/production/project_1197/img/google_play_store.png" alt="Google Play logo"></p>
<p>Mobile apps are everywhere. They are easy to create and can be very lucrative from the business standpoint. Specifically, Android is expanding as an operating system and has captured more than 74% of the total market<sup><a href="https://www.statista.com/statistics/272698/global-market-share-held-by-mobile-operating-systems-since-2009">[1]</a></sup>. </p>
<p>The Google Play Store apps data has enormous potential to facilitate data-driven decisions and insights for businesses. In this notebook, we will analyze the Android app market by comparing ~10k apps in Google Play across different categories. We will also use the user reviews to draw a qualitative comparision between the apps.</p>
<p>The dataset you will use here was scraped from Google Play Store in September 2018 and was published on <a href="https://www.kaggle.com/lava18/google-play-store-apps">Kaggle</a>. Here are the details: <br>
<br></p>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/apps.csv</b></div>
This file contains all the details of the apps on Google Play. There are 9 features that describe a given app.
<ul>
    <li><b>App:</b> Name of the app</li>
    <li><b>Category:</b> Category of the app. Some examples are: ART_AND_DESIGN, FINANCE, COMICS, BEAUTY etc.</li>
    <li><b>Rating:</b> The current average rating (out of 5) of the app on Google Play</li>
    <li><b>Reviews:</b> Number of user reviews given on the app</li>
    <li><b>Size:</b> Size of the app in MB (megabytes)</li>
    <li><b>Installs:</b> Number of times the app was downloaded from Google Play</li>
    <li><b>Type:</b> Whether the app is paid or free</li>
    <li><b>Price:</b> Price of the app in US$</li>
    <li><b>Last Updated:</b> Date on which the app was last updated on Google Play </li>

</ul>
</div>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/user_reviews.csv</b></div>
This file contains a random sample of 100 <i>[most helpful first](https://www.androidpolice.com/2019/01/21/google-play-stores-redesigned-ratings-and-reviews-section-lets-you-easily-filter-by-star-rating/)</i> user reviews for each app. The text in each review has been pre-processed and passed through a sentiment analyzer.
<ul>
    <li><b>App:</b> Name of the app on which the user review was provided. Matches the `App` column of the `apps.csv` file</li>
    <li><b>Review:</b> The pre-processed user review text</li>
    <li><b>Sentiment Category:</b> Sentiment category of the user review - Positive, Negative or Neutral</li>
    <li><b>Sentiment Score:</b> Sentiment score of the user review. It lies between [-1,1]. A higher score denotes a more positive sentiment.</li>

</ul>
</div>
<p>From here on, it will be your task to explore and manipulate the data until you are able to answer the three questions described in the instructions panel.<br></p>

## Importing modules

In [1]:
import pandas as pd

## Read the data

In [2]:
apps = pd.read_csv('datasets/apps.csv')
apps.info()
apps.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9659 entries, 0 to 9658
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   App           9659 non-null   object 
 1   Category      9659 non-null   object 
 2   Rating        8196 non-null   float64
 3   Reviews       9659 non-null   int64  
 4   Size          8432 non-null   float64
 5   Installs      9659 non-null   object 
 6   Type          9659 non-null   object 
 7   Price         9659 non-null   float64
 8   Last Updated  9659 non-null   object 
dtypes: float64(3), int64(1), object(5)
memory usage: 679.3+ KB


Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19.0,"10,000+",Free,0.0,"January 7, 2018"
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14.0,"500,000+",Free,0.0,"January 15, 2018"
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7,"5,000,000+",Free,0.0,"August 1, 2018"
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25.0,"50,000,000+",Free,0.0,"June 8, 2018"
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8,"100,000+",Free,0.0,"June 20, 2018"


In [3]:
chars_picker_remove = [',', '+']

for char in chars_picker_remove:
    apps['Installs'] = apps['Installs'].apply(lambda x: x.replace(char, ''))
    
apps.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19.0,10000,Free,0.0,"January 7, 2018"
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14.0,500000,Free,0.0,"January 15, 2018"
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7,5000000,Free,0.0,"August 1, 2018"
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25.0,50000000,Free,0.0,"June 8, 2018"
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8,100000,Free,0.0,"June 20, 2018"


In [4]:
apps.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9659 entries, 0 to 9658
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   App           9659 non-null   object 
 1   Category      9659 non-null   object 
 2   Rating        8196 non-null   float64
 3   Reviews       9659 non-null   int64  
 4   Size          8432 non-null   float64
 5   Installs      9659 non-null   object 
 6   Type          9659 non-null   object 
 7   Price         9659 non-null   float64
 8   Last Updated  9659 non-null   object 
dtypes: float64(3), int64(1), object(5)
memory usage: 679.3+ KB


In [5]:
apps['Installs'] = apps['Installs'].astype(int)
apps.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9659 entries, 0 to 9658
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   App           9659 non-null   object 
 1   Category      9659 non-null   object 
 2   Rating        8196 non-null   float64
 3   Reviews       9659 non-null   int64  
 4   Size          8432 non-null   float64
 5   Installs      9659 non-null   int32  
 6   Type          9659 non-null   object 
 7   Price         9659 non-null   float64
 8   Last Updated  9659 non-null   object 
dtypes: float64(3), int32(1), int64(1), object(4)
memory usage: 641.5+ KB


In [6]:
app_category_info = apps.groupby('Category').agg({'App':'count', 'Price':'mean', 'Rating':'mean'})
app_category_info

Unnamed: 0_level_0,App,Price,Rating
Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ART_AND_DESIGN,64,0.093281,4.357377
AUTO_AND_VEHICLES,85,0.158471,4.190411
BEAUTY,53,0.0,4.278571
BOOKS_AND_REFERENCE,222,0.539505,4.34497
BUSINESS,420,0.417357,4.098479
COMICS,56,0.0,4.181481
COMMUNICATION,315,0.263937,4.121484
DATING,171,0.160468,3.970149
EDUCATION,119,0.150924,4.364407
ENTERTAINMENT,102,0.078235,4.135294


In [7]:
app_category_info = app_category_info.rename(columns={'App':'App Count', 'Price':'Average Price', 'Rating':'Average Rating'})
app_category_info

Unnamed: 0_level_0,App Count,Average Price,Average Rating
Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ART_AND_DESIGN,64,0.093281,4.357377
AUTO_AND_VEHICLES,85,0.158471,4.190411
BEAUTY,53,0.0,4.278571
BOOKS_AND_REFERENCE,222,0.539505,4.34497
BUSINESS,420,0.417357,4.098479
COMICS,56,0.0,4.181481
COMMUNICATION,315,0.263937,4.121484
DATING,171,0.160468,3.970149
EDUCATION,119,0.150924,4.364407
ENTERTAINMENT,102,0.078235,4.135294


In [8]:
reviews = pd.read_csv('datasets/user_reviews.csv')
reviews.info()
reviews.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 64295 entries, 0 to 64294
Data columns (total 4 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   App                 64295 non-null  object 
 1   Review              37427 non-null  object 
 2   Sentiment Category  37432 non-null  object 
 3   Sentiment Score     37432 non-null  float64
dtypes: float64(1), object(3)
memory usage: 2.0+ MB


Unnamed: 0,App,Review,Sentiment Category,Sentiment Score
0,10 Best Foods for You,I like eat delicious food. That's I'm cooking ...,Positive,1.0
1,10 Best Foods for You,This help eating healthy exercise regular basis,Positive,0.25
2,10 Best Foods for You,,,
3,10 Best Foods for You,Works great especially going grocery store,Positive,0.4
4,10 Best Foods for You,Best idea us,Positive,1.0


### Finance Apps

In [9]:
finance_apps = apps[apps['Category'] == 'FINANCE']
finance_apps.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated
837,K PLUS,FINANCE,4.4,124424,,10000000,Free,0.0,"June 26, 2018"
838,ING Banking,FINANCE,4.4,39041,,1000000,Free,0.0,"August 3, 2018"
839,Citibanamex Movil,FINANCE,3.6,52306,42.0,5000000,Free,0.0,"July 27, 2018"
840,The postal bank,FINANCE,3.7,36718,,5000000,Free,0.0,"July 16, 2018"
841,KTB Netbank,FINANCE,3.8,42644,19.0,5000000,Free,0.0,"June 28, 2018"


In [10]:
free_finance_apps = finance_apps[finance_apps['Type'] == 'Free']
free_finance_apps.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated
837,K PLUS,FINANCE,4.4,124424,,10000000,Free,0.0,"June 26, 2018"
838,ING Banking,FINANCE,4.4,39041,,1000000,Free,0.0,"August 3, 2018"
839,Citibanamex Movil,FINANCE,3.6,52306,42.0,5000000,Free,0.0,"July 27, 2018"
840,The postal bank,FINANCE,3.7,36718,,5000000,Free,0.0,"July 16, 2018"
841,KTB Netbank,FINANCE,3.8,42644,19.0,5000000,Free,0.0,"June 28, 2018"


In [11]:
merged_df_finance = pd.merge(free_finance_apps, reviews, on = 'App')
merged_df_finance.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated,Review,Sentiment Category,Sentiment Score
0,Citibanamex Movil,FINANCE,3.6,52306,42.0,5000000,Free,0.0,"July 27, 2018","Forget paying app, designed make fail payments...",Negative,-0.5
1,Citibanamex Movil,FINANCE,3.6,52306,42.0,5000000,Free,0.0,"July 27, 2018","It's working expected, talking best bank Mexic...",Positive,0.4
2,Citibanamex Movil,FINANCE,3.6,52306,42.0,5000000,Free,0.0,"July 27, 2018",It has many problems with Android 8.1. You can...,Positive,0.25
3,Citibanamex Movil,FINANCE,3.6,52306,42.0,5000000,Free,0.0,"July 27, 2018","I changed my phone to a Xiaomi Redmi Note 5, t...",Positive,0.175
4,Citibanamex Movil,FINANCE,3.6,52306,42.0,5000000,Free,0.0,"July 27, 2018",In her eagerness to make her look pretty with ...,Negative,-0.158333


In [12]:
app_finance_sentiment_score = merged_df_finance.groupby('App').agg({'Sentiment Score':'mean'})
app_finance_sentiment_score.head()

Unnamed: 0_level_0,Sentiment Score
App,Unnamed: 1_level_1
A+ Mobile,0.329592
ACE Elite,0.252171
Acorns - Invest Spare Change,0.046667
Amex Mobile,0.175666
Associated Credit Union Mobile,0.388093


In [13]:
# Best 5 App in Finance (Free)

user_finance_feedback = app_finance_sentiment_score.sort_values(by='Sentiment Score', ascending=False)
user_finance_feedback.head(5)

Unnamed: 0_level_0,Sentiment Score
App,Unnamed: 1_level_1
BBVA Spain,0.515086
Associated Credit Union Mobile,0.388093
BankMobile Vibe App,0.353455
A+ Mobile,0.329592
Current debit card and app made for teens,0.327258


### Family Apps

In [14]:
family_apps = apps[apps['Category'] == 'FAMILY']
family_apps.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated
1575,Jewels Crush- Match 3 Puzzle,FAMILY,4.4,14774,19.0,1000000,Free,0.0,"July 23, 2018"
1576,Coloring & Learn,FAMILY,4.4,12753,51.0,5000000,Free,0.0,"July 17, 2018"
1577,Mahjong,FAMILY,4.5,33983,22.0,5000000,Free,0.0,"August 2, 2018"
1578,Super ABC! Learning games for kids! Preschool ...,FAMILY,4.6,20267,46.0,1000000,Free,0.0,"July 16, 2018"
1579,Toy Pop Cubes,FAMILY,4.5,5761,21.0,1000000,Free,0.0,"July 4, 2018"


In [15]:
free_family_apps = family_apps[family_apps['Type'] == 'Free']
free_family_apps.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated
1575,Jewels Crush- Match 3 Puzzle,FAMILY,4.4,14774,19.0,1000000,Free,0.0,"July 23, 2018"
1576,Coloring & Learn,FAMILY,4.4,12753,51.0,5000000,Free,0.0,"July 17, 2018"
1577,Mahjong,FAMILY,4.5,33983,22.0,5000000,Free,0.0,"August 2, 2018"
1578,Super ABC! Learning games for kids! Preschool ...,FAMILY,4.6,20267,46.0,1000000,Free,0.0,"July 16, 2018"
1579,Toy Pop Cubes,FAMILY,4.5,5761,21.0,1000000,Free,0.0,"July 4, 2018"


In [16]:
merged_df_family = pd.merge(free_family_apps, reviews, on = 'App')
merged_df_family.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated,Review,Sentiment Category,Sentiment Score
0,Coloring & Learn,FAMILY,4.4,12753,51.0,5000000,Free,0.0,"July 17, 2018",thanks nice 2 year old baby boy want color eve...,Positive,0.3
1,Coloring & Learn,FAMILY,4.4,12753,51.0,5000000,Free,0.0,"July 17, 2018",Love switch crayons different coloring books,Positive,0.25
2,Coloring & Learn,FAMILY,4.4,12753,51.0,5000000,Free,0.0,"July 17, 2018",Very easy 2 year old grandson use.,Positive,0.331667
3,Coloring & Learn,FAMILY,4.4,12753,51.0,5000000,Free,0.0,"July 17, 2018",I like whole lot everytime I color there's sma...,Negative,-0.025
4,Coloring & Learn,FAMILY,4.4,12753,51.0,5000000,Free,0.0,"July 17, 2018",Thank app. My 2 yrs old baby girl loved it.,Positive,0.4


In [17]:
app_fams_sentiment_score = merged_df_family.groupby('App').agg({'Sentiment Score':'mean'})
app_fams_sentiment_score.head()

Unnamed: 0_level_0,Sentiment Score
App,Unnamed: 1_level_1
A Call From Santa Claus!,0.008294
A Word A Day,0.277273
ABC Kids - Tracing & Phonics,0.36777
ABCmouse.com,0.182504
Akinator,-0.014899


In [18]:
# Best 5 App in Family (Free)

user_feedback_fams = app_fams_sentiment_score.sort_values(by='Sentiment Score', ascending=False)
user_feedback_fams.head(5)

Unnamed: 0_level_0,Sentiment Score
App,Unnamed: 1_level_1
Comptia A+ 220-901 & 220-902,0.500792
All-in-One Mahjong 3 FREE,0.5
Coloring & Learn,0.413822
Drawing for Kids Learning Games for Toddlers age 3,0.386667
ABC Kids - Tracing & Phonics,0.36777


### Productivity App

In [19]:
productivity_apps = apps[apps['Category'] == 'PRODUCTIVITY']
productivity_apps.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated
2716,Microsoft Word,PRODUCTIVITY,4.5,2084126,,500000000,Free,0.0,"July 11, 2018"
2717,"All-In-One Toolbox: Cleaner, Booster, App Manager",PRODUCTIVITY,4.7,536926,,10000000,Free,0.0,"August 5, 2018"
2718,Adobe Acrobat Reader,PRODUCTIVITY,4.3,3016297,,100000000,Free,0.0,"April 17, 2018"
2719,"AVG Cleaner – Speed, Battery & Memory Booster",PRODUCTIVITY,4.4,1188154,24.0,10000000,Free,0.0,"June 14, 2018"
2720,Google Drive,PRODUCTIVITY,4.4,2731171,,1000000000,Free,0.0,"August 6, 2018"


In [20]:
free_productivity_apps = productivity_apps[productivity_apps['Type'] == 'Free']
free_productivity_apps .head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated
2716,Microsoft Word,PRODUCTIVITY,4.5,2084126,,500000000,Free,0.0,"July 11, 2018"
2717,"All-In-One Toolbox: Cleaner, Booster, App Manager",PRODUCTIVITY,4.7,536926,,10000000,Free,0.0,"August 5, 2018"
2718,Adobe Acrobat Reader,PRODUCTIVITY,4.3,3016297,,100000000,Free,0.0,"April 17, 2018"
2719,"AVG Cleaner – Speed, Battery & Memory Booster",PRODUCTIVITY,4.4,1188154,24.0,10000000,Free,0.0,"June 14, 2018"
2720,Google Drive,PRODUCTIVITY,4.4,2731171,,1000000000,Free,0.0,"August 6, 2018"


In [21]:
merged_df_productivity = pd.merge(free_productivity_apps , reviews, on = 'App')
merged_df_productivity.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated,Review,Sentiment Category,Sentiment Score
0,"All-In-One Toolbox: Cleaner, Booster, App Manager",PRODUCTIVITY,4.7,536926,,10000000,Free,0.0,"August 5, 2018",The pop-up ads reminders annoying distracting....,Negative,-0.4
1,"All-In-One Toolbox: Cleaner, Booster, App Manager",PRODUCTIVITY,4.7,536926,,10000000,Free,0.0,"August 5, 2018",I wanted deny autostart several applications (...,Neutral,0.0
2,"All-In-One Toolbox: Cleaner, Booster, App Manager",PRODUCTIVITY,4.7,536926,,10000000,Free,0.0,"August 5, 2018",The works well. Been using years. I bought pro...,Negative,-0.4
3,"All-In-One Toolbox: Cleaner, Booster, App Manager",PRODUCTIVITY,4.7,536926,,10000000,Free,0.0,"August 5, 2018",I've using years love it. I'd recommend anyone...,Positive,0.375
4,"All-In-One Toolbox: Cleaner, Booster, App Manager",PRODUCTIVITY,4.7,536926,,10000000,Free,0.0,"August 5, 2018",One complaint I beta user get new phone. After...,Negative,-0.009091


In [22]:
app_productivity_sentiment_score = merged_df_productivity.groupby('App').agg({'Sentiment Score':'mean'})
app_productivity_sentiment_score.head()

Unnamed: 0_level_0,Sentiment Score
App,Unnamed: 1_level_1
7 Weeks - Habit & Goal Tracker,0.14282
ASUS Calling Screen,0.063164
ASUS Quick Memo,0.029167
ASUS SuperNote,0.192791
"AVG Cleaner – Speed, Battery & Memory Booster",0.273664


In [23]:
user_feedback_productivity = app_productivity_sentiment_score.sort_values(by='Sentiment Score', ascending=False)
user_feedback_productivity.head()

Unnamed: 0_level_0,Sentiment Score
App,Unnamed: 1_level_1
Google Slides,0.933333
Easy Voice Recorder,0.435837
Calendar+ Schedule Planner App,0.319071
Fake Call - Fake Caller ID,0.297794
G Cloud Backup,0.284192


## Declaring Function to find the best 5

In [24]:
def finder_big_five(category):
    category = category.upper()
    the_apps = apps[apps['Category'] == category]
    free_apps = the_apps[the_apps['Type'] == 'Free']
    merged_apps = pd.merge(free_apps, reviews, on = 'App')
    apps_sentiment_score = merged_apps.groupby('App').agg({'Sentiment Score':'mean'})
    user_feedback = apps_sentiment_score.sort_values(by='Sentiment Score', ascending=False)
    print(user_feedback.head(5))

In [25]:
finder_big_five('Social')

                                    Sentiment Score
App                                                
Google+                                    0.368056
Couple - Relationship App                  0.304423
Dating App, Flirt & Chat : W-Match         0.301667
Hide Something - Photo, Video              0.240710
Banjo                                      0.230850


In [26]:
finder_big_five('Tools')

                              Sentiment Score
App                                          
CM Flashlight (Compass, SOS)         0.556250
ASUS Sound Recorder                  0.516771
Brightest Flashlight Free ®          0.492571
Flashlight HD LED                    0.452204
Flashlight                           0.405400


In [27]:
finder_big_five('Weather')

                                                    Sentiment Score
App                                                                
APE Weather ( Live Forecast)                               0.432323
Free live weather on screen                                0.394259
ForecaWeather                                              0.360907
GO Weather - Widget, Theme, Wallpaper, Efficient           0.333400
AccuWeather: Daily Forecast & Live Weather Reports         0.200723
