# Play store apps dataset

## Why this dataset is interesting?

The Google Play Store is Google's app marketplace. Most people access the Google Play Store when they want to install new apps onto Android their phones.

Like any market apps in the play store are subject to **supply** and **demand**... that is to say that certain kinds of apps get downloaded a lot while others don't. Certain kinds of apps get paid for while others don't. Some categories of apps have lots and lots of competition while others don't.

A dataset like this can help you spot opportunities.

# Ideas for questions this data can help you answer

* What categories of applications get a lot of downloads per day?
* What categories of applications don't get many downloads per day?
* In what app categories are there market leaders (one app that clearly is getting downloaded more than the others)?
* How many downloads per day might you expect if you took the time to build an app?
* What can the data tell you about monetization approaches?

In [1]:
import pandas as pd

In [2]:
data = pd.read_csv("datasets/google-play-store-11-2018.csv")

In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 62694 entries, 0 to 62693
Data columns (total 18 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   app_id             62694 non-null  object 
 1   title              62694 non-null  object 
 2   reviews            62683 non-null  float64
 3   ratings            62683 non-null  float64
 4   min_installs       62694 non-null  int64  
 5   score              62683 non-null  float64
 6   offers_iap         62694 non-null  bool   
 7   ad_supported       62694 non-null  bool   
 8   released           61848 non-null  object 
 9   ratings_per_day    62694 non-null  int64  
 10  genre              62693 non-null  object 
 11  genre_id           62693 non-null  object 
 12  price              62694 non-null  float64
 13  rating_one_star    62694 non-null  int64  
 14  rating_two_star    62694 non-null  int64  
 15  rating_three_star  62694 non-null  int64  
 16  rating_four_star   626

In [4]:
data.rename(columns={"title":"app_name","reviews":"review_count","ratings":"rating_count","min_installs":"total_installs","score":"avg_rating","offers_iap":"has_in_app_purchase","ad_supported":"has_ads","released":"release_date"}, inplace=True)

In [5]:
data['genre_id'].value_counts()

EDUCATION              6673
TOOLS                  6192
ENTERTAINMENT          3395
BOOKS_AND_REFERENCE    3052
LIFESTYLE              2797
BUSINESS               2717
HEALTH_AND_FITNESS     2558
PERSONALIZATION        2523
MUSIC_AND_AUDIO        2443
PRODUCTIVITY           2296
FINANCE                1982
GAME_SIMULATION        1814
PHOTOGRAPHY            1513
TRAVEL_AND_LOCAL       1356
GAME_PUZZLE            1342
MEDICAL                1329
GAME_CASUAL            1320
COMMUNICATION          1226
NEWS_AND_MAGAZINES     1154
SPORTS                 1048
GAME_ACTION             974
GAME_EDUCATIONAL        936
GAME_ARCADE             900
MAPS_AND_NAVIGATION     893
SHOPPING                873
FOOD_AND_DRINK          847
SOCIAL                  836
VIDEO_PLAYERS           744
GAME_CARD               649
GAME_BOARD              604
AUTO_AND_VEHICLES       574
GAME_RACING             541
GAME_ADVENTURE          513
GAME_ROLE_PLAYING       464
GAME_STRATEGY           457
ART_AND_DESIGN      

In [6]:
genre_group = data.groupby(['genre_id'])

In [7]:
genre_group['ratings_per_day'].mean().sort_values(ascending=False)

genre_id
GAME_SPORTS            278.671916
GAME_STRATEGY          258.185996
GAME_ACTION            247.476386
GAME_RACING            188.293900
GAME_ARCADE            158.313333
SOCIAL                 143.052632
GAME_CASUAL            138.944697
GAME_WORD              136.277778
GAME_ROLE_PLAYING      127.422414
COMMUNICATION          120.518760
GAME_ADVENTURE         102.709552
VIDEO_PLAYERS           83.897849
GAME_PUZZLE             72.210134
GAME_TRIVIA             64.151111
GAME_SIMULATION         63.355017
PHOTOGRAPHY             55.830139
GAME_CASINO             49.992718
WEATHER                 48.910569
GAME_MUSIC              46.621005
GAME_BOARD              39.562914
TOOLS                   34.911176
SHOPPING                30.974800
GAME_CARD               30.123267
PERSONALIZATION         23.774475
ENTERTAINMENT           21.762887
DATING                  21.285714
MUSIC_AND_AUDIO         21.066721
PRODUCTIVITY            20.542247
COMICS                  18.583815
LIBRA

From this we can see that highly stimulating gaming apps such as sports, word, racing, strategy and action have the highest traffic while most of the non stimulating apps that are purely informational like vehicles, books, food, education, business, medical and events don't recieve a lot of new traffic

In [8]:
(genre_group['ratings_per_day'].max()-genre_group['ratings_per_day'].median()).sort_values(ascending=False)

genre_id
SOCIAL                 40524.0
COMMUNICATION          34482.0
GAME_ACTION            30937.0
GAME_STRATEGY          24918.0
TOOLS                  19338.0
GAME_CASUAL            17864.0
GAME_ADVENTURE         12795.0
GAME_ARCADE            12761.0
GAME_SPORTS            11039.0
VIDEO_PLAYERS          10341.0
PHOTOGRAPHY             9955.0
MUSIC_AND_AUDIO         7938.0
GAME_RACING             6202.0
PERSONALIZATION         5034.0
GAME_ROLE_PLAYING       4916.0
GAME_SIMULATION         4447.0
ENTERTAINMENT           4129.0
GAME_PUZZLE             4020.5
PRODUCTIVITY            3958.0
NEWS_AND_MAGAZINES      3899.0
GAME_TRIVIA             3559.0
EDUCATION               3361.0
GAME_BOARD              3067.0
GAME_WORD               3040.5
SHOPPING                3000.0
GAME_CARD               2323.0
FINANCE                 2069.0
HEALTH_AND_FITNESS      1923.0
MAPS_AND_NAVIGATION     1841.0
LIFESTYLE               1542.0
WEATHER                 1536.0
SPORTS                  1303.0

In [9]:
(genre_group['ratings_per_day'].max()-genre_group['ratings_per_day'].mean()).sort_values(ascending=False)

genre_id
SOCIAL                 40382.947368
COMMUNICATION          34361.481240
GAME_ACTION            30709.523614
GAME_STRATEGY          24679.814004
TOOLS                  19304.088824
GAME_CASUAL            17737.055303
GAME_ADVENTURE         12704.290448
GAME_ARCADE            12615.686667
GAME_SPORTS            10795.328084
VIDEO_PLAYERS          10261.102151
PHOTOGRAPHY             9902.169861
MUSIC_AND_AUDIO         7916.933279
GAME_RACING             6035.706100
PERSONALIZATION         5011.225525
GAME_ROLE_PLAYING       4805.577586
GAME_SIMULATION         4392.644983
ENTERTAINMENT           4108.237113
GAME_PUZZLE             3955.789866
PRODUCTIVITY            3937.457753
NEWS_AND_MAGAZINES      3888.573657
GAME_TRIVIA             3498.848889
EDUCATION               3357.250412
GAME_BOARD              3031.437086
SHOPPING                2969.025200
GAME_WORD               2927.722222
GAME_CARD               2296.876733
FINANCE                 2060.551463
HEALTH_AND_FITNESS 

From this we can gather that there is a clear market leader in the following categories:

Social                 
Communication          
Action Games           
Strategy Games         
Tools                
Casual Games           
Adventure Games         
Sport Games
Arcade Games
Video Players

In [11]:
idx = genre_group['ratings_per_day'].transform(max) == data['ratings_per_day']

In [18]:
data.loc[idx, ["genre_id","app_name"]]

Unnamed: 0,genre_id,app_name
2550,SPORTS,Manchester United Official App
4484,TOOLS,Clean Master- Space Cleaner & Antivirus
4881,PRODUCTIVITY,Super-Bright LED Flashlight
4967,VIDEO_PLAYERS,YouTube
5182,ART_AND_DESIGN,"Canva: Graphic design & poster, invitation maker"
5293,COMMUNICATION,Messenger – Text and Video Chat for Free
5923,NEWS_AND_MAGAZINES,Twitter
5996,HEALTH_AND_FITNESS,Period Tracker - Period Calendar Ovulation Tra...
6077,MEDICAL,My Calendar - Period Tracker
6093,PARENTING,Pregnancy Tracker & Countdown to Baby Due Date


This list shows us the outlier for each category.