## App Profile Recommendation

We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that the number of users of our apps determines our revenue for any given app — the more users who see and engage with the ads, the better. Our goal for this project is to analyze data to help our developers understand what type of apps are likely to attract more users.

In [18]:
import pandas as pd
import warnings as wr
wr.filterwarnings('ignore')

Read datasets

In [11]:
google_data = 'googleplaystore.csv'
apple_data = 'AppleStore.csv'

In [187]:
ggl = pd.read_csv(google_data)

In [188]:
ios = pd.read_csv(apple_data)

In [189]:
ggl.head(5)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up


In [165]:
ios.head(5)

Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
0,284882215,Facebook,389879808,USD,0.0,2974676,212,3.5,3.5,95.0,4+,Social Networking,37,1,29,1
1,389801252,Instagram,113954816,USD,0.0,2161558,1289,4.5,4.0,10.23,12+,Photo & Video,37,0,29,1
2,529479190,Clash of Clans,116476928,USD,0.0,2130805,579,4.5,4.5,9.24.12,9+,Games,38,5,18,1
3,420009108,Temple Run,65921024,USD,0.0,1724546,3842,4.5,4.0,1.6.2,9+,Games,40,5,1,1
4,284035177,Pandora - Music & Radio,130242560,USD,0.0,1126879,3594,4.0,4.5,8.4.1,12+,Music,37,4,1,1


Clean Google data and isolate free apps

In [190]:
# Replace null Rating values with mean
ggl['Rating'].fillna(ggl['Rating'].mean(), inplace = True)

# Replace null Type with most frequent
ggl['Type'].fillna(ggl['Type'].mode()[0], inplace = True)

# Replace null Content Rating with most frequent
ggl['Content Rating'].fillna(ggl['Content Rating'].mode()[0], inplace = True)

In [214]:
# Change Installs to numeric
ggl['Installs'] = ggl['Installs'].astype('int64')/10000


In [192]:
# Delete duplicate rows
ggl.drop_duplicates(subset = 'App', keep = 'first', inplace = True)

In [193]:
# Remove non-English apps
ggl = ggl[ggl['App'].str.contains('^[A-Za-z ]+$')]

In [215]:
ggl_free = ggl[ggl['Type'] == 'Free']

Clean Apple Data and isolate free apps

In [208]:
# Delete duplicate rows
ios.drop_duplicates(subset = 'track_name', keep = 'first', inplace = True)

# Change rating_count_tot to numeric
ios['rating_count_tot'] = ios['rating_count_tot'].astype('int64')/100

In [201]:
# Remove non-English apps
ios = ios[ios['track_name'].str.contains('^[A-Za-z ]+$')]

In [211]:
ios_free = ios[ios['price'] == 0.0]

Most common apps by genre

In [203]:
ggl_free['Category'].value_counts().head(10)

Category
FAMILY             787
TOOLS              422
GAME               388
BUSINESS           267
MEDICAL            212
FINANCE            202
PRODUCTIVITY       191
LIFESTYLE          182
SPORTS             169
PERSONALIZATION    156
Name: count, dtype: int64

In [204]:
ggl_free['Genres'].value_counts().head(10)

Genres
Tools              421
Business           267
Education          257
Entertainment      251
Medical            212
Finance            202
Productivity       191
Lifestyle          182
Sports             170
Personalization    156
Name: count, dtype: int64

In [205]:
ios_free['prime_genre'].value_counts().head(10)

prime_genre
Games                853
Entertainment        102
Education             55
Social Networking     44
Photo & Video         38
Utilities             31
Sports                29
Music                 27
Shopping              23
Productivity          21
Name: count, dtype: int64

Average number of installs for each app genre

In [216]:
# Average number of installs for each app genre as int
ggl_free.groupby('Genres')['Installs'].mean().astype('int64').sort_values(ascending = False).head(5)

Genres
Adventure;Action & Adventure    5050
Puzzle;Action & Adventure       5000
Communication                   3818
Arcade                          3519
Social                          3295
Name: Installs, dtype: int64

In [213]:
ios_free.groupby('prime_genre')['rating_count_tot'].mean().astype('int64').sort_values(ascending = False).head(10)

prime_genre
Reference            1709
Social Networking    1391
Photo & Video         742
Music                 490
Travel                480
News                  393
Food & Drink          347
Sports                286
Games                 282
Weather               261
Name: rating_count_tot, dtype: int64