# Profitable App Profiles for the App Store and Google Play Markets
***

*In this guided project, we pretend we're working as data analysts for a company that builds Android and iOS apps to put on the Google Play Store and the App Store.*
***

The aim of this project is to examine the profiles of mobile apps that are popular on the Apple Store and the Google Play Store. In doing so, this project will allow for data driven decisions of which type of apps that should be developed.

The goal of this project is to develop a free app where the main source of revenue from the app will be through in-app ads. This means that the revenue will depend on the number of users using the app. The app will first be developed for the Android market. If the app receives good response from users, it will be developed further. If the app becomes profitable after six months, it will receive an iOS port and put on the App Store. This project will help provide conclusions on what profile of app will help attract more users.
***


## Cleaning the Datasets
***

### Opening and Exploring the Data

As of September 2018, there are approximately 2 million iOS apps on the App Store and approximately 2.1 million Android Apps on the Google Play Store. The data for the apps on each store are luckily made available for download. The dataset for the App Store can be found [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) and the dataset for the Google Play Store can be found [here.](https://www.kaggle.com/lava18/google-play-store-apps)

Let's begin by opening the datasets so we can manipulate them later for analysis.

In [1]:
### Opening all the datasets, making them into a list of lists, and 
### setting their variables for headers and for the data itself

from csv import reader
### The Google Play data set ###
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

### The App Store data set ###
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

The following is a function to explore the datasets in a readable way. Included in the function is the option to output the number of rows and columns for a provided dataset.

In [2]:
### Function that takes in a dataset, and displays it from a start
### point to an end point also adding the option to output the number
### of columns and rows
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line between rows
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

The headers and the first few rows of each dataset are printed to see the format as well as the number of rows and columns for each dataset. The headers are also  displayed to help determine which columns are most relevant to our analysis.

In [3]:
### Print the header and the first few rows of each dataset ###
### Android ###
print('The header and the first 2 rows of the Google Play Store dataset')
print('\n')
print(android_header)
print('\n')
explore_data(android, 0, 2, True)

The header and the first 2 rows of the Google Play Store dataset


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


Here we see that there are 10,841 apps in the Google Play Store. At first glance, the most relevant columns for our analysis are 'App', 'Category', 'Reviews', 'Installs', 'Type', 'Price', and 'Genres'.

In [4]:
### iOS ###
print('The header and the first 2 rows of the Apple Store dataset')
print('\n')
print(ios_header)
print('\n')
explore_data(ios, 0, 2, True)

The header and the first 2 rows of the Apple Store dataset


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 7197
Number of columns: 16


There are 7,197 apps in the App Store. The most relevant columns for our analysis are 'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', and 'prime_genre'. The headers are not very concise. Descriptions for each column header can be found [here.](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home)

### Deleting Wrong Data

The Google Play dataset has a dedicated [discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion), and we find that [one of the discussions](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) describes an error for row 10,472. This incorrect row will be compared with a correct row to see what the error is.

In [5]:
### Comparison between a correct and incorrect row to help edit, or
### delete the problem

print(android[10472])  # incorrect row
print('\n')
print(android_header)  # header
print('\n')
print(android[0])      # correct row

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


The error appears in the rating column. The rating is entered as 19 which is more than the maximum 5-star rating. Because of that, this row will be deleted.

In [6]:
### Delete the one row. Printing the before and after length to make sure
### it was done correctly

print(len(android))
del android[10472]  # DO NOT RUN THIS MORE THAN ONCE, WILL DELETE CORRECT ROWS
print(len(android))

10841
10840


### Removing Duplicate Entries

The next step is to delete duplicate entries. We don't want to analyze the same app more than once so we will keep only one entry per app. For instance, the app 'Slack' has more than one entry in the Google Play dataset.

In [7]:
### Showing that there are duplicates in the data

for app in android:
    name = app[0]
    if name == 'Slack':
        print(app)

['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51510', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']


Next, we need to check how many apps are entered more than once for each dataset.

In [8]:
### Showing how many duplicates there are for each dataset

print('Checking for duplicates in the Google Play Store')
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps))

Checking for duplicates in the Google Play Store
Number of duplicate apps: 1181


There are a total of 1,181 apps that appear more than once in the Google Play Store.

In [9]:
print('Checking for duplicates in the App Store')
duplicate_apps = []
unique_apps = []

for app in ios:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps))

Checking for duplicates in the App Store
Number of duplicate apps: 0


There are 0 apps that appear more than once in the App Store.

When examining the duplicates for the app Slack, we can see that there is a difference in the fourth entry in the rows. The fourth column corresponds to the number of reviews. It appears the data was collected at different times and added to the dataset. Since we want to analyze the dataset in its most recent form, we will only keep the entry that has the highest number of ratings.

In [10]:
### Creating a dictionary from name:rating pairs. To it, unique apps are
### added and duplicate apps with the highest number of ratings are added

reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

In [11]:
### New clean list created without duplicates 

android_clean = [] # stores cleaned dataset
already_added = [] # stores app names

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

Since we want to produce an app for the English speaking market, we want to analyze apps made for English speakers.

In [12]:
def english_app(a_string):
    non_ascii = 0
    for character in a_string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True

A function called **english_app()** is created to search the datasets for English named apps. Characters typically used in the English language fall within the ASCII range of 0 to 127 such as the alphabet and symbols (!, #, ?, etc.). However, characters typical in the English language also appear outside this range so the function must account for that. To do so, a filter is created to allow for apps with up to 3 characters outside the ASCII range to be included in the analysis. Common characters outside the range include, emojis, currency symbols, mathematical characters, etc. The filter may let through non-English apps whose name is three non-English characters long, but the loss of English apps will be minimized.

In [13]:
### Adding English apps to new datasets
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if english_app(name):
        android_english.append(app)
        
for app in ios:
    name = app[1]
    if english_app(name):
        ios_english.append(app)        

In [14]:
### Checking to see how many English apps are in the datasets
print('Android clean data')
explore_data(android_english, 0, 0, True)
print('\n')
print('iOS clean data')
explore_data(ios_english, 0, 0, True)

Android clean data
Number of rows: 9614
Number of columns: 13


iOS clean data
Number of rows: 6183
Number of columns: 16


After deleting the non-English apps in the Google Play Store and Apple Store dataset, there are 9,614 and 6,183 apps left respectively.

### Isolating Free Apps

In [15]:
android_final = []
ios_final = []
for app in android_english:
    price = app[7]
    if price == '0':
        android_final.append(app)
        
for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)

android_num = str(len(android_final))
ios_num = str(len(ios_final))

print('There are ' + android_num + ' apps left in the android dataset')
print('There are ' + ios_num + ' apps left in the ios dataset')

There are 8864 apps left in the android dataset
There are 3222 apps left in the ios dataset


After removing incorrect data, non-English apps, duplicates, and apps that are not free, there are 8,864 and 3,222 apps left in the Android and iOS dataset respectively.

## Data Analysis
***

Our goal is to find an app profile that will attract users.

In [16]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages

In [17]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

The next step is to analyze the Google Play Store and the App Store to see what type of app is most popular, i.e. games.

The App Store only has one column called 'prime_genre' that is related to a category/genre type.

In [18]:
### App Store only has one column related to category/genre
print('App Store "prime genre" frequency table')
print('\n')
display_table(ios_final, 11)

App Store "prime genre" frequency table


Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


*Games* make up over half (58.2%) of free, English apps in the App Store. *Entertainment*, *photo & video*, *education*, and *social networking* make up 7.9, 5, 3.7, and 3.3% respectively. Most apps on the App Store are designed for personal entertainment (i.e. games, entertainment, social networking, music etc.) while apps designed for practical uses (i.e. education, utilities, finance, weather etc.) make up significantly less.

In [19]:
### The Google Play Store has two columns related to category/genre (1 and 9)

print('Google Play Store "category" frequency table')
print('\n')
display_table(android_final, 1)

Google Play Store "category" frequency table


FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
P

The *family* category is the most common app type making up 18% of the free, English Apps on the Google Play Store. *Games* only make up 9.7% of the apps, a significant difference from the App Store. *Tools*, *business*, and *lifestyle* round out the top five apps with 8.7, 4.6, and 3.9% of the apps. The latter three are more for practical purposes. Upon inspecting the Google Play Store, the *family* category consists mostly of games for younger audiences.

For the results from both the App Store and the Google Play Store, the number of apps does not necessarily reflect the number of users. The number of personal entertainment apps may not mean they each have more users than their practical use counterparts. A more thorough analysis must be done to properly find what profile of app have the most users.


In [20]:
print('Google Play Store "genres" frequency table')
print('\n')
display_table(android_final, 9)

Google Play Store "genres" frequency table


Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.93637

At first glance, there are significantly more genres than categories. Looking through them, particularly in the bottom half of the table, we can see that they become very specific (i.e.*music*, *entertainment; music & video*, *casual; music and video* etc.). Due to the specificity of the breakdown of genres, we opt to continue the analysis using the category column instead of the genre column.

### Most popular apps by genre on the App Store
To gain a better understanding of the popularity of each genre, we must find the average number of installs for each genre. The Google Play Store dataset has a column for the total number of installs for each app, but the App Store does not. To circumvent this issue, we will use the number of ratings to substitute for the missing information.

Below, we calculate the average number of users per app on the App Store using the number of ratings as a proxy.

In [21]:
genres_ios = freq_table(ios_final, 11)
empty_dict = {}
for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[11]
        if genre_app == genre:
            total += float(app[5])
            len_genre += 1
    avg_rating = total / len_genre
    empty_dict[genre] = avg_rating
table = empty_dict
table_display = []
for key in table:
    key_val_as_tuple = (table[key], key)
    table_display.append(key_val_as_tuple)

table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
    print(entry[1], ':', entry[0])

Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22788.6696905016
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0


The highest number of user reviews on average are from *navigation* apps. This average is suspect however because there are likely only a few apps that dominate the navigation category. Rounding out the top four following *navigation* apps are *reference*, *social networking*, and *music* apps. These apps are likely to follow the same situation as navigation apps i.e. the average is skewed due to a few extremely popular apps.

To get an idea of the distribution of the number of ratings within a genre, we must look at individual apps within that genre. The **pop_app_ios** function lists all apps in a given genre and the number of ratings.

In [22]:
def pop_apps_ios(genre):
    for app in ios_final:
        if app[11] == genre:
            print(app[1], ':', app[5])

Let's take a look at the 'Navigation' genre.

In [23]:
print('Navigation apps by number of user ratings on the App Store')
pop_apps_ios('Navigation')

Navigation apps by number of user ratings on the App Store
Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


Here, we can see our suspicions were correct about the averages being skewed due to a few, immensely popular apps. In the case of the *navigation* genre, Waze and Google Maps accounts for nearly all the user ratings. The same goes for the *music*, *social networking* and *reference* genres; a few apps have a significant portion of the total number of ratings for that genre. This creates a problem since it may seem like a certain genre is popular when only a few of many apps are popular with the number of rating dropping significantly after the most popular apps. However, ignoring the most popular apps we can see that there are not as drastic differences between one app to the next in terms of the number of ratings. Aside from the few popular apps, the genre appears relatively comparable between apps and thus good initial profiles for development.

Upon further consideration, *navigation* does not appear to be a viable profile for consideration. Since the main function of a navigation app is for live route finding, particularly while driving, this profile does not fit our requirements for an app whose main source of income is through ads. Since most often, people use navigation apps while driving, it would be irresponsible to put ads to distract a driver.

Let's continue to the *social networking* genre.

In [24]:
print('Social Networking apps by number of user ratings on the App Store')
pop_apps_ios('Social Networking')

Social Networking apps by number of user ratings on the App Store
Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633


Similarly with the *navigation* genre, a social networking app is not a viable profile. Even if a unique app is developed to fit within the market, social networking apps often require many costly servers to operate and store user information. It would be difficult to break into the market where the top two apps account for nearly half the total number of ratings.

Next, we look at the *music* genre.

In [25]:
print('Music apps by number of user ratings on the App Store')
pop_apps_ios('Music')

Music apps by number of user ratings on the App Store
Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118
Certified Mixtapes - Hi

The same argument for the previous two genres can be said for *music* apps. Additionally, paying for rights to stream music can become costly with [returns not occurring for extended periods of time.](https://www.businesswire.com/news/home/20190206005298/en/Spotify-Technology-S.A.-Announces-Financial-Results-Fourth)

Below, we look at the *reference* genre as a possible app profile.

In [26]:
print('Reference apps by number of user ratings on the App Store')
pop_apps_ios('Reference')

Reference apps by number of user ratings on the App Store
Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


The *reference* genre is filled with such things as religious texts, dictionaries, thesauruses, and encyclopedias for video games. The genre is dominated by the 'Bible' and 'Dictionary.com Dictionary & Thesaurus.' The other apps ranking less in number of ratings are primarily encyclopedias for video games.

It may be worth looking into the *book* genre. Although it is not in the top four popular app genres, it ranks sixth in the number of ratings.

In [27]:
print('Book apps by number of user ratings on the App Store')
pop_apps_ios('Book')

Book apps by number of user ratings on the App Store
Kindle – Read eBooks, Magazines & Textbooks : 252076
Audible – audio books, original series & podcasts : 105274
Color Therapy Adult Coloring Book for Adults : 84062
OverDrive – Library eBooks and Audiobooks : 65450
HOOKED - Chat Stories : 47829
BookShout: Read eBooks & Track Your Reading Goals : 879
Dr. Seuss Treasury — 50 best kids books : 451
Green Riding Hood : 392
Weirdwood Manor : 197
MangaZERO - comic reader : 9
ikouhoushi : 0
MangaTiara - love comic reader : 0
謎解き : 0
謎解き2016 : 0


Although the Kindle app has a quarter of a million ratings, quick research into the app reveals that unlimited access to books requires an Amazon Prime account. So, while the Prime users will continue to use the Kindle app, people without a prime account may gravitate towards a completely free app. An app could be developed that allows users to download and read books or the app could have the books preloaded upon download. Furthermore, users will spend more time within the app allowing for more opportunities to display ads.  

Of the aforementioned app profiles, the *reference* genre appears promising. Although the genre is dominated by a few apps (namely the Bible and digitized dictionaries/thesauruses), they are incredibly unique compared to the video game encyclopedias that make up most of the genre. A possible app profile could be a collection of religious texts from religions around the world that could also incorporate an in-app dictionary so users can find the definition of a word without exiting the app. In addition to an app in the *reference* genre, a second choice could be in the *book* genre. The app could contain a collection of classic literature. Copyright permissions would not be required for certain books as specified by [copyright laws for the Canada, the United States](https://www.copyrightlaws.com/u-s-vs-canadian-copyright/) and the [United Kingdom.](https://copyrightservice.co.uk/copyright/uk_law_summary)

Now that we have investigated popular genres for apps in the App Store, let's turn our attention to the Google Play Store.

### Most popular apps by genre on the Google Play Store
The dataset for the Google Play Store includes the number of installs. However, when checking the format of the headers and corresponding entries, we notice that number of installs are placed into intervals (100+, 1,000+ etc.) and are rather open ended and not precise. Below, we check to see what percentage of apps fall into a range of number of installs.

In [28]:
display_table(android_final, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


Since we only want a rough idea of how many users there are for each genre of app, we opt to keep the number of installs as the value of the start of the interval (i.e. 100+ will be 100).

To work with the number of installs, we need to convert them from strings to floats while also removing commas and the plus symbols. Once the strings are converted to floats, the categories will be listed from the highest to lowest number of downloads.

In [29]:
categories_android = freq_table(android_final, 1)
empty_dict = {}
for category in categories_android:
    total = 0
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace('+', '')
            n_installs = n_installs.replace(',', '')
            total += float(n_installs)
            len_category += 1
            avg_n_installs = total / len_category
            empty_dict[category] = avg_n_installs
        
table = empty_dict
table_display = []
for key in table:
    key_val_as_tuple = (table[key], key)
    table_display.append(key_val_as_tuple)

table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
    print(entry[1], ':', entry[0])

COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15588015.603248259
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8767811.894736841
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3695641.8198090694
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 51315

Next, we check to the most popular categories in the Google Play Store. We suspect that the same trend will appear in the Google Play Store as the App Store where only a few apps make the category appear more popular than it is.

The four most popular categories in the Google Play Store are *communication*, *video players*, *social*, and *photography*.

Below, we build two functions to analyze the popularity of a specific category. The first function outputs the number of installs for all apps of a specific category. The second function names and counts the apps that are over a specified number of downloads. This is done to check to see if there is a bias towards a few extremely popular apps.

In [30]:
def pop_apps_android(category, list_apps = True):
    total = 0
    for app in android_final:
        if app[1] == category:
            total += 1
            if list_apps:
                print(app[0], ':', app[5])
    print('There are a total of', total, 'apps in the', category, 'category.')

def apps_over_n(category, minimum, list_apps=True):
    num = 0
    for app in android_final:
        n_installs = app[5]
        n_installs = n_installs.replace('+', '')
        n_installs = n_installs.replace(',', '')
        if (float(n_installs) >= minimum) and (app[1] == category):
            num += 1
            if list_apps:
                print(app[0], ':' , n_installs)
    print(num, 'app(s) with over', minimum, 'installs')

Let's us take a closer look at the *communication*, *video player*, *social*, and *photography* categories.

In [31]:
print('Popular apps in the communication category')
pop_apps_android('COMMUNICATION')

Popular apps in the communication category
WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,

In [32]:
apps_over_n('COMMUNICATION', 1000000000)

WhatsApp Messenger : 1000000000
Messenger – Text and Video Chat for Free : 1000000000
Skype - free IM & video calls : 1000000000
Google Chrome: Fast & Secure : 1000000000
Gmail : 1000000000
Hangouts : 1000000000
6 app(s) with over 1000000000 installs


Here we can see that there are 6 apps that have at least one billion installs in the *communication* category. These 6 apps are not only incredibly popular but also relatively unique when compared to each other. To try to break into this market would require developing a unique app that 261 other apps are already trying to do as well with varying degrees of success. Additionally, with 267 apps categorized as *communication*, the markets is incredibly saturated.

In [33]:
print('Popular apps in the video player category')
pop_apps_android('VIDEO_PLAYERS', False)
apps_over_n('VIDEO_PLAYERS', 100000000, False)
print('\n')
print('Popular apps in the social category')
pop_apps_android('SOCIAL', False)
apps_over_n('SOCIAL', 100000000, False)
print('\n')
print('Popular apps in the photography category')
pop_apps_android('PHOTOGRAPHY', False)
apps_over_n('PHOTOGRAPHY', 100000000, False)

Popular apps in the video player category
There are a total of 159 apps in the VIDEO_PLAYERS category.
9 app(s) with over 100000000 installs


Popular apps in the social category
There are a total of 236 apps in the SOCIAL category.
13 app(s) with over 100000000 installs


Popular apps in the photography category
There are a total of 261 apps in the PHOTOGRAPHY category.
19 app(s) with over 100000000 installs


After examining the last of the top four, we see a situation like what we saw in the *communication* category. These categories of apps are dominated by a popular few. In this case there are 9, 13, and 19 apps that have at least 100,000,000 million downloads in the *video players*, *social*, and *photography* categories respectively. Similarly, these markets are heavily saturated.

Since we decide that the two best apps to develop for the iOS market were in the *reference* and *book* genres, let's look at similar categories in the Google Play Store which is the *books and reference* category.

In [34]:
pop_apps_android('BOOKS_AND_REFERENCE')

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

The *books and reference* category has a lot of variety. The category contains e-books, encyclopedias, dictionaries, recipe books, translation books, programming language tutorials and various religious texts.

In [35]:
apps_over_n('BOOKS_AND_REFERENCE', 100000000)

Google Play Books : 1000000000
Bible : 100000000
Amazon Kindle : 100000000
Wattpad 📖 Free Books : 100000000
Audiobooks from Audible : 100000000
5 app(s) with over 100000000 installs


Like the other app categories, there are a few apps that skew the popularity. Although these apps are extremely popular, research into some of the apps reveal that they require users to purchase books in order to read them. Apps that do this include Google Play Books, Audible, and Amazon Kindle, the latter two also found in the Apple Store. Wattpad appears to be the only truly free popular book app. However, looking further into what the app offers is unlike other e-book apps in the Google Play Store. Wattpad is an app where amateur authors can submit their books for users to read. Although there are some professional works on the app, they appear to be limited. While these apps have many downloads, they may not have many users due to paywalls within the otherwise free apps.

In [56]:
# for app in android_final:
#     n_installs = app[5]
#     n_installs = n_installs.replace('+', '')
#     n_installs = n_installs.replace(',', '')
#     if (app[1] == 'BOOKS_AND_REFERENCE') and (float(n_installs) >= 10000) and (float(n_installs) <=1000000000):
#         print(app[0], ':' , n_installs)

## Conclusion

In this project, we analysed data about the Apple Store and the Google Play Store to help select an app profile profitable in both stores. In our analysis, we found two profiles that could be a possible fit for the goal of this project.

The first app profile is a classic literature collection. The app would consist of a library of classic literature from the likes of Jane Austin, Mark Twain, and Charles Dickens to name a few. While similar apps appear on the Apple Store like Audible and Amazon Kindle, they are not truly free as they require users to subscribe to the service or pay for the books they want to read. While the same can be said for the same apps in the Google Play Store, it does not apply to all the book related apps. In order to compete with these apps, we suggest adding unique features that may include bookmarks, table of contents, dictionaries, a rating system, recommended books, and a 'read later' list. If the app shows good user response, additional content and features could be added such as additional books, a companion website to the app used for discussions lead by users, and translation of books into other languages.

The second app profile is a collection of religious text. Since most religious apps on the App Store and the Google Play Store are focused on only one religion, a collection would draw more users from across many religions. The collection could include religious text from the most common religions such as Christianity, Islam, Hinduism, Buddhism, and Judaism. The features in the app would generally be the same as the aforementioned app. An additional feature unique to this app could include daily passages from preferred religion and accompanying texts in the form of popup notifications of the users’ phone. If the app shows good user response it could be developed further with translations into other languages.