# Most profitable Apple Store and Google Play Apps

This project sets to analyze app data obtained from Apple Store and Google Play. It is meant to provide developers with a better understanding of users' demands and help them build more attractive apps.

As of 2018 there were approximately 2 milliom iOS apps available on the App Store, and 2.1 million Android apps on Google Play. To analyze them we will use two sets of relevant data:

1. A data set of approximately 10000 Android apps from Google Play:

https://dq-content.s3.amazonaws.com/350/googleplaystore.csv

2. A data set of approximately 7000 iOS apps from the App Store:

https://dq-content.s3.amazonaws.com/350/AppleStore.csv

To explore this data set we will use the following **function**:



In [35]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [36]:
from csv import reader

opened_file = open(r'C:\Nauka\Python\Guided Project_ Profitable App Profiles for the App Store and Google Play Markets\googleplaystore.csv', encoding='utf8')
read_file = reader(opened_file)
google_data = list(read_file)
google_header = google_data[0]
google_no_header = google_data[1:]
    

In [37]:
from csv import reader

opened_file = open(r'C:\Nauka\Python\Guided Project_ Profitable App Profiles for the App Store and Google Play Markets\AppleStore.csv',encoding='utf8')
read_file = reader(opened_file)
apple_data = list(read_file)
apple_header = apple_data[0]
apple_no_header = apple_data[1:]

## Data Sets

The Google Play data set contains **10841** records (header row excluded), split into **13** columns with the following headers:

In [38]:
explore_data(google_data,0, 1, rows_and_columns = True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


Number of rows: 10842
Number of columns: 13


If you're having trouble understandings what a particular column describes, please follow the link to the original documentation for the data set: 

https://www.kaggle.com/lava18/google-play-store-apps

Apple Store data set contains **7197** records (header row excluded), split into **16** columns with the following headers:

In [39]:
explore_data(apple_data,0, 1, rows_and_columns = True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


Number of rows: 7198
Number of columns: 16


If you're having trouble understandings what a particular column describes, please follow the link to the original documentation for the data set:

https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps


## Data Cleaning Process

#### 1. Removing entries with missing data

According to this discussion on the Google Play data set (https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) one of the entries is missing a row value for "Rating" causing the next columns to shift; the entry will be deleted.

In [40]:
explore_data(google_no_header, 10472, 10473)

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']




In [41]:
del google_no_header[10472]

In [42]:
explore_data(google_no_header, 10472, 10473)

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']




#### 2. Removing duplicate entries

Multiple duplicate entries for a number of apps in Google Play data set were found:

In [43]:
duplicate_apps = []
unique_apps = []

for row in google_no_header:
    app_name = row[0]
    if app_name in unique_apps:
        duplicate_apps.append(app_name)
    else:
        unique_apps.append(app_name)
        
print("Number of duplicate apps:", len(duplicate_apps))
print('\n')
print("Examples of duplicate apps:", duplicate_apps[:10])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


Below you can see the examples of multiple entries for the same application:

In [44]:
for row in google_no_header:
    app_name = row[0]
    if app_name == "Facebook":
        print(google_header)
        print('\n')
        print(row)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Facebook', 'SOCIAL', '4.1', '78158306', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Facebook', 'SOCIAL', '4.1', '78128208', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']


In [45]:
for row in google_no_header:
    app_name = row[0]
    if app_name == "Twitter":
        print(google_header)
        print('\n')
        print(row)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Twitter', 'NEWS_AND_MAGAZINES', '4.3', '11667403', 'Varies with device', '500,000,000+', 'Free', '0', 'Mature 17+', 'News & Magazines', 'August 6, 2018', 'Varies with device', 'Varies with device']
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Twitter', 'NEWS_AND_MAGAZINES', '4.3', '11667403', 'Varies with device', '500,000,000+', 'Free', '0', 'Mature 17+', 'News & Magazines', 'August 6, 2018', 'Varies with device', 'Varies with device']
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Twitter', 'NEWS_AND_MAGAZINES', '4.3', '11657972', 'Varies with device', '500,000,000+', 'Free', '0', 'Mature 17+', 'News & Magazines', 

In [46]:
for row in google_no_header:
    app_name = row[0]
    if app_name == "Instagram":
        print(google_header)
        print('\n')
        print(row)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['App', 'Cate

The duplicates will be removed by taking into account the **number of reviews**. The rationale is that the largest number of reviews indicates the most up-to-date data, while entries with the lower number of reviews seem to be out-of-date. Thus, entries with the lower number of reviews will be deleted to clean up the data set. 

In order to delete the multiple entries a new data set in a form of a **dictionary** that will have only one value for each key and that value will be the highest number of reviews for a given application:

In [47]:
max_reviews = {}

for row in google_no_header:
    name = row[0]
    no_of_reviews = float(row[3])
    if name in max_reviews and max_reviews[name] < no_of_reviews:
        max_reviews[name] = no_of_reviews
    elif name not in max_reviews:
        max_reviews[name] = no_of_reviews
        
print(max_reviews['Instagram'])
print(len(max_reviews))
        

66577446.0
9659


By comparing the original data set to the dictionary max_reviews created above, a new cleaned up list of Google Play apps named **android_clean** will be created:

In [48]:
android_clean = []
already_added = []
rejected = []

for row in google_no_header:
    name = row[0]
    no_of_reviews = float(row[3])
    if no_of_reviews == max_reviews[name] and name not in already_added:
        android_clean.append(row)
        already_added.append(name)
        
print(len(android_clean))
print(len(already_added))

9659
9659


#### 3. Removing entries for apps not in English

App names containing symbols that are not commonly used in English will be removed from the data set. To determine whether a given app name contains a character not used in the English alphabet the name will be iterated over to check whether any of the characters with an ASCII code higher than 127:

In [49]:
def non_English(string):
    for character in string:
        if ord(character) > 127:
            return False
    return True
        
    
print(non_English('Instachat 😜'))

False


The above function however does not take into account any special symbols that might still be used in English app names, even though they are not part of the English alphabet, for example symbols like "™" or emojis. To rectify, a decision was made to remove only records where the app name contains more than 3 characters not part of the English alphabet. While not ideal, it should make the data signifantly clearer:

In [50]:
def non_English(string):
    if_false = []
    for character in string:
        if ord(character) > 127:
            if_false.append(character)
        if len(if_false) > 3:
            return False
    return True

print(non_English('爱奇艺艺'))

False


Using the above functions new data sets were created:

In [51]:
android_english = []

for row in android_clean:
    name = row[0]
    if non_English(name) == True:
        android_english.append(row)
        
print(len(android_english))

9614


In [52]:
apple_english = []

for row in apple_no_header:
    name = row[1]
    if non_English(name) == True:
        apple_english.append(row)

print(google_header)
explore_data(android_english, 0, 3, True)
print('\n')
explore_data(apple_english, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+

#### 4. Isolating free apps

Because the client is mainly concerned with developing free apps and generating profit from in-game ads, the lat part of the cleaning process is isolating free apps from the data set:

In [53]:
android_final = []

for row in android_english:
    price = row[7]
    if price == '0':
        android_final.append(row)
        
print(len(android_final))

8864


In [54]:
ios_final = []

for row in apple_english:
    price = float(row[4])
    if price == 0.0:
        ios_final.append(row)
        
print(len(ios_final))

3222


## Data Analysis

### Introduction

With our data sets prepared, we will now move on to analyze the data. Because the goal of this project is to provide developers with insight into which free apps generate the most traffic and present the most opportunities for monetization. We will conduct a two-fold analysis: one pertaining how many apps of a given genre are there on both platforms and a second one which will show how big is the number of estimated users for all of the genres. Let's begin with analyzing which genres are the most popular among both iOs and Android users by creating frequency tables for both of them.

### Apple Store frequency data

In [55]:
def freq_table(dataset, index):
    freq_table = {}
    total = 0
    for row in dataset:
        total += 1
        clmn = row[index]
        if clmn in freq_table:
            freq_table[clmn] += 1
        else:
            freq_table[clmn] = 1
    return freq_table
            
print(freq_table(ios_final, 11))

{'Social Networking': 106, 'Photo & Video': 160, 'Games': 1874, 'Music': 66, 'Reference': 18, 'Health & Fitness': 65, 'Weather': 28, 'Utilities': 81, 'Travel': 40, 'Shopping': 84, 'News': 43, 'Navigation': 6, 'Lifestyle': 51, 'Entertainment': 254, 'Food & Drink': 26, 'Sports': 69, 'Book': 14, 'Finance': 36, 'Education': 118, 'Productivity': 56, 'Business': 17, 'Catalogs': 4, 'Medical': 6}


Unfortunately, the acquired data is **not sorted**. For better visibility we will use another function, which sorts the iOS data by **prime_genre** column and returns a percentage of total number of iOS apps a given genre have on Apple Store:

In [56]:


def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])


        
print(display_table(ios_final, 11))


Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665
None


As seen above, **games** represent the highest percentage of genres on iOS store. It does not however mean that games are the most popular genre, as we don't know the exact number of users and/or individual installs of those games. We will analyze the data involving individual users later on.

### Google Play frequency data

For Google Play data set there are two columns which could be of interest when analyzing the number of apps in a particular genre. These columns are either **Category** or **Genres**. Here are the results for both of them:

**Category** frequency table:

In [57]:


def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])


        
print(display_table(android_final, 1))

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

**Genres** frequency table:

In [58]:


def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])


        
print(display_table(android_final, 9))

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

In case of Android apps and Google Play store we can see that the percentage of genres is more balanced than it is the case with Apple Store. Again however, the percentage that a given genres possesses may not be equal to the factual number of users a given genre has. 

### Apple Store average user count per genre

To find out about the average user count per genre we will analyze a column in the data set titled **"rating_count_tot"**, under the assumption that the number of reviews is indicative of how many users have installed a given app. We will then try to point potentially interesting genres that we could enter with our own app.

In [59]:
prime_genre_ios = freq_table(ios_final, 11)

for genre in prime_genre_ios:
    total = 0
    len_genre = 0
    for row in ios_final:
        genre_app = row[-5]
        if genre_app == genre:
            ratings = float(row[5])
            total += ratings
            len_genre +=1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)  

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


As we can see **Games**, even though they represent more than 50% of all the iOS apps, do not in fact have the highest user count. Because the category of **Games** seem  to be a rather saturated market we will not take them into acount as a potential interest. Looking at the data, we can see that the highest user count actually belongs to the **Navigation** genre, followed by the **Reference**, **Social Networking** and **Music** genres. Other potentially interesting genres are **Book**, **Food & Drinks**, **Weather** and **Finance**.  All of this genres look potentially interesting to us, it is neccessary however to take a closer look into each one of them to better grasp their potential, or lack of thereof:

#### Navigation genre

The most popular category, **Navigation**, is almost entirely occupied by industry giants - Waze and Google Maps, which combined have more than a half of all the reviews in this category. This means that entering this category with a new app will most probably prove to be extremely challenging, especially considering that both most popular apps are being developed by Google and thus have a high degree of itegration with Android software and phones. I would advise against entering this market.

In [60]:
for app in ios_final:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


#### Social Networking genre

A similar conclusion can be made for **Social Networking** category, where the industry giants like Facebook, Pinterest, Tumblr and various messaging app like Skype, Messenger and WhatsApp heavily influence the numbers of users. Again, entering such a heavily saturated market will be extremely difficult and thus it should be advised against.

In [67]:
for app in ios_final:
    if app[-5] == 'Social Networking':
        print(app[1], ':', app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

#### Music genre

Like with **Navigation** and **Social Networking**, the **Music** genre looks to be mainly populated by an already firmly established and entrenched brands and because of that should not of interest to us. Additional problems like cost of licensing and storing/streaming data also argue against it.

In [69]:
for app in ios_final:
    if app[-5] == 'Music':
        print(app[1], ':', app[5])

Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
TIDAL : 7398
YouTube Mu

#### Reference genre

As we can see the genre of **Reference** apps is almost entirely represented by the Bible app and several dictionaries. Even though this category is very visibly dominated by a single app it also seems to be at least promising - a well designed app building upon an existing version of a book by adding audio version, in-built dictionary or daily quotes could potentially garner some attention and users. Moreover, as Apple Store seems to be dominated by games and other for-fun apps, building a practical app might help get around the market saturation. This seems like a potential niche we could occupy.

In [62]:
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


#### Other notable genres

Other less popular, but still fairly attractive categories like **Book**, **Food & Drinks**, **Weather** and **Finance** each present their own set of problems, but some might be still taken into consideration.

**Book** category, once again, seem to be dominated by industry giants so entering this market might prove to be too difficult. Similarly to the **Music** category, additional costs of data storage and copyrights licensing must be taken into account.

In [29]:
for app in ios_final:
    if app[-5] == 'Book':
        print(app[1], ':', app[5])

Kindle – Read eBooks, Magazines & Textbooks : 252076
Audible – audio books, original series & podcasts : 105274
Color Therapy Adult Coloring Book for Adults : 84062
OverDrive – Library eBooks and Audiobooks : 65450
HOOKED - Chat Stories : 47829
BookShout: Read eBooks & Track Your Reading Goals : 879
Dr. Seuss Treasury — 50 best kids books : 451
Green Riding Hood : 392
Weirdwood Manor : 197
MangaZERO - comic reader : 9
ikouhoushi : 0
MangaTiara - love comic reader : 0
謎解き : 0
謎解き2016 : 0


**Food & Drink** category however looks like something we should take into consideration. While it is dominated, once again, by large chain restaurants and delivery services we should not aim to compete with them. Instead, we should look into recipe and cooking apps, as the landscape in this particular sub-genre seems to be rather empty, except the one, albeit quite popular, app - Allrecipes Dinner Spinner. Moreover, this kind of apps are usually apps that users use extensively and for a longer period of time, which will maximize the in-app advertisment profits:

In [34]:
for app in ios_final:
    if app[-5] == 'Food & Drink':
        print(app[1], ':', app[5])

Starbucks : 303856
Domino's Pizza USA : 258624
OpenTable - Restaurant Reservations : 113936
Allrecipes Dinner Spinner : 109349
DoorDash - Food Delivery : 25947
UberEATS: Uber for Food Delivery : 17865
Postmates - Food Delivery, Faster : 9519
Dunkin' Donuts - Get Offers, Coupons & Rewards : 9068
Chick-fil-A : 5665
McDonald's : 4050
Deliveroo: Restaurant Delivery - Order Food Nearby : 1702
SONIC Drive-In : 1645
Nowait Guest : 1625
7-Eleven, Inc. : 1356
Outback : 805
Bon Appetit : 750
Starbucks Keyboard : 457
Whataburger : 197
Delish Eatmoji Keyboard : 154
Lieferheld - Delicious food delivery service : 29
Lieferando.de : 29
McDo France : 22
Chefkoch - Rezepte, Kochen, Backen & Kochbuch : 20
Youmiam : 9
Marmiton Twist : 2
Open Food Facts : 1


**Finance** category includes banking apps, personal finance apps, money transfer apps etc. While creating for example a personal finance app would prove difficult because of all the extra expertise we would require from financial experts, there is a potential for creating simpler apps, like an economic news aggregate app. Quick look at the **News** category shows that there is indeed a small niche for such a focused app.

In [63]:
for app in ios_final:
    if app[-5] == 'Finance':
        print(app[1], ':', app[5])

Chase Mobile℠ : 233270
Mint: Personal Finance, Budget, Bills & Money : 232940
Bank of America - Mobile Banking : 119773
PayPal - Send and request money safely : 119487
Credit Karma: Free Credit Scores, Reports & Alerts : 101679
Capital One Mobile : 56110
Citi Mobile® : 48822
Wells Fargo Mobile : 43064
Chase Mobile : 34322
Square Cash - Send Money for Free : 23775
Capital One for iPad : 21858
Venmo : 21090
USAA Mobile : 19946
TaxCaster – Free tax refund calculator : 17516
Amex Mobile : 11421
TurboTax Tax Return App - File 2016 income taxes : 9635
Bank of America - Mobile Banking for iPad : 7569
Wells Fargo for iPad : 2207
Stash Invest: Investing & Financial Education : 1655
Digit: Save Money Without Thinking About It : 1506
IRS2Go : 1329
Capital One CreditWise - Credit score and report : 1019
U by BB&T : 790
Paribus - Rebates When Prices Drop : 768
KeyBank Mobile : 623
VyStar Mobile Banking for iPhone : 434
Sparkasse - Your mobile branch : 77
VyStar Mobile Banking for iPad : 57
Zaim : 4

In [70]:
for app in ios_final:
    if app[-5] == 'News':
        print(app[1], ':', app[5])

Twitter : 354058
Fox News : 132703
CNN: Breaking US & World News, Live Video : 112886
Reddit Official App: All That's Trending and Viral : 67560
USA TODAY : 61724
ABC News - US & World News + Live Video : 48407
NBC News : 32881
HuffPost - News, Politics & Entertainment : 29107
The Washington Post Classic : 18572
WIRED Magazine : 12074
CBS News - Watch Free Live Breaking News : 11691
The Guardian : 8176
AOL: News, Email, Weather & Video : 5233
SmartNews - Trending News & Stories : 4645
MSNBC : 3692
LotteryHUB : 2417
theSkimm : 1765
Quartz • News in a whole new way : 1267
Lotto Results - Mega Millions Powerball Lottery : 794
TopBuzz: Best Viral Videos, GIFs, TV & News : 692
Ticket Scanner for Powerball & MegaMillions Pool : 581
FOCUS Online - Aktuelle Nachrichten : 373
SPIEGEL ONLINE - Nachrichten : 299
n-tv Nachrichten : 273
CNN Politics : 254
Tagesschau : 233
Fresco — Be a part of the news : 219
News Break - Local & World Breaking News & Radio : 173
OPM Alert : 172
franceinfo - l'actua

In [71]:
for app in ios_final:
    if app[-5] == 'Business':
        print(app[1], ':', app[5])

Indeed Job Search : 38681
Flashlight ◎ : 24744
Adobe Acrobat Reader: View, Create, & Convert PDFs : 20069
Scanner App - PDF Document Scan : 11696
SayHi Translate : 8623
ADP Mobile Solutions : 8324
Sideline - 2nd Phone Number : 7907
Uber Driver : 3289
AirWatch Agent : 1150
VPN Go - Safe Fast & Stable VPN Proxy : 881
Cisco AnyConnect : 825
GreenVPN - Free & fast VPN with unlimited traffic : 464
iPlum Business Phone Number for Calling & Texting : 392
OPEN Forum : 200
Pulse Secure : 53
DingTalk : 40
Mon Espace - Pôle emploi : 11


**Weather** genre presents two substantial problems that probably disqualify it from our consideration: a weather app would most likely require a paid API access and users usually don't spend much time in-app to generate a substantial profit from advertisment.

In [64]:
for app in ios_final:
    if app[-5] == 'Weather':
        print(app[1], ':', app[5])

The Weather Channel: Forecast, Radar & Alerts : 495626
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208648
WeatherBug - Local Weather, Radar, Maps, Alerts : 188583
MyRadar NOAA Weather Radar Forecast : 150158
AccuWeather - Weather for Life : 144214
Yahoo Weather : 112603
Weather Underground: Custom Forecast & Local Radar : 49192
NOAA Weather Radar - Weather Forecast & HD Radar : 45696
Weather Live Free - Weather Forecast & Alerts : 35702
Storm Radar : 22792
QuakeFeed Earthquake Map, Alerts, and News : 6081
Moji Weather - Free Weather Forecast : 2333
Hurricane by American Red Cross : 1158
Forecast Bar : 375
Hurricane Tracker WESH 2 Orlando, Central Florida : 203
FEMA : 128
iWeather - World weather forecast : 80
Weather - Radar - Storm with Morecast App : 78
Yurekuru Call : 53
Weather & Radar : 37
WRAL Weather Alert : 25
Météo-France : 24
JaxReady : 22
Freddy the Frogcaster's Weather Station : 14
Almanac Long-Range Weather Forecast : 12
TodayAir

#### Conclusion

To conclude, I suggest that the genres we should take a closer look at are **Reference** genre with potentially introducing an app version of a well established book enhanced with interactive features, **Food and Drink** genre with a recipe app and **Finance** genre with an app aggregating business and financial news.

### Google Play average user count per genre

In case of Google Play data we luckily have a number of installs for every app, so the question of app popularity should be a little bit clearer. Unfortunately, the data is not exactly precise, i.e. it is presented as intervals with open ended values:

In [78]:
display_table(android_final, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


Even though the data is not precise, it will be enough for our purposes, that is determining which apps and which categories of apps attract the most users:

In [31]:
prime_categories_android = freq_table(android_final, 1)

for category in prime_categories_android:
    total = 0
    len_category = 0
    for app in android_final:
        app_category = app[1]
        installs = app[5]
        installs = installs.replace("+", "")
        installs = installs.replace(",", "")
        if app_category == category:
            installs = float(installs)
            total += installs
            len_category += 1
    avg_installs = total / len_category
    print(category, ':', avg_installs) 
        

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

At the very first glance it is clearly visible that Google Play seems to be a lot less skewed towards for-fun apps and more balanced in terms of categories that apps belong to. What's interesting to note here is a presence of a **Personalization** category, absent in Apple Store data but present on Google Play store because of more open nature of the Android operating system and potential customization it allows for. Top10 categories in terms of number of installs are as follows:

COMMUNICATION 	38456119,17

VIDEO_PLAYERS 	24727872,45

SOCIAL 	23253652,13

PHOTOGRAPHY 	17840110,40

PRODUCTIVITY 	16787331,34

GAME 	15588015,60

TRAVEL_AND_LOCAL 	13984077,71

ENTERTAINMENT 	11640705,88

TOOLS 	10801391,30

NEWS_AND_MAGAZINES 	9549178,47



In [77]:
for app in android_final:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess