# Analysis into market opportunities for iOs and Android apps
### (A DataQuest guided project)

This project has the objective of discovering what mobile applications are the most profitable for our company to build.

For this project, I am a Data Analyst at a company which builds free apps with in-app advertisement. I provide analysis which helps the business and our developers understand what kind of app are likely to attract a lot of users.

Because there are millions of apps available, it would take a lot of time to gather all of them. Therefore I will use a sample dataset which can be found here: [android](https://www.kaggle.com/lava18/google-play-store-apps/home) and [ios](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home)

In [1]:
from csv import reader

def open_read_list(dataset):
    opened_file = open(dataset)
    read_file = reader(opened_file)
    return list(read_file)
    
ios = open_read_list('AppleStore.csv')
ios_header = ios[0]
ios = ios[1:]

android = open_read_list('googleplaystore.csv')
android_header = android[0]
android = android[1:]

def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

print(ios_header,'\n\n')       
explore_data(ios,0,5,True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] 


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns: 16


As we can see, the ios dataset contains 7197 rows and 16 columns. At first glance, variables (columns) that might be useful for an analysis on profitable applications might include:
1. 'track_name' and 'prime_genre' for an indication of what the application is about.
2. 'user_rating' and user_rating_ver (all versions, grade 1-5, with 5 being best), 'rating_count_tot' and 'rating_count_ver' (latest version) to measure popularity.
3. 'price' to differentiate free apps from paid apps analyse customer's willingness to pay.

Documentation on the variables in this dataset can be found [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home)

In [2]:
print(android_header,'\n\n')       
explore_data(android,0,5,True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Ev

A first exploration of the google playstore dataset shows us that it has 10841 rows and 13 columns. At first glance, variables (columns) that might be useful for an analysis on profitable applications might include:
1. 'App', 'Category', 'Type',  'Genres' for an indication of what the application is about.
2. 'Rating' (grade 1-5, with 5 being best), (number of) 'Reviews' and (number of) 'Installs' to measure popularity.
3. 'Price' to differentiate free apps from paid apps analyse customer's willingness to pay.

Documentation on the variables in this dataset can be found [here](

In the discussion section of this dataset, an error was pointed out in row 10472. [link](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015)

In [3]:
print(android[10472])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


Indeed, a missing value seems to be the cause of the columns shifting, which results in a rating of 19, which is impossible on a 5 point scale. I will remove this row.

In [4]:
del android[10472]

When examining the discussion section of the AppleStore, no errors in the dataset are reported.

The next step is to see if there are any duplicates in the dataset. Let's take a popular app and see how many times it is found.

In [5]:
for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


It appears that the app is found 4 times. This indicates that we should check for duplicates in both datasets and remove them. It seems best to keep the most recent versions of every app. As can be seen above, they are of the same date. Therefore, I will keep the row with the highest number of reviews. More reviews means more data on variables like rating.

In [6]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps), '\n')
print('Examples of duplicate apps:', duplicate_apps[:10])

Number of duplicate apps: 1181 

Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


In [7]:
reviews_max = {}
for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

len(reviews_max)

9659

I created a dictionary which contains the apps with the most reviews as keys. An app is added to the dictionary and replaced when an app with the same name and a higher review count is encountered.

In [8]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)
        
print(android_clean[:5])
len(android_clean)

[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'], ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'], ['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']]


9659

This dictionary is used to loop through the Google Playstore dataset. Here I go through every row and add apps which are in the reviews_max dictionary (making sure I only add apps with the highest review count). Apps which are added, are also added to the already_added list. I had to include the already_added list to make sure we are not including cases in which apps have the same amount of reviews.

Our company is interested in creating English apps. Therefore we should filter out apps with symbols not used in the English alphabet. Based on the ASCII, the characters found in the English alphabet have ord numbers in the range 0 to 127. We will filter out apps with names containing symbols outside that range.

In [9]:
def english(string):
    for character in string:
        if ord(character) > 127:
            return False
    return True

print(english('Instagram'))
print(english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english('Docs To Go™ Free Office Suite'))
print(english('Instachat 😜'))
        

True
False
False
False


As we can see above, the function I wrote will detect apps with non-English names. But it will also say an app is non-English if it contains emoticons or symbols like '™', because they fall outside the ASCII range. We don't want to delete these apps. To minimize our data loss this way, I will only delete apps with more than 3 characters falling outside the ASCII range.

In [10]:
def english(string):
    non_ascii = 0
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
        if non_ascii > 3:
            return False
    return True

print(english('Instagram'))
print(english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english('Docs To Go™ Free Office Suite'))
print(english('Instachat 😜'))

android_english = []
ios_english = []

def filter_english(dataset, column):
    for app in dataset:
        name = app[column]
        if english(name):
            if dataset == android_clean:
                android_english.append(app)
            elif dataset == ios:
                ios_english.append(app)

filter_english(android_clean,0)
filter_english(ios,1)

print(len(android_english))
print(len(ios_english))


True
False
True
True
9614
6183


Next, we want to isolate the free apps in the dataset for our analysis.

In [11]:
#index price column: ios=4 android=7
final_android = []
final_ios = []

def free(dataset, column):
    for app in dataset:
        price = app[column]
        if dataset == android_english and price == '0':
            final_android.append(app)
        elif dataset == ios_english and price == '0.0':
            final_ios.append(app)

free(android_english, 7)
free(ios_english, 4)

print(len(final_android))
print(len(final_ios))
    

8864
3222


We want to find an app profile that fits both the App Store and Google Play, since we want our app to be downloaded by as many people as possible. 

Instead of using a waterfall approach of building a large project, we want to minimize risk and shorten our feedback loops by building a minimum viable product first. 

If the MVP gets good response from users, we can then develop it further based on feedback. 

If the app is profitable in Android (which has the largest userbase), we can then build an iOS version of the app and add it to the App Store.

In [12]:
print(android_header,'\n\n',ios_header)


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

 ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


We can use the 'Genres' column for the android dataset, and the 'prime_genre' column for the ios dataset.

In [13]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
        
display_table(final_ios, -5) #prime genre

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


When examining the prime genre column of the App Store dataset, we can see that the most common genre is clearly 'Games' with 58%. 'Entertainment is the runner-up. In general, the pattern is that leisure activities are popular than practical apps, such as shopping or education.

In [14]:
display_table(final_android, -4) #genre

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

The genre distribution for the android dataset is more evenly distributed, with the number 1 genre being 'Tools' (8%) and the runner-up being 'Entertainment'. At first glance, practical apps seem to be more popular for android users than for apple. When examining the list of genres, we can see that a lot of genres are subgenres of games.

In [15]:
display_table(final_android, 1) #category

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

The category column shows a less granular picture. Here, 'Family' is number one and 'Game' is the runner-up. This suggests that gaming is indeed also very popular for android users. Examining the 'Family' category reveals that it contains a lot of games for children, which are also leisure activities. We can state that gaming apps are very popular, but the data suggest a high level of saturation in the gaming app market. It might be more interesting to explore a niche.

Below, I explore a frequency table for the 'prime_genre'column for the Apple store dataset.

In [16]:
genres_ios = freq_table(final_ios, -5)
genres_ios

{'Book': 0.4345127250155183,
 'Business': 0.5276225946617008,
 'Catalogs': 0.12414649286157665,
 'Education': 3.662321539416512,
 'Entertainment': 7.883302296710118,
 'Finance': 1.1173184357541899,
 'Food & Drink': 0.8069522036002483,
 'Games': 58.16263190564867,
 'Health & Fitness': 2.0173805090006205,
 'Lifestyle': 1.5828677839851024,
 'Medical': 0.186219739292365,
 'Music': 2.0484171322160147,
 'Navigation': 0.186219739292365,
 'News': 1.3345747982619491,
 'Photo & Video': 4.9658597144630665,
 'Productivity': 1.7380509000620732,
 'Reference': 0.5586592178770949,
 'Shopping': 2.60707635009311,
 'Social Networking': 3.2898820608317814,
 'Sports': 2.1415270018621975,
 'Travel': 1.2414649286157666,
 'Utilities': 2.5139664804469275,
 'Weather': 0.8690254500310366}

In [17]:
for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in final_ios:
        genre_app = app[-5]
        if genre_app == genre:            
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)
    

Education : 7003.983050847458
Health & Fitness : 23298.015384615384
Business : 7491.117647058823
Navigation : 86090.33333333333
News : 21248.023255813954
Medical : 612.0
Shopping : 26919.690476190477
Reference : 74942.11111111111
Photo & Video : 28441.54375
Travel : 28243.8
Sports : 23008.898550724636
Music : 57326.530303030304
Catalogs : 4004.0
Weather : 52279.892857142855
Book : 39758.5
Games : 22788.6696905016
Productivity : 21028.410714285714
Lifestyle : 16485.764705882353
Social Networking : 71548.34905660378
Finance : 31467.944444444445
Food & Drink : 33333.92307692308
Utilities : 18684.456790123455
Entertainment : 14029.830708661417


On average, the Navigation genre has the highest number of reviews. This includes, for example, Google Maps and Waze. The third highest navigation app is 'Geocaching', which could also be seen as entertainment, or fun.

In [18]:
for app in final_ios:
    if app[-5] == 'Navigation':
        print(app[1],':', app[5]) #name and n of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


If we examine 'Reference' we notice that it includes some apps which could be labeled 'Books', like Bible and Quran apps. Other popular apps are guides for popular games like Minecraft. Apps like this are low effort to make: It is simply turning existing guides into apps. 

One recommendation for a new app to create is creating a reference app for popular games, series, lifestyles, sports (or any other up and coming genre). I would not recommend creating an app for already saturated genres, like existing popular games (new games could be a possibility though).

In [19]:
categories_android = freq_table(final_android, 1)

In [35]:
for category in categories_android:
    total = 0
    len_category = 0
    for app in final_android:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace(',','')
            n_installs = n_installs.replace('+','')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)
    

FOOD_AND_DRINK : 1924897.7363636363
AUTO_AND_VEHICLES : 647317.8170731707
VIDEO_PLAYERS : 24727872.452830188
COMMUNICATION : 38456119.167247385
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
GAME : 15588015.603248259
BUSINESS : 1712290.1474201474
ENTERTAINMENT : 11640705.88235294
LIFESTYLE : 1437816.2687861272
PHOTOGRAPHY : 17840110.40229885
MEDICAL : 120550.61980830671
EDUCATION : 1833495.145631068
EVENTS : 253542.22222222222
LIBRARIES_AND_DEMO : 638503.734939759
BOOKS_AND_REFERENCE : 8767811.894736841
PARENTING : 542603.6206896552
BEAUTY : 513151.88679245283
HEALTH_AND_FITNESS : 4188821.9853479853
DATING : 854028.8303030303
COMICS : 817657.2727272727
FAMILY : 3695641.8198090694
NEWS_AND_MAGAZINES : 9549178.467741935
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
HOUSE_AND_HOME : 1331540.5616438356
MAPS_AND_NAVIGATION : 4056941.7741935486
PERSONALIZATION : 5201482.6122448975
FINANCE : 1387692.475609756
WEATHER : 5074486.197183099
TOOLS : 10801391.298666667

It is important to find a category or genre which is both popular, but which is not dominated by a few large apps. Social apps and communication apps seem to be dominated by a few large apps, and people tend to only use one or a few of these apps instead of multiple. This means there is a high entry barrier here.

In [32]:
for app in final_android:
    if app[1] == 'SOCIAL':
        print(app[0],':', app[5]) #name and n of ratings

Facebook : 1,000,000,000+
Facebook Lite : 500,000,000+
Tumblr : 100,000,000+
Social network all in one 2018 : 100,000+
Pinterest : 100,000,000+
TextNow - free text + calls : 10,000,000+
Google+ : 1,000,000,000+
The Messenger App : 1,000,000+
Messenger Pro : 1,000,000+
Free Messages, Video, Chat,Text for Messenger Plus : 1,000,000+
Telegram X : 5,000,000+
The Video Messenger App : 100,000+
Jodel - The Hyperlocal App : 1,000,000+
Hide Something - Photo, Video : 5,000,000+
Love Sticker : 1,000,000+
Web Browser & Fast Explorer : 5,000,000+
LiveMe - Video chat, new friends, and make money : 10,000,000+
VidStatus app - Status Videos & Status Downloader : 5,000,000+
Love Images : 1,000,000+
Web Browser ( Fast & Secure Web Explorer) : 500,000+
SPARK - Live random video chat & meet new people : 5,000,000+
Golden telegram : 50,000+
Facebook Local : 1,000,000+
Meet – Talk to Strangers Using Random Video Chat : 5,000,000+
MobilePatrol Public Safety App : 1,000,000+
💘 WhatsLov: Smileys of love, sti

In [34]:
for app in final_android:
    if app[1] == 'COMMUNICATION':
        print(app[0],':', app[5]) #name and n of ratings

WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Calls & Text by Mo+ : 5,000,000+
free 

Games and entertainment are a large popular group, and it is not dominated by a handful of apps. It seems that people like to play different and new games. However, there is a lot of competition and the market seems saturated.

In [38]:
for app in final_android:
    if app[1] == 'GAME':
        print(app[0],':', app[5]) #name and n of ratings

Solitaire : 10,000,000+
Sonic Dash : 100,000,000+
PAC-MAN : 100,000,000+
Bubble Witch 3 Saga : 50,000,000+
Race the Traffic Moto : 10,000,000+
Marble - Temple Quest : 10,000,000+
Shooting King : 10,000,000+
Geometry Dash World : 10,000,000+
Jungle Marble Blast : 5,000,000+
Roll the Ball® - slide puzzle : 100,000,000+
Block Craft 3D: Building Simulator Games For Free : 50,000,000+
Farm Fruit Pop: Party Time : 1,000,000+
Love Balls : 50,000,000+
Piano Tiles 2™ : 100,000,000+
Pokémon GO : 100,000,000+
Paint Hit : 10,000,000+
Snake VS Block : 50,000,000+
Rolly Vortex : 10,000,000+
Woody Puzzle : 1,000,000+
Stack Jump : 10,000,000+
The Cube : 5,000,000+
Extreme Car Driving Simulator : 100,000,000+
Bricks n Balls : 1,000,000+
The Fish Master! : 1,000,000+
Color Road : 10,000,000+
Draw In : 10,000,000+
PLANK! : 500,000+
Looper! : 1,000,000+
Trivia Crack : 100,000,000+
Will it Crush? : 5,000,000+
Tomb of the Mask : 5,000,000+
Baseball Boy! : 10,000,000+
Hello Stars : 10,000,000+
Tank Stars : 1

Game development is costly if we are going for high quality, and easy to be copied if we go for low quality. Making a reference app for popular games, series or any other popular subject seems like a good choice. My recommendation would be to create a reference app, which can be used again and again for new popular subjects.