# Guided Project: Profitable App Profiles for the App Store and Google Play Markets

Analysis of free to download apps that have potential to generate advertisement revenue. 

Goal of this project is to help developers understand what type of apps are likely to attract users.

## Opening and Exploring the Data

In this project we examine two datasets:

* [A dataset](https://www.kaggle.com/datasets/lava18/google-play-store-apps) containing data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018. You can download the data set directly from [this link](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv).

* [A dataset](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps) containing data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017. You can download the data set directly from [this link](https://dq-content.s3.amazonaws.com/350/AppleStore.csv).

First the two dataset are loaded and explored.

In [1]:
from csv import reader

### The Google Play dataset ###
opened_file = open('data/googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

### The App Store dataset ###
opened_file = open('data/AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

Function to explore the dataset is created.

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

Function is used to explore the Google data.

In [3]:
print(android_header)
print("\n")
google_data = explore_data(android,0,3,True)


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


Function is used to explore the Apple data.

In [4]:
print(ios_header)
print("\n")
apple_data = explore_data(ios,0,3,True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


Examining the column names of both datasets, below columns may be of interest to us, in determining the characteristics of a popular app:

Android Columns - App, Category, Rating, Reviews, Installs, Content Rating and Genre.

iOS Columns - track_name, rating_count_tot, user_rating, cont_rating and prime_genre. 

## Deleting Wrong Data

The Google Play dataset has a dedicated [discussion section](https://www.kaggle.com/datasets/lava18/google-play-store-apps/discussion), and we can see that [one of the discussions](https://www.kaggle.com/datasets/lava18/google-play-store-apps/discussion/66015) describes an error for a certain row.

Below we confirm the problem row by checking the length of the columns

In [5]:
print(android_header)
print('\n')
print(android[5])
print(len(android[5]))
print('\n')
print(android[10472])
print(len(android[10472]))

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']
13


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
12


As columns of row 10427 is only 12 versus 13 in a row that doesn't have issues, we delete the row 10427 to keep the data clean.

In [6]:
google_data = explore_data(android,0,0,True)
del android[10472]
google_data = explore_data(android,0,0,True)


Number of rows: 10841
Number of columns: 13
Number of rows: 10840
Number of columns: 13


For Apple data, we check for errors by comparing number of columns for each row

In [7]:
for row in ios:
    if len(row) != len(ios_header):
        print(row) # print the row data
        print(ios.index(row)) # print the row no.

print('No column variance detected')

No column variance detected


## Removing Duplicate Entries

When we explore the datasets or look at the discussions section, it is evident there are duplicate apps within the data. Where there are total of 1,181 duplicate apps within the Android data.

In [8]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])



Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


Instead of removing duplicates randomly, we locate rows that have the highest number of reviews with assumption that this will be the most current data.

On the previous screen, we looped through the Google Play dataset and found that there are 1,181 duplicates. After we remove the duplicates, we should be left with 9,659 rows:

In [9]:
print('Expected length:', len(android) - 1181)

Expected length: 9659


To remove the duplicates, we will do the following:

* Create a dictionary, where each dictionary key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.

* Use the information stored in the dictionary and create a new dataset, which will have only one entry per app (and for each app, we'll only select the entry with the highest number of reviews).

In [10]:
reviews_max ={}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

print(f'Number of apps in reviews_max is {len(reviews_max)}' )


Number of apps in reviews_max is 9659


We use the reviews_max dictionary to remove duplicate rows, while leaving the entries with highest reviews.

We initialise two lists 'android_clean' and 'already_added'. 

We loop through android data set and isolate name and number of reviews. We then check if number of reviews match the number in reviews_max dictionary and the name is not in 'already_added' list. This way we can isolate rows that have not been added and that have the highest reviews. The supplementary condition is also to remove any duplicate entries that have the same max review number. 

In [11]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])

    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)

explore_data(android_clean,0,3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


## Removing Non-English Apps

For App Store data there are no duplicates but they include Non-English apps which we are not interested in for this project. Similarly there are Non-English apps in the android data.

Here we use the ord() built-in function to deduce English texts which are in the ranges of 0 to 127. We'll write a function to help us achieve this.

In [12]:
def english_check(string):
    for letters in string:
        if ord(letters) > 127:
            return False
    return True

print(english_check('Instagram'))
print(english_check('Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠'))

True
False


However we want to retain apps that have special characters such as symbols and emoji and as these can be English apps.

In [13]:
print(english_check('Docs To Go‚Ñ¢ Free Office Suite'))
print(english_check('Instachat üòú'))

print(ord('‚Ñ¢'))
print(ord('üòú'))

False
False
8482
128540


To minimize the impact of data loss, we'll only remove an app if its name has more than three characters with corresponding numbers falling outside the ASCII range. This means all English apps with up to three emoji or other special characters will still be labeled as English. Our filter function is still not perfect, but it should be fairly effective.

Let's edit the function we created on the previous screen, and then use it to filter out the non-English apps.

In [14]:
def english_check(string):
    english_criteria = 0
    
    for letters in string:
        if ord(letters) > 127:
            english_criteria += 1

    if english_criteria > 3:
        return False
    else: return True

print(english_check('Docs To Go‚Ñ¢ Free Office Suite'))
print(english_check('Instachat üòú'))
print(english_check('Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠'))

True
True
False


We now use the polished function to filter our non-English apps from both data sets.

In [15]:
english_android = []

for app in android_clean:
    name = app[0]
    if english_check(name):
        english_android.append(app)

print('Current number of apps in Android =' , len(android_clean))
print('Removing Non-English apps in Android =', len(english_android))

Current number of apps in Android = 9659
Removing Non-English apps in Android = 9614


In [16]:
english_ios = []

for app in ios:
    name = app[1]
    if english_check(name):
        english_ios.append(app)

print('Current number of apps in Apple =' , len(ios))
print('Removing Non-English apps in Apple =', len(english_ios))

Current number of apps in Apple = 7197
Removing Non-English apps in Apple = 6183


## Isolating the Free Apps

So far in the data cleaning process, we've done the following:

* Removed inaccurate data
* Removed duplicate app entries
* Removed non-English apps

Now we will be isolating Free Appls only.

In [17]:
android_final = []
ios_final = []

for app in english_android:
    price = app[7]
    if price == '0':
        android_final.append(app)
        
for app in english_ios:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)
        
print('Number of Android Apps =', len(android_final))
print('Number of iOS Apps =', len(ios_final))

Number of Android Apps = 8864
Number of iOS Apps = 3222


## Analysis: Most common Apps by Genre

As we mentioned in the introduction, our goal is to determine the kinds of apps that are likely to attract more users because the number of people using our apps affect our revenue.

To minimize risks and overhead, our validation strategy for an app idea has three steps:

* Build a minimal Android version of the app, and add it to Google Play.
* If the app has a good response from users, we develop it further.
* If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful in both markets. For instance, a profile that works well for both markets might be a productivity app that makes use of gamification.

Let's begin the analysis by determining the most common genres for each market. For this, we'll build a frequency table for the prime_genre column of the App Store data set, and the Genres and Category columns of the Google Play data set.

We create a function named freq_table() that takes in two inputs: dataset (which will be a list of lists) and index (which will be an integer).

The function should return the frequency table (as a dictionary) for any column we want. The frequencies should also be expressed as percentages.

In [18]:
def freq_table(dataset, index):
    frequency = {}
    total = 0

    for row in dataset:
        total += 1
        value = row[index]
        
        if value in frequency:
            frequency[value] += 1
        else:
            frequency[value] = 1

    frequency_percentages = {}
    for key in frequency:
        percentage = (frequency[key] / total) * 100
        frequency_percentages[key] = percentage 
    
    return frequency_percentages

freq_table(android_final,9)

    

{'Art & Design': 0.5979241877256317,
 'Art & Design;Creativity': 0.06768953068592057,
 'Auto & Vehicles': 0.9250902527075812,
 'Beauty': 0.5979241877256317,
 'Books & Reference': 2.1435018050541514,
 'Business': 4.591606498194946,
 'Comics': 0.6092057761732852,
 'Comics;Creativity': 0.01128158844765343,
 'Communication': 3.2378158844765346,
 'Dating': 1.861462093862816,
 'Education': 5.347472924187725,
 'Education;Creativity': 0.04512635379061372,
 'Education;Education': 0.33844765342960287,
 'Education;Pretend Play': 0.056407942238267145,
 'Education;Brain Games': 0.033844765342960284,
 'Entertainment': 6.069494584837545,
 'Entertainment;Brain Games': 0.078971119133574,
 'Entertainment;Creativity': 0.033844765342960284,
 'Entertainment;Music & Video': 0.16922382671480143,
 'Events': 0.7107400722021661,
 'Finance': 3.7003610108303246,
 'Food & Drink': 1.2409747292418771,
 'Health & Fitness': 3.0798736462093865,
 'House & Home': 0.8235559566787004,
 'Libraries & Demo': 0.936371841155234

The display_table() function you see below does the following:

* Takes in two parameters: dataset and index. Dataset will be a list of lists, and index will be an integer
* Generates a frequency table using the freq_table() function
* Transforms the frequency table into a list of tuples, then sorts the list in a descending order
* Prints the entries of the frequency table in descending order


In [19]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

Using the newly defined display_table() function to analyse prime_genre (iOS data), Genres (android) and Category (android). 

In [20]:
#prime_genre from iOS dataset
display_table(ios_final,11)


Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


Top three genres of apps in Apple store are:

* Games 58%
* Entertainment 8%
* Photo & Video 5%

Games are the dominant app that Apple users are using. All top three app genres are for entertainment.

Building a game related app would attract more users compared to other genres. 

In [21]:
#Genres from Android dataset
display_table(android_final,1)


FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

Top three genres of apps in Android store are:

* Family 19%
* Game 10%
* Tools 8%

In [22]:
#Category from Android dataset
display_table(android_final,9)


Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

Top three Category of apps in Android store are:
* Tools 8%
* Entertainment 6%
* Education 5%

Across Android store, although games are still popular, there isn't a significant domination of gaming & entertainment apps like the Apple store. 

Android users seem to be after more practical apps that are family friendly and can be used as a tool or educate them on things. 

Further investigation reveals that Family apps are actually games that are designed for kids. Hence, gaming is still the dominant genre for android as well as apple. 

## Analysis: Most popular Apps by Genre on the App Store

The frequency tables we analyzed on the previous screen showed us that apps designed for fun dominate the App Store, while Google Play shows a more balanced landscape of both practical and fun apps. Now, we'd like to determine the kind of apps with the most users.

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot column.

Let's start with calculating the average number of user ratings per app genre on the App Store. To do that, we'll need to do the following:

Isolate the apps of each genre
Add up the user ratings for the apps of that genre
Divide the sum by the number of apps belonging to that genre (not by the total number of apps)

In [23]:
genre_ios = freq_table(ios_final,11)

for genre in genre_ios:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[11]
        if genre_app == genre:
            n_rating = float(app[5])
            total += n_rating
            len_genre += 1
    avg_n_rating = total / len_genre
    print(genre, ':', avg_n_rating)




Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


On average Navigation apps received the highest reviews even though Navigation apps are a small portion of the app store. This could be due to usefulness of the Navigation apps along with Push Review requests from the app itself.
    

In [24]:
for app in ios_final:
    if app[11] == 'Navigation':
        print(app[1], ':', app[5]) # print name and number of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching¬Æ : 12811
CoPilot GPS ‚Äì Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


Further investigation reveals that majority of the reviews are coming from Waze and Google Maps.

## Analysis: Most popular Apps by Genre on the Android Store

On the previous screen, we came up with an app profile recommendation for the App Store based on the number of user ratings. We have data about the number of installs for the Google Play market, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough ‚Äî we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.):

In [25]:
display_table(android_final,5) # the Installs columns

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


For instance, we don't know whether an app with 100,000+ installs has 100,000 installs, 200,000, or 350,000. However, we don't need very precise data for our purposes ‚Äî we only want to find out which app genres attract the most users.

We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on. To perform computations, however, we'll need to convert each install number from a string to a float. This means we need to remove the commas and the plus characters, or the conversion will fail and cause an error.

In [35]:
category_android = freq_table(android_final,1)

for category in category_android:
    total = 0
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace(',','')
            n_installs = n_installs.replace('+','')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = round(total / len_category)
    print(f"{category} : {avg_n_installs:,}")

        

ART_AND_DESIGN : 1,986,335
AUTO_AND_VEHICLES : 647,318
BEAUTY : 513,152
BOOKS_AND_REFERENCE : 8,767,812
BUSINESS : 1,712,290
COMICS : 817,657
COMMUNICATION : 38,456,119
DATING : 854,029
EDUCATION : 1,833,495
ENTERTAINMENT : 11,640,706
EVENTS : 253,542
FINANCE : 1,387,692
FOOD_AND_DRINK : 1,924,898
HEALTH_AND_FITNESS : 4,188,822
HOUSE_AND_HOME : 1,331,541
LIBRARIES_AND_DEMO : 638,504
LIFESTYLE : 1,437,816
GAME : 15,588,016
FAMILY : 3,695,642
MEDICAL : 120,551
SOCIAL : 23,253,652
SHOPPING : 7,036,877
PHOTOGRAPHY : 17,840,110
SPORTS : 3,638,640
TRAVEL_AND_LOCAL : 13,984,078
TOOLS : 10,801,391
PERSONALIZATION : 5,201,483
PRODUCTIVITY : 16,787,331
PARENTING : 542,604
WEATHER : 5,074,486
VIDEO_PLAYERS : 24,727,872
NEWS_AND_MAGAZINES : 9,549,178
MAPS_AND_NAVIGATION : 4,056,942


On Android, Communication apps had the greatest average installs at 38 million installs.

In [46]:
for app in android_final:
    if app[1] == 'COMMUNICATION':
        print(f"{app[0]} : {int(app[5].replace(',','').replace('+','')):,}") # print name and number of ratings

WhatsApp Messenger : 1,000,000,000
Messenger for SMS : 10,000,000
My Tele2 : 5,000,000
imo beta free calls and text : 100,000,000
Contacts : 50,000,000
Call Free ‚Äì Free Call : 5,000,000
Web Browser & Explorer : 5,000,000
Browser 4G : 10,000,000
MegaFon Dashboard : 10,000,000
ZenUI Dialer & Contacts : 10,000,000
Cricket Visual Voicemail : 10,000,000
TracFone My Account : 1,000,000
Xperia Link‚Ñ¢ : 10,000,000
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000
Skype Lite - Free Video Call & Chat : 5,000,000
My magenta : 1,000,000
Android Messages : 100,000,000
Google Duo - High Quality Video Calls : 500,000,000
Seznam.cz : 1,000,000
Antillean Gold Telegram (original version) : 100,000
AT&T Visual Voicemail : 10,000,000
GMX Mail : 10,000,000
Omlet Chat : 10,000,000
My Vodacom SA : 5,000,000
Microsoft Edge : 5,000,000
Messenger ‚Äì Text and Video Chat for Free : 1,000,000,000
imo free video calls and chat : 500,000,000
Calls & Text by Mo+ : 5,000,000
free video calls and chat :

The market is dominated by handful of apps with more than billion installs, which are: WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts.

## Conclusions

In this project we looked at what apps are popular in App Store and Google Play store, with a goal of identifying the best genre / category of apps that could generate revenue through advertisement.

For App store, market is dominated by Navigation apps and for Google Play store, Communication apps had the highest usage. 

Few players greatly skew the results and it will be a challenge to produce an app that can compete against them. 

Further analysis is required to delve into a niche within these apps to provide users that these popular apps do not cover.