# Analyzing Mobile App Data

The goal of this project is to find app profiles in Google Play and Apple Store that are profitable. Let's imagine that I'm making this analysis for a company that builds Android and iOS mobile apps, and my job is to enable the team of developers to make data-driven decisions with respect to the kind of apps they build. I will focus on free apps, where the main source of revenue is from in app ads. I will analyze data to understand what kind of free apps attract the most users, generate the most revenue.

I work with two data sources:

1.) A dataset about approx. 10,000 Android apps from Google Play: [link](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv)

2.) A dataset about approx. 7,000 iOS apps from the App Store: [link](https://dq-content.s3.amazonaws.com/350/AppleStore.csv)

# Opening and Exploring Data

I start by opening the data sources:

In [2]:
from csv import reader

## Open IOS data
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

## Open Android data
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

To make it easier to explore the dataset, I defined a function named `explore_data()` that prints the rows in a readable way and tells the number of rows and columns.

In [3]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [4]:
print(ios_header)
print('\n')
explore_data(ios, 0, 5, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns: 16


We have 7197 rows in the iOS dataset and the columns that seem useful are: 'track_name', 'price', 'rating_count_tot' and 'prime_genre'. Not all the column names are self-explanatory, the documentation can be found [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps).

In [5]:
print(android_header)
print('\n')
explore_data(android, 0, 5, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Eve

There are 10,841 rows in the Android dataset, from which 'App', 'Category', 'Reviews', 'Installs' seems useful for our analysis. The documentation for the column names can be found [here](https://www.kaggle.com/lava18/google-play-store-apps).

# Deleting Wrong Data

The Google Play data set has a dedicated [discussion](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) section, and we can see that one of the discussions outlines an error for row 10472. Let's print this row and compare it against the header and another row that is correct.

In [6]:
print(android[10472]) # incorrect row
print('\n')
print(android_header) # header
print('\n')
print(android[0]) # correct row

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


After printing these 3 rows, we can see that the 'Category' data is missing from row 10472 and the user rating is 19, even though the maximum rating at Google Store is 5. Let's remove the row.

In [7]:
print(len(android))
del android[10472]
print(len(android))

10841
10840


# Removing Duplicates

If we explore the Google Play dataset long enough and check the [discussion](https://www.kaggle.com/lava18/google-play-store-apps/discussion), we can notice that there are some duplicates. For example, Instagram has 4 entries:

In [8]:
for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


To find all the duplicate entries, I created two lists: one for storing the name of the duplicate apps and one for storing the name of the unique apps.

In [9]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps))
print('Examples of duplicate apps:', duplicate_apps[:10])

Number of duplicate apps: 1181
Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


There are 1181 cases where an app occurs more than once. We don't want to count with the same apps more than once when analyzing the data, so we need to remove the duplicate entries.

If we examine the difference between the Instagram entries above, we notice that the main difference is the number of reviews in the 'Review' row. By keeping the entry which has the highest number of reviews, we can make sure that we will work with the most recent data.

I will create an empty dictionary where the keys are the app names and the values are the number of reviews. I will loop through every row and check whether the app name already exists in the dictionary. If it exists and the number of reviews are higher, then I will update the dictionary with the number of reviews for that entry. Otherwise, if the app name is not yet in the dictionary, I will just create a new entry.

In [10]:
reviews_max = {} # Create an empty dictionary

for app in android:
    name = app[0] # assign the app name to a variable
    n_reviews = float(app[3]) # convert the number of reviews to float and assingn it to a variable
    

    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

To make sure the code worked as expected, let's check the length of the dictionary, it should be 1181 less than the length of the original dataset.

In [11]:
print('Expected length:', len(android) - 1181)
print('Dictionary length:', len(reviews_max))

Expected length: 9659
Dictionary length: 9659


To remove duplicate rows: 

* I created 2 empty lists: one to store the new, cleaned dataset, and another one to store the app names,
* I assigned the app names and the number of reviews to variables,
* I added the current row (`app`) to the `android_clean` list and the app name (`name`) to the `already_added` list if the number of reviews of the app matches the number of reviews in the `reviews_max` dictionary, and the name of the app is not already in the `already_added` list.

I added this extra condition because in some cases, the highest number of reviews are the same for more than one entries among the duplicate apps. If we only use the first condition, we will still end up with duplicate apps.

In [12]:
android_clean = [] # list to store the new cleaned dataset
already_added = [] # list to store app names

for app in android:
    name = app[0] #assign the name of the app to a variable
    n_reviews = float(app[3]) #assign the number of reviews to a variable
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

Let's explore the new dataset with the previously defined `explore_data` function and check if the number of rows are 9,659 (10,840-1,181)

In [13]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


# Removing Non-English Apps

There are some apps in both datasets that are not directed toward English-speaking audience based on their names. We'll remove those from the analysis.

All the English characters are encoded using the ASCII standard. Each ASCII character has a corresponding number between 0 to 127. We will take advantage of this, using the `ord()` built-in function to find out the corresponding encoding number of each character in the app names.

In [14]:
def is_english(string):
    for character in string:
        if ord(character) > 127:
            return False
    return True

print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Instagram'))
print(is_english('Instachat 😜'))
print(is_english('Docs To Go™ Free Office Suite'))

False
True
False
False


The new function works fine, however some of the English app names use emojis and symbols that fall outside of the ASCII range. The function in its current form would remove useful apps. To minimize the data loss, I'll only remove an app if its name has more than 3 non-ASCII characters.

In [15]:
def is_english(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    if non_ascii > 3:
        return False
    else:
        return True

Let's test the new function on a few examples:

In [16]:
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
True


The function is good enough for this analysis, let's filter out the non-English apps for both datasets.

In [17]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    
    is_english(name)
    if is_english(name):
        android_english.append(app)
        
for app in ios:
    name = app[1]
    
    is_english(name)
    if is_english(name):
        ios_english.append(app)
        
explore_data(android_english, 0, 3, True)
print('\n')
explore_data(ios_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

After removing the non-English apps, we are left with 9,614 Android apps and 6,183 IOS apps.

# Isolating the free apps

So far in the data cleaning process, I:
* removed innacurate data,
* removed duplicate entries,
* removed non-English apps.

As I mentioned in the introduction, this analysis focuses on free apps only where the main source of revenue is from in-app ads. As a next step, I will isolate the free apps.

In [18]:
android_final = []
ios_final = []

for app in android_english:
    app_price = app[7]
    if app_price == '0':
        android_final.append(app)
        
for app in ios_english:
    app_price = app[4]
    if app_price == '0.0':
        ios_final.append(app)
        
print(len(android_final))
print(len(ios_final))

8864
3222


After cleaning the dataset, we have 8,864 Android apps and 3,222 IOS apps for analysis.

# Most common apps by genre

To repeat the aim of the project: I am trying to determine the kinds of apps that are likely to attract more users because the revenue is highly influenced by the number of people using the apps.

At our imaginary company, we want to minimize risks and overhead so the way we validate the app ideas is the following:

1. Build a minimal Android version of the app, and add it to Google Play
2. If the app has a good response from users, we develop it further
3. If the app is profitable after 6 months, we build and IOS version of the app and add it to the App Store.

Because the end goal is to have the app on both Google Play and App Store, I will need to find the app profiles that are successful on both market. I will begin my analysis by understanding the most common genres for each market. For this, I'll build frequency tables for the `prime_genre` column of the App Store dataset and for the `Genres` and `Category` columns of the Google Play dataset.

I start by defining two new functions:
1. to generate frequency tables for any column and show percentages (`freq_table`)
2. to transform the frequency table into a list of tuples and sort the list in a descending order (`display_table`)

In [19]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
            
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages
            
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

Let's use the new function on the columns `prime_genre`, `Genres` and `Category`.

In [20]:
display_table(ios_final, 11) # prime genre column

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


We can see that among the non-English IOS apps, more than half of the apps are Games (58.16%). The second most frequent genre is Entertainment with less than 8% (7.88%), followed by Photo & Video with 4.97%. Only 3.66% is Education related followed by Social Networking with 3.29%.

Overall, we can see that the "fun" apps are dominating (games, entertainment, photo & video, social networking, sports, music etc.) however it doesn't mean that they are the most popular among the users.

Let's see what we can learn from the Google Play data.

In [21]:
display_table(android_final, 1) # Category column

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

Interestingly, the Google Play `Category` data  is significantly different: the majority of the apps are designed for practical purposes (family, tools, business, lifestyle, productivity, etc.), although games are still the second most frequent category with 9.72%.

However, if we check the Family category (18.9%) at [Google Play](https://play.google.com/store/apps/category/FAMILY?hl=en), we can see that it consists of games for kids mostly.

Overall, it's a more balanced mix of fun and practical apps than we saw in the App Store.

Let's see the `Genres` column now.

In [22]:
display_table(android_final, -4) # Genre column

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

The difference between the `Category` and the `Genres` column is not very clear, however we can see that the `Genres` columns has a lot more categories, it's more granular.

The `Category` column is more practical for our analysis, we only need to see the big picture at the moment.

To summarize the learnings, the App Store is dominated by apps for fun, whereas Google Play has a more balanced mix of practical and fun apps.

Now let's find out which kind of apps have the most users.

# Most popular apps by genre on the App Store

One way to find out what genres have the most users is to calcualte the average number of installs for each app genre. However, this information is only available about the android apps. As a workaround, I will use the `rating_count_tot` column that contains the total number of user ratings.

I will start by generating a frequency table for the `prime_genre` column to get the unique app genres of the App Store dataset.

In [23]:
genres_ios = freq_table(ios_final, -5) # frequency table about prime_genre

for genre in genres_ios:
    total = 0 # variable to store the sum of user ratings
    len_genre = 0 # variable to store the number of apps specific to each genre
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_rating = total / len_genre
    print(genre, ":", avg_rating)

Shopping : 26919.690476190477
Music : 57326.530303030304
Weather : 52279.892857142855
Productivity : 21028.410714285714
Navigation : 86090.33333333333
Sports : 23008.898550724636
Games : 22788.6696905016
News : 21248.023255813954
Food & Drink : 33333.92307692308
Education : 7003.983050847458
Social Networking : 71548.34905660378
Business : 7491.117647058823
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Photo & Video : 28441.54375
Utilities : 18684.456790123455
Reference : 74942.11111111111
Book : 39758.5
Finance : 31467.944444444445
Medical : 612.0
Travel : 28243.8
Catalogs : 4004.0
Health & Fitness : 23298.015384615384


The most popular categories are: Navigation, Reference, Social Networking and Music. If we explore these categories a bit, we will see that the number of ratings are heavily influenced by a few extra-popular apps that have 100,000+ ratings while the others struggle to get past the 10,000 threshold.

In [24]:
for app in ios_final:
    if app[-5] == "Navigation":
        print(app[1], app[5])

Waze - GPS Navigation, Maps & Real-time Traffic 345046
Google Maps - Navigation & Transit 154911
Geocaching® 12811
CoPilot GPS – Car Navigation & Offline Maps 3582
ImmobilienScout24: Real Estate Search in Germany 187
Railway Route Search 5


The Navigation category is dominated by 2 main players: Waze and Google Maps, the others have insignificant numbers so altogether it's not a popular category, I wouldn't recommend my imaginary company to develop Navigation apps.

In [25]:
for app in ios_final:
    if app[-5] == "Reference":
        print(app[1], app[5])

Bible 985920
Dictionary.com Dictionary & Thesaurus 200047
Dictionary.com Dictionary & Thesaurus for iPad 54175
Google Translate 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition 17588
Merriam-Webster Dictionary 16849
Night Sky 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools 4693
GUNS MODS for Minecraft PC Edition - Mods Tools 1497
Guides for Pokémon GO - Pokemon GO News and Cheats 826
WWDC 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free 718
VPN Express 14
Real Bike Traffic Rider Virtual Reality Glasses 8
教えて!goo 0
Jishokun-Japanese English Dictionary & Translator 0


The number 1 player in the Reference category is the Bible, followed by dictionaries and online game manuals. It can be an interesting idea to check the reviews of these apps and try to come up with a new app that is about religion or dictionary infused with some fun features, such as day or quote of the day, quizes, seeing what other people highlight while reading etc.

In [26]:
for app in ios_final:
    if app[-5] == "Social Networking":
        print(app[1], app[5])

Facebook 2974676
Pinterest 1061624
Skype for iPhone 373519
Messenger 351466
Tumblr 334293
WhatsApp Messenger 287589
Kik 260965
ooVoo – Free Video Call, Text and Voice 177501
TextNow - Unlimited Text + Calls 164963
Viber Messenger – Text & Call 164249
Followers - Social Analytics For Instagram 112778
MeetMe - Chat and Meet New People 97072
We Heart It - Fashion, wallpapers, quotes, tattoos 90414
InsTrack for Instagram - Analytics Plus More 85535
Tango - Free Video Call, Voice and Chat 75412
LinkedIn 71856
Match™ - #1 Dating App. 60659
Skype for iPad 60163
POF - Best Dating App for Conversations 52642
Timehop 49510
Find My Family, Friends & iPhone - Life360 Locator 43877
Whisper - Share, Express, Meet 39819
Hangouts 36404
LINE PLAY - Your Avatar World 34677
WeChat 34584
Badoo - Meet New People, Chat, Socialize. 34428
Followers + for Instagram - Follower Analytics 28633
GroupMe 28260
Marco Polo Video Walkie Talkie 27662
Miitomo 23965
SimSimi 23530
Grindr - Gay and same sex guys chat, meet

The data shows that the market of the Social Networking apps is also not balanced, a few giant players have more ratings in total individually than the rest of the apps together. It would be hard to compete with them but I think there's still market for niche networking apps, for example an app that connects students or career changers who would like to know more about the field that they are interested from someone who already works in that profession and is willing to answer questions or mentor the newbie.

Now let's analyze the Google Play market a bit.

# Most popular apps by genre on Google Play

In case of the App Store I had to come up with recommendations based on the number of user ratings. Luckily, I have data about the number of installs for the Google Play apps so I can have a clearer picture about the genre popularity.

The only issue is that the install numbers are open-ended:

In [27]:
display_table(android_final, 5) # installs column

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


I don't need perfect precision about the number of users, these ranges are good enough for my comparison. 

I will leave the numbers as they are, which means I'll consider that an app with 100+ installs has 100 installs, and an app with 5,000+ installs has 5,000 installs and so on. 

I will need to convert the strings to floats: removing the commas and the plus signs. After that I will calculate the average number of installs.

In [28]:
categories_android = freq_table(android_final, 1)

for category in categories_android:
    total = 0 # variable to store the sum of installs specific to each genre
    len_category = 0 # variable to store the number of apps specific to each genre
    for app in android_final:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
            
    avg_installs = (total / len_category)
    print(category, " : ", avg_installs)

SPORTS  :  3638640.1428571427
EDUCATION  :  1833495.145631068
NEWS_AND_MAGAZINES  :  9549178.467741935
HEALTH_AND_FITNESS  :  4188821.9853479853
AUTO_AND_VEHICLES  :  647317.8170731707
MAPS_AND_NAVIGATION  :  4056941.7741935486
BOOKS_AND_REFERENCE  :  8767811.894736841
FAMILY  :  3695641.8198090694
VIDEO_PLAYERS  :  24727872.452830188
TRAVEL_AND_LOCAL  :  13984077.710144928
FINANCE  :  1387692.475609756
DATING  :  854028.8303030303
EVENTS  :  253542.22222222222
LIFESTYLE  :  1437816.2687861272
PERSONALIZATION  :  5201482.6122448975
COMMUNICATION  :  38456119.167247385
GAME  :  15588015.603248259
SHOPPING  :  7036877.311557789
MEDICAL  :  120550.61980830671
LIBRARIES_AND_DEMO  :  638503.734939759
BUSINESS  :  1712290.1474201474
HOUSE_AND_HOME  :  1331540.5616438356
ENTERTAINMENT  :  11640705.88235294
PRODUCTIVITY  :  16787331.344927534
FOOD_AND_DRINK  :  1924897.7363636363
SOCIAL  :  23253652.127118643
COMICS  :  817657.2727272727
ART_AND_DESIGN  :  1986335.0877192982
BEAUTY  :  513151.

The most popular category is the Communication with 38 million installs but again, it's heavily influenced by a few giants:

In [29]:
for app in android_final:
    if app[1] == 'COMMUNICATION' and app[5] == '1,000,000,000+':
        print(app[0], ":", app[5])

WhatsApp Messenger : 1,000,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
Skype - free IM & video calls : 1,000,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+


If I remove the apps that have over 100 million installs, the average will shrink roughly 10 times:

In [30]:
under_100m = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100m.append(float(n_installs))
print(sum(under_100m) / len(under_100m))    

3603485.3884615386


We can see the same pattern for the categories that are next in line:

* Video players: Youtube, Google Play Movies & TV
* Social: Facebook, Instagram, Google+
* Productivity: Microsoft Word, Dropbox, Google Calendar

The problem is that these few popular apps can mislead us in judging how popular their category really is, moreover it's hard to compete against the giants.

The Books and Reference category seems popular on Google Play too, as my goal is to recommend an app genre that has potential on IOS too, it makes sense to explore it:

In [31]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ":", app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

The books are mostly dictionaries, e-books, software development manuals, the Bible and the Quran are fairly popular too so the idea I had after the IOS analysis still makes sense.

Let's look at the Social category by listing the medium-popular apps:

In [33]:
for app in android_final:
    if app[1] == 'SOCIAL' and (app[5] == '1,000,000+'
                                or app[5] == '5,000,000+'
                                or app[5] == '10,000,000+'
                                or app[5] == '50,000,000+'):
        print(app[0], ":", app[5])

TextNow - free text + calls : 10,000,000+
The Messenger App : 1,000,000+
Messenger Pro : 1,000,000+
Free Messages, Video, Chat,Text for Messenger Plus : 1,000,000+
Telegram X : 5,000,000+
Jodel - The Hyperlocal App : 1,000,000+
Hide Something - Photo, Video : 5,000,000+
Love Sticker : 1,000,000+
Web Browser & Fast Explorer : 5,000,000+
LiveMe - Video chat, new friends, and make money : 10,000,000+
VidStatus app - Status Videos & Status Downloader : 5,000,000+
Love Images : 1,000,000+
SPARK - Live random video chat & meet new people : 5,000,000+
Facebook Local : 1,000,000+
Meet – Talk to Strangers Using Random Video Chat : 5,000,000+
MobilePatrol Public Safety App : 1,000,000+
💘 WhatsLov: Smileys of love, stickers and GIF : 1,000,000+
HTC Social Plugin - Facebook : 10,000,000+
Quora : 10,000,000+
Kate Mobile for VK : 10,000,000+
Family GPS tracker KidControl + GPS by SMS Locator : 1,000,000+
Moment : 1,000,000+
Text Me: Text Free, Call Free, Second Phone Number : 10,000,000+
Text Free: 

Developing a social app still seems to be a valid idea, there are plenty of apps in this genre that were installed by millions of people, there's space for new social apps. I can even foresee work blog posts that would help people to decide which career suits them: the professionals could update their feed with photos about their daily job, small stories about accomplishments or stressful situations at work etc., we could even engage enterprises to use our app for employer branding.

# Conclusions

In this project, I analyzed data about Google Play and App Store mobile apps with the purpose of finding an app profile that can be profitable to develop for both markets.

My conclusion was to focus on the Books and Social genres.