# Marketing of a new profitable app for the Google Play and App Store markets

The aim of this project is to find a viable, free to download app that can be offered, further developed, and marketed to both app store markets. The app will generate revenue through in-app ads as its main source of income. 

We will take a look at the data sets for both app stores, isolating english language apps, cleaning the data for duplicates, non-free apps, and futher break down results by genre/category, then installation base. 

By cleaning and filtering the data we will be able to make a recommendation on which genre/category will provide the best chance for a new, free app to enter the market across both mobile enviorments and reach profitablility by attracting new users. 


## Opening and Exploring Data

The data sets that we are working with are comprised of information pertaining to Android apps on the Google play store and iOS app on the App store as of September 2018. There is approximately 2.1 million Android apps and 2 million iOS apps. 

To minimize the costs associated with analysis of such a large amount of data, we will be using a sample data set that could provide us with significant information at no cost. 

There are two data sets that we will use, one for each of the app markets:
- A [data set](https://www.kaggle.com/lava18/google-play-store-apps/home) containing data about approximately ten thousand Android apps from Google Play
- A [data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home) containing data about approximately seven thousand iOS apps from the App Store

We start by opening the two data sets and then continue with exploring the data.

In [1]:
from csv import reader

### The Google Play data set ###
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

### The App Store data set ###
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [3]:
print(android_header)
print('\n')
explore_data(android, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


We see that the Android app store has 10841 apps from this data set. The key columns to focus on are 'App', 'Rating', 'Installs', and 'Price'.

Information pertaining to all the columns can be found [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home).

In [4]:
print(ios_header)
print('\n')
explore_data(ios, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


We see that the iOS app store has 7197 apps from this data set. The key columns to focus on are 'Track_Name', 'Currency', 'Price', and 'User_rating'.

Information pertaining to all the columns can be found [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home).

## Deleting Wrong Data

In a discussion dedicated to the Google Play data set we can see that others have brought up information regarding errors in row 10472. We will take a look at this row and provide resoning for removal from our analysis.

In [44]:
print(android[10472]) # row with error
print('\n')
print(android_header)
print('\n')
print(android[0])

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


In [5]:
print(android[10472:10473])

[['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']]


In [6]:
### We removed row 10472 due to missing information. 
del(android[10472])

We deleted Row 10472, corresponding to the "Life Made Wi-Fi Touchscreen Photo Frame" app. When compared to the column key, rating information was missing. We removed this entry from the data set as a result. 

# Removing Duplicate Entries

## Part One

Next we checked the Google Play Store data set for duplicate entries. Having this information would compromise the validity of the conclusions we will draw from the information in the data set.

In [7]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[0:3])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business']


By checking for duplicate entries in the data set we can see that there are '1181' instances of duplicated app entries within the data set. 

The duplicate entries will be removed based upon the number of reviews the app has received, keeping the entry with the highest number of user reviews.

## Part Two

We start by building a dictionary for the review information.

In [24]:
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
print('Expected length:', len(android) - 1181)
print('Actual length:', len(reviews_max))

Expected length: 9659
Actual length: 9659


After creating a dictionary that replaced each instance of an app with its version that had the most user reviews, we were able to determine the length of the actual list. This was double checked against the expected length (len(android) - 1181 (number of duplicates).

In [9]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in (already_added)):
        android_clean.append(app)
        already_added.append(name)
        

In [10]:
print(len(android_clean))

9659


# Removing Non-English Apps


Next we want to check for and remove non-English applications from the data. To do this we will check against the ord function, for values great than the 0-127 range for English, according to the ASCII (American Standard Code for Information Interchange).

In [11]:
def is_English(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
            
    else:
        return True

In [12]:
print(is_English('Instagram'))

print(is_English('Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠'))

print(is_English('Docs To Go‚Ñ¢ Free Office Suite'))

print(is_English('Instachat'))




True
False
True
True


Certain apps have non-English characters in them that can not be properly identified by the ASCII range. We will use the orde(character) function to determine its corresponding number. Removing apps with these types of characters could not be ideal as it is beneficial to have them in the data set, so we will adjust the function accordingly to keep English apps with these types of characters in their names.

In [13]:
print(ord('‚Ñ¢'))
print(ord('üòú'))

8482
128540


To help improve the efficency of the function while not removing relevant apps we will modify it to all for entries with up to 3 characters that correspond to outside of the ASCII range. While this is not a perfect solution to data cleansing, it will help by a significant amount. 

Modifying the previous function to allow for this by referencing a non_ascii count will help in this process.

In [14]:
print(is_English('Docs To Go‚Ñ¢ Free Office Suite'))
print(is_English('Instachat üòú'))
print(is_English('Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠'))


True
True
False


As seen above, using the updated is_English function we were able to include the Docs to go and Instachat apps in the filter. These entries will be included in the data as a result. We will now filter out the data set using the function as seen below.

In [15]:
android_English = []
ios_English = []

for app in android_clean:
    name = app[0]
    if is_English(name):
        android_English.append(app)
    
for app in ios:
    name = app[1]
    if is_English(name):
        ios_English.append(app)
        

explore_data(android_English, 0, 3, True)
print('\n')
explore_data(ios_English, 0, 3, True)


    
        

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+'

Running each row of the data through the function created, we can see that there are 9614 Android apps and 6183 ios apps. 

Because we are most concerned with apps that are free in each respective app store. We will now begin to isolate the apps that are free to download.

In [16]:
android_free = []
ios_free = []

for app in android_English:
    price = app[7]
    if price == '0':
        android_free.append(app)
        
for app in ios_English:
    price = app[4]
    if price == '0.0':
        ios_free.append(app)
        
print('Number of free Android apps:', len(android_free))
print('Number of free ios apps:', len(ios_free))
    

Number of free Android apps: 8864
Number of free ios apps: 3222


# Most Common Apps by Genre

## Part One

Up until this point we have focused mainly on cleaning data. This was done by:

- Removing innacurate data
- Removing duplicate entires
- Removing non-English apps
- Isolating free apps

We removed these entries and focused on the free apps because we are mainly concerned with attracting the most users possible. Revenues are heavily correlated with the number of people using the appss and having them free to use will allow for the largest possible user base. 

To best achieve this goal we are going to:

1. Building an Android version of the app
2. Upon succesful reception of the app, continue to enchance the app by further developing.
3. If the app is deemed profitable, after a six month period, build an iOS version of the app.

Since the ideal scenario would be to have the app uploaded to both the Google Play and iOS App Store, we want to find traits that make an app successful on both markets. One way we can do this is by analyzing which are the most common genres for each app store market. We can accomplish this by means of frequency tables, referencing the Genres and Category columns for the Google Play data set and the Prime_Genre column of the App Store data set. 

## Part Two

We will now define two functions that will help analyze the frequency tables referencing the app genres.

- One function will generate the frequency tables to show percentages
- The other function will sort the percentages in descending order

In [17]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
            
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage
        
    return table_percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
            
    

## Part Three

We will now use the functions we've created to analyze the results of the frequency tables for category and genre column of the Android Google Play data set and the prime_genre column of the data set for the App Store.

In [18]:
display_table(android_free, 1) #Category

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

In [19]:
display_table(android_free, 9) #Genre

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

In [20]:
display_table(ios_free, 11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


By taking a look at the display tables that were generated we can draw some conclusions based on the free - English apps on the Google Play and App Store. Using the analysis on the 'prime_genre' and 'Category' & 'Genres' columns, we notice some trends across each data set.

Analyzing the 'Category' column of the Google play store we can see that the three most common categories are:
 - FAMILY : 18.90%
 - GAME : 9.72%
 - TOOLS : 8.46%
 
Analyzing the 'Genres' column of the Google Play store we can see that the three most common genres are:

 - Tools : 8.44%
 - Entertainment : 6.06%
 - Education : 5.34%
 

Using the results from both of these columns we can infer that the most frequently appearing apps are those designed for family, games, and tools. While there is 18.9% Family apps, we can further break this down to see that it is mostly games. Still, there is a diverse spread of the types of apps on the Google play Store, with a larger percentage skewed towards games.

The genre column offers numerous levels of granuality when categorizing each of the apps. The information show us there is a balance between practical purposes (education, shopping, utilities, productivity, lifestyle) and entertainment (games, photo and video, social networking, sports, music). While it is informative, we would prefer a more concise categorization moving forward. Moving forward, we will be using the 'Category' column.

Analyzing the 'prime_genre' column of the ios App store we can see that the three most common genres are:
 - Games : 58.16%
 - Entertainment : 7.88%
 - Photo & Video : 4.96%
 
We can see that the apps are more skewed towards that of entertainment with Games accounting for 58% of the apps, followed by entertainment at 8%.

Compared to the resutls of the Google Play store, we can see there is far less diversity among the apps with the larger amount of game related apps on the store.

Comparing the results we can see that there is a more balanced representation on the Google play store, while the App Store has a greater allocation of entertainment type apps. While we can see the resprenation of the apps themselves, we would like to now take a look at the number of users for each of these categories/genres.

# Most popular app on the Google Play Store

Taking a look at the Google Play Store data, information regarding the number of app installs is readily available. However, it is segemented ie. (100+, 1,000+, 5,000+, etc.). Because we are trying to summize which genres have the most users, we don't require perfectly percise numbers but rather an overall conclusion. We will leave the numbers as they are. An app with 1,000+ installs will have 1,000, an app with 100,000+ will have 100,000, and so on. 

One thing to note is that each of these numbers from the data set is listed as a string and we will need to remove the '+' and ',' to convert them to a float in order to process our functions.

In [29]:
display_table(android_free, 5) # the Installs column

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


In [28]:
num_android_users = freq_table(android_free, 1)

for category in num_android_users:
    total = 0
    len_category = 0
    for app in android_free:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace('+', '')
            n_installs = n_installs.replace(',', '')
            n_installs = float(n_installs)
            total += n_installs
            len_category += 1
    
    avg_n_installs = (total / len_category)
    print(category, ':', avg_n_installs)
             

LIBRARIES_AND_DEMO : 638503.734939759
COMMUNICATION : 38456119.167247385
AUTO_AND_VEHICLES : 647317.8170731707
COMICS : 817657.2727272727
DATING : 854028.8303030303
HEALTH_AND_FITNESS : 4188821.9853479853
NEWS_AND_MAGAZINES : 9549178.467741935
TRAVEL_AND_LOCAL : 13984077.710144928
GAME : 15588015.603248259
FINANCE : 1387692.475609756
ART_AND_DESIGN : 1986335.0877192982
EDUCATION : 1833495.145631068
FOOD_AND_DRINK : 1924897.7363636363
MAPS_AND_NAVIGATION : 4056941.7741935486
HOUSE_AND_HOME : 1331540.5616438356
PHOTOGRAPHY : 17840110.40229885
EVENTS : 253542.22222222222
BUSINESS : 1712290.1474201474
BOOKS_AND_REFERENCE : 8767811.894736841
SOCIAL : 23253652.127118643
FAMILY : 3695641.8198090694
BEAUTY : 513151.88679245283
PERSONALIZATION : 5201482.6122448975
PARENTING : 542603.6206896552
PRODUCTIVITY : 16787331.344927534
LIFESTYLE : 1437816.2687861272
ENTERTAINMENT : 11640705.88235294
WEATHER : 5074486.197183099
MEDICAL : 120550.61980830671
SHOPPING : 7036877.311557789
TOOLS : 10801391.29

Communication apps seem to have the most installs: 38,456,119. However, this can be attributed to the fact that only a handful of apps have over one billion installations (Facebook Messenger, Skype, WhatsApp, Gmail, and Hangouts), and a few with 100-500 million installations. 

In [31]:
for app in android_free:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger ‚Äì Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Me

Removing the app with over 100 million installations would reduce the average dramatically. 

In [33]:
under_100_m = []

for app in android_free:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
sum(under_100_m) / len(under_100_m)

3603485.3884615386

The pattern that we have drawn from the communications app is similarily found withing the video players category, which has 24,727,872 installs. Again, the market is dominated by a few apps like Youtube, MX Player, and Google Play Movies & TV. The same can be said for apps in the social category, where giants like Facebook and Instagram make up a huge portion of the installs as well as the productivity apps category with Microsoft Word, DropBox, Google Calendar, and Evernote making up the largest segement of installs. 

What we can draw from these finds is that the categorys themselves seem more popular than they actually are. When these categories are dominated by only a handful of apps it makes competition difficult. 

Looking to make a category recommendation, we could consider books and reference. It is popular as well with an average number of installs of 8,767,811. There is a variety of apps that fall under this category. With a goal of finding an app that shows potential for both the Google Play Store and the App Store this could be one to keep in mind.

Taking a deeper look at the number of apps for the books and reference we can see the variety and spread of installations.

In [34]:
for app in android_free:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra ‚Äì free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+

Looking at the number of installs we can see that while there is increased diversity there is a still a few number of apps that skew the average: 

In [37]:
for app in android_free:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                           or app[5] == '500,000,000+'
                                           or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad üìñ Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


The number of very popular apps are limited to just, so there still room for new entries while not facing overwhelming odds by dominated apps in this category. We can look futher into the popularity of these apps by breaking down number of installs again on a smaller scale.

In [42]:
for app in android_free:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                          or app[5] == '5,000,000+'
                                          or app[5] == '10,000,000+'
                                          or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra ‚Äì free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+

Looking more closely at the result set of apps, it appears that a majority are of the ebook reading and processing software, in addition to collections of libraries and dictionaries. 

There are a number of apps for the reading of the Quran, which leads to the conclusion that an app centered around a popular book can be both popular and potentially profitable on both the Google Play and App Stores.

A way to have the app stand out would be to include additional features above the basic reading capabilities. An app that could provide additional user functions could help it differential from the crowd of apps already in this particular category. 

# Most popular app on the App Store

Now we will take a look at the most popular apps on the app store. While the number of installs is readily available for the Google Play store apps, this is not the case for the app store. As a a result, we will have to determine the number of installs using the average number of reviews per genre as a proxy. We will do this by using the 'ratings_count_tot' column.

In [None]:
num_ios_users = freq_table(ios_free, 11)

for genre in num_ios_users:
    total = 0
    len_genre = 0
    for app in ios_free:
        genre_app = app[11]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
            
    avg_n_ratings = (total / len_genre)
    print(genre, ':', avg_n_ratings)
        

From the results we can see that the most popular genre, based on average number of ratings, is Navigation. This could be due to the widespread use and review of Google Maps and Waze. 

In [41]:
for app in ios_free:
    if app[11] == 'Navigation':
        print(app[1], ':', app[5])
    

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching¬Æ : 12811
CoPilot GPS ‚Äì Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


In line with our assumption, Waze and Google Maps make up a bulk of the number of reviews by a large margin.

The Reference genre comes in at second with regards to the average number of ratings. We can take a look at which apps this is made up of by using the same loop as above.

In [23]:
for app in ios_free:
    if app[11] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ‚Ñ¢ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pok√©mon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
Êïô„Åà„Å¶!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


The reference genre is largely comprised of reviews from Bible and Dictionary apps. There could be an opportunity here for a new app as there is not as much competition and compared to the results from the Google Play store, there is a lack of practical applications. A free reference app could address one of these gaps in the App Store. 

# Conclusion

Analyzing the apps on both the Google Play store and ios App Store after cleaning and disecting the information we can draw some conclusions for making a recommendation on a new, profitable app for both mobile markets. 

By creating an app that could used to read a popular book, that would allow for features above the basic reading capabilites, there is a potential for a profitable endevour on both app stores. 