## Profitable App Profiles for the App Store and Google Play

In this project we are looking at what type of app are likely to attract more users.  The apps are free to download and install and the main source of revenue consists of in-app ads.  For any given app the revenue is mostly indfluenced by the number of users of that app.<br>We will collect and analyze data about mobile apps available on Google Play and the App Store.<br>
* The Google Play data set containing approximately 10,000 Android apps can be found [here](https://www.kaggle.com/lava18/google-play-store-apps)
* The App Store data set containing approximately 7,000 iOS apps can be found [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)

These datasets are from 2018 and 2017, respectively.

In [1]:
import pandas as pd
import csv
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline 

In [2]:
def create_file_list(file):
    file_open=open(file)
    file_read=csv.reader(file_open)
    return list(file_read)
ios_all=create_file_list('AppleStore.csv')
ios_header=ios_all[0]
ios=ios_all[1:]
android_all=create_file_list('googleplaystore.csv')
android_header=android_all[0]
android=android_all[1:]

In [3]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice=dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:',len(dataset[0]))

In [4]:
#iOS
print('Headers-iOS','\n','\n',ios_header,'\n')
print('First few rows of data-iOS','\n')
print(explore_data(ios, 0,2,rows_and_columns=True))

Headers-iOS 
 
 ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] 

First few rows of data-iOS 

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 7197
Number of columns: 16
None


Not all of the column headers are self-explanatory, see data documentation for description [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps).First glance some of the columns that may be useful: `id`,`track_name`,`price`,`rating_count`,`user_rating`,`prime_genre`

In [5]:
#Android
print('Headers-Android','\n','\n',android_header,'\n')
print('First few rows of data-Android','\n')
print(explore_data(android, 0,2,rows_and_columns=True))

Headers-Android 
 
 ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

First few rows of data-Android 

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13
None


Not all of the column headers are self-explanatory, see data documentation for description [here](https://www.kaggle.com/lava18/google-play-store-apps).  First glance some of the columns that may be useful: 
`App`,`Category`,`Rating`,`Reviews`,`Installs`,`Price`,`Genres`

### Data Cleaning
* Detect inaccurate data, correct or remove
* Detect duplicate data, remove it
* Remove non-english apps (we are only interested in English speaking audiance for this project)
* Remove apps that aren't free (we are only concerned with apps free to download and install for this project)

The Google play [discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) describes an error for a certain row, 10472.  Let us print this row and check it out.  It appears as if it is missing the category and the data has shifted.  We could research and try to find the category or delete it.  It is just one row so we will delete it.

In [6]:
print(android_header)
print(android[10472])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [7]:
del android[10472]

### Removing Duplicates
The discussion section also indicates multiple duplicate entries.  We will define a function to create a list of the names of the duplicate apps and a list of the names of the unique apps.  We don't want to delete the duplicates as random, first we must explore the duplicates to see the differences to best determine which one to keep.

In [8]:
#Function to create list of duplicate and unique apps
def duplicate_apps(app_list):
    duplicate_apps=[]
    unique_apps=[]
    for app in app_list:
        name=app[0]
        if name in unique_apps:
            duplicate_apps.append(name)
        else:
            unique_apps.append(name)
    print('Number of duplicate apps:', len(duplicate_apps),'\n')
    print('Examples of duplicate apps:', duplicate_apps[:20])
    return duplicate_apps, unique_apps

In [9]:
#Android
android_duplicate_apps, android_unique_apps = duplicate_apps(android)

Number of duplicate apps: 1181 

Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software', 'MailChimp - Email, Marketing Automation', 'Crew - Free Messaging and Scheduling', 'Asana: organize team projects', 'Google Analytics', 'AdWords Express']


Let us look at Slack apps in the Android apps list and determine a criteria for deleting duplicates.

In [10]:
for app in android:
    name=app[0]
    if name == 'Slack':
        print(app)

['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51510', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']


Column 4, `Reviews`, is the difference in these duplicates.  We will keep the row with the highest number of reviews.

In [11]:
print('Expected length for Android apps after removing duplicates:', len(android)-len(android_duplicate_apps))

Expected length for Android apps after removing duplicates: 9659


Now we will remove the duplicates for the iOS apps

In [12]:
#iOS
ios_duplicate_apps, ios_unique_apps = duplicate_apps(ios)

Number of duplicate apps: 0 

Examples of duplicate apps: []


There are no duplicates in the ios app list.

In [13]:
def max_reviews(app_list):
    reviews_max={}
    for app in app_list:
        name=app[0]
        n_reviews=float(app[3])
        if name in reviews_max and reviews_max[name]<n_reviews:
            reviews_max[name]=n_reviews
        if name not in reviews_max:
            reviews_max[name]=n_reviews
    return reviews_max
def clean_app_list(app_list):
    reviews_max=max_reviews(app_list)
    clean=[]
    already_added=[]
    for app in app_list:
        name=app[0]
        n_reviews=float(app[3])
        if n_reviews == reviews_max[name] and name not in already_added:
            clean.append(app)
            already_added.append(name)
    return clean, already_added
        

In [14]:
android_clean, android_already_added=clean_app_list(android)

In [15]:
len(android_clean)

9659

### Removing non English apps
Sccording to the [ASCII](https://en.wikipedia.org/wiki/ASCII) (American Standard Code for Information Interchange) system the numbers corresponding to english characters range from 0 to 127.  We will create a function to remove the apps with non english characters.<br>To account for english apps with emojis and characters like `™` we will remove apps with more than three characters that fall out of the ASCII range.  This allows us to keep english apps with up to three special characters.

In [16]:
def english(string):
    count=0
    for char in string:
        if ord(char) > 127:
            count+=1
    if count > 3:
        return False
    else:
        return True
    

In [17]:
print(english('Instagram'))
print(english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english('Docs To Go™ Free Office Suite'))
print(english('Instachat 😜'))

True
False
True
True


In [18]:
android_english=[]
ios_english=[]
for app in android_clean:
    name=app[0]
    if english(name):
        android_english.append(app)
        
for app in ios:
    name=app[1]
    if english(name):
        ios_english.append(app)
        
print('Number of android english apps:',len(android_english))
print('Number of non english android apps deleted:',len(android_clean)-len(android_english))

print('Number of ios english apps:',len(ios_english))
print('Number of non english ios apps deleted:',len(ios)-len(ios_english))
        
        

Number of android english apps: 9614
Number of non english android apps deleted: 45
Number of ios english apps: 6183
Number of non english ios apps deleted: 1014


### Isolate the apps that are free to download

In [19]:
print('Headers-android','\n','\n',android_header,'\n')
print(explore_data(android_english, 0,2,rows_and_columns=True))
print(25 *'-')
print('Headers-iOS','\n','\n',ios_header,'\n')
print(explore_data(ios_english, 0,2,rows_and_columns=True))

Headers-android 
 
 ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 9614
Number of columns: 13
None
-------------------------
Headers-iOS 
 
 ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] 

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '

We will loop through each dataset and isolate the free apps into separate lists and see how many apps we have remaining.  The `Price` index for ios is 4. The `Price` index for android is 7.

In [20]:
def get_free(dataset, idx):
    free_apps=[]
    for app in dataset:
        price = app[idx]
        if price == '0' or price == '0.0':
            free_apps.append(app)
    return free_apps

In [21]:
android_free=get_free(android_english, 7)
ios_free=get_free(ios_english, 4)
print('Headers-android','\n','\n',android_header,'\n')
print(explore_data(android_free, 0,2,rows_and_columns=True))
print(25 *'-')
print('Headers-iOS','\n','\n',ios_header,'\n')
print(explore_data(ios_free, 0,2,rows_and_columns=True))

Headers-android 
 
 ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 8864
Number of columns: 13
None
-------------------------
Headers-iOS 
 
 ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] 

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '

We have cleaned both datasets by:
* removing duplicates (keeping app with move reviews)
* removing inaccurate data
* removing non-English apps
* isolating free apps

The aim is to determine the kinds of apps that are likely to attract more users because the revenue is influenced by the number of people using the apps.<br>
The validation strategy where risks and overhead are minimized is broken down into three steps:
1. Build a minimal Android version of the app, add it to Google Play
2. If the app has a good response, develop it further
3. If the app is profitable after six months, build an iOS version and add it to the App Store.

Ultimately, we want to add the add to both Google Play and the App Store.  Therefore, we need to find app profiles that are successful on both markets.  We begin our data exploration with the cleaned datasets.

### Data Exploration
#### Genres
In the Android dataset we will use `Category` and `Genres`, index 1 and 9.<br>
In the iOS dataset we will use `prime_genre`, index 11.<br>
We will create a function to generate frequency tables to show percentages and another function to display in descending order.

In [24]:
def freq_table(dataset, idx):
    freq_dict={}
    total_genre=len(dataset)
    for genre in dataset:
        genre=genre[idx]
        if genre in freq_dict:
            freq_dict[genre]+=(1/total_genre)*100
        else:
            freq_dict[genre]=(1/total_genre)*100
    return freq_dict
#Need to transform the dictionary into a list of tuples in order to sort in descending order.
def display_table(dataset, idx):
    table = freq_table(dataset, idx)
    table_display=[]
    for key in table:
        key_value_as_tuple=(table[key],key)
        table_display.append(key_value_as_tuple)
    
    table_sorted=sorted(table_display, reverse=True)
    for entry in table_sorted:
        print(entry[1],':',entry[0])

In [35]:
ios_genre=display_table(ios_free,11)

Games : 58.1626319056464
Entertainment : 7.883302296710134
Photo & Video : 4.965859714463075
Education : 3.6623215394165176
Social Networking : 3.2898820608317867
Shopping : 2.6070763500931133
Utilities : 2.5139664804469306
Sports : 2.1415270018621997
Music : 2.048417132216017
Health & Fitness : 2.0173805090006227
Productivity : 1.7380509000620747
Lifestyle : 1.5828677839851035
News : 1.3345747982619496
Travel : 1.2414649286157668
Finance : 1.1173184357541899
Weather : 0.8690254500310364
Food & Drink : 0.8069522036002481
Reference : 0.558659217877095
Business : 0.5276225946617009
Book : 0.4345127250155184
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


In the iOS dataset for free, english apps the `Games` genre comprises 58.2% of the apps followed by `Entertainment` with 7.9% and `Photo & Video` with 5%.  The top apps are more in line with entertainment as opposed to pratical purposes such us travel, news, weather, reference.  Based on this frequency table alone the focus should be on gaming and entertainment apps.<br> To support entertainment at top apps to focus on we could look at number of ratings and the ratings.

In [36]:
android_genre=display_table(android_free, 9)

Tools : 8.449909747292507
Entertainment : 6.069494584837599
Education : 5.34747292418777
Business : 4.591606498194979
Productivity : 3.8921480144404565
Lifestyle : 3.8921480144404565
Finance : 3.7003610108303455
Medical : 3.5311371841155417
Sports : 3.46344765342962
Personalization : 3.3167870036101235
Communication : 3.2378158844765483
Action : 3.1024368231047053
Health & Fitness : 3.079873646209398
Photography : 2.944494584837555
News & Magazines : 2.7978339350180583
Social : 2.6624548736462152
Travel & Local : 2.3240072202166075
Shopping : 2.2450361010830324
Books & Reference : 2.14350180505415
Simulation : 2.041967509025268
Dating : 1.861462093862813
Arcade : 1.8501805054151597
Video Players & Editors : 1.771209386281586
Casual : 1.7599277978339327
Maps & Navigation : 1.398916967509025
Food & Drink : 1.2409747292418778
Puzzle : 1.1281588447653441
Racing : 0.9927797833935037
Role Playing : 0.9363718411552363
Libraries & Demo : 0.9363718411552363
Auto & Vehicles : 0.9250902527075828


In [37]:
android_category=display_table(android_free, 1)

FAMILY : 18.907942238266926
GAME : 9.724729241877363
TOOLS : 8.46119133574016
BUSINESS : 4.591606498194979
LIFESTYLE : 3.90342960288811
PRODUCTIVITY : 3.8921480144404565
FINANCE : 3.7003610108303455
MEDICAL : 3.5311371841155417
SPORTS : 3.3957581227436986
PERSONALIZATION : 3.3167870036101235
COMMUNICATION : 3.2378158844765483
HEALTH_AND_FITNESS : 3.079873646209398
PHOTOGRAPHY : 2.944494584837555
NEWS_AND_MAGAZINES : 2.7978339350180583
SOCIAL : 2.6624548736462152
TRAVEL_AND_LOCAL : 2.335288808664261
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.14350180505415
DATING : 1.861462093862813
VIDEO_PLAYERS : 1.7937725631768928
MAPS_AND_NAVIGATION : 1.398916967509025
FOOD_AND_DRINK : 1.2409747292418778
EDUCATION : 1.1620036101083042
ENTERTAINMENT : 0.9589350180505433
LIBRARIES_AND_DEMO : 0.9363718411552363
AUTO_AND_VEHICLES : 0.9250902527075828
HOUSE_AND_HOME : 0.8235559566787015
WEATHER : 0.8009927797833946
EVENTS : 0.7107400722021667
PARENTING : 0.6543321299638993
ART_AND_DESIGN : 0.6

The Android apps show a different trend in top apps with the distribution spaced more evenly than the >50% majority of one app like iOS apps.  The top Android genres are `Tools` (8.5%), `Entertainment` (6.1%), `Education` (5.3%), `Business` (4.6%).<br>
The top Android categories are `Family` (18.9), `Game` (9.%), `Tools` (8.5%).<br>
The Android users trend more towards Lifestyle/Practical purposes as opposed to gaming.  Games is still a top Android app but not the vast majority.  Further exploration of the games, entertainment, and education genres will help to support which apps to focus attention.

#### Users
We will find out which genres have the most users to determine popularity.  We will use the data from `Installs` in the android dataset and `rating_count_tot` in the iOS dataset.  Both at index 5.<br>
We will calculate the average number of user ratings per app genre on the App Store:
* Isolate apps of each genre
* Sum up user ratings for the apps of that genre
* Divide sum by the number of apps belonging to that genre

In [66]:
#iOS

ios_freq=freq_table(ios_free,11)

for genre in ios_freq:
    total=0
    len_genre=0
    for app in ios_free:
        genre_app=app[11]
        
        if genre_app == genre:
            ratings=float(app[5])
            total+=ratings
            len_genre+=1

    average_ratings=total/len_genre
    print(genre,':',average_ratings)


Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


Based on number of ratings the `Navigation` genre has the highest average of number of user ratings at 86090 followed by `Reference` with 74942 and `Social Networking` with 71548 number of user ratings. This varies drastically from the 0.2% of the genres downloaded.  The top two app genres are `Games` and `Entertainment` which fall in the lower middle of the pack for users.

In [65]:
#android number of installs
android_installs=display_table(android_free, 5)

1,000,000+ : 15.726534296029072
100,000+ : 11.552346570397244
10,000,000+ : 10.548285198556075
10,000+ : 10.198555956678813
1,000+ : 8.393501805054239
100+ : 6.915613718411619
5,000,000+ : 6.82536101083039
500,000+ : 5.561823104693188
50,000+ : 4.772111913357437
5,000+ : 4.512635379061404
10+ : 3.5424187725631953
500+ : 3.249097472924202
50,000,000+ : 2.3014440433213004
100,000,000+ : 2.1322202166064965
50+ : 1.9178700361010799
5+ : 0.7897111913357411
1+ : 0.5076714801444041
500,000,000+ : 0.2707581227436822
1,000,000,000+ : 0.22563176895306852
0+ : 0.04512635379061372
0 : 0.01128158844765343


The install numbers are not precise enough and most are open ended.  For example, we don't know if 100,000+ is 100,000 or 200,000, etc.  Since we don't need perfect precision here we will clean the data and let 100,000+ be 100,000 and so on.  Below we will create a function to clean the strings.<br>
We will use the `Category` column to get the unique genres.

In [75]:
#using android_category
android_freq=freq_table(android_free,1)
for genre in android_freq:
    total=0
    len_category=0
    for app in android_free:
        category_app=app[1]
        if category_app == genre:
            installs=app[5]
            installs=installs.replace('+','')
            installs=installs.replace(',','')
            total+=float(installs)
            len_category+=1
    average_installs=total/len_category
    print(genre,':',average_installs)
 

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

The `COMMUNICATION` genre has the largest average installs with over 38 million.  `SOCIAL` and `VIDEO_PLAYERS` fall a close second and third with over 23 million.<br> Any of these three genres would be a good place to focus.<br>
Comparing the top number of user ratings for iOS and number of installs for google play, the genre that ranks high for both is Social Networking/Social.<br>
Before making a suggestion for a spefic genre we should look at the results on a more detailed level by the names of the apps and see if the data is skewed.  For example, giants like Facebook likely account for the high `Social` values and app like Messenger and WhatsApp could skew the communication data.

#### Apps per genre with highest average number of user ratings/installs
* iOS top genres:  `Navigation` (86090), `Reference` (74942),`Social Networking` (71548).
* Android top genres:  `COMMUNICATION` (38456119), `SOCIAL` (23253652), `VIDEO_PLAYERS` (24727872)

In [83]:
#iOS
print('Navigation','\n',10*'-')
for app in ios_free:    
    if app[11]=='Navigation':        
        print(app[1],':',app[5])        

Navigation 
 ----------
Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


Although `Navagation` has the highest aver number of ratings it is heavily skewed to Waze and Google Maps which have close to 500,000 ratings.  This makes `Navagation` seem more popular than it actually is.  We will see if `Reference` and `Social Networking` are also heavily skewed by industry giants.

In [84]:
print('Reference','\n',10*'-')
for app in ios_free:    
    if app[11]=='Reference':        
        print(app[1],':',app[5])

Reference 
 ----------
Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


In the `Reference` genre, Bible and Dictionary.com dominate the apps.  We could look into some of the other popular apps, like Muslim Pro and see if there is opporunity to create addons for those apps.

In [85]:
print('Social Networking','\n',10*'-')
for app in ios_free:    
    if app[11]=='Social Networking':        
        print(app[1],':',app[5])

Social Networking 
 ----------
Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Wa

Giants like Facebook, Pinterest, and Skype dominate the `Social Networking` genre.  This genre also seems more popular than it actually is.<br>
Let us look at the Google Play apps and confirm the expecttion of the same trend where certains industry giants like Facebook, WhatsApp, etc. are dominating the app market.

In [97]:
#android
print('COMMUNICATION','\n',10*'-')
for app in android_free:    
    if app[1]=='COMMUNICATION':        
        print(app[0],':',app[5]) 

COMMUNICATION 
 ----------
WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Calls & Tex

In [98]:
print('COMMUNICATION with over 100,000,000+ installs','\n',10*'-')
for app in android_free:    
    if app[1]=='COMMUNICATION' and (app[5]=='100,000,000+' or
                                    app[5]=='500,000,000+' or
                                    app[5]=='1,000,000,000+'):        
        print(app[0],':',app[5])

COMMUNICATION with over 100,000,000+ installs 
 ----------
WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser

If we remove these app the average would decrease significantly. 

In [99]:
under_100_m = []
for app in android_free:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
print('Including apps with over 100,000,000 installs 38456119')        
print('Under 100,000,000 installs',sum(under_100_m) / len(under_100_m))
print('Difference', 38456119 - 3603485)

Including apps with over 100,000,000 installs 38456119
Under 100,000,000 installs 3603485.3884615386
Difference 34852634


In [100]:
#android
print('SOCIAL','\n',10*'-')
for app in android_free:    
    if app[1]=='SOCIAL':        
        print(app[0],':',app[5]) 

SOCIAL 
 ----------
Facebook : 1,000,000,000+
Facebook Lite : 500,000,000+
Tumblr : 100,000,000+
Social network all in one 2018 : 100,000+
Pinterest : 100,000,000+
TextNow - free text + calls : 10,000,000+
Google+ : 1,000,000,000+
The Messenger App : 1,000,000+
Messenger Pro : 1,000,000+
Free Messages, Video, Chat,Text for Messenger Plus : 1,000,000+
Telegram X : 5,000,000+
The Video Messenger App : 100,000+
Jodel - The Hyperlocal App : 1,000,000+
Hide Something - Photo, Video : 5,000,000+
Love Sticker : 1,000,000+
Web Browser & Fast Explorer : 5,000,000+
LiveMe - Video chat, new friends, and make money : 10,000,000+
VidStatus app - Status Videos & Status Downloader : 5,000,000+
Love Images : 1,000,000+
Web Browser ( Fast & Secure Web Explorer) : 500,000+
SPARK - Live random video chat & meet new people : 5,000,000+
Golden telegram : 50,000+
Facebook Local : 1,000,000+
Meet – Talk to Strangers Using Random Video Chat : 5,000,000+
MobilePatrol Public Safety App : 1,000,000+
💘 WhatsLov: 

In [101]:
under_100_m = []
for app in android_free:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'SOCIAL') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
print('Including apps with over 100,000,000 installs 23253652')        
print('Under 100,000,000 installs',sum(under_100_m) / len(under_100_m))
print('Difference', 23253652 - (sum(under_100_m) / len(under_100_m)))

Including apps with over 100,000,000 installs 23253652
Under 100,000,000 installs 3084582.5201793723
Difference 20169069.479820628


In [102]:
#android
print('VIDEO PLAYERS','\n',10*'-')
for app in android_free:    
    if app[1]=='VIDEO_PLAYERS':        
        print(app[0],':',app[5]) 

VIDEO PLAYERS 
 ----------
YouTube : 1,000,000,000+
All Video Downloader 2018 : 1,000,000+
Video Downloader : 10,000,000+
HD Video Player : 1,000,000+
Iqiyi (for tablet) : 1,000,000+
Video Player All Format : 10,000,000+
Motorola Gallery : 100,000,000+
Free TV series : 100,000+
Video Player All Format for Android : 500,000+
VLC for Android : 100,000,000+
Code : 10,000,000+
Vote for : 50,000,000+
XX HD Video downloader-Free Video Downloader : 1,000,000+
OBJECTIVE : 1,000,000+
Music - Mp3 Player : 10,000,000+
HD Movie Video Player : 1,000,000+
YouCut - Video Editor & Video Maker, No Watermark : 5,000,000+
Video Editor,Crop Video,Movie Video,Music,Effects : 1,000,000+
YouTube Studio : 10,000,000+
video player for android : 10,000,000+
Vigo Video : 50,000,000+
Google Play Movies & TV : 1,000,000,000+
HTC Service － DLNA : 10,000,000+
VPlayer : 1,000,000+
MiniMovie - Free Video and Slideshow Editor : 50,000,000+
Samsung Video Library : 50,000,000+
OnePlus Gallery : 1,000,000+
LIKE – Magic Vi

We see the same trend with the Google Play apps.  We should should into other genres that are not dominated by these giants to see if we can find a viable niche to increase revenue where the competition is not dominated by these industry giants.<br>. The iOS market is is geared todards the gaming genre so it is of interest to explore a more pratical, educational type genre like reference.  We noticed that there may be some potential in that genre.  We will look at the Google Play genre `BOOKS_AND REFERENCE` to infer whether this has potential.
There are other popular genres including finance, health, weather but these would require expertise outside of the realm of this project.

In [103]:
print('BOOKS_AND_REFERENCE','\n',10*'-')
for app in android_free:    
    if app[1]=='BOOKS_AND_REFERENCE':        
        print(app[0],':',app[5]) 

BOOKS_AND_REFERENCE 
 ----------
E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,

In [104]:
print('BOOKS_AND_REFERENCE with over 100,000,000+ installs','\n',10*'-')
for app in android_free:    
    if app[1]=='BOOKS_AND_REFERENCE' and (app[5]=='100,000,000+' or
                                    app[5]=='500,000,000+' or
                                    app[5]=='1,000,000,000+'):        
        print(app[0],':',app[5])

BOOKS_AND_REFERENCE with over 100,000,000+ installs 
 ----------
Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


In [108]:
print('Apps with less than 100,000,000 installs','\n',10*'-')
under_100_m = []
for app in android_free:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'BOOKS_AND_REFERENCE') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        print(app[0],':',app[5])

Apps with less than 100,000,000 installs 
 ----------
E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC H

### Conclusion
The goal was to determine an area to introduce apps that may be profitable for borh Google Play and the App Store.  The genres that appear most popular (social, gaming, communication,etc.) are heavily influenced by industry giants like Facebook, WhatsApp, Google and thus yield higher installs/average number of ratings skewing the genres to appear more popular than others.  Other popular genres are outside the scope of this project and may require hiring ouside expertise.  These incluse finance, health & fitness, and weather.  The reference genre, although dominated by bible and dictionary.com, shows promise to be profitable in both markets.
There are a lot of reference book around language and religion in the Google Play `BOOKS_AND_REFERENCE` genre.  We could look into developing book apps for both markets.  Possibilities include creating audio books for the popular books, quizzes, built in translations for religious books, prayer time apps with alarms and translations, or quotes.