# Dataquest Python Introduction Project: Profitable App Profiles for the App Store and Google Play Markets
---

In this project I act as a data analsyt for a company that builds Android an iOS apps. The apps we build are available on both the Google Play and the App Store for the enlish-speaking market, and are free to download. The company's main revenue source is in-app ads. Therefore, the revenue for any given app is primarily governed by the number of users for that app.

The goal of this project is to enable app developers to understand what types of apps are likely to attract more app users. The intention is that developers are able to leverage this analysis to build apps which will attract more users, and hence generate increased per-app advertising revenue.

## Create funtion `import_data`
- Requires file path as input parameter `file_path`
- Returns .csv data as a list-of-lists variable

In [1]:
def import_data(file_path, encoding = "utf8"):
    opened_file = open(file_path, encoding = encoding)
    from csv import reader
    read_file = reader(opened_file)
    return list(read_file)


## Create function `explore_data`
- Takes 4 input parameters:
    - `dataset` is the list of lists to be explored
    - `start` and `end` are integers representing start and end index numbers of the slice to be explored
    - `rows_and_columns` requires a boolean argument. If `True`, then the number of rows and columns of `dataset` is printed (default is `False`)
- This function slices the data set and prints as separated rows.

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

## Retrieve data from .csv files
- Data is retrieved from the following source:
    - App Store data from 2017 for ~7,000 iOS apps: [link][1]
    - Play Store data from 2018 for ~10,000 Android apps: [link][2]
- Reads filepaths into function `import_data`
- Saves the returned list-of-lists as variables:
    - `AppStore_data`
    - `PlayStore_data`
    
[1]: https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps
[2]: https://www.kaggle.com/lava18/google-play-store-apps

In [3]:
AppStore_file = r"C:\Users\tomel\Documents\Programming\Tutorials\Dataquest\Guided Project 1\AppleStore.csv"
PlayStore_file = r"C:\Users\tomel\Documents\Programming\Tutorials\Dataquest\Guided Project 1\googleplaystore.csv"

AppStore_data = import_data(AppStore_file)
PlayStore_data = import_data(PlayStore_file)

Initial exploration of the data sets enabled the identification of column headers and the associated indices, as shown in the following tables:

<h3><center>iOS App Store data</center></h3>


|Index No.|Header|Example Data
|:---------|:------|:---
|0|`'id'`|`'284882215'`
|1|`'track_name'`|`'Facebook'`
|2|`'size_bytes'`|`'389879808'`
|3|`'currency'`|`'USD'`
|4|`'price'`|`'0.0'`
|5|`'rating_count_total'`|`'2974676'`
|6|`'rating_count_ver'`|`'212'`
|7|`'user_rating'`|`'3.5'`
|8|`'user_rating_ver'`|`'3.5'`
|9|`'ver'`|`'95.0'`
|10|`'cont_rating'`|`'4+'`
|11|`'prime_genre'`|`'Social Networking'`
|12|`'sup_devices.num'`|`'37'`
|13|`'ipad_urls.num'`|`'1'`
|14|`'long.num'`|`'29'`
|15|`'vpp_lic'`|`'1'`


<h3><center>Play Store data</center></h3>

|Index No.|Header|Example Data
|:---------|:------|:---
|0|`'App'`|`'Photo Editor & Candy Camera & Grid & ScrapBook'`
|1|`'Category'`|`'ART_AND_DESIGN'`
|2|`'Rating'`|`'4.1'`
|3|`'Reviews'`|`'159'`
|4|`'Size'`|`'19M'`
|5|`'Installs'`|`'10,000+'`
|6|`'Type'`|`'Free'`
|7|`'Price'`|`'0'`
|8|`'Content Rating'`|`'Everyone'`
|9|`'Genres'`|`'Art & Design'`
|10|`'Last Updated'`|`'January 7, 2018'`
|11|`'Current Ver'`|`'1.0.0'`
|12|`'Android Ver'`|`'4.0.3 and up'`



## Data Cleaning

### 1. Play Store data entry no. 10473 (Life Made WI-Fi Touchscreen Photo Frame)
A quick check of the Kaggle discussion board for the Google Play data source highlights that entry no. 10473 (Life Made WI-Fi Touchscreen Photo Frame) is missing a category, and hence a column shift has happened.

If this entry still exists in position 10473 then it is deleted here.

In [4]:
if PlayStore_data[10473][0] == 'Life Made WI-Fi Touchscreen Photo Frame':
    del PlayStore_data[10473]

### 2. Check for duplicates in each dataset

In both the `AppStore_data` and `PlayStore_data` data sets there are duplicate entries. 

The function `duplicate_app_checker` prints the number of duplicate entries for a set and requires input parameters:

1. `dataset` : the variable name for the data set to analyse (i.e. `AppStore_data` or `PlayStore_data`)
2. `comp_col` : the index number used as the identifier for duplicate cells. For `AppStore_data` use `1`; for `PlayStore_data` use `0`.

    

In [5]:
def duplicate_app_checker(dataset, comp_col):
    if dataset == AppStore_data:
        datasetname = "AppStore_data"
    else:
        datasetname = "PlayStore_data"
    
    unique_apps = [] 
    duplicate_apps = []
    
    for app in dataset: 
        app_name = app[comp_col] 

        if app_name not in unique_apps:
            unique_apps.append(app_name)
        else:
            duplicate_apps.append(app_name)


    if len(duplicate_apps) != 0:
        print ("DUPLICATE APPS IN ", datasetname, ": (count = " , len(duplicate_apps), ")")
        if len(duplicate_apps) > 3:
            print (duplicate_apps[0:3], "...", len(duplicate_apps)-3, """more entries.
             """)
        else:
            print (duplicate_apps, """
             """)

    else:
        print("""There are no duplicate apps in 
         """, datasetname)
        
    return duplicate_apps

AppStore_duplicates = duplicate_app_checker(AppStore_data, 1)
PlayStore_duplicates = duplicate_app_checker(PlayStore_data, 0)

DUPLICATE APPS IN  AppStore_data : (count =  2 )
['Mannequin Challenge', 'VR Roller Coaster'] 
             
DUPLICATE APPS IN  PlayStore_data : (count =  1181 )
['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business'] ... 1178 more entries.
             


### 3. Remove duplicate apps

Duplicate apps need to be removed from both lists. However, the entry that must be retained should contain the most up-to-date data. Two apps are compared here to find the most up-to-date entry by comparing `rating_count_total` for `AppStore_data` and `Reviews` for `PlayStore_data`. The entry with the higher number of reviews (i.e. `rating_count_total`/`Reviews`) is the most up-to-date version.

The function `remove_duplicates` returns a cleaned list-of-lists with non-up-to-date versions of apps removed.
This function takes 3 parameters:
1. `dataset` : the variable name for the data set to analyse (i.e. `AppStore_data` or `PlayStore_data`)
2. `name_index` : the index number of the app name. Use `1` for `AppStore_data` and `0` for `PlayStore_data`.
3. `reviews_index` : the index number used to assess the number of reviews. Use `5` for `AppStore_data` and `3` for `PlayStore_data`.

The function creates a dictionary `reviews_max` which takes a unique app name as the key and the maximum number of reviews for any entry with that unique app name as the dictionary value.

The function then loops through the data set again, and appends all the entire app entry to a new list (`clean_list`) if both of the following criteria are met:
- The number of reviews for an entry is equal to the dictionary value for that app in `reviews_max`
- An entry for that app has not already been added to `clean_list`

The cleaned data sets are stored as lists of lists in the following variables:

App Store | Variable name
:---|:---
iOS App Store | `ios_non_dup`
Play Store| `play_non_dup`

In [6]:
def remove_duplicates(dataset, name_index, reviews_index):

    reviews_max = {}
    for app in dataset[1:]:
        name = app[name_index]
        n_reviews = float(app[reviews_index])
        
        if name in reviews_max and reviews_max[name] < n_reviews:
            reviews_max[name] = n_reviews
            
        elif name not in reviews_max:
            reviews_max[name] = n_reviews    
    
    clean_list = []
    already_added = []
    
    for app in dataset[1:]:
        name = app[name_index]
        n_reviews = float(app[reviews_index])
        
        if n_reviews == reviews_max[name] and name not in already_added:
            clean_list.append(app)
            already_added.append(name)
    return clean_list
        
ios_non_dup = remove_duplicates(AppStore_data, 1, 5)
play_non_dup = remove_duplicates(PlayStore_data, 0, 3)

### 4. Remove non-English language apps

The function `lang_idef` requires a string as an input parameter, and returns the variable `is_eng` with a boolean value of `True` if less than 3 characters in the string have a corresponding number over 127.
According to the ASCII (American Standard Code for Information Interchange) system, characters commonly used in the English language all have corresponding numbers in the range 0-127.

However, some characters, such as emojis, have corresponding numbers greater than 127. In order to retain as many English language apps as possible that contains characters like emojis, only app names containing 3 or more characters with corresponding numbers greater than 127 were removed. This is not a completely fool-proof technique, but efficiently ensures the data used will be of a high quality.

In [7]:
def lang_idef(string):
    foreign_char = 0
    is_eng = True
    
    for character in string:
        if ord(character) > 127:
            foreign_char += 1
            
            if foreign_char >= 3:
                is_eng = False
                
                return is_eng
        
    return is_eng

ios_clean = []
play_clean = []

for app in ios_non_dup:
    if lang_idef(app[1]) == True:
        ios_clean.append(app)
        
for app in play_non_dup:
    if lang_idef(app[0]) == True:
        play_clean.append(app)

### 5. Isolate only free apps

Because our company builds only free apps this analysis will comprise of a dataset of only free apps.
Free apps are identified by the function `idef_free`, which takes 2 input parametersL
1. `dataset` : the variable name for the data set to analyse (i.e. `ios_clean` or `play_clean`)
2. `price_col` : the index value in the input data set used to identify if app price. Use `4` for `ios_clean` and `6` for `play_clean`.

For the `ios_clean` data set, the price is identified if the value in `price_col` position is equal to `0`.
For the `play_clean`data set, the price is identified if the string in `price_col` is equal to `free`.

The list-of-lists of clean data for free apps in each data set is stored in the following variables:

App Store | Variable name
:---|:---
iOS App Store | `ios_free`
Play Store| `play_free`

In [8]:
def idef_free(dataset, price_col):
    
    free_list = []
    
    for app in dataset:
        
        if dataset == ios_clean:
            price = float(app[price_col])
            if price == 0:
                free_list.append(app)
                
        elif dataset == play_clean:
            if app[6] == "Free":
                free_list.append(app)            
            
    return free_list

ios_free = idef_free(ios_clean, 4)
play_free = idef_free(play_clean, 6)

## Validation Strategy

The company's validation strategy for building new apps follows three steps:
1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Hence, it is key that this analysis covers apps on both the Play store and the iOS App store in order to identify app profiles which will likely be successful on both markets.

## Common app genres by market

The following section creates frequency tables for the genres of apps on both the iOS App Store and the Play Store.

The function `freq_table` creates a frequency table for the data set `dataset` for the entries found by index `index`.

The function `display_table` prints the contents of the input parameter `ft` (which should be the variable name for the frequency table returned by `freq_table`) in descending order.

These functions are used to generate the following frequency tables for the app genre in the iOS App Store (`ios_prime_genre_ft`), the app category in the Play Store (`play_category_ft`) and the app genres in teh Play Store (`play_genres_ft`).

In [59]:
def freq_table(dataset, index):
    freq_dict = {}
    
    for app in dataset:
        if app[index] in freq_dict:
            freq_dict[app[index]] += 1
        else:
            freq_dict[app[index]] = 1
    
    for item in freq_dict:
        freq_dict[item] = round((freq_dict[item]/len(dataset))*100, 2)

    return freq_dict


def display_table(ft):
    table_display = []
    for key in ft:
        key_val_as_tuple = (ft[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
    return table_sorted
 

print ('\033[1m' + """ 
ios_prime_genre_ft
 """ + '\033[0m')
ios_prime_genre_ft = freq_table(ios_free, 11)
ios_prime_genre_ft = display_table(ios_prime_genre_ft)

print ('\033[1m' + """ 
play_category_ft
 """+ '\033[0m')
play_category_ft = freq_table(play_free, 1)
play_category_ft = display_table(play_category_ft)

print ('\033[1m' + """ 
play_genres_ft
 """+ '\033[0m')
play_genres_ft = freq_table(play_free, 9)
play_genres_ft = display_table(play_genres_ft)

[1m 
ios_prime_genre_ft
 [0m
Games : 58.23
Entertainment : 7.84
Photo & Video : 5.0
Education : 3.69
Social Networking : 3.31
Shopping : 2.59
Utilities : 2.47
Sports : 2.16
Music : 2.06
Health & Fitness : 2.03
Productivity : 1.75
Lifestyle : 1.56
News : 1.34
Travel : 1.25
Finance : 1.09
Weather : 0.87
Food & Drink : 0.81
Reference : 0.53
Business : 0.53
Book : 0.37
Navigation : 0.19
Medical : 0.19
Catalogs : 0.12
[1m 
play_category_ft
 [0m
FAMILY : 18.93
GAME : 9.7
TOOLS : 8.45
BUSINESS : 4.6
PRODUCTIVITY : 3.9
LIFESTYLE : 3.89
FINANCE : 3.71
MEDICAL : 3.54
SPORTS : 3.39
PERSONALIZATION : 3.32
COMMUNICATION : 3.23
HEALTH_AND_FITNESS : 3.09
PHOTOGRAPHY : 2.95
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.67
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.25
BOOKS_AND_REFERENCE : 2.14
DATING : 1.87
VIDEO_PLAYERS : 1.8
MAPS_AND_NAVIGATION : 1.39
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16
ENTERTAINMENT : 0.96
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.93
HOUSE_AND_HOME : 0.8
WEATHER : 0.79
EVENTS : 0.71
P

### a) iOS App Store genre frequency analysis

"Games" are overwhelmingly the most common genre of free, english language app on the iOS App Store, accounting for over half of these apps (58.23%). The next most common genre is "Entertainment", accounting for only 7.84%. The remaining genres of app each account for 5% apps that meet this criteria.

Genres that fall in the wider category of entertainment (e.g. games, photo and video, social networking, sports, music) acccount for 6 of the top 9 most populated app categories. The number of apps that fall into the general category of practical/utility apps is far less, and only 3 of the top 9 ranking categories belong to this domain - the most populous being Education (3.69%), then "Shopping" (2.59%), then "Utilities" (2.47%).

However, the insight provided by this data is limited. Simply because a genre of app is more widely available, it does not imply that these apps have the most users or are most frequently used. If anything, it could be argued that the market for Game-based apps is saturated, and that a percentage of users within any given genre might be achieved by targetting an alternative genre. However, further analysis is needed to confirm the number of users per genre.

### b) Play Store genre and category frequency analysis

The most common category of app in the Play Store is "Family", accounting for 18.93% of the number of apps and almost twice as common as the next most common category, "Game" (9.7%). However, 8 of the next 10 most common categories of apps (ranging from 3-9%) belong to the wider category of practical/utility apps.

Within the genres frequency table for hte Play store, one might initially suppose that there is a more even spread of app types. "Tools", "Entertainment", "Education" and "Business" are the most common four genres, each accounting for between 4.5-8.5% of the total number of apps. 

However, this frequency table is less clear, as several apps are labelled as haveing multiple "genres"- for example "Puzzle;Action & Adventure". There is also ambiguity over genre labels, for example "Educational" and "Education" both describe the same genre of app.

Similarly to the iOS Store frequency tables, the insight provided by these tables are limited, and further analysis should be performed to assess the number of users and frequency of use of each app category. The data for Play Store genre is so messy, and it offers little more insight than the Play Store category frequency table. Therefore, it is proposed that the Play Store genre frequency table is ignored in this analysis.

## iOS App Store mean ratings per app genre

As an indicator of the number of users any given app on the iOS App Store the total number of user ratings (`rating_count_total`) is used. In order to achieve a normalised comparison between app genres, the mean number of user ratings per genre is calculated and stored in dictionary `mean_gen_ratings`. The function `display_table` generated earlier is then used on this dictionary to print the mean number of ratings per app genre in descending order.

In [71]:
tot_app_ratings = {}
len_genre = {}
genre_list = []
for app in ios_free:
    
    if app[11] in len_genre:
        len_genre[app[11]] += 1
    else:
        len_genre[app[11]] = 1
        genre_list.append(app[11])
    
    if app[11] in tot_app_ratings:
        tot_app_ratings[app[11]] += float(app[5])
    else:
        tot_app_ratings[app[11]] = float(app[5])

mean_gen_ratings = {}
for genre in genre_list:
    mean_gen_ratings[genre] = round(tot_app_ratings[genre]/len_genre[genre], 2)

ios_maean_gen_ratings = display_table(mean_gen_ratings)   

Navigation : 86090.33
Reference : 79350.47
Social Networking : 71548.35
Music : 57326.53
Weather : 52279.89
Book : 46384.92
Food & Drink : 33333.92
Finance : 32367.03
Photo & Video : 28441.54
Travel : 28243.8
Shopping : 27230.73
Health & Fitness : 23298.02
Sports : 23008.9
Games : 22910.83
News : 21248.02
Productivity : 21028.41
Utilities : 19156.49
Lifestyle : 16815.48
Entertainment : 14195.36
Business : 7491.12
Education : 7003.98
Catalogs : 4004.0
Medical : 612.0


According to the mean number of user ratings by genre shown here, the three most-rated genre of app are:
1. Navigation
2. Reference
3. Social Networking

Each of these app-genres have a mean number of ratings over 70,000. This is a strong indication that these genres have a high number of users. However, it may be the case that users of certain app-types (e.g. Navigation apps) and more likely to leave a review than users of other app types (e.g. Games).

It could also be the case that, although these apps-types have a very high number of reviews, the frequency of use by users may be very low. In order to achieve a high level of ad-traffic on apps built by our company, we want to target a genre of app which has a high frequency of use by users. For example, although there are a high number of ratings for ratings for 
"Navigation" apps, these types of apps may be used far less frequently than "Game" apps. Unfortunately the data used for the iOS App Store does not provide this kind of information, so the number of user ratings must suffice.

A more detailed breakdown of the number of user ratings per app for the top 3 genres is given below to give more transparency for this data.

In [77]:
print ('\033[1m' + """ 
"Navigation"
 """ + '\033[0m')

for app in ios_free:
    if app[11] == "Navigation":
        print (app[1], ":", app[5])
        
print ('\033[1m' + """ 
"Reference"
 """ + '\033[0m')
for app in ios_free:
    if app[11] == "Reference":
        print (app[1], ":", app[5])
        
print ('\033[1m' + """ 
"Social Networking"
 """ + '\033[0m')
for app in ios_free:
    if app[11] == "Social Networking":
        print (app[1], ":", app[5])

[1m 
"Navigation"
 [0m
Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5
[1m 
"Reference"
 [0m
Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for

Having this detailed breakdown allows a more insightful analysis. Although the "Navigation" genre has the highest mean number of user ratings, we can see here that the navigation market is actually dominated by 2 apps: "Waze" and "Google Maps". All other apps in this category receive a relatively low number of reviews. We can draw from this that gaining a foothold in the free, English speaking "Navigation" app market is challenging.

This is also true for the "Reference" genre of apps, although less extreme. The high average number of ratings for this genre is mostly accounted for by 2 apps: "Bible" and "Dictionary.com". However, in this category there are a slightly higher number of apps in the range 10,000 - 100,000 user ratings. This indicates it is more achievable to build an "Reference" app that will draw a substantial number of users and generate a solid revenue stream.

The breakdown for the "Social Networking" category shows a considerably different pattern. There are far more apps of this genre available for free in the English language on the iOS App Store. The top 2 apps by number of ratings ("Facebook" and "Pinterest") have a huge number of user ratings, exceeding 1M. However, there are many more apps with between 10,000-500,000 user ratings. This indicates that many app builders are finding success in creating Social Networking apps that draw a high number of users. In addition to this, it is important to consider the contextual understanding that the use of Social Networking apps is not mutually exclusive: i.e. it is common for individuals to be members of and use multiple different social media apps. This may not be true for other genres, e.g. Navigation apps.

## Play Store mean ratings per app genre

As an indicator of the number of users any given app on the Play Store the lower limit of the total app installs (`Installs`) is used. The `Installs` data listed in the Play Store data set is in the form `1,000,000+`. For each entry we will remove commas and the trailing + sign, hence the data point used for this entry would be `1000000`.

In order to achieve a normalised comparison between app categories, the mean number of installs per category is calculated and stored in dictionary `mean_cat_installs`. The function `display_table` generated earlier is then used on this dictionary to print the mean number of installs per category in descending order.

In [104]:
tot_app_installs = {}
len_cat = {}
cat_list = []

for app in play_free:
    app_installs = app[5]
    app_installs = app_installs = app_installs.replace(',', '')
    app_installs = app_installs.replace('+', '')
    app_installs = float(app_installs)
    
    if app[1] in len_cat:
        len_cat[app[1]] += 1
    else:
        len_cat[app[1]] = 1
        cat_list.append(app[1])
        
    if app[1] in tot_app_installs:
        tot_app_installs[app[1]] += app_installs
    else:
        tot_app_installs[app[1]] = app_installs
        
mean_cat_installs = {}
for cat in cat_list:
    mean_cat_installs[cat] = round(tot_app_installs[cat]/len_cat[cat], 2)

play_mean_cat_installs = display_table(mean_cat_installs) 

COMMUNICATION : 38590581.09
VIDEO_PLAYERS : 24727872.45
SOCIAL : 23253652.13
PHOTOGRAPHY : 17840110.4
PRODUCTIVITY : 16787331.34
GAME : 15544014.51
TRAVEL_AND_LOCAL : 13984077.71
ENTERTAINMENT : 11640705.88
TOOLS : 10830251.97
NEWS_AND_MAGAZINES : 9549178.47
BOOKS_AND_REFERENCE : 8814199.79
SHOPPING : 7036877.31
PERSONALIZATION : 5201482.61
WEATHER : 5145550.29
HEALTH_AND_FITNESS : 4188821.99
MAPS_AND_NAVIGATION : 4049274.63
FAMILY : 3697848.17
SPORTS : 3650602.28
ART_AND_DESIGN : 1986335.09
FOOD_AND_DRINK : 1924897.74
EDUCATION : 1833495.15
BUSINESS : 1712290.15
LIFESTYLE : 1446158.22
FINANCE : 1387692.48
HOUSE_AND_HOME : 1360598.04
DATING : 854028.83
COMICS : 832613.89
AUTO_AND_VEHICLES : 647317.82
LIBRARIES_AND_DEMO : 638503.73
PARENTING : 542603.62
BEAUTY : 513151.89
EVENTS : 253542.22
MEDICAL : 120550.62


This analysis shows that for free, English-language apps on the Play Store, the app categories which have the highest mean number of installs are:
1. Communication
2. Video Players
3. Social

We can also see that on average, "Communication" apps have over 10 million more installs than "Video Players" or "Social" apps.

However, there was "Communication" genre on the iOS app store, so in order to draw correllations between data sets we need to know what apps comprise the "Communications" category.

Below are printed all apps on the Play Store that have over 1 billion installs, to see the major players in each category.

In [112]:
print ('\033[1m' + """ 
Apps with 1B+ installs
 """ + '\033[0m')

for app in play_free:
    app_installs = app[5]
    app_installs = app_installs = app_installs.replace(',', '')
    app_installs = app_installs.replace('+', '')
    app_installs = float(app_installs)
    if app_installs >= 1000000000:
        print (app[1], app[0], ":", app[5])

[1m 
Apps with 1B+ installs
 [0m
BOOKS_AND_REFERENCE Google Play Books : 1,000,000,000+
COMMUNICATION WhatsApp Messenger : 1,000,000,000+
COMMUNICATION Messenger – Text and Video Chat for Free : 1,000,000,000+
COMMUNICATION Skype - free IM & video calls : 1,000,000,000+
COMMUNICATION Google Chrome: Fast & Secure : 1,000,000,000+
COMMUNICATION Gmail : 1,000,000,000+
COMMUNICATION Hangouts : 1,000,000,000+
GAME Subway Surfers : 1,000,000,000+
SOCIAL Facebook : 1,000,000,000+
SOCIAL Google+ : 1,000,000,000+
SOCIAL Instagram : 1,000,000,000+
PHOTOGRAPHY Google Photos : 1,000,000,000+
TRAVEL_AND_LOCAL Maps - Navigate & Explore : 1,000,000,000+
TRAVEL_AND_LOCAL Google Street View : 1,000,000,000+
TOOLS Google : 1,000,000,000+
PRODUCTIVITY Google Drive : 1,000,000,000+
VIDEO_PLAYERS YouTube : 1,000,000,000+
VIDEO_PLAYERS Google Play Movies & TV : 1,000,000,000+
FAMILY Google Play Games : 1,000,000,000+
NEWS_AND_MAGAZINES Google News : 1,000,000,000+


We can see that the category containing apps with the most 1B+ installs is indeed "Communication", with 6 apps:
- "WhatsApp Messenger"
- "Messenger"
- "Skype"
- "Google Chrome"
- "Gmail"
- "Hangouts"

Below we will investigate further to see what genre these apps are listed as on the iOS Store.

In [117]:
print ('\033[1m' + """ 
What genre are the most installed 'Communincation' apps on the Play Store listed as on the iOS App store?
 """ + '\033[0m')
for app in ios_free:
    if app[1] == "WhatsApp Messenger":
        print (app[1], ":", app[11])
    if app[1] == "Messenger":
        print (app[1], ":", app[11])
    if app[1] == "Skype for iPhone":
        print (app[1], ":", app[11])
    if app[1] == "Google Chrome – The Fast and Secure Web Browser":
        print (app[1], ":", app[11])
    if app[1] == 'Inbox by Gmail':
        print (app[1], ":", app[11])
    if app[1] == "Hangouts":
        print (app[1], ":", app[11])

[1m 
What genre are the most installed 'Communincation' apps on the Play Store listed as on the iOS App store?
 [0m
Skype for iPhone : Social Networking
Messenger : Social Networking
WhatsApp Messenger : Social Networking
Google Chrome – The Fast and Secure Web Browser : Utilities
Hangouts : Social Networking
Inbox by Gmail : Productivity


As we can see, most of these Play Store "Communication" apps are listed as "Social Networking" apps on the iOS App Store.
If we assume that this rings true for the majority of Play Store "Communication" apps, this would significantly boost the average number of installs of apps in the "Social" category on the Play Store.

This analysis suggests that Social Media apps are among the most installed apps on the Play Store. Although this doesnt confirm that Social Media apps are used regularly by users, there is no way of inferring this from the Play Store data. As explained earlier, our company doesn't just want to build apps that will yield a high number of installs, it want to also ensure that users will frequently use our apps.

## Conclusions

- Both data sets are extensive, and have been cleaned and filtered to avoid bugs arising, and to only contain free, English-speaking apps in this analysis.

- Neither data sets contain data describing the frequency of use by users. 

- As an indicator of the number of app users for apps on the iOS Store, the mean number of user ratings for apps of a given genre are analysed. This analysis showed that there is considerable scope to build a Social Media app to capture a significant number of users.

- For the Play Store, the lower bound of the total number of installs was used to calculate the mean number of installs per category. The "Communication" category was ranked first for mean app installs, but didnt feature as a genre on the iOS App Store. On further inspection, it was found that many of the most installed apps in the "Communincations" category were described as "Social Media" apps on the iOS App Store. "Social Media" apps ranked third most installed on the Google Play Store.

- As a conclusion, it is social media apps perform well on both the iOS App Store and the Play Store. There is a high number of high-performing social media apps as identified in the analysis of each data set. This is encouraging when considering launching a new social media app on either (or both) platforms. Hence, it is recommended that the company design a social media app for initial release on the Play Store.