# New App Idea - Analyzing Mobile App Data

- This project involves analyzing data to identify which types of mobile apps are most likely to attract a large user base and suggest a MVP idea for an app based on the observed data.

## Introduction

In [3]:
# Reading 2 databases from Excel about Google Play and IOS Apps
import csv
open_apple = open('AppleStore.csv', encoding='utf8')
open_google = open('googleplaystore.csv', encoding='utf8')

ios = csv.reader(open_apple)
data_ios = list(ios)
data_ios_header = data_ios[0]
data_ios = data_ios[1:]

gplay = csv.reader(open_google)
data_gplay = list(gplay)
data_gplay_header = data_gplay[0]
data_gplay = data_gplay[1:]

print(data_ios[0:2])
print(data_gplay[0:2])

[['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']]
[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']]


Below, I create a function with the **def command** to have a quick preview about the data I am using. The function receive 4 parameters:
- dataset -> The database to explore. In this case, a list of lists
- start, end -> the slice of the data to see a preview
- rows_and_columns -> If the data has rows and columns, it is needed to assert with a "True"

Firstly, I create a variable called *dataset_slice* which attributes the start and end date choosen to the dataset. After, I iterate with a ***for loop** the dataset_slice to print the rows that are part of the slice choosen.
In the second part  of the equation, I use an **if statement** to test if the rows_and_columns parameter is active (True). In this case, the function **print the lenght** of the rows and columns. 

The examples below show the results. You are going to see a piece of the data and the numbers of rows and columns of the dataset.

In [5]:
def explore_data(dataset, start, end, rows_and_columns = False):
    dataset_slice = dataset[start:end]
    for i in dataset_slice:
        print(i)
        print('\n') # New empty line after each row
        
    if rows_and_columns == True:
        print('Number of Rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [6]:
explore_data(data_gplay,0,2,True)
explore_data(data_ios, 0, 3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of Rows: 10841
Number of columns: 13
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of Rows: 7197
Number of columns: 16


In [7]:
# The variable below was created just to have a good view about the variables of the study. I did it when I opened the Excel data.
print(data_gplay_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


## 1)Data Cleaning

#### Error Cleaning
**Documentação**

I got this data for the study in Kaggle and in the comment section some people aware that the Google Play dataset has an issue in the line 10472 (for the data without counting the header). Below, I used a **for loop** just to be sure about this information. I ran the **for loop** in *data_gplay* with an **if statement** inside. The **if statement** tests each row of the dataset and if the lenght of one row is smaller then the lenght of the header, thats indicates an error, so, if this difference is found, the entire row is printed and the index (number) of that row also. After that, I use the **del** command to delete that row.

Here is the link for the documentation:
- Google Play: [link](https://www.kaggle.com/datasets/lava18/google-play-store-apps)
- Apple Store: [link](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps)

In [9]:
for i in data_gplay:
    if len(i) != len(data_gplay_header):
        print(i)
        print(data_gplay.index(i))
del data_gplay[10472] # incorrect row

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
10472


#### Duplicates Cleaning

In the code above, I just removed a row with an issue. Now I am going to remove the duplicate app reviews (more then one review for the same app) with a **for loop and an if statement**. 
I am not going to remove the duplicates randomly. I am going the remove the duplicates with less number of reviews, meaning they are the less updated ones.

Firstly, I create 2 empty lists to store in a divided way the duplicate apps and the unique apps. Iterating with a **for loop**, I create a variable *name*, which captures the names in the first column, after that, with a **if else statement**, I test if the *name* already appeared in the dataset. If its true, the name is appended to the *duplicate apps list*. With this, I find out if really there are duplicate apps and the quantity in the dataset.

In [12]:
duplicate_apps = []
unique_apps = []

for i in data_gplay: 
    name = i[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Some duplicate apps:', duplicate_apps[0:5])

Number of duplicate apps: 1181


Some duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']


Now, to filter only one entry per app, I create an empty dictionary which the objective is to create a new data set where each key is an unique app. Again, there is a **for loop** the variable *name* is created and a new variable (*n_reviews*) is used to quantify the number of reviews (using the **float** method). Now, an **if elif statement** is used to add to the key (the app name) the max number of reviews for that app. Basically, if the number of reviews stored is smaller then the new one tested in the **for loop**, a new atributtion is made, otherwise, if the name still not in the dict, the **elif clause** adds this name to the dictionary.

In the end, I test if it worked, because the len of the *gplay_data* less duplicates needs to be equal to the lenght of the new dictionary. 

In [14]:
reviews_max = {}

for i in data_gplay:
    name = i[0]
    n_reviews = float(i[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
print('Expected lenght:', len(data_gplay) - 1181)
print ('Actual lenght:', len(reviews_max))

Expected lenght: 9659
Actual lenght: 9659


Now, I just add the filtered data without duplicates to a new list. One part to call attention here is why I created an *already_added list*. This list was created for the cases where one duplicate app has the same number of reviews in more then one row. So, to be added to the new list, the name cannot be in the *already_added list*, otherwise, the dataset would continue with duplicate entries.

In [16]:
data_gplay_clean = []
already_added = []

for i in data_gplay:
    name = i[0]
    n_reviews = float(i[3])
    
    if (n_reviews == reviews_max[name]) and (name  not in already_added):
        data_gplay_clean.append(i)
        already_added.append(name)

print(data_gplay_clean[0:3])

[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']]


In [17]:
explore_data(data_gplay_clean, 0, 2, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of Rows: 9659
Number of columns: 13


#### Non-English Apps Cleaning

In this challenge, the objective is to do a data analysis in english speaking apps, but in the dataset there are lots of apps which aren't directed for english speakers. This is why I also create the filter below.
" All these characters that are specific to English texts are encoded using the ASCII standard. Each ASCII character has a corresponding number between 0 and 127 associated with it, and we can take advantage of that to build a function that checks an app name and tells us whether it contains non-ASCII characters."

In [20]:
# testing the ascii charcaters
print(ord('a'))

97


Below, I create a function with the **def** command to *return True* if the string does not have more then 3 characters outside the english standard. To do this, I create a variable called *count* as a counter. If the **for loop** finds a character outside of the english standard, it adds +1 to the counter. If the counter exceeds 3, then the function *returns False*.

Why 3 and not 1 character? Because some special characters like emojis are not in the english standard, but can be part of the name of an english app. Of course, it is possible to miss some data with this 3 limit, but is the best way I founf to minimalize the loss.

In [22]:
def test_english(string):
    count = 0    
    for character in string:
        if ord(character) > 127:
            count += 1
    if count > 3:
            return False
    else: 
        return True

print(test_english('inter'))
print(test_english('Docs To Go™ Free Office Suite'))
print(test_english('Instachat 😜'))
    

True
True
True


Now, in a simple way, I create *2 empty lists* to filter only the english apps in the Google Play and Ios datasets and **appended**  the entire row if the created *test_english* function *returns True*

In [24]:
data_gplay_clean_english = []
data_ios_clean_english = []

for i in data_gplay_clean:
    name = i[0]
    
    if test_english(name):
        data_gplay_clean_english.append(i)
    
for i in data_ios:
    name = i[1]
    if test_english(name):
        data_ios_clean_english.append(i)
        

print(data_gplay_clean_english[0:3])
explore_data(data_gplay_clean_english,0,3,True)

print(data_ios_clean_english[0:3])
explore_data(data_ios_clean_english,0,3,True)

[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']]
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644

#### Paid Apps Cleaning

Now, the last step for data cleaning is to filter only for free apps. The 7 column indicates the price, and obviously, if the price is equal to 0, it is a free app (detail: the price number its in string format). **The *2 empty lists created* are the final dataset.**

In [27]:
gplay_cleaned = []
ios_cleaned = []
for i in data_gplay_clean_english:
    price = i[7]
    
    if price == '0':
        gplay_cleaned.append(i)
        
for i in data_ios_clean_english:
    price = i[4]
    if price == '0.0':
        ios_cleaned.append(i)

print(gplay_cleaned[0:3])
print(ios_cleaned[0:3])

[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']]
[['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']]


In [28]:
explore_data(gplay_cleaned,0,3,True)
explore_data(ios_cleaned,0,3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of Rows: 8864
Number of columns: 13
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'G

## Analysing Data

- My objective is to find a mobile app that attracts the largest user base possible. In this way, makes sense to find an app type that fits Google Play and App Store, because it allows the app to have a bigger addressable market. First, I will inspect the columns to find useful variables to determine the most common genres in IOS and Android:

#### Creating Functions to Analyse Data

In [30]:
print(data_gplay_header)
print(data_ios_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In [31]:
# Google: Genres and Category | IOS: prime_genre
def freq_table(dataset, index):
    table_dict = {}
    total = 0
    
    for i in dataset:
        total += 1
        value = i[index]
        if value in table_dict:
            table_dict[value] += 1
            
        else:
            table_dict[value] = 1
        
    table_percentages = {}
    for i in table_dict:
        percentage = (table_dict[i] / total) * 100
        table_percentages[i] = percentage
        
    return table_percentages
    print (table_percentages)    

In [32]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

##### Displaying (big) Tables

In [34]:
display_table(gplay_cleaned, 9) # Genre
print('\n')
display_table(gplay_cleaned, 1) # Category
print('\n')
display_table(ios_cleaned, 11) # Genre

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

#### Comments on Initial Data 

IOS:

The most common app genre for IOS is "Games", which are really predominant (58% of all apps). In the second place, far below, we have entertainment apps. Almost all top genres are designed for fun. With this information, we still can't decide an app profile. Also, the most common genres are also the ones with bigger competition.

GOOGLE:

On the other side, Google Store apps appear to be more focused in work/productivity in both views (Genre or Category)

Now, one way to see the most popular app class with the highest number of install per genre. At Google Play dataset, I already have this information because each row gives the number of install per app, but at the App Store dataset this information does not exist. To do a proxy calculation, I perform a **for loop** and a **nested for loop**. For each genre in ios_genres, I iterate through the entire ios_cleaned dataset. Then I check if the genre of the app matches the current genre in the outer loop. If it does, I extract the number of ratings for that app (n_ratings) and update the total and len_genre variables.

#### Finding Most Popular Genres Per App Population at IOS

In [39]:
ios_genres = freq_table(ios_cleaned, -5)

genre_avg_ratings = []

for genre in ios_genres:
    total = 0 # To store the sum of user ratings (number, not actual)
    len_genre = 0 # To store the genre-specific apps number
    for i in ios_cleaned:
        genre_app = i[-5]
        if genre_app == genre:            
            n_ratings = float(i[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    genre_avg_ratings.append((genre, avg_n_ratings))
    
# Sorting the list of tuples in descending order based on the average ratings
genre_avg_ratings_sorted = sorted(genre_avg_ratings, key=lambda x: x[1], reverse=True)

# Printing the sorted results
for genre, avg in genre_avg_ratings_sorted:
    print(genre, ':', avg)

Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22788.6696905016
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0


##### Visualizing Results for IOS
Below I visualize some of the specific apps in each genre. Later I will compare this results to the Google Play results to decide a good genre to start a venture in both platforms. 

In [41]:
for i in ios_cleaned:
    if i[-5] == 'Navigation':
        print(i[1], ' ' , i[5])

Waze - GPS Navigation, Maps & Real-time Traffic   345046
Google Maps - Navigation & Transit   154911
Geocaching®   12811
CoPilot GPS – Car Navigation & Offline Maps   3582
ImmobilienScout24: Real Estate Search in Germany   187
Railway Route Search   5


Navigation app is clearly extremely concentrated and would make no sense to start a competition with Waze and Google in a small project.

In [43]:
for i in ios_cleaned:
    if i[-5] == 'Reference':
        print(i[1], ' ' , i[5])

Bible   985920
Dictionary.com Dictionary & Thesaurus   200047
Dictionary.com Dictionary & Thesaurus for iPad   54175
Google Translate   26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran   18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition   17588
Merriam-Webster Dictionary   16849
Night Sky   12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE)   8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools   4693
GUNS MODS for Minecraft PC Edition - Mods Tools   1497
Guides for Pokémon GO - Pokemon GO News and Cheats   826
WWDC   762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free   718
VPN Express   14
Real Bike Traffic Rider Virtual Reality Glasses   8
教えて!goo   0
Jishokun-Japanese English Dictionary & Translator   0


Another concentrated category, **but gave me an idea that I will share some cells below.**

In [45]:
for i in ios_cleaned:
    if i[-5] == 'Social Networking':
        print(i[1], ' ' , i[5])

Facebook   2974676
Pinterest   1061624
Skype for iPhone   373519
Messenger   351466
Tumblr   334293
WhatsApp Messenger   287589
Kik   260965
ooVoo – Free Video Call, Text and Voice   177501
TextNow - Unlimited Text + Calls   164963
Viber Messenger – Text & Call   164249
Followers - Social Analytics For Instagram   112778
MeetMe - Chat and Meet New People   97072
We Heart It - Fashion, wallpapers, quotes, tattoos   90414
InsTrack for Instagram - Analytics Plus More   85535
Tango - Free Video Call, Voice and Chat   75412
LinkedIn   71856
Match™ - #1 Dating App.   60659
Skype for iPad   60163
POF - Best Dating App for Conversations   52642
Timehop   49510
Find My Family, Friends & iPhone - Life360 Locator   43877
Whisper - Share, Express, Meet   39819
Hangouts   36404
LINE PLAY - Your Avatar World   34677
WeChat   34584
Badoo - Meet New People, Chat, Socialize.   34428
Followers + for Instagram - Follower Analytics   28633
GroupMe   28260
Marco Polo Video Walkie Talkie   27662
Miitomo   2

Everyone knows the famous social media, but apparently there are a lot of medium small size social medias with success...

In [47]:
for i in ios_cleaned:
    if i[-5] == 'Productivity':
        print(i[1], ' ' , i[5])

Evernote - stay organized   161065
Gmail - email by Google: secure, fast & organized   135962
iTranslate - Language Translator & Dictionary   123215
Yahoo Mail - Keeps You Organized!   113709
Google Docs   64259
Google Drive - free online storage   59255
Dropbox   49578
Microsoft Word   47999
Microsoft OneNote   39638
Microsoft Outlook - email and calendar   32807
Hotspot Shield Free VPN Proxy & Wi-Fi Privacy   32499
Documents 6 - File manager, PDF reader and browser   29110
Google Sheets   24602
Microsoft Excel   24430
Inbox by Gmail   21561
T-Mobile   19977
Paper by FiftyThree - Sketch, Diagram, Take Notes   18219
MyScript Calculator - Handwriting calculator   16555
VPN Proxy Master - Unlimited WiFi security VPN   13674
Microsoft OneDrive – File & photo cloud storage   12797
Ever - Capture Your Memories   12755
Speak & Translate － Voice and Text Translator   12062
Tayasui Sketches   11505
Drawing Desk - Draw, Paint, Doodle & Sketch board   11040
Microsoft PowerPoint   10939
Email - F

Productivity apps, besides Microsoft, Google and a few others don't look like a good market at App Store.

#### Finding Most Popular Genres Per App Population at Google Play

In [50]:
display_table(gplay_cleaned, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


It is possible to see that the number of installs is not very specific about the real number> to make our analysis faster, I am going to consider that '1,000,000+' equal to 1,000,000, so it is kinda a conservative estimation. Anyway, to do that i will perform a code similar to the one at App Store, but will use the **replace** method to make possible to transform the **string** in a **float**.  

In [52]:
gplay_genres = freq_table(gplay_cleaned, 1)

category_avg_installs = []

for category in gplay_genres:
    total = 0
    len_category = 0
    for i in gplay_cleaned:
        category_app = i[1]
        if category_app == category:
            n_installs = i[5]
            n_installs = n_installs.replace('+', '')
            n_installs = n_installs.replace(',', '')
            n_installs = float(n_installs)
            total += n_installs
            len_category += 1
    avg_installs = total / len_category
    category_avg_installs.append((category,avg_installs))
    
category_avg_installs_sorted = sorted(category_avg_installs, key = lambda x: x[1], reverse = True)

for category, avg in category_avg_installs_sorted:
    print(category, ':', avg)
    

COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15588015.603248259
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8767811.894736841
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3695641.8198090694
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 51315

##### Visualizing Results for Google Play

In [54]:
for i in gplay_cleaned:
    if i[1] == 'COMMUNICATION':
        print(i[0], ' ' , i[5])

WhatsApp Messenger   1,000,000,000+
Messenger for SMS   10,000,000+
My Tele2   5,000,000+
imo beta free calls and text   100,000,000+
Contacts   50,000,000+
Call Free – Free Call   5,000,000+
Web Browser & Explorer   5,000,000+
Browser 4G   10,000,000+
MegaFon Dashboard   10,000,000+
ZenUI Dialer & Contacts   10,000,000+
Cricket Visual Voicemail   10,000,000+
TracFone My Account   1,000,000+
Xperia Link™   10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard   10,000,000+
Skype Lite - Free Video Call & Chat   5,000,000+
My magenta   1,000,000+
Android Messages   100,000,000+
Google Duo - High Quality Video Calls   500,000,000+
Seznam.cz   1,000,000+
Antillean Gold Telegram (original version)   100,000+
AT&T Visual Voicemail   10,000,000+
GMX Mail   10,000,000+
Omlet Chat   10,000,000+
My Vodacom SA   5,000,000+
Microsoft Edge   5,000,000+
Messenger – Text and Video Chat for Free   1,000,000,000+
imo free video calls and chat   500,000,000+
Calls & Text by Mo+   5,000,000+
free 

In [55]:
for i in gplay_cleaned:
    if i[1] == 'SOCIAL':
        print(i[0], ' ' , i[5])

Facebook   1,000,000,000+
Facebook Lite   500,000,000+
Tumblr   100,000,000+
Social network all in one 2018   100,000+
Pinterest   100,000,000+
TextNow - free text + calls   10,000,000+
Google+   1,000,000,000+
The Messenger App   1,000,000+
Messenger Pro   1,000,000+
Free Messages, Video, Chat,Text for Messenger Plus   1,000,000+
Telegram X   5,000,000+
The Video Messenger App   100,000+
Jodel - The Hyperlocal App   1,000,000+
Hide Something - Photo, Video   5,000,000+
Love Sticker   1,000,000+
Web Browser & Fast Explorer   5,000,000+
LiveMe - Video chat, new friends, and make money   10,000,000+
VidStatus app - Status Videos & Status Downloader   5,000,000+
Love Images   1,000,000+
Web Browser ( Fast & Secure Web Explorer)   500,000+
SPARK - Live random video chat & meet new people   5,000,000+
Golden telegram   50,000+
Facebook Local   1,000,000+
Meet – Talk to Strangers Using Random Video Chat   5,000,000+
MobilePatrol Public Safety App   1,000,000+
💘 WhatsLov: Smileys of love, sti

In [56]:
for i in gplay_cleaned:
    if i[1] == 'BOOKS_AND_REFERENCE':
        print(i[0], ' ' , i[5])

E-Book Read - Read Book for free   50,000+
Download free book with green book   100,000+
Wikipedia   10,000,000+
Cool Reader   10,000,000+
Free Panda Radio Music   100,000+
Book store   1,000,000+
FBReader: Favorite Book Reader   10,000,000+
English Grammar Complete Handbook   500,000+
Free Books - Spirit Fanfiction and Stories   1,000,000+
Google Play Books   1,000,000,000+
AlReader -any text book reader   5,000,000+
Offline English Dictionary   100,000+
Offline: English to Tagalog Dictionary   500,000+
FamilySearch Tree   1,000,000+
Cloud of Books   1,000,000+
Recipes of Prophetic Medicine for free   500,000+
ReadEra – free ebook reader   1,000,000+
Anonymous caller detection   10,000+
Ebook Reader   5,000,000+
Litnet - E-books   100,000+
Read books online   5,000,000+
English to Urdu Dictionary   500,000+
eBoox: book reader fb2 epub zip   1,000,000+
English Persian Dictionary   500,000+
Flybook   500,000+
All Maths Formulas   1,000,000+
Ancestry   5,000,000+
HTC Help   10,000,000+
E

In [57]:
# Performing the same code to see the average rating (0-5) in each genre 
ios_ratings = freq_table(ios_cleaned, -5)

rating_avg = []

for genre in ios_ratings:
    total = 0
    len_ratings = 0
    for i in ios_cleaned:
        genre_app = i[-5]
        if genre_app == genre:            
            nn_ratings = float(i[7])
            total += nn_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    rating_avg.append((genre, avg_n_ratings))
    
# Sorting the list of tuples in descending order based on the average ratings
genre_avg_ratings_sorted = sorted(rating_avg, key=lambda x: x[1], reverse=True)

# Printing the sorted results
for genre, avg in genre_avg_ratings_sorted:
    print(genre, ':', avg)

Games : 3.5253960857409132
Social Networking : 3.4017857142857144
Photo & Video : 2.295955882352941
Entertainment : 0.31193615544760583
Education : 0.13640699523052463
Shopping : 0.1319224683544304
Utilities : 0.11896838602329451
Music : 0.11776672694394213
Health & Fitness : 0.10675381263616558
Sports : 0.0710446758481693
Productivity : 0.06997813183380194
Lifestyle : 0.06621004566210045
Travel : 0.05707855973813421
News : 0.05425904317386231
Weather : 0.04197158846319415
Finance : 0.04013875123885035
Food & Drink : 0.03249656121045392
Reference : 0.029596412556053813
Business : 0.020975761342448725
Book : 0.014376462721497826
Navigation : 0.00892510671323244
Medical : 0.0055762081784386614
Catalogs : 0.005121042830540037


### CONCLUSION
Looking inside the genres at App Store and the Categories at Google Play an idea emerged in my mind. Apparently there is a big demand for religious apps like The Bible, The Quran and others. At the same time, the social media at the mean of installs does not have a great success, but shows on average, a good number of installs. Another thing I realized is that social medias of live video chats also have a good number of downloads.

So, combining the 3, would make sense to test an MVP of a video chat social media where people could connect with others from their religion. I don't see this idea as plausible for subscription.