# App data analysis

This project is about understanding what kind of apps should the company be focusing on producing to acheive more users.

The goal is to analyze data to help our developers understand what type of apps are likely to attract more users.

Here are two data sets that seem suitable for our goals::
- Google Play: https://www.kaggle.com/datasets/lava18/google-play-store-apps
- App Store: https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps

First, we import the dataset as a list of lists

In [1]:
from csv import reader
#AppStore dataset
opened_appstore = open('AppleStore.csv')
read_appstore = reader(opened_appstore)
data_ios = list(read_appstore) #All ios data
header_ios = data_ios[0] #Header Ios data
ios = data_ios[1:] #AppStore data without header

#Google Play dataset
opened_googleplay = open('googleplaystore.csv')
read_googleplay = reader(opened_googleplay)
data_gplay = list(read_googleplay) #All android data
header_gplay = data_gplay[0] #Header android data
gplay = data_gplay[1:] #Android data without header


We create a function to explore the dataset by extracting a slice of the rows to be printed:
The **explore_data()** function does the following:

Takes in four parameters:
dataset(list of lists)
start and end, (integers that represent the starting and the ending indices of a slice from the dataset)
rows_and_columns(Boolean and has False as a default argument)

First it Slices the dataset using dataset[start:end]
Loops through the slice, and for each iteration, prints a row and adds a new line after that row using print('\n')
The \n in print('\n') is a special character that won't print. Instead, the \n character adds a new line, and we use print('\n') to add some blank space between rows
Prints the number of rows and columns if rows_and_columns is True
dataset shouldn't have a header row, or the function will print the wrong number of rows (one more row compared to the actual length)


In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [3]:
explore_data(gplay, 1, 3, 'rows_and_columns'==True)

['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']




In [4]:
explore_data(ios, 1, 3, True)

['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


After we explore what kind of data we will be using in this project, we need to see what columns of data could be useful for our purpose, so we print the header of each dataset to see what we could use.

In [5]:
print(data_ios[0])
print('\n')
print(data_gplay[0])

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


# Data Cleaning
Before beginning our analysis, we need to make sure the data we analyze is accurate, or the results of our analysis will be wrong. 
- Detect inaccurate data, and correct or remove it.
- Detect duplicate data, and remove the duplicates.

For this, we will:
- Remove non-English apps like 爱奇艺PPS -《欢乐颂2》电视剧热播.
- Remove apps that aren't free.

For this we read the discussions on https://www.kaggle.com/datasets/lava18/google-play-store-apps/discussion and realized there was a row without the information of one of its column. 
To see which app has missing data, we created a loop that compares the number of values stored in each row with the number of columns of the header row. 
When these number dont add up, the function prints out the row and the index of the row so we can later delete it

In [6]:
print(data_gplay[0],'\n')

print(len(header_gplay),'\n')

for row in gplay:
    if len(row) != len(header_gplay):
        print(row)
        print("\n")
        print("Index postion is:", gplay.index(row))

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

13 

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


Index postion is: 10472


With this information we print out the row in question, then we delete it and afterwards we check if the deletion was successful.

In [7]:
print(gplay[10472])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [8]:
del (gplay[10472])

In [9]:
print(gplay[10472])

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


We see that the row in question no longer exists, so we repeat this process for the ios dataset to confirm that all of the rows have the neccessary data

In [10]:
print(header_ios,'\n')

print(len(header_ios))
for row in ios:
    if len(row) != len(header_ios):
        print(row)
        print("\n")
        print("Index postion is:", ios.index(row))

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] 

16


# Duplicate apps
We see that there is no missing data in the ios dataset,so next we will create a loop to identify duplicated data.
This loop goes through each row of the dataset without header and adds them to a separate list of duplicate data if the row already exists in the unique list 

In [11]:
duplicate_apps = []
unique_apps = []

for apps in gplay:
    name = apps[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate android apps:', len(duplicate_apps))
print('\n')
print('Number of expected android apps without duplicates:', (len(gplay)-len(duplicate_apps)))
print('\n')
print('Examples of duplicate android apps:', duplicate_apps[:15])

duplicate_ios_apps = []
unique_ios_apps = []

print('\n')

for apps in ios:
    name = apps[0]
    if name in unique_ios_apps:
        duplicate_ios_apps.append(name)
    else:
        unique_ios_apps.append(name)
        
print('Number of duplicate ios apps:', len(duplicate_ios_apps))
print('\n')
print('Number of expected ios apps without duplicates:', len(ios))
print('\n')
print('Examples of duplicate ios apps:', duplicate_ios_apps[:15])


Number of duplicate android apps: 1181


Number of expected android apps without duplicates: 9659


Examples of duplicate android apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


Number of duplicate ios apps: 0


Number of expected ios apps without duplicates: 7197


Examples of duplicate ios apps: []


We can see that there are no duplicate data for the ios dataset but there are 1181 duplicated entries in the android dataset.

# Delete duplicated android
Seeing that there are many duplicated android apps, we need to delete those that are somehow less updated. For this, we recognized that the N° of reviews differs significantly, so we will try to mantain de data from the apps that have the most reviews.
1. Create a dictionary where each key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.
2. Use the dictionary we created  to remove the duplicate rows

For the first step we will iterate over the google apps to identify the names and reviews columns, and use them to create a new dictionary that has unique keys for each app associated with the highest value of reviews found in the dataset.

In [12]:
reviews_max = {}
for rows in gplay:
    name = rows[0]
    n_reviews = float(rows[3])
    if name in reviews_max and reviews_max[name]<n_reviews:
        reviews_max[name]:n_reviews
    elif name not in reviews_max:
        reviews_max[name]=n_reviews
        
print('Number of rows in android dictionary:',len(reviews_max))
print('Number of expected android apps without duplicates: 9659')
        

Number of rows in android dictionary: 9659
Number of expected android apps without duplicates: 9659


With this dictionary at hand, we can create a new list with all the clean data by adding only the rows of the apps that match the highest value of reviews saved in the dictionary.

In [13]:
gplay_clean = []
already_added = []

for rows in gplay:
    name = rows[0]
    n_reviews = float(rows[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        gplay_clean.append(rows)
        already_added.append(name)
print('Length of clean android dataset:',len(gplay_clean))
#We don't need to do the same for the App Store data because there are no duplicates 

Length of clean android dataset: 9659


# Delete non_english apps
Write a function that takes in a string and returns False if there's any character in the string that doesn't belong to the set of common English characters; otherwise, the function returns True.
This function takes in a string and loops through the name of the app to see if each character belongs to the english dictionary, which acording to the ASCII correspond are associated to numbers below 127. If there are more than 3 characters outside this range, the function returns False. If not, then the name is in english it returns True.

In [14]:
def check_lang(name):
    count = 0 #Empty variable to store count of non encglish characters
    for characters in name:
       
        if ord(characters) > 127:
            count += 1
    if count > 3:
        return False
    else:
        return True

check_lang('爱奇艺PPS -《欢乐颂2》电视剧热播')

False

In [15]:
print(ios[813][1])
print(check_lang(ios[813][1]))

爱奇艺PPS -《欢乐颂2》电视剧热播
False


In [16]:
android_eng_apps = []
ios__eng_apps = []
for apps in gplay_clean:
    app_name = apps[0]
    if check_lang(app_name):
        android_eng_apps.append(apps)

for apps in ios:
    app_name = apps[1]
    if check_lang(app_name):
        ios__eng_apps.append(apps)
#pendiente hacer para ios (cual es la lista final, limpia? revisar todo proyecto, ordenar y seguir)

print('Length of android apps in english:', len(android_eng_apps))
print('\n')
print('Length of ios apps in english:', len(ios__eng_apps))
     
        

Length of android apps in english: 9614


Length of ios apps in english: 6183


We can see that we are left with:
- 9614 Android apps in english
- 6183 Ios apps in english

In [17]:
explore_data(android_eng_apps, 0, 3, True)
explore_data(ios__eng_apps, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 9614
Number of columns: 13
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

# Isolating free apps
The final step in the cleaning process is isolating the free apps. For this we loop through each dataset to identify all the free apps and create a new list with only these apps.

# Problem: final android apps should be 8864 but it gives 8862

In [18]:
android_final = []
ios_final = []
fuera = []
for rows in android_eng_apps:
    price = rows[7]
    tag = rows[6]
    if price == '0':
        android_final.append(rows)
    else:
        fuera.append(rows)
for rows in ios__eng_apps:
    price = rows[4]
    if price == '0.0':
        ios_final.append(rows)
        
print('\n')
explore_data(android_final, 0, 3, True)
print('\n')
explore_data(ios_final, 0, 3, True)



['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 8862
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9

# Most Common Apps by Genre: Part One

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful in both markets

We need to create frequency tables to determine the most common genres in each market (ios - android)
(Explain more blah blah)

In [19]:
print(header_gplay)
print('\n')
print(android_final[0])
print('\n')
print(header_ios)
print('\n')
print(ios_final[0])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


So for android we have category in [1] place and genres in [9] place that we could associate with Installs [5] or reviews [3]

For ios we have prime_genre in [-5] place that we could associate with rating count total [5]

In [20]:
#"Frequency table" function to use in "display_table" function
def freq_table(dataset,index): #Takes in a list of lists and an integer that defines the column
    frequency = {}
    total = 0 #Needed to calculate the %
    for app in dataset:
        total += 1
        column = app[index]
        if column in frequency:
            frequency[column] += 1
        else:
            frequency[column] = 1
    
    freq_in_perc = {}
    for key in frequency:
        percentage = (frequency[key]/total) * 100
        freq_in_perc[key] = percentage
    return freq_in_perc

In [21]:
freq_category_android = freq_table(android_final,1)
print(freq_category_android)

{'ART_AND_DESIGN': 0.6770480704129994, 'AUTO_AND_VEHICLES': 0.9252990295644324, 'BEAUTY': 0.598059128864816, 'BOOKS_AND_REFERENCE': 2.143985556307831, 'BUSINESS': 4.5926427443015125, 'COMICS': 0.6206273978785828, 'COMMUNICATION': 3.238546603475513, 'DATING': 1.8618821936357481, 'EDUCATION': 1.2863913337846988, 'ENTERTAINMENT': 1.128413450688332, 'EVENTS': 0.7109004739336493, 'FINANCE': 3.7011961182577298, 'FOOD_AND_DRINK': 1.2412547957571656, 'HEALTH_AND_FITNESS': 3.080568720379147, 'HOUSE_AND_HOME': 0.8350259535093659, 'LIBRARIES_AND_DEMO': 0.9365831640713158, 'LIFESTYLE': 3.9043105393816293, 'GAME': 9.873617693522906, 'FAMILY': 18.449559918754233, 'MEDICAL': 3.5206499661475967, 'SOCIAL': 2.663055743624464, 'SHOPPING': 2.2455427668697814, 'PHOTOGRAPHY': 2.945159106296547, 'SPORTS': 3.39652448657188, 'TRAVEL_AND_LOCAL': 2.335815842924848, 'TOOLS': 8.440532611148726, 'PERSONALIZATION': 3.3175355450236967, 'PRODUCTIVITY': 3.8930264048747465, 'PARENTING': 0.6544798013992327, 'WEATHER': 0.

In [22]:
def display_table(dataset, index):
    table = freq_table(dataset, index) #We use the previous function to take in the same parameters as this function
    table_display = [] #new empty list as tuple
    for key in table:
        key_val_as_tuple = (table[key], key) # We have to switch the value and key so the "sorted" function can work
        table_display.append(key_val_as_tuple) # Save it as a new list

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [23]:
#Ios english app "prime_genre" frecuency table
ios_prime_genre_freq = display_table(ios_final, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


In [24]:
#Android english app "genre" frecuency table
android_genre_freq = display_table(android_final, 9)

Tools : 8.429248476641842
Entertainment : 6.070864364703228
Education : 5.348679756262695
Business : 4.5926427443015125
Productivity : 3.8930264048747465
Lifestyle : 3.8930264048747465
Finance : 3.7011961182577298
Medical : 3.5206499661475967
Sports : 3.4642292936131795
Personalization : 3.3175355450236967
Communication : 3.238546603475513
Action : 3.1031369893929135
Health & Fitness : 3.080568720379147
Photography : 2.945159106296547
News & Magazines : 2.798465357707064
Social : 2.663055743624464
Travel & Local : 2.324531708417964
Shopping : 2.2455427668697814
Books & Reference : 2.143985556307831
Simulation : 2.0424283457458814
Dating : 1.8618821936357481
Arcade : 1.8505980591288649
Video Players & Editors : 1.7716091175806816
Casual : 1.7603249830737984
Maps & Navigation : 1.399232678853532
Food & Drink : 1.2412547957571656
Puzzle : 1.128413450688332
Racing : 0.9930038366057323
Role Playing : 0.9365831640713158
Libraries & Demo : 0.9365831640713158
Auto & Vehicles : 0.92529902956443

In [25]:
#Android english app "category" frecuency table
android_category_freq = display_table(android_final, 1)

FAMILY : 18.449559918754233
GAME : 9.873617693522906
TOOLS : 8.440532611148726
BUSINESS : 4.5926427443015125
LIFESTYLE : 3.9043105393816293
PRODUCTIVITY : 3.8930264048747465
FINANCE : 3.7011961182577298
MEDICAL : 3.5206499661475967
SPORTS : 3.39652448657188
PERSONALIZATION : 3.3175355450236967
COMMUNICATION : 3.238546603475513
HEALTH_AND_FITNESS : 3.080568720379147
PHOTOGRAPHY : 2.945159106296547
NEWS_AND_MAGAZINES : 2.798465357707064
SOCIAL : 2.663055743624464
TRAVEL_AND_LOCAL : 2.335815842924848
SHOPPING : 2.2455427668697814
BOOKS_AND_REFERENCE : 2.143985556307831
DATING : 1.8618821936357481
VIDEO_PLAYERS : 1.782893252087565
MAPS_AND_NAVIGATION : 1.399232678853532
EDUCATION : 1.2863913337846988
FOOD_AND_DRINK : 1.2412547957571656
ENTERTAINMENT : 1.128413450688332
LIBRARIES_AND_DEMO : 0.9365831640713158
AUTO_AND_VEHICLES : 0.9252990295644324
HOUSE_AND_HOME : 0.8350259535093659
WEATHER : 0.8011735499887158
EVENTS : 0.7109004739336493
ART_AND_DESIGN : 0.6770480704129994
PARENTING : 0.65

# Most popular apps by genre


calculating the average number of user ratings per app genre on the App Store

In [26]:
freq_table_ios = freq_table(ios_final, -5)
for genre in freq_table_ios:
    total = 0
    len_genre = 0
    for apps in ios_final:
        genre_app = apps[-5]
        if genre_app == genre:
            rating_count = float(apps[5])
            total += rating_count
            len_genre += 1
    avg_usr_rating = total /len_genre
    print(genre, freq_table_ios[genre], ":", avg_usr_rating)


    

Social Networking 3.2898820608317814 : 71548.34905660378
Photo & Video 4.9658597144630665 : 28441.54375
Games 58.16263190564867 : 22788.6696905016
Music 2.0484171322160147 : 57326.530303030304
Reference 0.5586592178770949 : 74942.11111111111
Health & Fitness 2.0173805090006205 : 23298.015384615384
Weather 0.8690254500310366 : 52279.892857142855
Utilities 2.5139664804469275 : 18684.456790123455
Travel 1.2414649286157666 : 28243.8
Shopping 2.60707635009311 : 26919.690476190477
News 1.3345747982619491 : 21248.023255813954
Navigation 0.186219739292365 : 86090.33333333333
Lifestyle 1.5828677839851024 : 16485.764705882353
Entertainment 7.883302296710118 : 14029.830708661417
Food & Drink 0.8069522036002483 : 33333.92307692308
Sports 2.1415270018621975 : 23008.898550724636
Book 0.4345127250155183 : 39758.5
Finance 1.1173184357541899 : 31467.944444444445
Education 3.662321539416512 : 7003.983050847458
Productivity 1.7380509000620732 : 21028.410714285714
Business 0.5276225946617008 : 7491.117647

# Recomendation
Based on the ios dataset, we could recomend 

Android 


In [27]:
display_table(android_final, 5)

1,000,000+ : 15.730083502595352
100,000+ : 11.543669600541637
10,000,000+ : 10.550665763935905
10,000+ : 10.212141728729407
1,000+ : 8.395396073121193
100+ : 6.917174452719477
5,000,000+ : 6.82690137666441
500,000+ : 5.574362446400361
50,000+ : 4.773188896411646
5,000+ : 4.513653802753328
10+ : 3.5432182351613632
500+ : 3.2498307379823967
50,000,000+ : 2.279395170390431
100,000,000+ : 2.1214172872940646
50+ : 1.9183028661701647
5+ : 0.7898894154818324
1+ : 0.5077860528097494
500,000,000+ : 0.2708192281651997
1,000,000,000+ : 0.22568269013766643
0+ : 0.045136538027533285
0 : 0.011284134506883321


In [31]:
freq_table_android = freq_table(android_final, 1)
for category in freq_table_android:
    total = 0
    len_category = 0
    for apps in android_final:
        categories = apps[1]
        if categories == category:
            installs = (apps[5].replace('+', ''))
            installs_count = float(installs.replace(',', ''))
            total += installs_count
            len_category += 1
    avg_usr_installs = total /len_genre
    print(category, ":", avg_usr_installs)

ART_AND_DESIGN : 19053516.666666668
AUTO_AND_VEHICLES : 8846676.833333334
BEAUTY : 4532841.666666667
BOOKS_AND_REFERENCE : 277647376.6666667
BUSINESS : 116150348.33333333
COMICS : 7495191.666666667
COMMUNICATION : 1839484366.8333333
DATING : 23485792.833333332
EDUCATION : 58558333.333333336
ENTERTAINMENT : 352243333.3333333
EVENTS : 2662193.3333333335
FINANCE : 75860522.0
FOOD_AND_DRINK : 35289791.833333336
HEALTH_AND_FITNESS : 190591400.33333334
HOUSE_AND_HOME : 16202076.833333334
LIBRARIES_AND_DEMO : 8832635.0
LIFESTYLE : 82914071.5
GAME : 2309644908.3333335
FAMILY : 733465948.3333334
MEDICAL : 6272057.333333333
SOCIAL : 914643650.3333334
SHOPPING : 233389764.16666666
PHOTOGRAPHY : 774544802.5
SPORTS : 182538447.16666666
TRAVEL_AND_LOCAL : 482450681.0
TOOLS : 1333340579.0
PERSONALIZATION : 254872648.0
PRODUCTIVITY : 965271552.3333334
PARENTING : 5245168.333333333
WEATHER : 60048086.666666664
VIDEO_PLAYERS : 654455286.6666666
NEWS_AND_MAGAZINES : 394699376.6666667
MAPS_AND_NAVIGATION 