## Apple AppStore and Google Play data analysis

**In this project, we'll analyze data about mobile apps from Apple AppStore and Google Play.
The goal for this project is to help our developers to understand what types of apps are more perspective.**

**Goals:**

* To collect and analyze data about mobile apps available on Google Play and the Apple AppStore
* To clean the data and prepare it for analysis.
* To analyze the cleaned data.



## Functions

In [1]:
############################################################################
#Open a csv file 

def open_dataset (file_name):
    from csv import reader
    opened_file = open (file_name, encoding='utf8')
    read_file = reader(opened_file)
    data = list (read_file)
    return data
    
############################################################################

In [2]:
############################################################################
#Print some rows of a data set, amount of rows and amount of columns

def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line between rows
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        
############################################################################  

In [3]:
############################################################################
#Print a header row in readable form

def print_header(app_header):    
    separator  = '\n'
    joined =  separator.join(app_header)
    print (joined)                          
    print ('\n')

############################################################################

In [4]:
############################################################################
#Create a frequency table for any column in a data set

def freq_dict (index, any_list):
    frequency_dict = {}
    
    for row in any_list:
        name = row [index]
        if name in frequency_dict:
            frequency_dict [name] += 1
        else:
            frequency_dict [name] = 1
    return frequency_dict    
            
############################################################################   

In [5]:
############################################################################
#Analyse string and returns False if there's any character in the string
#that doesn't belong to the set of common latin characters, otherwise it returns True.
def is_english(string):  
    count = 0
    for character in string:
        if ord(character) > 127:
            count = count +1
    if count > 3:
        return False
    else:
        return True
    
############################################################################

In [6]:
############################################################################
#Print data set in a readable format
def print_data_set (data_set):
    for row in data_set [:10]:
        print (row)
        print('\n')

############################################################################


In [7]:
############################################################################ 
#Isolate free apps

def free_apps (price_column, app):
    free_app = []
    no_free_app = []

    for row in app:
        price = row [price_column]
        if price == '0' or price == '0.0':
            free_app.append (row)
        else:
            no_free_app.append (row)
    return free_app, no_free_app        
            
############################################################################   



## Description of Google Play Store Apps columns:

* **App** - AppApplication name.
* **Category** - Category the app belongs to.
* **Rating** - Overall user rating of the app (as when scraped).
* **Reviews** - Number of user reviews for the app (as when scraped).
* **Size** - Size of the app (as when scraped).
* **Installs** - Number of user downloads/installs for the app (as when scraped).
* **Type** - Paid or Free.
* **Price** - Price of the app (as when scraped).
* **Content Rating** - Age group the app is targeted at - Children / Mature 21+ / Adult.
* **Genres** - An app can belong to multiple genres (apart from its main category). For example, a musical family game will belong to Music, Game, Family genres.
* **Last Updated** - Date when the app was last updated on Play Store (as when scraped).
* **Current Ver** - Current version of the app available on Play Store (as when scraped).
* **Android Ver** - Min required Android version (as when scraped).

More information about this dataset you can find here: [Google Play Store Apps](https://www.kaggle.com/lava18/google-play-store-apps)

In [8]:
android_apps = open_dataset ('googleplaystore.csv')
android_apps_header = android_apps [0]
android_apps = android_apps [1:]

print ('---------- Android apps ----------')
print ('\n')
print ('The header row: ')
print ('\n')
print_header (android_apps_header)
print ('\n')

explore_data (android_apps, 0,3, True)

---------- Android apps ----------


The header row: 


App
Category
Rating
Reviews
Size
Installs
Type
Price
Content Rating
Genres
Last Updated
Current Ver
Android Ver




['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13



## Description of  Apple AppStore (column's names):
* **id** - Applications ID
* **track_name** - App Name
* **size_bytes** - Size of the app 
* **currency** - Currency Type
* **price** - Price amount
* **rating_count_tot** - User Rating counts
* **rating_count_ver** - User Rating counts
* **user_rating** - Average User Rating value
* **user_rating_ver** - Average User Rating value
* **ver** - Latest version code
* **cont_rating** - Content Rating
* **prime_genre** - Primary Genre
* **sup_devices.num** - Number of supporting devices
* **ipadSc_urls.num** - Number of screenshots showed for display
* **lang.num** - Number of supported languages
* **vpp_lic** - Vpp Device Based Licensing Enabled

More information about this dataset you can find here: [Mobile App Store](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/#AppleStore.csv)

In [9]:
ios_apps = open_dataset ('AppleStore.csv')
ios_apps_header = ios_apps [0]
ios_apps = ios_apps [1:]

print ('---------- Ios apps ----------')
print ('\n')
print ('The header row: ')
print ('\n')
print_header (ios_apps_header)
print ('\n')

explore_data (ios_apps, 0,3, True)

---------- Ios apps ----------


The header row: 



id
track_name
size_bytes
currency
price
rating_count_tot
rating_count_ver
user_rating
user_rating_ver
ver
cont_rating
prime_genre
sup_devices.num
ipadSc_urls.num
lang.num
vpp_lic




['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


Number of rows: 7197
Number of columns: 17




# Data cleaning

*We need to delete rows with incorrect data from the Google Play dataset.* 

There is an error in row 10472, we need to delete this row. 

In [10]:
print (android_apps [10472])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [11]:
#print ('The length of Google Play data set BEFORE we delete the row 10472:  ', len (android_apps))
del android_apps [10472]
#print ('The length of Google Play data set AFTER we delete the row 10472:  ', len (android_apps))

In [12]:
# A new row 10472
print (android_apps [10472])

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']



# Part 1: Find duplicate data  and remove the duplicates

## Frequency table

In the frequency table each key is a unique app name, and the value is amount of entries.

The frequency table can help us:


* Detect duplicate data.
* Count the number of duplicates.
* Convert the frequency table to the list of values.
* Search the maximum amount of duplicates.
* Search names of apps that have the maximum amount of duplicates.



In [13]:
# Create a frequency table for the Google Play Store Apps

freq_android_apps = freq_dict (0, android_apps)
print ('---------- Android apps ----------')
print ('\n')
print ("The number of columns in Google Play Store Apps: ", len (android_apps))
print ("The length of the frequency table: ", len(freq_android_apps))
print ("Numbers of duplicates apps: ", len (android_apps) - len(freq_android_apps))

---------- Android apps ----------


The number of columns in Google Play Store Apps:  10840
The length of the frequency table:  9659
Numbers of duplicates apps:  1181


*We can try some other intresting analysis.* 
*For example, we can find the names of apps that have max amount entries in the dataset.*

In [14]:
# The max values of apps entries 

max_entries = max (freq_android_apps.values())
print (max_entries)

for key in freq_android_apps:
    if freq_android_apps[key] == max_entries:
        print ('App (s) with the max amount of enrties: \n', key, ':',  freq_android_apps [key])

9
App (s) with the max amount of enrties: 
 ROBLOX : 9


*If we want to know which apps appear in the dataset and how many times:*

In [15]:
for i in range(2, max_entries+1):
    print ('\n')
    print ('--------',i, 'times ----------')
    print ('\n')
    for key in freq_android_apps:
        if freq_android_apps[key] == i:
            print (key)



-------- 2 times ----------


Coloring book moana
Mcqueen Coloring pages
UNICORN - Color By Number & Pixel Art Coloring
Textgram - write on photos
Wattpad 📖 Free Books
Amazon Kindle
Dictionary - Merriam-Webster
NOOK: Read eBooks & Magazines
Oxford Dictionary of English : Free
Spanish English Translator
NOOK App for NOOK Devices
Ebook Reader
English Dictionary - Offline
Docs To Go™ Free Office Suite
OfficeSuite : Free Office + PDF Editor
Curriculum vitae App CV Builder Free Resume Maker
Facebook Pages Manager
Call Blocker
ZOOM Cloud Meetings
Facebook Ads Manager
SignEasy | Sign and Fill PDF and other Documents
Genius Scan - PDF Scanner
Tiny Scanner - PDF Scanner App
Fast Scanner : Free PDF Scan
Mobile Doc Scanner (MDScan) Lite
TurboScan: scan documents and receipts in PDF
Tiny Scanner Pro: PDF Doc Scan
Zenefits
FreshBooks Classic
Insightly CRM
HipChat - Chat Built for Teams
Xero Accounting Software
MailChimp - Email, Marketing Automation
Crew - Free Messaging and Scheduling
Asana: org

My Talking Tom
Pixel Art: Color by Number Game
Dream League Soccer 2018
Block Craft 3D: Building Simulator Games For Free
Hungry Shark Evolution
MARVEL Strike Force
Strawberry Shortcake BerryRush
Dog Run - Pet Dog Simulator
Toca Kitchen 2
PJ Masks: Moonlight Heroes
DisneyNOW – TV Shows & Games
Papumba Academy - Fun Learning For Kids
Equestria Girls
Frozen Free Fall
Elmo Calls by Sesame Street
Sago Mini Friends
Dr. Panda & Toto's Treehouse
Video Editor
Human Anatomy Atlas 2018: Complete 3D Human Body
Cardiac diagnosis (heart rate, arrhythmia)
Blood Pressure
Youper - AI Therapy
mySugr: the blood sugar tracker made just for you
Free Blood Pressure
Tumblr
Pinterest
Periscope - Live Video
Text Free: WiFi Calling App
SayHi Chat, Meet New People
POF Free Dating App
Amazon Shopping
Target - now with Cartwheel
Extreme Coupon Finder
Checkout 51: Grocery coupons
Gyft - Mobile Gift Card Wallet
RetailMeNot - Coupons, Deals & Discount Shopping
LivingSocial - Local Deals
Fancy
JackThreads: Men's Shop

In [16]:
# Print a few duplicate rows to confirm

for row in android_apps:
    if row [0] == 'Instagram':
        print (row)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


The difference between rows is on the fourth position of each row, which corresponds to the number of reviews. We can't remove duplicates randomly. We can, for exempel,  keep the row with the highest number of reviews and remove the other entries for any given app or we can calculate thenaverage of revievs. 

In [17]:
# Create a list of unique names and a list of duplicates names

android_duplicate_apps = []
android_unique_apps = []

for row in android_apps:
    name = row[0]
    if name in android_unique_apps:
        android_duplicate_apps.append (name)
    else:
        android_unique_apps.append (name)

print (len(android_duplicate_apps))
print (len(android_unique_apps))           

1181
9659


To clean the data set we will create a dictionary, where each dictionary key is a unique app name and the corresponding dictionary value is the everage value of reviews of the app.
Use the information stored in the dictionary and create a new data set, which will have only one entry per app.

Every key in dictionary is unic so we can create a list of unic names in Apple store. 

In [18]:
unic_names_android_apps = list (freq_android_apps.keys())
len(unic_names_android_apps)

9659

Create a dictionary with average values of revievs of duplicates apps

In [19]:
average_dictionary = {}

for element in android_duplicate_apps:
    value = 0
    for row in android_apps:
        if row [0] == element:
            value = value + float (row[3])
                
    average_value = value / freq_android_apps [element]
    average_dictionary [element] = round (average_value)      
    
print (len(average_dictionary))
print (average_dictionary ['ESPN']) 

798
521131


In [20]:
#Create a new clean data set without duplicates
#Create two empty lists

android_apps_clean = []
android_apps_added = []

#Loop through the Google Play data set
for app in android_apps:
   
    if app[0] not in android_apps_added:
        if app[0] in average_dictionary:
            #Assign an average number of reviews to ReviewsNumber column  
            app[3] = average_dictionary [app[0]]
        android_apps_clean.append (app)
        android_apps_added.append(app[0])
        
print (len (android_apps_clean)) 

for app in android_apps_clean:
    if app [0] == 'Instagram':
        print (app)    

9659
['Instagram', 'SOCIAL', '4.5', 66560497, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']



# Part 2: Remove apps with bad names

Remove apps with names containing non-latin characters

In [21]:
english_android_apps  = []
no_english_android_apps = []

for row in android_apps_clean:
    name = row[0]
    english = is_english(name)
    if english:
        english_android_apps.append (row)
    else:
        no_english_android_apps.append(row)
        
print (len (english_android_apps)) 
print (len (no_english_android_apps))
print (len (no_english_android_apps) + len (english_android_apps))

explore_data (no_english_android_apps ,0,5)
print ("---------------------------------------------------------------------------")
explore_data (english_android_apps,0,5)

9614
45
9659
['Flame - درب عقلك يوميا', 'EDUCATION', '4.6', '56065', '37M', '1,000,000+', 'Free', '0', 'Everyone', 'Education', 'July 26, 2018', '3.3', '4.1 and up']


['သိင်္ Astrology - Min Thein Kha BayDin', 'LIFESTYLE', '4.7', '2225', '15M', '100,000+', 'Free', '0', 'Everyone', 'Lifestyle', 'July 26, 2018', '4.2.1', '4.0.3 and up']


['РИА Новости', 'NEWS_AND_MAGAZINES', '4.5', '44274', '8.0M', '1,000,000+', 'Free', '0', 'Everyone', 'News & Magazines', 'August 6, 2018', '4.0.6', '4.4 and up']


['صور حرف H', 'ART_AND_DESIGN', '4.4', '13', '4.5M', '1,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 27, 2018', '2.0', '4.0.3 and up']


['L.POINT - 엘포인트 [ 포인트, 멤버십, 적립, 사용, 모바일 카드, 쿠폰, 롯데]', 'LIFESTYLE', '4.0', '45224', '49M', '5,000,000+', 'Free', '0', 'Everyone', 'Lifestyle', 'August 1, 2018', '6.5.1', '4.1 and up']


---------------------------------------------------------------------------
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M

In [22]:
english_ios_apps  = []
no_english_ios_apps = []

for row in ios_apps:
    name = row[2]
    english = is_english(name)
    if english:
        english_ios_apps.append (row)
    else:
        no_english_ios_apps.append(row)
        
print (len (english_ios_apps)) 
print (len (no_english_ios_apps))
print (len (no_english_ios_apps) + len (english_ios_apps))

explore_data (no_english_ios_apps ,0,5)
print ("---------------------------------------------------------------------------")
explore_data (english_ios_apps,0,5)

6183
1014
7197
['80', '299853944', '新浪新闻-阅读最新时事热门头条资讯视频', '115143680', 'USD', '0', '2229', '4', '3.5', '1', '6.2.1', '17+', 'News', '37', '0', '1', '1']


['96', '303191318', '同花顺-炒股、股票', '122886144', 'USD', '0', '1744', '0', '3.5', '0', '10.10.46', '4+', 'Finance', '37', '0', '1', '1']


['239', '331259725', '央视影音-海量央视内容高清直播', '54648832', 'USD', '0', '2070', '0', '2.5', '0', '6.2.0', '4+', 'Sports', '37', '0', '1', '1']


['268', '336141475', '优酷视频', '204959744', 'USD', '0', '4885', '0', '3.5', '0', '6.7.0', '12+', 'Entertainment', '38', '0', '2', '1']


['295', '340368403', 'クックパッド - No.1料理レシピ検索アプリ', '76644352', 'USD', '0', '115', '0', '3.5', '0', '17.5.1.0', '4+', 'Food & Drink', '37', '5', '1', '1']


---------------------------------------------------------------------------
['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', 


# Part 3: Remove Paid apps

Remove apps which belong to Paid category

In [23]:
free_android_apps = []
no_free_android_apps = []

#
free_android_apps , no_free_android_apps = free_apps (7, english_android_apps)

print ('---------- Android apps ----------')
print ('Length of Free Android apps: ', len (free_android_apps)) 
print ('Length of Non - Free Android apps: ', len (no_free_android_apps))
print ('Total length: ', len (no_free_android_apps) + len (free_android_apps))
print ("\n")
explore_data (no_free_android_apps ,0,5)
print ("---------------------------------------------------------------------------")
explore_data (free_android_apps,0,5)

---------- Android apps ----------
Length of Free Android apps:  8862
Length of Non - Free Android apps:  752
Total length:  9614


['TurboScan: scan documents and receipts in PDF', 'BUSINESS', '4.7', 11442, '6.8M', '100,000+', 'Paid', '$4.99', 'Everyone', 'Business', 'March 25, 2018', '1.5.2', '4.0 and up']


['Tiny Scanner Pro: PDF Doc Scan', 'BUSINESS', '4.8', 10295, '39M', '100,000+', 'Paid', '$4.99', 'Everyone', 'Business', 'April 11, 2017', '3.4.6', '3.0 and up']


['Puffin Browser Pro', 'COMMUNICATION', '4.0', '18247', 'Varies with device', '100,000+', 'Paid', '$3.99', 'Everyone', 'Communication', 'July 5, 2018', '7.5.3.20547', '4.1 and up']


['Moco+ - Chat, Meet People', 'DATING', '4.2', 1546, 'Varies with device', '10,000+', 'Paid', '$3.99', 'Mature 17+', 'Dating', 'June 19, 2018', '2.6.139', '4.1 and up']


['Calculator', 'DATING', '2.6', 20414, '6.2M', '1,000+', 'Paid', '$6.99', 'Everyone', 'Dating', 'October 25, 2017', '1.1.6', '4.0 and up']


-----------------------------

In [24]:
free_ios_apps = []
no_free_ios_apps = []

free_ios_apps , no_free_ios_apps = free_apps (5, english_ios_apps)

print ('---------- IOS apps ----------')

print ('Length of Free IOS apps: ', len (free_ios_apps)) 
print ('Length of Non - Free IOS apps: ', len (no_free_ios_apps))
print ('Total length: ', len (no_free_ios_apps) + len (free_ios_apps))
print ("\n")
explore_data (no_free_ios_apps ,0,5)
print ("---------------------------------------------------------------------------")
explore_data (free_ios_apps,0,5)

---------- IOS apps ----------
Length of Free IOS apps:  3222
Length of Non - Free IOS apps:  2961
Total length:  6183


['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['6', '283619399', 'Shanghai Mahjong', '10485713', 'USD', '0.99', '8253', '5516', '4', '4', '1.8', '4+', 'Games', '47', '5', '1', '1']


['9', '284666222', 'PCalc - The Best Calculator', '49250304', 'USD', '9.99', '1117', '4', '4.5', '5', '3.6.6', '4+', 'Utilities', '37', '5', '1', '1']


['10', '284736660', 'Ms. PAC-MAN', '70023168', 'USD', '3.99', '7885', '40', '4', '4', '4.0.4', '4+', 'Games', '38', '0', '10', '1']


['11', '284791396', 'Solitaire by MobilityWare', '49618944', 'USD', '4.99', '76720', '4017', '4.5', '4.5', '4.10.1', '4+', 'Games', '38', '4', '11', '1']


---------------------------------------------------------------------------
['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065',


# Part 4: Inspect both data sets and generate frequency tables

Inspect both data sets and identify the columns you could use to generate frequency tables to find out what are the most common genres in each market.

## Frequency table for the Category column of the Google Play data set

This frequency table shows how many free apps exsit in each category

In [25]:
freq_dict_category_android_apps = {}
freq_dict_category_android_apps = freq_dict (1, free_android_apps)
print ("Amout of app's categories: ", len (freq_dict_category_android_apps.keys()))
sorted_freq_dict_category_android_apps = []


for key in freq_dict_category_android_apps:
    tuples = (freq_dict_category_android_apps [key] , key)
    sorted_freq_dict_category_android_apps.append (tuples)
display_sorted_freq_dict_category_android_apps = sorted (sorted_freq_dict_category_android_apps, reverse = True)    

print ('---------- Android apps ----------')
for element in display_sorted_freq_dict_category_android_apps:
    print (element [1], ' : ', element [0])

Amout of app's categories:  33
---------- Android apps ----------
FAMILY  :  1635
GAME  :  875
TOOLS  :  748
BUSINESS  :  407
LIFESTYLE  :  346
PRODUCTIVITY  :  345
FINANCE  :  328
MEDICAL  :  312
SPORTS  :  301
PERSONALIZATION  :  294
COMMUNICATION  :  287
HEALTH_AND_FITNESS  :  273
PHOTOGRAPHY  :  261
NEWS_AND_MAGAZINES  :  248
SOCIAL  :  236
TRAVEL_AND_LOCAL  :  207
SHOPPING  :  199
BOOKS_AND_REFERENCE  :  190
DATING  :  165
VIDEO_PLAYERS  :  158
MAPS_AND_NAVIGATION  :  124
EDUCATION  :  114
FOOD_AND_DRINK  :  110
ENTERTAINMENT  :  100
LIBRARIES_AND_DEMO  :  83
AUTO_AND_VEHICLES  :  82
HOUSE_AND_HOME  :  74
WEATHER  :  71
EVENTS  :  63
ART_AND_DESIGN  :  60
PARENTING  :  58
COMICS  :  55
BEAUTY  :  53


This frequency table shows how many free apps exsit in each category in percentage

In [26]:
persentage_category_freq_table = {}
for key in freq_dict_category_android_apps:
    total = len (free_android_apps)
    persentage_category_freq_table [key] = round(freq_dict_category_android_apps [key] / total * 100, 1)
    
print (len (free_android_apps)) 

sorted_android_category_dict = []
for key in persentage_category_freq_table:
    tuples = (persentage_category_freq_table [key] , key)
    sorted_android_category_dict.append (tuples)
display_sorted_android_category_dict = sorted (sorted_android_category_dict, reverse = True)    

print ('---------- Android apps ----------')
for element in display_sorted_android_category_dict:
    print (element [1], ' : ', element [0], "%")

8862
---------- Android apps ----------
FAMILY  :  18.4 %
GAME  :  9.9 %
TOOLS  :  8.4 %
BUSINESS  :  4.6 %
PRODUCTIVITY  :  3.9 %
LIFESTYLE  :  3.9 %
FINANCE  :  3.7 %
MEDICAL  :  3.5 %
SPORTS  :  3.4 %
PERSONALIZATION  :  3.3 %
COMMUNICATION  :  3.2 %
HEALTH_AND_FITNESS  :  3.1 %
PHOTOGRAPHY  :  2.9 %
NEWS_AND_MAGAZINES  :  2.8 %
SOCIAL  :  2.7 %
TRAVEL_AND_LOCAL  :  2.3 %
SHOPPING  :  2.2 %
BOOKS_AND_REFERENCE  :  2.1 %
DATING  :  1.9 %
VIDEO_PLAYERS  :  1.8 %
MAPS_AND_NAVIGATION  :  1.4 %
EDUCATION  :  1.3 %
FOOD_AND_DRINK  :  1.2 %
ENTERTAINMENT  :  1.1 %
LIBRARIES_AND_DEMO  :  0.9 %
AUTO_AND_VEHICLES  :  0.9 %
WEATHER  :  0.8 %
HOUSE_AND_HOME  :  0.8 %
PARENTING  :  0.7 %
EVENTS  :  0.7 %
ART_AND_DESIGN  :  0.7 %
COMICS  :  0.6 %
BEAUTY  :  0.6 %


Free apps from FAMILY category, examples of app names

In [27]:
for row in free_android_apps:
    if row [1] == 'FAMILY':
        print (row [9])

Casual;Brain Games
Educational;Creativity
Puzzle;Brain Games
Educational;Education
Casual;Brain Games
Educational;Education
Casual;Brain Games
Casual;Brain Games
Casual;Brain Games
Education;Creativity
Educational;Education
Educational;Brain Games
Educational;Pretend Play
Education;Education
Educational;Education
Educational;Pretend Play
Casual;Action & Adventure
Entertainment;Education
Entertainment;Brain Games
Casual;Education
Educational;Education
Casual;Brain Games
Casual;Brain Games
Casual;Pretend Play
Educational;Education
Casual;Creativity
Music;Music & Video
Simulation;Action & Adventure
Educational;Education
Casual;Brain Games
Racing;Action & Adventure
Educational;Education
Casual;Pretend Play
Music;Music & Video
Educational;Education
Entertainment;Music & Video
Education;Education
Educational;Education
Educational;Brain Games
Educational;Pretend Play
Arcade;Pretend Play
Educational;Education
Action;Action & Adventure
Casual;Action & Adventure
Education;Education
Puzzle;Brain 

Education
Entertainment
Educational
Entertainment
Education
Casual
Strategy
Casual
Role Playing
Puzzle;Brain Games
Casual
Arcade;Action & Adventure
Entertainment
Casual
Simulation
Entertainment
Education
Entertainment
Education;Education
Education
Educational
Education
Simulation
Entertainment
Entertainment
Education
Entertainment
Entertainment
Entertainment
Strategy
Education
Simulation
Education
Simulation
Education
Entertainment
Simulation
Entertainment
Entertainment
Education
Education
Education
Simulation
Education
Simulation
Education
Simulation
Education
Strategy
Education
Education
Educational
Education
Education
Puzzle
Education
Education
Educational
Education
Education
Trivia;Education
Education
Puzzle
Education
Puzzle
Casual
Entertainment
Strategy
Education
Education
Education
Education
Education
Education
Simulation
Simulation
Simulation
Simulation
Casual;Pretend Play
Education
Entertainment
Entertainment
Entertainment;Creativity
Entertainment
Entertainment
Casual;Creativit

Casual
Role Playing
Role Playing
Education
Entertainment
Entertainment
Casual
Role Playing
Education
Entertainment
Entertainment
Entertainment
Education
Puzzle
Casual
Entertainment
Entertainment
Communication;Creativity
Casual
Education
Entertainment
Entertainment
Entertainment
Entertainment
Simulation
Entertainment
Entertainment
Simulation
Entertainment
Entertainment
Entertainment
Entertainment
Simulation
Education
Education
Education
Entertainment
Education
Education
Education
Entertainment
Entertainment
Education
Education
Puzzle
Entertainment
Entertainment
Entertainment
Strategy
Education
Education
Entertainment
Simulation
Simulation
Entertainment;Pretend Play
Simulation
Entertainment
Educational;Education
Puzzle
Puzzle
Puzzle
Puzzle
Education
Education
Puzzle
Education
Education
Puzzle
Puzzle
Simulation
Education
Casual
Education
Casual
Puzzle
Simulation
Art & Design;Creativity
Simulation
Education
Strategy
Entertainment
Simulation
Simulation
Simulation
Simulation
Simulation
Casua

## Frequency table for the Genres column of the Google Play data set

Number of apps in each ganre

In [28]:
freq_dict_genre_android_apps = {}
freq_dict_genre_android_apps = freq_dict (9, free_android_apps)
print ("Amount of genres:  ", len (freq_dict_genre_android_apps.keys()))


sorted_android_genre_dict = []
for key in freq_dict_genre_android_apps:
    tuples = (freq_dict_genre_android_apps [key] , key)
    sorted_android_genre_dict.append (tuples)
display_sorted_android_genre_dict = sorted (sorted_android_genre_dict, reverse = True)    

print ('---------- Android apps ----------')
for element in display_sorted_android_genre_dict:
    print (element [1], ' : ', element [0])

Amount of genres:   114
---------- Android apps ----------
Tools  :  747
Entertainment  :  538
Education  :  474
Business  :  407
Productivity  :  345
Lifestyle  :  345
Finance  :  328
Medical  :  312
Sports  :  307
Personalization  :  294
Communication  :  287
Action  :  275
Health & Fitness  :  273
Photography  :  261
News & Magazines  :  248
Social  :  236
Travel & Local  :  206
Shopping  :  199
Books & Reference  :  190
Simulation  :  181
Dating  :  165
Arcade  :  164
Video Players & Editors  :  157
Casual  :  156
Maps & Navigation  :  124
Food & Drink  :  110
Puzzle  :  100
Racing  :  88
Role Playing  :  83
Libraries & Demo  :  83
Auto & Vehicles  :  82
Strategy  :  81
House & Home  :  74
Weather  :  71
Events  :  63
Adventure  :  60
Comics  :  54
Beauty  :  53
Art & Design  :  53
Parenting  :  44
Card  :  40
Casino  :  38
Trivia  :  37
Educational;Education  :  35
Educational  :  33
Board  :  33
Education;Education  :  30
Word  :  23
Casual;Pretend Play  :  21
Music  :  18
Racing

## Frequency table for the prime_genre column of the App Store data set

In [29]:
freq_dict_genre_ios_apps = {}
freq_dict_genre_ios_apps = freq_dict (12, free_ios_apps)
print ("Amount of genres: ", len (freq_dict_genre_ios_apps.keys()))

sorted_freq_dict_genre_ios_apps = []
for key in freq_dict_genre_ios_apps:
    tuples = (freq_dict_genre_ios_apps [key] , key)
    sorted_freq_dict_genre_ios_apps.append (tuples)
display_sorted_freq_dict_genre_ios_apps = sorted (sorted_freq_dict_genre_ios_apps, reverse = True)    

print ('---------- Android apps ----------')
for element in display_sorted_freq_dict_genre_ios_apps:
    print (element [1], ' : ', element [0])

Amount of genres:  23
---------- Android apps ----------
Games  :  1874
Entertainment  :  254
Photo & Video  :  160
Education  :  118
Social Networking  :  106
Shopping  :  84
Utilities  :  81
Sports  :  69
Music  :  66
Health & Fitness  :  65
Productivity  :  56
Lifestyle  :  51
News  :  43
Travel  :  40
Finance  :  36
Weather  :  28
Food & Drink  :  26
Reference  :  18
Business  :  17
Book  :  14
Navigation  :  6
Medical  :  6
Catalogs  :  4


Display the percentages in a descending order

In [30]:
persentage_prime_genre_freq_table = {}
for key in freq_dict_genre_ios_apps:
    total = len (free_ios_apps)
    persentage_prime_genre_freq_table [key] = round( freq_dict_genre_ios_apps [key] / total * 100, 1)

# Apple
sorted_ios_prime_genre_dict = []
for key in persentage_prime_genre_freq_table:
    tuples = (round(persentage_prime_genre_freq_table [key],1) , key)
    sorted_ios_prime_genre_dict.append (tuples)
display_sorted_ios_prime_genre_dict = sorted (sorted_ios_prime_genre_dict, reverse = True)    
display_sorted_ios_prime_genre_dict

print ('---------- Ios apps ----------')
for element in display_sorted_ios_prime_genre_dict:
    print (element [1], ' : ', element [0], "%")    

---------- Ios apps ----------
Games  :  58.2 %
Entertainment  :  7.9 %
Photo & Video  :  5.0 %
Education  :  3.7 %
Social Networking  :  3.3 %
Shopping  :  2.6 %
Utilities  :  2.5 %
Sports  :  2.1 %
Music  :  2.0 %
Health & Fitness  :  2.0 %
Productivity  :  1.7 %
Lifestyle  :  1.6 %
News  :  1.3 %
Travel  :  1.2 %
Finance  :  1.1 %
Weather  :  0.9 %
Food & Drink  :  0.8 %
Reference  :  0.6 %
Business  :  0.5 %
Book  :  0.4 %
Navigation  :  0.2 %
Medical  :  0.2 %
Catalogs  :  0.1 %
