# Guided project: Apps Store database.

Our aim is to help our developers understand what type of apps are likely to attract more users on Google Play and the App Store. 

Collecting data for over 4 million apps requires a significant amount of time and money, so we'll try to analyze a sample of the data instead. To avoid spending resources on collecting new data ourselves, we should first try to see if we can find any relevant existing data at no cost. Luckily, here are two data sets that seem suitable for our goals:

- [A dataset](https://www.kaggle.com/lava18/google-play-store-apps) containing data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018. You can download the data set directly from [this link](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv).
- [A dataset](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) containing data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017. You can download the data set directly from [this link](https://dq-content.s3.amazonaws.com/350/AppleStore.csv).

## Explore datasets

In [1]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [2]:
# Due to some setup constraints,
# reading the csv file directly from url instead of downloading the csv file
from urllib.request import urlopen
from io import StringIO
import csv

In [3]:
android_link = "https://dq-content.s3.amazonaws.com/350/googleplaystore.csv"
ios_link = "https://dq-content.s3.amazonaws.com/350/AppleStore.csv"

In [4]:
android_data = urlopen(android_link).read().decode('utf8','ignore')
andoid_file = StringIO(android_data)
# turning the result into a list 
# to be able to use the explore_data function provided
android_csv = list(csv.reader(andoid_file))

In [5]:
ios_data = urlopen(ios_link).read().decode('utf8','ignore')
ios_file = StringIO(ios_data)
# turning the result into a list 
# to be able to use the explore_data function provided
ios_csv = list(csv.reader(ios_file))


In [6]:
print(type(android_csv))
print(android_csv[0])
print('\n')
print(android_csv[1])
print('\n')
    
print(type(ios_csv))
print(ios_csv[0])
print('\n')
print(ios_csv[1])
print('\n')
    

<class 'list'>
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


<class 'list'>
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']




In [7]:
explore_data(android_csv, 0, 5, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10842
Number of columns: 13


In [8]:
explore_data(ios_csv, 0, 5, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7198
Number of columns: 16


## Data cleaning

In the previous step, we opened the two data sets and explored the data. Before beginning our analysis, we need to make sure the data we analyze is accurate, or the results of our analysis will be wrong. This means that we need to do the following:

- Detect inaccurate data, and correct or remove it.
- Detect duplicate data, and remove the duplicates.

Recall that at our company, we only build apps that are free to download and install, and we design them for an English-speaking audience. This means that we'll need to do the following:

- Remove non-English apps like 爱奇艺PPS -《欢乐颂2》电视剧热播.
- Remove apps that aren't free.

### Detect and remove entrie with an error

First, there is a row having an error determined from the classeroom discussion.

In [9]:
explore_data(android_csv, 10473 , 10475 , True)

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


Number of rows: 10842
Number of columns: 13


In [10]:
for i in range(10472,10475):
    print(i , android_csv[i][0], len(android_csv[i]))

10472 Xposed Wi-Fi-Pwd 13
10473 Life Made WI-Fi Touchscreen Photo Frame 12
10474 osmino Wi-Fi: free WiFi 13


The "Life Made WI-Fi Touchscreen Photo Frame" recorded data are missing some information. We should delete this row.

In [11]:
del android_csv[10473]

In [12]:
for i in range(10472,10475):
    print(i , android_csv[i][0], len(android_csv[i]))

10472 Xposed Wi-Fi-Pwd 13
10473 osmino Wi-Fi: free WiFi 13
10474 Sat-Fi Voice 13


It's done.

### Detect and remove duplicates

In [13]:
for app in android_csv:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


We can use this information to build a criterion for removing the duplicates. The higher the number of reviews, the more recent the data should be. Rather than removing duplicates randomly, we'll only keep the row with the highest number of reviews and remove the other entries for any given app.

Another criterion to remove duplicates can be the number of installs. But we'll stick with the number of reviews.

In total, there are 1,181 cases where an app occurs more than once.

In [14]:
len(android_csv)

10841

In [15]:
duplicate_apps = []
unique_apps = []
for app in android_csv:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps: ', len(duplicate_apps))      
print('Percentage of duplicate apps: ',f"{len(duplicate_apps)/len(android_csv):.0%}")
print('\n')
print('Examples of duplicate apps: ', duplicate_apps[:15])       
            

Number of duplicate apps:  1181
Percentage of duplicate apps:  11%


Examples of duplicate apps:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


To remove the duplicates, we will do the following:

- Create a dictionary, where each dictionary key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.
- Use the information stored in the dictionary and create a new dataset, which will have only one entry per app (and for each app, we'll only select the entry with the highest number of reviews).

In [16]:
android_csv[0]

['App',
 'Category',
 'Rating',
 'Reviews',
 'Size',
 'Installs',
 'Type',
 'Price',
 'Content Rating',
 'Genres',
 'Last Updated',
 'Current Ver',
 'Android Ver']

In [17]:
android_csv[0].index("Reviews")

3

In [18]:
reviews_max = {}

for app in android_csv[1:]:
    name, n_reviews = app[0],float(app[3]) 
    if name in reviews_max:
        reviews_max[name] = max(n_reviews, reviews_max[name]) 
    else:
        reviews_max[name] = n_reviews
            
    

Let's check wih Instagram.

In [19]:
instagram_review_entries = []
for app in android_csv[1:]:
    name = app[0]    
    if name == 'Instagram':
        print(app[0],app[3])
        instagram_review_entries.append(app[3])
        
print('Highest Intagram Entry review: ', max(instagram_review_entries))        
        

Instagram 66577313
Instagram 66577446
Instagram 66577313
Instagram 66509917
Highest Intagram Entry review:  66577446


In [25]:
print(reviews_max['Instagram'])

66577446.0


In [24]:
len(reviews_max)

9659

In [62]:
android_clean = []
# we use already_added to deal with apps having multiple entries
# with the highest number of ratings
already_added = []
i=0
for row in android_csv[1:]:
    
    name, review = row[0], float(row[3])
    
    if (reviews_max[name] == review) and (name not in already_added):
        android_clean.append(row)
        already_added.append(name)

    

The without duplicates android_clean dataset should have 9,659 rows. 

In [64]:
len(android_clean)

9659

### Determine the English-speaking perimeter

We use English for the apps we develop at our company, and we'd like to analyze only the apps that are designed for an English-speaking audience. 

 One way to do this is to remove each app with a name containing a symbol that isn't commonly used in English text — English text usually includes letters from the English alphabet, numbers composed of digits from 0 to 9, punctuation marks (., !, ?, ;), and other symbols (+, *, /).

Each character we use in a string has a corresponding number associated with it. For instance, the corresponding number for character 'a' is 97, character 'A' is 65, and character '爱' is 29,233. We can get the corresponding number of each character using the ord() built-in function.

In [70]:
def is_english_string(string):
    for character in string:
        if ord(character) > 127:
            return False
    return True
        

In [72]:
test_list = ['Instagram','爱奇艺PPS -《欢乐颂2》电视剧热播',
             'Docs To Go™ Free Office Suite','Instachat 😜']

for name in test_list:
    print(name,is_english_string(name))
    #print('\n')

Instagram True
爱奇艺PPS -《欢乐颂2》电视剧热播 False
Docs To Go™ Free Office Suite False
Instachat 😜 False


 we wrote a function that detects non-English app names, but we saw that the function couldn't correctly identify certain English app names like 'Docs To Go™ Free Office Suite' and 'Instachat 😜'. This is because emojis and characters like ™ fall outside the ASCII range and have corresponding numbers over 127.

If we're going to use the function we've created, we'll lose useful data since many English apps will be incorrectly labeled as non-English. To minimize the impact of data loss, we'll only remove an app if its name has more than three characters with corresponding numbers falling outside the ASCII range. This means all English apps with up to three emoji or other special characters will still be labeled as English. Our filter function is still not perfect, but it should be fairly effective.

In [74]:
def is_english_string(string):
    non_english_characters = []
    for character in string:
        if ord(character) > 127:
            non_english_characters.append(character)
            if len(non_english_characters) > 3:
                return False
    return True

In [77]:
test_list=['Docs To Go™ Free Office Suite','Instachat 😜',
           '爱奇艺PPS -《欢乐颂2》电视剧热播']
for name in test_list:
    print(name,is_english_string(name))
    #print('\n')

Docs To Go™ Free Office Suite True
Instachat 😜 True
爱奇艺PPS -《欢乐颂2》电视剧热播 False


In [88]:
def get_english_apps(apps, name_index):
    english_apps = []
    for row in apps:
        if is_english_string(row[name_index]):
            english_apps.append(row)
    return english_apps


In [89]:
android_english = get_english_apps(android_clean,0)
ios_english = get_english_apps(ios_csv[1:],1)


In [90]:
print(len(android_clean),len(android_english))

9659 9614


In [91]:
print(len(ios_csv[1:]),len(ios_english))

7197 6183


### Isolating the free apps

So far in the data cleaning process, we've done the following:

- Removed inaccurate data
- Removed duplicate app entries
- Removed non-English apps

As we mentioned in the introduction, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. Our datasets contain both free and non-free apps; we'll need to isolate only the free apps for our analysis.

In [94]:
# the price is in the 7th comumn for android apps
android_csv[0][7]

'Price'

In [128]:
# display all price values
sorted(set([row[7] for row in android_english]))

['$0.99',
 '$1.00',
 '$1.04',
 '$1.20',
 '$1.26',
 '$1.29',
 '$1.49',
 '$1.50',
 '$1.59',
 '$1.61',
 '$1.70',
 '$1.75',
 '$1.76',
 '$1.96',
 '$1.97',
 '$1.99',
 '$10.00',
 '$10.99',
 '$109.99',
 '$11.99',
 '$12.99',
 '$13.99',
 '$14.00',
 '$14.99',
 '$15.46',
 '$15.99',
 '$154.99',
 '$16.99',
 '$17.99',
 '$18.99',
 '$19.40',
 '$19.90',
 '$19.99',
 '$2.00',
 '$2.49',
 '$2.50',
 '$2.56',
 '$2.59',
 '$2.60',
 '$2.90',
 '$2.95',
 '$2.99',
 '$200.00',
 '$24.99',
 '$25.99',
 '$28.99',
 '$29.99',
 '$299.99',
 '$3.02',
 '$3.04',
 '$3.08',
 '$3.28',
 '$3.49',
 '$3.61',
 '$3.88',
 '$3.90',
 '$3.95',
 '$3.99',
 '$30.99',
 '$33.99',
 '$37.99',
 '$379.99',
 '$389.99',
 '$39.99',
 '$394.99',
 '$399.99',
 '$4.29',
 '$4.49',
 '$4.59',
 '$4.60',
 '$4.77',
 '$4.80',
 '$4.84',
 '$4.85',
 '$4.99',
 '$400.00',
 '$46.99',
 '$5.00',
 '$5.49',
 '$5.99',
 '$6.49',
 '$6.99',
 '$7.49',
 '$7.99',
 '$74.99',
 '$79.99',
 '$8.49',
 '$8.99',
 '$89.99',
 '$9.00',
 '$9.99',
 '0']

Free android apps have as price the string '0'.

Let's do the same for ios apps.

In [115]:
ios_csv[0][4]

'price'

In [129]:
# display all price values
sorted(set([row[4] for row in ios_csv[1:]]))

['0.0',
 '0.99',
 '1.99',
 '11.99',
 '12.99',
 '13.99',
 '14.99',
 '15.99',
 '16.99',
 '17.99',
 '18.99',
 '19.99',
 '2.99',
 '20.99',
 '21.99',
 '22.99',
 '23.99',
 '24.99',
 '249.99',
 '27.99',
 '29.99',
 '299.99',
 '3.99',
 '34.99',
 '39.99',
 '4.99',
 '47.99',
 '49.99',
 '5.99',
 '59.99',
 '6.99',
 '7.99',
 '74.99',
 '8.99',
 '9.99',
 '99.99']

In [None]:
Free android apps have as price the string '0.0'.

In [None]:
We are now ready to get our free apps.

In [135]:
def get_free_apps(apps, price_index, free_price):
    free_apps = []    
    for row in apps:
        price = row[price_index]
        if price == free_price:
            free_apps.append(row)
    return free_apps

In [136]:
android_english_free = get_free_apps(android_english,7,'0')
ios_english_free = get_free_apps(ios_english,4,'0.0')

In [137]:
print(len(android_english),len(android_english_free))

9614 8864


In [139]:
print(len(ios_english),len(ios_english_free))

6183 3222


## Determine the kinds of apps that are likely to attract more users.

So far, we've spent a good amount of time cleaning data, including the following:

- Removing inaccurate data
- Removing duplicate app entries
- Removing non-English apps
- Isolating the free apps

As we mentioned in the introduction, our goal is to determine the kinds of apps that are likely to attract more users because the number of people using our apps affect our revenue.

To minimize risks and overhead, our validation strategy for an app idea has three steps:

1- Build a minimal Android version of the app, and add it to Google Play.
2- If the app has a good response from users, we develop it further.
3- If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful in both markets. For instance, a profile that works well for both markets might be a productivity app that makes use of gamification.

Let's begin the analysis by determining the most common genres for each market. For this, we'll need to build frequency tables for a few columns in our datasets.

In [141]:
android_headers = android_csv[0]
print(android_headers)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [143]:
android_english_free[1]

['U Launcher Lite – FREE Live Cool Themes, Hide Apps',
 'ART_AND_DESIGN',
 '4.7',
 '87510',
 '8.7M',
 '5,000,000+',
 'Free',
 '0',
 'Everyone',
 'Art & Design',
 'August 1, 2018',
 '1.2.4',
 '4.0.3 and up']

In [142]:
ios_headers = ios_csv[0]
print(ios_headers)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In [144]:
ios_english_free[1]

['389801252',
 'Instagram',
 '113954816',
 'USD',
 '0.0',
 '2161558',
 '1289',
 '4.5',
 '4.0',
 '10.23',
 '12+',
 'Photo & Video',
 '37',
 '0',
 '29',
 '1']

We'll use the columns 'Genres' and 'Category' for android apps and 'prime_genre' for ios apps  to generate frequency tables to determine the most common genres in each market.

In [183]:
android_index_genre = android_headers.index('Genres')
android_index_genre

9

In [182]:
android_index_category = android_headers.index('Category')
android_index_category

1

In [156]:
ios_index_genre = ios_headers.index('prime_genre')
ios_index_genre

11

In [184]:
# display all android category values
sorted(set([row[android_index_category] for row in android_english_free]))

['ART_AND_DESIGN',
 'AUTO_AND_VEHICLES',
 'BEAUTY',
 'BOOKS_AND_REFERENCE',
 'BUSINESS',
 'COMICS',
 'COMMUNICATION',
 'DATING',
 'EDUCATION',
 'ENTERTAINMENT',
 'EVENTS',
 'FAMILY',
 'FINANCE',
 'FOOD_AND_DRINK',
 'GAME',
 'HEALTH_AND_FITNESS',
 'HOUSE_AND_HOME',
 'LIBRARIES_AND_DEMO',
 'LIFESTYLE',
 'MAPS_AND_NAVIGATION',
 'MEDICAL',
 'NEWS_AND_MAGAZINES',
 'PARENTING',
 'PERSONALIZATION',
 'PHOTOGRAPHY',
 'PRODUCTIVITY',
 'SHOPPING',
 'SOCIAL',
 'SPORTS',
 'TOOLS',
 'TRAVEL_AND_LOCAL',
 'VIDEO_PLAYERS',
 'WEATHER']

In [154]:
# display all android genre values
sorted(set([row[android_index_genre] for row in android_english_free]))

['Action',
 'Action;Action & Adventure',
 'Adventure',
 'Adventure;Action & Adventure',
 'Adventure;Education',
 'Arcade',
 'Arcade;Action & Adventure',
 'Arcade;Pretend Play',
 'Art & Design',
 'Art & Design;Action & Adventure',
 'Art & Design;Creativity',
 'Art & Design;Pretend Play',
 'Auto & Vehicles',
 'Beauty',
 'Board',
 'Board;Action & Adventure',
 'Board;Brain Games',
 'Books & Reference',
 'Books & Reference;Education',
 'Business',
 'Card',
 'Card;Action & Adventure',
 'Casino',
 'Casual',
 'Casual;Action & Adventure',
 'Casual;Brain Games',
 'Casual;Creativity',
 'Casual;Education',
 'Casual;Music & Video',
 'Casual;Pretend Play',
 'Comics',
 'Comics;Creativity',
 'Communication',
 'Communication;Creativity',
 'Dating',
 'Education',
 'Education;Action & Adventure',
 'Education;Brain Games',
 'Education;Creativity',
 'Education;Education',
 'Education;Music & Video',
 'Education;Pretend Play',
 'Educational',
 'Educational;Action & Adventure',
 'Educational;Brain Games',
 '

In [157]:
# display all ios genre values
sorted(set([row[ios_index_genre] for row in ios_english_free]))

['Book',
 'Business',
 'Catalogs',
 'Education',
 'Entertainment',
 'Finance',
 'Food & Drink',
 'Games',
 'Health & Fitness',
 'Lifestyle',
 'Medical',
 'Music',
 'Navigation',
 'News',
 'Photo & Video',
 'Productivity',
 'Reference',
 'Shopping',
 'Social Networking',
 'Sports',
 'Travel',
 'Utilities',
 'Weather']

In [174]:
def freq_table(dataset, index):
    frequency_table = {}
    for row in dataset:
        a_data_point = row[index]
        if a_data_point in frequency_table:
            frequency_table[a_data_point] += 1
        else:        
            frequency_table[a_data_point] = 1   
    
    return {k: v / sum(frequency_table.values()) for k, v in frequency_table.items()}

In [178]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':',f"{entry[0]:.0%}")

In [185]:
display_table(android_english_free, android_index_category)

FAMILY : 19%
GAME : 10%
TOOLS : 8%
BUSINESS : 5%
LIFESTYLE : 4%
PRODUCTIVITY : 4%
FINANCE : 4%
MEDICAL : 4%
SPORTS : 3%
PERSONALIZATION : 3%
COMMUNICATION : 3%
HEALTH_AND_FITNESS : 3%
PHOTOGRAPHY : 3%
NEWS_AND_MAGAZINES : 3%
SOCIAL : 3%
TRAVEL_AND_LOCAL : 2%
SHOPPING : 2%
BOOKS_AND_REFERENCE : 2%
DATING : 2%
VIDEO_PLAYERS : 2%
MAPS_AND_NAVIGATION : 1%
FOOD_AND_DRINK : 1%
EDUCATION : 1%
ENTERTAINMENT : 1%
LIBRARIES_AND_DEMO : 1%
AUTO_AND_VEHICLES : 1%
HOUSE_AND_HOME : 1%
WEATHER : 1%
EVENTS : 1%
PARENTING : 1%
ART_AND_DESIGN : 1%
COMICS : 1%
BEAUTY : 1%


In [179]:
display_table(android_english_free, android_index_genre)

Tools : 8%
Entertainment : 6%
Education : 5%
Business : 5%
Productivity : 4%
Lifestyle : 4%
Finance : 4%
Medical : 4%
Sports : 3%
Personalization : 3%
Communication : 3%
Action : 3%
Health & Fitness : 3%
Photography : 3%
News & Magazines : 3%
Social : 3%
Travel & Local : 2%
Shopping : 2%
Books & Reference : 2%
Simulation : 2%
Dating : 2%
Arcade : 2%
Video Players & Editors : 2%
Casual : 2%
Maps & Navigation : 1%
Food & Drink : 1%
Puzzle : 1%
Racing : 1%
Role Playing : 1%
Libraries & Demo : 1%
Auto & Vehicles : 1%
Strategy : 1%
House & Home : 1%
Weather : 1%
Events : 1%
Adventure : 1%
Comics : 1%
Beauty : 1%
Art & Design : 1%
Parenting : 0%
Card : 0%
Casino : 0%
Trivia : 0%
Educational;Education : 0%
Board : 0%
Educational : 0%
Education;Education : 0%
Word : 0%
Casual;Pretend Play : 0%
Music : 0%
Racing;Action & Adventure : 0%
Puzzle;Brain Games : 0%
Entertainment;Music & Video : 0%
Casual;Brain Games : 0%
Casual;Action & Adventure : 0%
Arcade;Action & Adventure : 0%
Action;Action & Ad

In [180]:
display_table(ios_english_free, ios_index_genre)

Games : 58%
Entertainment : 8%
Photo & Video : 5%
Education : 4%
Social Networking : 3%
Shopping : 3%
Utilities : 3%
Sports : 2%
Music : 2%
Health & Fitness : 2%
Productivity : 2%
Lifestyle : 2%
News : 1%
Travel : 1%
Finance : 1%
Weather : 1%
Food & Drink : 1%
Reference : 1%
Business : 1%
Book : 0%
Navigation : 0%
Medical : 0%
Catalogs : 0%
