# Profitability potential of free iOS and Android Mobile Apps

The objective of this project is to create mobile app profiles for the Apple App Store and Google Play Store.

We want to enable app developpers to make data-driven decisions with respect to the kind of apps they should focus on, based on which types of apps are likely to attacted more users.

*As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.*

Collecting data ourselves for all these apps is not feasible within our time and budget constraints. However, we've identified two suitable data sets for our goal:

* [A data set](https://www.kaggle.com/lava18/google-play-store-apps/home) collected in August 2018, containing data about approximately ten thousand Android apps from Google Play. 

* [A data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home) collected in July 2017, containing data about approximately seven thousand iOS apps from the App Store.

In [205]:
# Let's open the respective app stores
import csv

# Opening and reading the data sets

#iOS
with open('Data/AppleStore_clean.csv', 'r') as ios:
    ios_read = csv.reader(ios, delimiter=",")
    ios_header = next(ios_read)
    ios_apps = list(ios_read)

#Google
with open('Data/googleplaystore.csv', 'r') as google:
    google_read = csv.reader(google, delimiter=",")
    google_header = next(google_read)
    google_apps = list(google_read)

In [206]:
# To make it easier to read, we'll use the following function:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [207]:
# For example, let's open the header and the first 4 rows for each app store

# iOS
print(ios_header)
print('\n')
explore_data(ios_apps, 0, 4, True)

print('\n')

# Google
print(google_header)
print('\n')
explore_data(google_apps, 0, 4, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic', 'game_enab']


['281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1', '0']


['281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1', '0']


['281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1', '0']


['282614216', 'eBay: Best App to Buy, Sell, Save! Online Shopping', '128512000', 'USD', '0', '262241', '649', '4', '4.5', '5.10.0', '12+', 'Shopping', '37', '5', '9', '1', '0']


Number of rows: 7197
Number of columns: 17


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs',

#### Summary

The iOS App Store data set has 11100 apps and 17 columns. 

The columns of interest are: 
'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating and 'prime_genre'. 

*Note: Not all column names are self-explanatory in this case, but details about each column can be found in the data set [documentation](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home).*


The Google Play data set has 10841 apps and 13 columns. 

The columns of interest are:
'App', 'Category', 'Reviews', 'Installs', 'Type', 'Price', and 'Genres'.



## Data Cleaning

We're only interested in free apps -- remove all non-free apps from both data sets.

The target market is english speakers -- remove all non-english apps from both data sets.

We want to remove/correct inaccurate data and remove duplicate entries.

### 1. Deleting wrong data

*Note: The Google Play data set has a discussion section, which outlines an error for row 10472.*

*Let's print that and the next row and compare them against the header.*

In [208]:
print(google_header)
print('\n')
explore_data(google_apps, 10472, 10474)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']




In [209]:
problem_line = dict(zip(google_header, google_apps[10472])) # This line will be correct after deleting the problematic entries
correct_line = dict(zip(google_header, google_apps[10473]))
print(problem_line)
print('\n')
print(correct_line)

{'App': 'Life Made WI-Fi Touchscreen Photo Frame', 'Category': '1.9', 'Rating': '19', 'Reviews': '3.0M', 'Size': '1,000+', 'Installs': 'Free', 'Type': '0', 'Price': 'Everyone', 'Content Rating': '', 'Genres': 'February 11, 2018', 'Last Updated': '1.0.19', 'Current Ver': '4.0 and up'}


{'App': 'osmino Wi-Fi: free WiFi', 'Category': 'TOOLS', 'Rating': '4.2', 'Reviews': '134203', 'Size': '4.1M', 'Installs': '10,000,000+', 'Type': 'Free', 'Price': '0', 'Content Rating': 'Everyone', 'Genres': 'Tools', 'Last Updated': 'August 7, 2018', 'Current Ver': '6.06.14', 'Android Ver': '4.4 and up'}


Row 10472 corresponds to the app "Life Made WI-Fi Touchscreen Photo Frame"
We can see several problems:
1. The category is missing (Assigned valus is 1.9)
2. The maximum rating for a Google Play app is 5
3. "Installs" number is assigned a "Free" label
4. "Price" is set to "Everyone"
5. "Content Rating" is empty

For all these reasons, we'll delete this apps from the list.

In [210]:
# To make sure we deleted the row, we'll look at row 10472 before and after
len(google_apps)
print(google_apps[10472])
#del(google_apps[10472]) # IMPORTANT: RUN THIS ONLY ONCE!
#print(google_apps[10472])
#len(google_apps)

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


### 2. Deleting duplicate entries

Let's see if we can find duplicate app entries in each data set

#### Google Play Store


In [253]:
# Let's start with the Google App Store
duplicate_android_apps = []
unique_android_apps = []

for app in google_apps:
    name = app[0]
    if name in unique_android_apps:
        duplicate_android_apps.append(name)
    else:
        unique_android_apps.append(name)

print('Number of duplicate Android apps:', len(duplicate_android_apps))
print('\n')
print('Examples of duplicate Android apps:', duplicate_android_apps[:20])

Number of duplicate Android apps: 1181


Examples of duplicate Android apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software', 'MailChimp - Email, Marketing Automation', 'Crew - Free Messaging and Scheduling', 'Asana: organize team projects', 'Google Analytics', 'AdWords Express']


Since we want to keep only one copy of each app, an important question to address is how we determine which duplicate entry to remove.

We could do is remove the duplicate rows randomly, but there may be a better method.

Let's take a closer look at the duplicates.

In [254]:
# For example, let's look at "Slack" in the Google Play Store
for app in google_apps:
    name = app[0]
    if name == "Slack":
        print(app)

['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51510', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']


If you examine the rows we printed for the Slack app, the main difference happens on the fourth position of each row, which corresponds to the number of reviews. The different numbers show the data was collected at different times.

We'll use as the criterion for keeping rows -- we'll keep the rows that have the highest number of reviews because the higher the number of reviews, the more reliable the ratings.

To do that, we will:

Create a dictionary where each key is a unique app name, and the value is the highest number of reviews of that app
Use the dictionary to create a new data set, which will have only one entry per app (and we only select the apps with the highest number of reviews)

In [275]:
# Let's build the dictionary
max_reviews = {}

for app in google_apps:
    n_reviews = app[3]
    name = app[0]
    
    if name in max_reviews and max_reviews[name] < n_reviews:
        max_reviews[name] = n_reviews
        
    elif name not in max_reviews:
        max_reviews[name] = n_reviews

print(len(max_reviews))


9660


We previously found that there are 1,181 instances where an app shows more than once. 
Therefore the length of our dictionary (unique apps) should be equal to the difference between the length of our data set and 1,181.

In [276]:
print('Expected length:', len(google_apps) - 1181)
print('Actual length:', len(max_reviews))

Expected length: 9660
Actual length: 9660


We can now use the max_reviews dictionary to remove duplicate apps, which will allow us to keep entries with the highest number of reviews. 

In the code cell below:

Initialize two empty lists, android_clean and already_added.

* Loop through the android data set, and for every iteration:
    *Isolate the name of the app and the number of reviews.
    * Add the current row (app) to the android_clean list, and the app name (name) to the already_cleaned list if:
        * The number of reviews of the current app matches the number of reviews of that app as described in the reviews_max dictionary; and
        * The name of the app is not already in the already_added list. 
        We need to add this supplementary condition to account for those cases where the highest number of reviews of a duplicate app is the same for more than one entry (for example, the Box app has three entries, and the number of reviews is the same). If we just check for reviews_max[name] == n_reviews, we'll still end up with duplicate entries for some apps.


In [277]:
google_apps_clean = []
already_added = []

for app in google_apps:
    n_reviews = app[3]
    name = app[0]
    
    if (max_reviews[name] == n_reviews) and (name not in already_added):
        google_apps_clean.append(app)
        already_added.append(name) # make sure this is inside the if block

# Let's eplore the new data set, and confirm that the number of rows is 9,660
explore_data(google_apps_clean, 0, 4, True)


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 9660
Number of columns: 13


#### Apple App Store

In [398]:
# Let's now look at the Apple App Store
duplicate_ios_apps = []
unique_ios_apps = []

for app in ios_apps:
    name = app[1]
    if name in unique_ios_apps:
        duplicate_ios_apps.append(name)
    else:
        unique_ios_apps.append(name)

print('Number of duplicate iOS apps:', len(duplicate_ios_apps))
print('\n')
print('Examples of duplicate iOS apps:', duplicate_ios_apps[:5])

for app in ios_apps:
    name = app[1]
    if name == 'NA':
        print(app)

Number of duplicate iOS apps: 1


Examples of duplicate iOS apps: ['Mannequin Challenge']


**Note: We had additional issues in the iOS dataset**
    * We had multiple entries (~3000) with no available data (i.e. NA)
    * We deleted these entries in the original csv file and save as AppleStore_clean.
    
#import csv
#with open('AppleStore.csv', 'r') as inp, open('AppleStore_clean.csv', 'w') as out:
    #writer = csv.writer(out)
    #for row in csv.reader(inp):
        #if row[1] != "NA":
            #writer.writerow(row)
            
    * We're using AppleStore_clean.csv for the analysis


In [558]:
# Let's also look at "VR Roller Coaster" and "Mannequin Challenge" in the Apple App Store
for app in ios_apps:
    name = app[1]
    if name == "VR Roller Coaster" or name == "Mannequin Challenge":
        print(app)

['952877179', 'VR Roller Coaster', '169523200', 'USD', '0', '107', '102', '3.5', '3.5', '2.0.0', '4+', 'Games', '37', '5', '1', '1', '0']
['1173990889', 'Mannequin Challenge', '109705216', 'USD', '0', '668', '87', '3', '3', '1.4', '9+', 'Games', '37', '4', '1', '1', '0']
['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0', '105', '58', '4', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1', '0']


If you examine the rows we printed for the VR Roller Coaster and Mannequin Challenge apps, one of the main differences is in tenth position of each row, which corresponds to version number. The different numbers show the data was collected for different app versions.

We'll keep the rows for the latest app version.

Since there are only 2 duplicates, we'll delete the corresponding rows.

In [561]:
# Let's find the index for the duplicate apps
    # These apps will not be founf after running the code below
print(ios_apps.index(['1173990889', 'Mannequin Challenge', '109705216', 'USD', '0', '668', '87', '3', '3', '1.4', '9+', 'Games', '37', '4', '1', '1', '0']))
print(ios_apps.index(['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0', '105', '58', '4', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1', '0']))

7091


ValueError: ['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0', '105', '58', '4', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1', '0'] is not in list

We'll delete rows 5603 for VR Roller Coaster app version 0.81 and 7128 for Mannequin Challenge app version 1.0.1.

In [560]:
#len(ios_apps)
#print(ios_apps[5603])
#print(ios_apps[7128])
#del(ios_apps[7127]) # IMPORTANT: RUN THIS ONLY ONCE!
#del(ios_apps[7128]) # IMPORTANT: RUN THIS ONLY ONCE!
#print(ios_apps[5603])
#print(ios_apps[7128])
#len(ios_apps)

### 3. Deleting non-english apps

We're not interested in keeping non_english apps, so we'll remove them. 

One method is to remove apps whose name contains a symbol that is not commonly used in English text — English text usually includes letters from the English alphabet, numbers composed of digits from 0 to 9, punctuation marks (., !, ?, ;, etc.), and other symbols (+, *, /, etc.).

All these characters that are specific to English texts are encoded using the ASCII standard. Each ASCII character has a corresponding number between 0 and 127 associated with it, and we can take advantage of that to build a function that checks an app name and tells us whether it contains non-ASCII characters.

In [562]:
# Let's find some examples of potental non-english apps

# To facilitate this task, we'll write a function
def not_english(data_set, i):

    for app in data_set:
        name = app[i]
        for character in name:
            if ord(character) > 127:
                print(name)

In [563]:
# Apple App Store
not_english(ios_apps, 1)

Google – Search made just for mobile
Lifesum – Inspiring healthy lifestyle app
iHeartRadio – Free Music & Radio Stations
Line Rider iRide™
Lose It! – Weight Loss Program and Calorie Counter
Chase Mobile℠
大辞林
大辞林
大辞林
新浪新闻-阅读最新时事热门头条资讯视频
新浪新闻-阅读最新时事热门头条资讯视频
新浪新闻-阅读最新时事热门头条资讯视频
新浪新闻-阅读最新时事热门头条资讯视频
新浪新闻-阅读最新时事热门头条资讯视频
新浪新闻-阅读最新时事热门头条资讯视频
新浪新闻-阅读最新时事热门头条资讯视频
新浪新闻-阅读最新时事热门头条资讯视频
新浪新闻-阅读最新时事热门头条资讯视频
新浪新闻-阅读最新时事热门头条资讯视频
新浪新闻-阅读最新时事热门头条资讯视频
新浪新闻-阅读最新时事热门头条资讯视频
新浪新闻-阅读最新时事热门头条资讯视频
新浪新闻-阅读最新时事热门头条资讯视频
新浪新闻-阅读最新时事热门头条资讯视频
新浪新闻-阅读最新时事热门头条资讯视频
新浪新闻-阅读最新时事热门头条资讯视频
新浪新闻-阅读最新时事热门头条资讯视频
Citi Mobile®
Kindle – Read eBooks, Magazines & Textbooks
同花顺-炒股、股票
同花顺-炒股、股票
同花顺-炒股、股票
同花顺-炒股、股票
同花顺-炒股、股票
同花顺-炒股、股票
同花顺-炒股、股票
同花顺-炒股、股票
Match™ - #1 Dating App.
Brain Wave ™ - 32 Advanced Binaural Brainwave Entrainment Programs with iTunes Music and Relaxing Ambience
LEDit – The LED Banner App
20 Minutes.fr - l'actualité en continu
World Cup Table Tennis™
iStudiez Pro – Homework, Schedule, Grades
MindNode – Delightful Mi

天猫-购物
ケンタッキーフライドチキン　公式アプリ
ケンタッキーフライドチキン　公式アプリ
ケンタッキーフライドチキン　公式アプリ
ケンタッキーフライドチキン　公式アプリ
ケンタッキーフライドチキン　公式アプリ
ケンタッキーフライドチキン　公式アプリ
ケンタッキーフライドチキン　公式アプリ
ケンタッキーフライドチキン　公式アプリ
ケンタッキーフライドチキン　公式アプリ
ケンタッキーフライドチキン　公式アプリ
ケンタッキーフライドチキン　公式アプリ
ケンタッキーフライドチキン　公式アプリ
ケンタッキーフライドチキン　公式アプリ
ケンタッキーフライドチキン　公式アプリ
ケンタッキーフライドチキン　公式アプリ
ケンタッキーフライドチキン　公式アプリ
ケンタッキーフライドチキン　公式アプリ
ケンタッキーフライドチキン　公式アプリ
ケンタッキーフライドチキン　公式アプリ
翼支付-只为简单生活
翼支付-只为简单生活
翼支付-只为简单生活
翼支付-只为简单生活
翼支付-只为简单生活
翼支付-只为简单生活
翼支付-只为简单生活
翼支付-只为简单生活
翼支付-只为简单生活
冒険ダンジョン村
冒険ダンジョン村
冒険ダンジョン村
冒険ダンジョン村
冒険ダンジョン村
冒険ダンジョン村
冒険ダンジョン村
冒険ダンジョン村
腾讯微云-安全备份共享文件和照片
腾讯微云-安全备份共享文件和照片
腾讯微云-安全备份共享文件和照片
腾讯微云-安全备份共享文件和照片
腾讯微云-安全备份共享文件和照片
腾讯微云-安全备份共享文件和照片
腾讯微云-安全备份共享文件和照片
腾讯微云-安全备份共享文件和照片
腾讯微云-安全备份共享文件和照片
腾讯微云-安全备份共享文件和照片
腾讯微云-安全备份共享文件和照片
腾讯微云-安全备份共享文件和照片
腾讯微云-安全备份共享文件和照片
腾讯微云-安全备份共享文件和照片
腾讯微云-安全备份共享文件和照片
宝宝树孕育-火爆的备孕怀孕育儿社区
宝宝树孕育-火爆的备孕怀孕育儿社区
宝宝树孕育-火爆的备孕怀孕育儿社区
宝宝树孕育-火爆的备孕怀孕育儿社区
宝宝树孕育-火爆的备孕怀孕育儿社区
宝宝树孕育-火爆的备孕怀孕育儿社区
宝宝树孕育-火爆的备孕怀孕育儿社区
宝宝树孕育-火爆的备孕怀孕育儿社区
宝宝树孕育-火爆的备孕怀孕育儿社区
宝宝树孕育-火爆的备孕怀孕育儿社区
宝宝树孕育-火爆

宜人贷借款
宜人贷借款
宜人贷借款
宜人贷借款
Block Сity Wars: game and skin export to minecraft
Jurassic World™: The Game
开心消消乐®
开心消消乐®
开心消消乐®
开心消消乐®
开心消消乐®
开心消消乐®
ゆるドラシル -本格派神話RPG-
ゆるドラシル -本格派神話RPG-
ゆるドラシル -本格派神話RPG-
ゆるドラシル -本格派神話RPG-
ゆるドラシル -本格派神話RPG-
ゆるドラシル -本格派神話RPG-
ゆるドラシル -本格派神話RPG-
ゆるドラシル -本格派神話RPG-
ゆるドラシル -本格派神話RPG-
ゆるドラシル -本格派神話RPG-
ゆるドラシル -本格派神話RPG-
QRコードリーダー for iPhone - 無料で使えるQRコード読み取り用アプリ
QRコードリーダー for iPhone - 無料で使えるQRコード読み取り用アプリ
QRコードリーダー for iPhone - 無料で使えるQRコード読み取り用アプリ
QRコードリーダー for iPhone - 無料で使えるQRコード読み取り用アプリ
QRコードリーダー for iPhone - 無料で使えるQRコード読み取り用アプリ
QRコードリーダー for iPhone - 無料で使えるQRコード読み取り用アプリ
QRコードリーダー for iPhone - 無料で使えるQRコード読み取り用アプリ
QRコードリーダー for iPhone - 無料で使えるQRコード読み取り用アプリ
QRコードリーダー for iPhone - 無料で使えるQRコード読み取り用アプリ
QRコードリーダー for iPhone - 無料で使えるQRコード読み取り用アプリ
QRコードリーダー for iPhone - 無料で使えるQRコード読み取り用アプリ
QRコードリーダー for iPhone - 無料で使えるQRコード読み取り用アプリ
QRコードリーダー for iPhone - 無料で使えるQRコード読み取り用アプリ
QRコードリーダー for iPhone - 無料で使えるQRコード読み取り用アプリ
QRコードリーダー for iPhone - 無料で使えるQRコード読み取り用アプリ
QRコードリーダー for 

がんばれルルロロ！かさねてブロック
オーブGETマルチ掲示板！ for モンスト
オーブGETマルチ掲示板！ for モンスト
オーブGETマルチ掲示板！ for モンスト
オーブGETマルチ掲示板！ for モンスト
オーブGETマルチ掲示板！ for モンスト
オーブGETマルチ掲示板！ for モンスト
オーブGETマルチ掲示板！ for モンスト
オーブGETマルチ掲示板！ for モンスト
オーブGETマルチ掲示板！ for モンスト
オーブGETマルチ掲示板！ for モンスト
オーブGETマルチ掲示板！ for モンスト
オーブGETマルチ掲示板！ for モンスト
オーブGETマルチ掲示板！ for モンスト
オーブGETマルチ掲示板！ for モンスト
Filterra – Photo Editor, Effects for Pictures
パチスロ黄門ちゃま 喝
パチスロ黄門ちゃま 喝
パチスロ黄門ちゃま 喝
パチスロ黄門ちゃま 喝
パチスロ黄門ちゃま 喝
パチスロ黄門ちゃま 喝
パチスロ黄門ちゃま 喝
パチスロ黄門ちゃま 喝
パチスロ黄門ちゃま 喝
パチスロ黄門ちゃま 喝
Keep - 移动健身教练 自由运动场
Keep - 移动健身教练 自由运动场
Keep - 移动健身教练 自由运动场
Keep - 移动健身教练 自由运动场
Keep - 移动健身教练 自由运动场
Keep - 移动健身教练 自由运动场
Keep - 移动健身教练 自由运动场
Keep - 移动健身教练 自由运动场
Keep - 移动健身教练 自由运动场
Keep - 移动健身教练 自由运动场
Keep - 移动健身教练 自由运动场
平安WiFi-手机必备的万能WiFi上网钥匙
平安WiFi-手机必备的万能WiFi上网钥匙
平安WiFi-手机必备的万能WiFi上网钥匙
平安WiFi-手机必备的万能WiFi上网钥匙
平安WiFi-手机必备的万能WiFi上网钥匙
平安WiFi-手机必备的万能WiFi上网钥匙
平安WiFi-手机必备的万能WiFi上网钥匙
平安WiFi-手机必备的万能WiFi上网钥匙
平安WiFi-手机必备的万能WiFi上网钥匙
平安WiFi-手机必备的万能WiFi上网钥匙
平安WiFi-手机必备的万能WiFi上网钥匙
平安WiFi-手机必备的万能WiFi上网钥

麻雀物語３ 役満乱舞の究極大戦
麻雀物語３ 役満乱舞の究極大戦
麻雀物語３ 役満乱舞の究極大戦
麻雀物語３ 役満乱舞の究極大戦
麻雀物語３ 役満乱舞の究極大戦
麻雀物語３ 役満乱舞の究極大戦
麻雀物語３ 役満乱舞の究極大戦
麻雀物語３ 役満乱舞の究極大戦
麻雀物語３ 役満乱舞の究極大戦
麻雀物語３ 役満乱舞の究極大戦
麻雀物語３ 役満乱舞の究極大戦
Viva™ Slots Las Vegas Classic Casino Games
クリアした奴マジ天才
クリアした奴マジ天才
クリアした奴マジ天才
クリアした奴マジ天才
クリアした奴マジ天才
クリアした奴マジ天才
クリアした奴マジ天才
クリアした奴マジ天才
クリアした奴マジ天才
クリアした奴マジ天才
Color•多彩手帐
Color•多彩手帐
Color•多彩手帐
Color•多彩手帐
Color•多彩手帐
激ムズ！ねこじゃんぷ２
激ムズ！ねこじゃんぷ２
激ムズ！ねこじゃんぷ２
激ムズ！ねこじゃんぷ２
激ムズ！ねこじゃんぷ２
激ムズ！ねこじゃんぷ２
激ムズ！ねこじゃんぷ２
激ムズ！ねこじゃんぷ２
激ムズ！ねこじゃんぷ２
激ムズ！ねこじゃんぷ２
激ムズ！ねこじゃんぷ２
Flow Speed Control ● Professional Edition
Little Nugget® - capture pregnancy & baby pics
戦え！プリンセスドール
戦え！プリンセスドール
戦え！プリンセスドール
戦え！プリンセスドール
戦え！プリンセスドール
戦え！プリンセスドール
戦え！プリンセスドール
戦え！プリンセスドール
戦え！プリンセスドール
戦え！プリンセスドール
戦え！プリンセスドール
MINE［マイン］ファッション/コーディネート動画アプリ
MINE［マイン］ファッション/コーディネート動画アプリ
MINE［マイン］ファッション/コーディネート動画アプリ
MINE［マイン］ファッション/コーディネート動画アプリ
MINE［マイン］ファッション/コーディネート動画アプリ
MINE［マイン］ファッション/コーディネート動画アプリ
MINE［マイン］ファッション/コーディネート動画アプリ
MINE［マイン］ファッション/コーディネート動画アプリ
MINE［マイン］ファッション/コーディネート動画アプリ
MI

机甲无双－燃即正义 殿堂级战斗手游
机甲无双－燃即正义 殿堂级战斗手游
机甲无双－燃即正义 殿堂级战斗手游
机甲无双－燃即正义 殿堂级战斗手游
机甲无双－燃即正义 殿堂级战斗手游
机甲无双－燃即正义 殿堂级战斗手游
机甲无双－燃即正义 殿堂级战斗手游
上司と秘密の2LDK　Love Happening
上司と秘密の2LDK　Love Happening
上司と秘密の2LDK　Love Happening
上司と秘密の2LDK　Love Happening
上司と秘密の2LDK　Love Happening
上司と秘密の2LDK　Love Happening
上司と秘密の2LDK　Love Happening
雨时
雨时
Future War：Reborn- Zombie Survival Tatics TPS
注文の多いブサ猫軒
注文の多いブサ猫軒
注文の多いブサ猫軒
注文の多いブサ猫軒
注文の多いブサ猫軒
注文の多いブサ猫軒
注文の多いブサ猫軒
注文の多いブサ猫軒
注文の多いブサ猫軒
DayDayCook － 日日煮
DayDayCook － 日日煮
DayDayCook － 日日煮
DayDayCook － 日日煮
百盈足球-专业足篮比分赛事预测
百盈足球-专业足篮比分赛事预测
百盈足球-专业足篮比分赛事预测
百盈足球-专业足篮比分赛事预测
百盈足球-专业足篮比分赛事预测
百盈足球-专业足篮比分赛事预测
百盈足球-专业足篮比分赛事预测
百盈足球-专业足篮比分赛事预测
百盈足球-专业足篮比分赛事预测
百盈足球-专业足篮比分赛事预测
百盈足球-专业足篮比分赛事预测
百盈足球-专业足篮比分赛事预测
百盈足球-专业足篮比分赛事预测
百盈足球-专业足篮比分赛事预测
ねこめし屋 -マンガも読めるネコゲーム料理店経営の無料育成シュミレーション-
ねこめし屋 -マンガも読めるネコゲーム料理店経営の無料育成シュミレーション-
ねこめし屋 -マンガも読めるネコゲーム料理店経営の無料育成シュミレーション-
ねこめし屋 -マンガも読めるネコゲーム料理店経営の無料育成シュミレーション-
ねこめし屋 -マンガも読めるネコゲーム料理店経営の無料育成シュミレーション-
ねこめし屋 -マンガも読めるネコゲーム料理店経営の無料育成シュミレーション-
ねこめし屋 -マンガも読めるネコゲーム料理店経営

ナイトメアハーレム 女性向け恋愛ゲーム！乙女げーむ
ナイトメアハーレム 女性向け恋愛ゲーム！乙女げーむ
ナイトメアハーレム 女性向け恋愛ゲーム！乙女げーむ
ナイトメアハーレム 女性向け恋愛ゲーム！乙女げーむ
ナイトメアハーレム 女性向け恋愛ゲーム！乙女げーむ
ナイトメアハーレム 女性向け恋愛ゲーム！乙女げーむ
ナイトメアハーレム 女性向け恋愛ゲーム！乙女げーむ
ナイトメアハーレム 女性向け恋愛ゲーム！乙女げーむ
ナイトメアハーレム 女性向け恋愛ゲーム！乙女げーむ
ナイトメアハーレム 女性向け恋愛ゲーム！乙女げーむ
ナイトメアハーレム 女性向け恋愛ゲーム！乙女げーむ
ナイトメアハーレム 女性向け恋愛ゲーム！乙女げーむ
ナイトメアハーレム 女性向け恋愛ゲーム！乙女げーむ
競馬予想 アプリ！馬券予想で収支アップ！競馬 jra 攻略！
競馬予想 アプリ！馬券予想で収支アップ！競馬 jra 攻略！
競馬予想 アプリ！馬券予想で収支アップ！競馬 jra 攻略！
競馬予想 アプリ！馬券予想で収支アップ！競馬 jra 攻略！
競馬予想 アプリ！馬券予想で収支アップ！競馬 jra 攻略！
競馬予想 アプリ！馬券予想で収支アップ！競馬 jra 攻略！
競馬予想 アプリ！馬券予想で収支アップ！競馬 jra 攻略！
競馬予想 アプリ！馬券予想で収支アップ！競馬 jra 攻略！
競馬予想 アプリ！馬券予想で収支アップ！競馬 jra 攻略！
競馬予想 アプリ！馬券予想で収支アップ！競馬 jra 攻略！
競馬予想 アプリ！馬券予想で収支アップ！競馬 jra 攻略！
競馬予想 アプリ！馬券予想で収支アップ！競馬 jra 攻略！
競馬予想 アプリ！馬券予想で収支アップ！競馬 jra 攻略！
競馬予想 アプリ！馬券予想で収支アップ！競馬 jra 攻略！
競馬予想 アプリ！馬券予想で収支アップ！競馬 jra 攻略！
競馬予想 アプリ！馬券予想で収支アップ！競馬 jra 攻略！
競馬予想 アプリ！馬券予想で収支アップ！競馬 jra 攻略！
競馬予想 アプリ！馬券予想で収支アップ！競馬 jra 攻略！
競馬予想 アプリ！馬券予想で収支アップ！競馬 jra 攻略！
競馬予想 アプリ！馬券予想で収支アップ！競馬 jra 攻略！
競馬予想 アプリ！馬券予想で収支アップ！競馬 jra 攻略！
競馬予想 アプリ！馬券

You勇者 -HIKAKINとSEIKIN(ヒカキンとセイキン)と無料ロールプレイング
You勇者 -HIKAKINとSEIKIN(ヒカキンとセイキン)と無料ロールプレイング
You勇者 -HIKAKINとSEIKIN(ヒカキンとセイキン)と無料ロールプレイング
You勇者 -HIKAKINとSEIKIN(ヒカキンとセイキン)と無料ロールプレイング
You勇者 -HIKAKINとSEIKIN(ヒカキンとセイキン)と無料ロールプレイング
You勇者 -HIKAKINとSEIKIN(ヒカキンとセイキン)と無料ロールプレイング
You勇者 -HIKAKINとSEIKIN(ヒカキンとセイキン)と無料ロールプレイング
You勇者 -HIKAKINとSEIKIN(ヒカキンとセイキン)と無料ロールプレイング
You勇者 -HIKAKINとSEIKIN(ヒカキンとセイキン)と無料ロールプレイング
You勇者 -HIKAKINとSEIKIN(ヒカキンとセイキン)と無料ロールプレイング
You勇者 -HIKAKINとSEIKIN(ヒカキンとセイキン)と無料ロールプレイング
You勇者 -HIKAKINとSEIKIN(ヒカキンとセイキン)と無料ロールプレイング
ぷかぷか
ぷかぷか
ぷかぷか
ぷかぷか
ゲームセンター倶楽部
ゲームセンター倶楽部
ゲームセンター倶楽部
ゲームセンター倶楽部
ゲームセンター倶楽部
ゲームセンター倶楽部
ゲームセンター倶楽部
ゲームセンター倶楽部
ゲームセンター倶楽部
ゲームセンター倶楽部
任务客
任务客
任务客
口袋电玩-经典街机游戏真人秀版•玩转全民疯狂欢乐火拼炸诈扎金花送斗牛牛麻将VIP
口袋电玩-经典街机游戏真人秀版•玩转全民疯狂欢乐火拼炸诈扎金花送斗牛牛麻将VIP
口袋电玩-经典街机游戏真人秀版•玩转全民疯狂欢乐火拼炸诈扎金花送斗牛牛麻将VIP
口袋电玩-经典街机游戏真人秀版•玩转全民疯狂欢乐火拼炸诈扎金花送斗牛牛麻将VIP
口袋电玩-经典街机游戏真人秀版•玩转全民疯狂欢乐火拼炸诈扎金花送斗牛牛麻将VIP
口袋电玩-经典街机游戏真人秀版•玩转全民疯狂欢乐火拼炸诈扎金花送斗牛牛麻将VIP
口袋电玩-经典街机游戏真人秀版•玩转全民疯狂欢乐火拼炸诈扎金花送斗牛牛麻将VIP
口袋电玩-经典街机游戏真人秀版•玩转全民疯狂欢乐火拼炸诈扎金花送斗牛牛麻将VIP
口袋

[GP]パチスロ ヱヴァンゲリヲン〜決意の刻〜(パチスロゲーム)
[GP]パチスロ ヱヴァンゲリヲン〜決意の刻〜(パチスロゲーム)
なっとう-人気の納豆育成ゲーム-
なっとう-人気の納豆育成ゲーム-
なっとう-人気の納豆育成ゲーム-
なっとう-人気の納豆育成ゲーム-
なっとう-人気の納豆育成ゲーム-
なっとう-人気の納豆育成ゲーム-
なっとう-人気の納豆育成ゲーム-
なっとう-人気の納豆育成ゲーム-
なっとう-人気の納豆育成ゲーム-
なっとう-人気の納豆育成ゲーム-
なっとう-人気の納豆育成ゲーム-
なっとう-人気の納豆育成ゲーム-
なっとう-人気の納豆育成ゲーム-
なっとう-人気の納豆育成ゲーム-
罪と罰2 -犯人は誰だ!?- 謎解き推理ミステリーサスペンス
罪と罰2 -犯人は誰だ!?- 謎解き推理ミステリーサスペンス
罪と罰2 -犯人は誰だ!?- 謎解き推理ミステリーサスペンス
罪と罰2 -犯人は誰だ!?- 謎解き推理ミステリーサスペンス
罪と罰2 -犯人は誰だ!?- 謎解き推理ミステリーサスペンス
罪と罰2 -犯人は誰だ!?- 謎解き推理ミステリーサスペンス
罪と罰2 -犯人は誰だ!?- 謎解き推理ミステリーサスペンス
罪と罰2 -犯人は誰だ!?- 謎解き推理ミステリーサスペンス
罪と罰2 -犯人は誰だ!?- 謎解き推理ミステリーサスペンス
罪と罰2 -犯人は誰だ!?- 謎解き推理ミステリーサスペンス
罪と罰2 -犯人は誰だ!?- 謎解き推理ミステリーサスペンス
罪と罰2 -犯人は誰だ!?- 謎解き推理ミステリーサスペンス
罪と罰2 -犯人は誰だ!?- 謎解き推理ミステリーサスペンス
罪と罰2 -犯人は誰だ!?- 謎解き推理ミステリーサスペンス
罪と罰2 -犯人は誰だ!?- 謎解き推理ミステリーサスペンス
罪と罰2 -犯人は誰だ!?- 謎解き推理ミステリーサスペンス
罪と罰2 -犯人は誰だ!?- 謎解き推理ミステリーサスペンス
罪と罰2 -犯人は誰だ!?- 謎解き推理ミステリーサスペンス
罪と罰2 -犯人は誰だ!?- 謎解き推理ミステリーサスペンス
罪と罰2 -犯人は誰だ!?- 謎解き推理ミステリーサスペンス
罪と罰2 -犯人は誰だ!?- 謎解き推理ミステリーサスペンス
罪と罰2 -犯人は誰だ!?- 謎解き推理ミステリーサスペンス
罪と罰2 -犯人は誰だ!?-

仙侠剑客—大型3D精品ARPG动作神作
仙侠剑客—大型3D精品ARPG动作神作
仙侠剑客—大型3D精品ARPG动作神作
仙侠剑客—大型3D精品ARPG动作神作
仙侠剑客—大型3D精品ARPG动作神作
仙侠剑客—大型3D精品ARPG动作神作
仙侠剑客—大型3D精品ARPG动作神作
仙侠剑客—大型3D精品ARPG动作神作
仙侠剑客—大型3D精品ARPG动作神作
仙侠剑客—大型3D精品ARPG动作神作
仙侠剑客—大型3D精品ARPG动作神作
仙侠剑客—大型3D精品ARPG动作神作
仙侠剑客—大型3D精品ARPG动作神作
オトナの教科書〜学校じゃ教わらない裏教育クイズ(画像付き)〜
オトナの教科書〜学校じゃ教わらない裏教育クイズ(画像付き)〜
オトナの教科書〜学校じゃ教わらない裏教育クイズ(画像付き)〜
オトナの教科書〜学校じゃ教わらない裏教育クイズ(画像付き)〜
オトナの教科書〜学校じゃ教わらない裏教育クイズ(画像付き)〜
オトナの教科書〜学校じゃ教わらない裏教育クイズ(画像付き)〜
オトナの教科書〜学校じゃ教わらない裏教育クイズ(画像付き)〜
オトナの教科書〜学校じゃ教わらない裏教育クイズ(画像付き)〜
オトナの教科書〜学校じゃ教わらない裏教育クイズ(画像付き)〜
オトナの教科書〜学校じゃ教わらない裏教育クイズ(画像付き)〜
オトナの教科書〜学校じゃ教わらない裏教育クイズ(画像付き)〜
オトナの教科書〜学校じゃ教わらない裏教育クイズ(画像付き)〜
オトナの教科書〜学校じゃ教わらない裏教育クイズ(画像付き)〜
オトナの教科書〜学校じゃ教わらない裏教育クイズ(画像付き)〜
オトナの教科書〜学校じゃ教わらない裏教育クイズ(画像付き)〜
オトナの教科書〜学校じゃ教わらない裏教育クイズ(画像付き)〜
オトナの教科書〜学校じゃ教わらない裏教育クイズ(画像付き)〜
オトナの教科書〜学校じゃ教わらない裏教育クイズ(画像付き)〜
オトナの教科書〜学校じゃ教わらない裏教育クイズ(画像付き)〜
オトナの教科書〜学校じゃ教わらない裏教育クイズ(画像付き)〜
オトナの教科書〜学校じゃ教わらない裏教育クイズ(画像付き)〜
オトナの教科書〜学校じゃ教わらない裏教育クイズ(画像付き)〜
オトナの教科書〜学校じゃ教わらない裏教育クイズ(画像付き)〜
オトナの教科書〜学校じゃ教わらない裏教育クイズ(画像付

In [564]:
# Android Play Store
not_english(google_apps_clean, 0)

U Launcher Lite – FREE Live Cool Themes, Hide Apps
CarMax – Cars for Sale: Search Used Car Inventory
AutoScout24 Switzerland – Find your new car
Zona Azul Digital Fácil SP CET - OFFICIAL São Paulo
Zona Azul Digital Fácil SP CET - OFFICIAL São Paulo
ReadEra – free ebook reader
Docs To Go™ Free Office Suite
USPS MOBILE®
Invoice 2go — Professional Invoices and Estimates
Röhrich Werner Soundboard
Manga Net – Best Online Manga Reader
Truyện Vui Tý Quậy
Truyện Vui Tý Quậy
Truyện Vui Tý Quậy
Comic Es - Shojo manga / love comics free of charge ♪ ♪
Comic Es - Shojo manga / love comics free of charge ♪ ♪
漫咖 Comics - Manga,Novel and Stories
漫咖 Comics - Manga,Novel and Stories
Tapas – Comics, Novels, and Stories
【Ranobbe complete free】 Novelba - Free app that you can read and write novels
【Ranobbe complete free】 Novelba - Free app that you can read and write novels
Call Free – Free Call
Xperia Link™
Messenger – Text and Video Chat for Free
Dolphin Browser - Fast, Private & Adblock🐬
Sync.ME – Calle

The function works relatively well, but some English app names use emojis or other symbols (™, — (em dash), – (en dash), etc.) that fall outside of the ASCII range. 

*e.g "Google – Search made just for mobile" in the Google Play store and "Dolphin Browser - Fast, Private & Adblock🐬" in the Apple App Store

Because of this, we'll remove useful apps if we use the function in its current form.

We'll modify the function to exclude apps with more than 3 characters that fall outside of the ASCII range.

In [568]:
# Creating a new function to exlude app names with more than 3 non-ascii characters
def is_english(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii >= 3:
        return False
    else:
        return True

# Testing out the function
print(is_english('新浪新闻-阅读最新时事热门头条资讯视频'))
print(is_english('Truyện Vui Tý Quậy'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

False
False
True
True


The function is still not perfect. Setting thte cutoff to 3 non-ascii misses some non-english apps (e.g. "Truyện Vui Tý Quậy". For our analysis, we'll set the cutoff to 2 non-ascii characters to minimize false positives.

We'll use the is_english function to filter out the non-english apps and create news lists:
android_apps_eng
ios_apps_eng

In [569]:
android_apps_eng = []
ios_apps_eng = []

for app in google_apps_clean:
    name = app[0]
    if is_english(name):
        android_apps_eng.append(app)
        
for app in ios_apps:
    name = app[1]
    if is_english(name):
        ios_apps_eng.append(app)
        
explore_data(android_apps_eng, 0, 3, True)
print('\n')
explore_data(ios_apps_eng, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9598
Number of columns: 13


['281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1', '0']


['281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1', '0']


['281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583',

After filtering out non-english apps, we are left with two list
1. androind_apps_eng containing 9598 apps
2. ios_apps_eng containing 6152 apps

### 4. Isolating free apps

**So far in the data cleaning process, we:**

- Removed inaccurate data
- Removed duplicate app entries
- Removed non-English apps

For our analysis, we're only interested in free apps.

Therefore we need to generate lists of apps containing only free apps (i.e. price = 0).

In [587]:
android_final = []
ios_final = []

for app in android_apps_eng:
    price = app[7]
    if price == '0':
        android_final.append(app)
        
for app in ios_apps_eng:
    price = app[4]
    if price == '0':
        ios_final.append(app)
        
print(len(android_final))
print(len(ios_final))

8846
3200


In [588]:
explore_data(android_final, 0, 3, True)
print('\n')
explore_data(ios_final, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 8846
Number of columns: 13


['281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1', '0']


['281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1', '0']


['282614216', 'eBay: Best App to Buy, Sell, Save! Online Shoppin

**At the end of out cleanup, we are left with the following for our analysis:**  
- **8846 Android apps in the Google Play Store**  
- **3200 iOS apps in the Apple App Store**        

----------------------
## Analysis

##### Note: Conclusions only applicable to english apps


As we mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users because revenue is highly influenced by the number of people using our apps.

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets. For instance, a profile that works well for both markets might be a productivity app that makes use of gamification.

Let's begin the analysis by getting a sense of what are the most common genres for each market. For this, we'll need to build frequency tables for a few columns in our data sets.

We'll build two functions we can use to analyze the frequency tables:

One function to generate frequency tables that show percentages
Another function that we can use to display the percentages in a descending order

In [589]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = round(percentage, 1) 
    
    return table_percentages


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

### **iOS Data**

We start by examining the frequency table for the prime_genre column of the App Store data set.

In [600]:
display_table(ios_final, 11)

Games : 58.2
Entertainment : 7.8
Photo & Video : 5.0
Education : 3.7
Social Networking : 3.3
Shopping : 2.6
Utilities : 2.5
Sports : 2.2
Music : 2.1
Health & Fitness : 2.0
Productivity : 1.8
Lifestyle : 1.6
News : 1.3
Travel : 1.2
Finance : 1.1
Weather : 0.9
Food & Drink : 0.8
Reference : 0.5
Business : 0.5
Book : 0.4
Navigation : 0.2
Medical : 0.2
Catalogs : 0.1


We see that that more than half (%58.2) of the free English apps in the Apple App Store are games, followed by entertainment apps (7.8%) and photo and video apps (%5.0).

The Apple App Store is dominated by apps designed for fun (games, entertainment, photo and video, social networking, sports, music, etc.). Apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) represent a much smaller proportion. 

However, the fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users — the demand might not be the same as the offer.

### **Android Data**

Let's now examining the frequency table for the prime_genre column of the Google Play Store data set.

In [591]:
# Display for Category
display_table(android_final, 1)

FAMILY : 19.0
GAME : 9.7
TOOLS : 8.4
BUSINESS : 4.6
PRODUCTIVITY : 3.9
LIFESTYLE : 3.9
FINANCE : 3.7
MEDICAL : 3.5
SPORTS : 3.4
PERSONALIZATION : 3.3
COMMUNICATION : 3.2
HEALTH_AND_FITNESS : 3.1
PHOTOGRAPHY : 3.0
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.7
TRAVEL_AND_LOCAL : 2.3
SHOPPING : 2.2
BOOKS_AND_REFERENCE : 2.1
DATING : 1.9
VIDEO_PLAYERS : 1.8
MAPS_AND_NAVIGATION : 1.4
FOOD_AND_DRINK : 1.2
EDUCATION : 1.2
ENTERTAINMENT : 1.0
LIBRARIES_AND_DEMO : 0.9
AUTO_AND_VEHICLES : 0.9
WEATHER : 0.8
HOUSE_AND_HOME : 0.8
PARENTING : 0.7
EVENTS : 0.7
COMICS : 0.6
BEAUTY : 0.6
ART_AND_DESIGN : 0.6


In [592]:
# Display for Genre
# The genre column gives us much more granular information about the data
display_table(android_final, 9)

Tools : 8.4
Entertainment : 6.1
Education : 5.4
Business : 4.6
Productivity : 3.9
Lifestyle : 3.9
Finance : 3.7
Sports : 3.5
Medical : 3.5
Personalization : 3.3
Communication : 3.2
Health & Fitness : 3.1
Action : 3.1
Photography : 3.0
News & Magazines : 2.8
Social : 2.7
Travel & Local : 2.3
Shopping : 2.2
Books & Reference : 2.1
Simulation : 2.0
Dating : 1.9
Video Players & Editors : 1.8
Casual : 1.8
Arcade : 1.8
Maps & Navigation : 1.4
Food & Drink : 1.2
Puzzle : 1.1
Racing : 1.0
Strategy : 0.9
Role Playing : 0.9
Libraries & Demo : 0.9
Auto & Vehicles : 0.9
Weather : 0.8
House & Home : 0.8
Events : 0.7
Adventure : 0.7
Comics : 0.6
Beauty : 0.6
Art & Design : 0.6
Parenting : 0.5
Trivia : 0.4
Educational;Education : 0.4
Educational : 0.4
Casino : 0.4
Card : 0.4
Board : 0.4
Word : 0.3
Education;Education : 0.3
Racing;Action & Adventure : 0.2
Puzzle;Brain Games : 0.2
Music : 0.2
Entertainment;Music & Video : 0.2
Casual;Pretend Play : 0.2
Simulation;Action & Adventure : 0.1
Parenting;Music

Contrary to the Apple App Store, the Google Play has a good balance of apps with practical purposes (tools, education, business, productivity, etc.), and apps designed for fun (entertainment, travel & local, Video Players & Editors, etc).

The information we've gathered so far doesn't inform us as to which genre/category is more popular in each store.

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. 

For the Google Play data set, we can find this information in the Installs column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot app.

### Most Popular Apps by Genre on the Apple App Store

In [593]:
# Let's calculate the average number of user ratings per app genre on the App Store:

genres_ios = freq_table(ios_final, 11)

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[11]
        if genre_app == genre:            
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = round(total / len_genre, 1)
    print(genre, ':', avg_n_ratings)

Productivity : 21028.4
Weather : 52279.9
Shopping : 27230.7
Reference : 79350.5
Finance : 32367.0
Music : 57326.5
Utilities : 19156.5
Travel : 28243.8
Social Networking : 71548.3
Sports : 23008.9
Health & Fitness : 23298.0
Games : 22921.3
Food & Drink : 33333.9
News : 21248.0
Book : 46384.9
Photo & Video : 28441.5
Entertainment : 14195.4
Business : 7491.1
Lifestyle : 16815.5
Education : 7004.0
Navigation : 86090.3
Medical : 612.0
Catalogs : 4004.0


Let's take a closer look at each genre to see the type of apps they contain

First, lets take a look at some genres genres with highest number of user reviews: 

- **Navigation: 86090.3 avg reviews**
- **Social Networking: 71548.3 avg reviews**
- **Music: 57326.5 avg reviews**

In [594]:
for app in ios_final:
     if app[11] == 'Navigation':
        print(app[1], ':', app[5]) # print name and number of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Geocaching® : 12811
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5
CoPilot GPS – Car Navigation & Offline Maps : 3582
Google Maps - Navigation & Transit : 154911


In [595]:
for app in ios_final:
     if app[11] == 'Social Networking':
        print(app[1], ':', app[5]) # print name and number of ratings

Facebook : 2974676
LinkedIn : 71856
Skype for iPhone : 373519
Tumblr : 334293
Match™ - #1 Dating App. : 60659
WhatsApp Messenger : 287589
TextNow - Unlimited Text + Calls : 164963
Grindr - Gay and same sex guys chat, meet and date : 23201
imo video calls and chat : 18841
Ameba : 269
Weibo : 7265
Badoo - Meet New People, Chat, Socialize. : 34428
Kik : 260965
Qzone : 1649
Fake-A-Location Free ™ : 354
Tango - Free Video Call, Voice and Chat : 75412
MeetMe - Chat and Meet New People : 97072
SimSimi : 23530
Viber Messenger – Text & Call : 164249
Find My Family, Friends & iPhone - Life360 Locator : 43877
Weibo HD : 16772
POF - Best Dating App for Conversations : 52642
GroupMe : 28260
Lobi : 36
WeChat : 34584
ooVoo – Free Video Call, Text and Voice : 177501
Pinterest : 1061624
知乎 : 397
Qzone HD : 458
Skype for iPad : 60163
LINE : 11437
QQ : 9109
LOVOO - Dating Chat : 1985
QQ HD : 5058
Messenger : 351466
eHarmony™ Dating App - Meet Singles : 11124
YouNow: Live Stream Video Chat : 12079
Cougar 

In [596]:
for app in ios_final:
     if app[11] == 'Music':
        print(app[1], ':', app[5]) # print name and number of ratings

Pandora - Music & Radio : 1126879
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
Deezer - Listen to your Favorite Music & Playlists : 4677
Sonos Controller : 48905
NRJ Radio : 38
radio.de - Der Radioplayer : 64
Spotify Music : 878563
SoundCloud - Music & Audio : 135744
Sing Karaoke Songs Unlimited with StarMaker : 26227
SoundHound Song Search & Music Player : 82602
Ringtones for iPhone & Ringtone Maker : 25403
Coach Guitar - Lessons & Easy Tabs For Beginners : 2416
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Magic Piano by Smule : 131695
QQ音乐HD : 224
The Singing Machine Mobile Karaoke App : 130
Bandsintown Concerts : 30845
PetitLyrics : 0
edjing Mix:DJ turntable to remix and scratch music : 13580
Smule Sing! : 119316
Amazon Music : 106235
AutoRap by Smule : 18202
My Mixtapez Music : 26286
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
Napster - Top Music 

*Note: We can see that a few non-english speaking apps made their way through our filtering. However, as the total number of user ratings suggest, the popularity of these apps is minimal and should not significantly affect our analysis.*

The popularity of navigation and Social Networking apps appears to be heavily influenced by Waze and Google Maps, and Facebook and Skype. A similar pattern applies to Music, where their popularity is biased by apps like Pandora, Shazam and Spotify.

This suggests that these categories may not be as popular as they may seem. The avagerage rating is heavily skewed by a small number of very popolar apps.

Moreover, given the high cost of development such apps, and the heavy competition within that space, developping such apps may not be desirable.

Let's look at potential genres of interest:

**Weather: 52279.9 avg reviews**

In [597]:
for app in ios_final:
     if app[11] == 'Weather':
        print(app[1], ':', app[5]) # print name and number of ratings

WeatherBug - Local Weather, Radar, Maps, Alerts : 188583
The Weather Channel: Forecast, Radar & Alerts : 495626
AccuWeather - Weather for Life : 144214
MyRadar NOAA Weather Radar Forecast : 150158
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208648
Météo-France : 24
Yurekuru Call : 53
QuakeFeed Earthquake Map, Alerts, and News : 6081
Moji Weather - Free Weather Forecast : 2333
FEMA : 128
Weather Underground: Custom Forecast & Local Radar : 49192
JaxReady : 22
Hurricane Tracker WESH 2 Orlando, Central Florida : 203
Hurricane by American Red Cross : 1158
Weather & Radar : 37
WRAL Weather Alert : 25
Yahoo Weather : 112603
Weather Live Free - Weather Forecast & Alerts : 35702
NOAA Weather Radar - Weather Forecast & HD Radar : 45696
iWeather - World weather forecast : 80
Almanac Long-Range Weather Forecast : 12
TodayAir : 0
Weather - Radar - Storm with Morecast App : 78
Storm Radar : 22792
WarnWetter : 0
wetter.com : 0
Forecast Bar : 375
Freddy the

While weather apps are generally popular, developping one may not be desirable for a variety of reasons:
1. Users don't typically spend much time in the app. A typical interaction involves opening the app and quickly glancing at the current weather and forecast. Therefore, the chances of making a large profit from ad-generated revenue is low.

2. The sucess of weather apps hinge heavily on their accuracy and reliability, which typically require accessing non-free APIs.

**Reference: 79350.5 avg reviews**

In [599]:
for app in ios_final:
     if app[11] == 'Reference':
        print(app[1], ':', app[5]) # print name and number of ratings

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
Merriam-Webster Dictionary : 16849
Google Translate : 26786
Night Sky : 12122
WWDC : 762
Jishokun-Japanese English Dictionary & Translator : 0
VPN Express : 14
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Real Bike Traffic Rider Virtual Reality Glasses : 8


While the Reference genre is predominated by apps from Dicitionary.com and the Bible, there is some potential in this category. 

1. Books and other reference apps are relatively easy to develop, both in terms of cost and time. They can even be free if it involves public domain books.

2. Because reading is involved, user spend a significant amount of time within the app. Additionally, in-app access to a dictionary and/wikipedia would also users to look up words and additional information without leaving the app.

A possible idea could be to take another popular book and turn it into an app where we could add different features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes about the book, etc. This fits well with the fact that the App Store is dominated by for-fun apps. However, the market might be saturated with entertainment apps, which means practical apps might have a better chance to stand out. A reference book where users learn a skill might be a good option, covering both genres.

### Most Popular Apps by Genre on the Google Play Store

For the Google Play market, we actually have data about the number of installs, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough — we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.):