# Google Play and App Store Apps Analysis

In this project, We will be diving deep into the different apps that exist in the Google Play and App store in order to identify the types of apps that are the most compelling for the users. we are a team of data analysts working at a company that develops and publishes free mobile apps.

In our company, our main source of revenue is the in-app ads, so it highly depends on the number of users. Our goal is to help the developers understand the different criteria of the apps that attract the biggest number of users.

# Opening and Exploring the Data

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

Collecting data for over 4 million apps requires a significant amount of time and money, so we'll try to analyze a sample of the data instead. To avoid spending resources on collecting new data ourselves, we should first try to see if we can find any relevant existing data at no cost. Luckily, these are two data sets that seem suitable for our goals:

- A data set containing data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018. You can download the data set directly from this [link](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv).
- A data set containing data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017. You can download the data set directly from this [link](https://dq-content.s3.amazonaws.com/350/AppleStore.csv).

In [3]:
from csv import reader

### open the Apple Store file ###
file = open('AppleStore.csv', encoding="utf8")
read_file = reader(file)
apple = list(read_file)
apple_header = apple[0]
apple_data = apple[1:]

### open the Google Play file ###
file = open('googleplaystore.csv', encoding="utf8")
read_file = reader(file)
android = list(read_file)
android_header = android[0]
android_data = android[1:]

We'll start by opening and exploring these two data sets. To make them easier for you to explore, we created a function named explore_data() that you can repeatedly use to print rows in a readable way.

In [14]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n')
### This will print the number of columns and rows if the optional value 'rows_and_columns' is True
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

print(apple_header)
print('\n')
explore_data(apple_data, 0,5,True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns: 16


We can see that the data of the apple store contains 7197 rows and 16 columns. We think that the categories that can help in our analysis are the user rating and genre.
Since not all of the column names are self explanatory, we can use this [documentation](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home) for the explanantions.

In [15]:
print(android_header)
print('\n')
explore_data(android_data, 0,5,True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Eve

We can see that the data of the goole play store contains 10841 rows and 13 columns. We think that the categories that can help in our analysis are the category, rating, type, and genre.

# Data Cleaning

## Wrong Data

Now that we have imported the data, we want to clean it. First, we are going to remove any wrong data. The [discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion) shows that the wrong data lie in the ratings of the apps only. Therefore, we are going to check the ratings of the apps to verify that they all lie between 0 and 5.


In [59]:
outside = []
inside = []
i = 0

### Android Data ###
for rows in android_data:
    review = float(rows[2])
    if review > 5:
        outside.append(review)
        app_name = rows[0]
        to_delete = i
    else:
        inside.append(review)
    i += 1
print('Android Data: \n')
print('Number of correct data: ',len(inside))
print('Number of wrong data: ',len(outside))
print('App Name: ',app_name)
print('Rating: ',outside[0])
print('index: ', to_delete)
print('Number of rows before deletion:', len(android_data))
del android_data[to_delete]
print('Number of rows after deletion:', len(android_data))

### Apple Data ###
outside = []
inside = []
for rows in apple_data:
    review = float(rows[7])
    if review > 5:
        outside.append(review)
    else:
        inside.append(review)
print('\nApple Data: \n')
print('Number of correct data: ',len(inside))
print('Number of wrong data: ',len(outside))

Android Data: 

Number of correct data:  10840
Number of wrong data:  1
App Name:  Life Made WI-Fi Touchscreen Photo Frame
Rating:  19.0
index:  10472
Number of rows before deletion: 10841
Number of rows after deletion: 10840

Apple Data: 

Number of correct data:  7197
Number of wrong data:  0


We can see above that the apple dataset do not contain any wrong data. The android data however does contain one wrong entry where the review is equal to 19. This row was deleted and thus the total number of rows in the google play dataset was reduced from 10841 to 10840

## Remove Duplicates

Some of the apps inside the datasets are repeated more than once; this will affect our future analysis. After analyzing the data, we found out that the main difference between the duplicates is the number of reviews. Therefore, we are going to remove any duplicates and keep the row that has the highest number of reviews. 

In [115]:
### Apple Data ###

# Checking for duplicates

duplicate = []
non_duplicate = []
for rows in apple_data:
    ID = rows[1]
    if ID in non_duplicate:
        duplicate.append(rows)
    else:
        non_duplicate.append(ID)

print('Apple Data:\n')
print('Number of Duplicates: ', len(duplicate))
print('Number of non Duplicates:', len(non_duplicate))

# removing duplicates by putting the app names and number of reviews in a dictionary while keeping the duplicates with the
# highest number of reviews

dictionary = {}
i = 0

for rows in apple_data:
    app_name = rows[1]
    n_reviews = rows[5]
    if app_name in dictionary and n_reviews>dictionary[app_name]:
        dictionary[app_name] = n_reviews
    elif app_name not in dictionary:
        dictionary[app_name] = n_reviews

# tansfering the results from apple_data to android_clean and using the dictionary to check the duplicates

apple_clean = []

for rows in apple_data:
    app_name = rows[1]
    n_reviews = rows[5]
    if app_name in dictionary and n_reviews == dictionary[app_name]:
        apple_clean.append(rows)

print('Number of rows in the updated list for the apple data: ',len(apple_clean))

# Verifying the elimination of duplicates

print('\nDuplicates Verification:\n')
duplicate = []
non_duplicate = []
for rows in apple_clean:
    ID = rows[0]
    if ID in non_duplicate:
        duplicate.append(rows)
    else:
        non_duplicate.append(ID)

print('Number of Duplicates: ', len(duplicate))
print('Number of non Duplicates:', len(non_duplicate))

Apple Data:

Number of Duplicates:  2
Number of non Duplicates: 7195
Number of rows in the updated list for the apple data:  7195

Duplicates Verification:

Number of Duplicates:  0
Number of non Duplicates: 7195


In [114]:
### Android Data ###

# Checking for duplicates

duplicate = []
non_duplicate = []
for rows in android_data:
    ID = rows[0]
    if ID in non_duplicate:
        duplicate.append(rows)
    else:
        non_duplicate.append(ID)

print('Android Data:\n')
print('Number of Duplicates: ', len(duplicate))
print('Number of non Duplicates:', len(non_duplicate))

# removing duplicates by putting the app names and number of reviews in a dictionary while keeping the duplicates with the
# highest number of reviews

dictionary = {}
i = 0

for rows in android_data:
    app_name = rows[0]
    n_reviews = rows[3]
    if app_name in dictionary and n_reviews>dictionary[app_name]:
        dictionary[app_name] = n_reviews
    elif app_name not in dictionary:
        dictionary[app_name] = n_reviews
        
# tansfering the results from apple_data to android_clean and using the dictionary to check the duplicates

android_clean = []
ignore = []

for rows in android_data:
    app_name = rows[0]
    n_reviews = rows[3]
    if app_name not in ignore and n_reviews == dictionary[app_name]:
        android_clean.append(rows)
        ignore.append(app_name)

print('Number of rows in the updated list for the android data: ',len(android_clean))

# Verifying the elimination of duplicates

print('\nDuplicates Verification:\n')
duplicate = []
non_duplicate = []
for rows in android_clean:
    ID = rows[0]
    if ID in non_duplicate:
        duplicate.append(rows)
    else:
        non_duplicate.append(ID)

print('Number of Duplicates: ', len(duplicate))
print('Number of non Duplicates:', len(non_duplicate))

Android Data:

Number of Duplicates:  1181
Number of non Duplicates: 9659
Number of rows in the updated list for the android data:  9659

Duplicates Verification:

Number of Duplicates:  0
Number of non Duplicates: 9659


## Removing Non-English Apps

In the previous step, we managed to remove the duplicate app entries in the Google Play data set. Remember we use English for the apps we develop at our company, and we'd like to analyze only the apps that are directed toward an English-speaking audience. However, if we explore the data long enough, we'll find that both data sets have apps with names that suggest they are not directed toward an English-speaking audience.

In [128]:
print(apple_data[813][1])
print(apple_data[6731][1])


爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜


We're not interested in keeping these apps, so we'll remove them. One way to go about this is to remove each app with a name containing a symbol that is not commonly used in English text — English text usually includes letters from the English alphabet, numbers composed of digits from 0 to 9, punctuation marks (., !, ?, ;), and other symbols (+, *, /).

The numbers corresponding to the characters we commonly use in an English text are all in the range 0 to 127, according to the ASCII (American Standard Code for Information Interchange) system. Based on this number range, we can build a function that detects whether a character belongs to the set of common English characters or not. If the number is equal to or less than 127, then the character belongs to the set of common English characters.



We will build a function that identifies the English letters and outputs 'True' if the app name contains english letters and 'False'when the app name contains a non-english character.

In [156]:
def is_english(string):
    i = 0
    for letter in string:
        if ord(letter) > 127:
            return False
    return True

True


we can see below that the function runs well, however it recognizes the '😜' and the '™' as non-english characters because these characters have corresponding numbers that are greater than 127

In [162]:
print('Function Test:\n')
print('Instagram ==> ',is_english('Instagram'))
print('爱奇艺PPS -《欢乐颂2》电视剧热播 ==> ',is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print('Docs To Go™ Free Office Suite ==> ',is_english('Docs To Go™ Free Office Suite'))
print('Instachat 😜 ==> ',is_english('Instachat 😜'))


Function Test:

Instagram ==>  True
爱奇艺PPS -《欢乐颂2》电视剧热播 ==>  False
Docs To Go™ Free Office Suite ==>  True
Instachat 😜 ==>  True


If we're going to use the function we've created, we'll lose useful data since many English apps will be incorrectly labeled as non-English. To minimize the impact of data loss, we'll only remove an app if its name has more than three characters with corresponding numbers falling outside the ASCII range. This means all English apps with up to three emoji or other special characters will still be labeled as English. Our filter function is still not perfect, but it should be fairly effective.

In [186]:
def is_english(string):
    i = 0
    for letter in string:
        if ord(letter) > 127:
            i +=1
            if i > 3:
                return False
    return True

In [182]:
print('Function Test:\n')
print('Instagram ==> ',is_english('Instagram'))
print('爱奇艺PPS -《欢乐颂2》电视剧热播 ==> ',is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print('Docs To Go™ Free Office Suite ==> ',is_english('Docs To Go™ Free Office Suite'))
print('Instachat 😜 ==> ',is_english('Instachat 😜'))

Function Test:

Instagram ==>  True
爱奇艺PPS -《欢乐颂2》电视剧热播 ==>  False
Docs To Go™ Free Office Suite ==>  True
Instachat 😜 ==>  True


As we can see above, the function works pretty well in identifying the apps with the english words, even if it contains special characters.
we will now apply this function to filter out the applications that contain non-english names from our datasets.

In [187]:
### Apple Data ###

apple_english = []
apple_non_english = []

for rows in apple_clean:
    app_name = rows[1]
    if is_english(app_name):
        apple_english.append(rows)
    else:
        apple_non_english.append(rows)

print('Number of Apps in the App Store Dataset with an English Name: ',len(apple_english))
print('Number of Apps in the App Store Dataset with a Non-English Name: ',len(apple_non_english))
    

Number of Apps in the App Store with an English Name:  6181
Number of Apps in the App Store with a Non-English Name:  1014


In [188]:
### Android Data ###

android_english = []
android_non_english = []

for rows in android_clean:
    app_name = rows[0]
    if is_english(app_name):
        android_english.append(rows)
    else:
        android_non_english.append(rows)

print('Number of Apps in the Google Play Store Dataset with an English Name: ',len(android_english))
print('Number of Apps in the Google Play Store Dataset with a Non-English Name: ',len(android_non_english))
    

Number of Apps in the Google Play Store with an English Name:  9614
Number of Apps in the Google Play Store with a Non-English Name:  45


Now we are left with 9614 Android Apps and 6181 ios Apps

## Isolating the Free Apps

So far in the data cleaning process, we:

- Removed inaccurate data
- Removed duplicate app entries
- Removed non-English apps

As we mentioned in the introduction, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. Our data sets contain both free and non-free apps; we'll need to isolate only the free apps for our analysis.

Isolating the free apps will be our last step in the data cleaning process. On the next screen, we're going to start analyzing the data.

In [196]:
### Apple Data ###

apple_free = []
apple_paid = []

for rows in apple_english:
    price = float(rows[4])
    if price == 0:
        apple_free.append(rows)
    else:
        apple_paid.append(rows)

print('Number of Free Apps in the App Store Dataset: ',len(apple_free))
print('Number of Paid Apps in the App Store Dataset: ',len(apple_paid))

Number of Free Apps in the App Store Dataset:  3220
Number of Paid Apps in the App Store Dataset:  2961


In [208]:
### Android Data ###

android_free = []
android_paid = []

for rows in android_english:
    price = rows[7]
    if price == '0':
        android_free.append(rows)
    else:
        android_paid.append(rows)

print('Number of Free Apps in the Google Play Store Dataset: ',len(android_free))
print('Number of Paid Apps in the Google Play Store Dataset: ',len(android_paid))



Number of Free Apps in the Google Play Store Dataset:  8862
Number of Paid Apps in the Google Play Store Dataset:  752


So far, we spent a good amount of time on cleaning data, and:

- Removed inaccurate data
- Removed duplicate app entries
- Removed non-English apps
- Isolated the free apps

We are left with 8862 apps in the Google Play Store and 3220 Apps in the App Store.

## Strategy

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

- Build a minimal Android version of the app, and add it to Google Play.
- If the app has a good response from users, we develop it further.
- If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets. For instance, a profile that works well for both markets might be a productivity app that makes use of gamification.

## Most Common Genres

Let's begin the analysis by getting a sense of what are the most common genres for each market. For this, we'll need to build frequency tables for a few columns in our data sets.

We'll build two functions we can use to analyze the frequency tables:

- One function to generate frequency tables that show percentages
- Another function we can use to display the percentages in a descending order

In [246]:
# Function that creates a dictionary containing the genre and the percentage

def freq_table(dataset, index):
    dictionary = {}
    for rows in dataset:
        column = rows[index]
        if column in dictionary:
            dictionary[column] += ( 1 / len(dataset) )* 100
        else:
            dictionary[column] = ( 1 / len(dataset) ) * 100
    return dictionary

# Function that arranges the values in a list of tuples in order to sort them 
# out

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])


Now we will check the output of the functions that we have created

In [248]:
print('Apple Store Results:\n')
display_table(apple_free,-5)
print('\nGoogle Play Store Results:\n')
display_table(android_free,1)


Apple Store Results:

Games : 58.13664596273457
Entertainment : 7.888198757764019
Photo & Video : 4.968944099378889
Education : 3.66459627329192
Social Networking : 3.2919254658385046
Shopping : 2.6086956521739095
Utilities : 2.5155279503105556
Sports : 2.14285714285714
Music : 2.0496894409937862
Health & Fitness : 2.0186335403726683
Productivity : 1.7391304347826066
Lifestyle : 1.5838509316770168
News : 1.3354037267080732
Travel : 1.2422360248447193
Finance : 1.1180124223602474
Weather : 0.8695652173913038
Food & Drink : 0.8074534161490678
Reference : 0.5590062111801242
Business : 0.5279503105590062
Book : 0.43478260869565216
Navigation : 0.18633540372670807
Medical : 0.18633540372670807
Catalogs : 0.12422360248447205

Google Play Store Results:

FAMILY : 18.9347777025507
GAME : 9.69307154141295
TOOLS : 8.451816745655742
BUSINESS : 4.592642744301517
LIFESTYLE : 3.904310539381615
PRODUCTIVITY : 3.893026404874732
FINANCE : 3.7011961182577164
MEDICAL : 3.5206499661475843
SPORTS : 3.39652

From the results above, we can see that in the App Store, 58% of the apps are games apps, followed by about 8% for the entertainment apps, then 5% for the photo and video apps. This means that the majority of the apps on the App Store are dedicated for entertainment, and that not a lot of apps are dedicated for productivity, education, and health...

On the other hand, we can see that in the Google Play Store, 20% of the apps are family apps, followed by 10% for games apps, then 8% for tools, 5% business, and 4% lifestyle. We can notice here that the percentages of the genres here are less scattered, and there is no dominance of one genre over the other, with the exception of the family and games (even those two constitute 30% of the total compared to 64% for the top 2 genres in the App Store). We can also notice that the type of the apps on the Google Store are more related to productivity, lifestyle, and education. This can be confirmed by checking the 'genres' column in the Google Play Store dataset, which is much more granular than the 'category' column.

In [249]:
print('\nGoogle Play Store Results:\n')
display_table(android_free,-4)



Google Play Store Results:

Tools : 8.440532611148859
Entertainment : 6.070864364703282
Education : 5.348679756262725
Business : 4.592642744301517
Productivity : 3.893026404874732
Lifestyle : 3.893026404874732
Finance : 3.7011961182577164
Medical : 3.5206499661475843
Sports : 3.464229293613168
Personalization : 3.3175355450236856
Communication : 3.238546603475503
Action : 3.1031369893929037
Health & Fitness : 3.080568720379137
Photography : 2.945159106296538
News & Magazines : 2.7984653577070557
Social : 2.6630557436244566
Travel & Local : 2.324531708417959
Shopping : 2.245542766869776
Books & Reference : 2.1439855563078267
Simulation : 2.0424283457458774
Dating : 1.861882193635745
Arcade : 1.8505980591288618
Video Players & Editors : 1.771609117580679
Casual : 1.7490408485669124
Maps & Navigation : 1.3992326788535314
Food & Drink : 1.2412547957571658
Puzzle : 1.1284134506883332
Racing : 0.9930038366057341
Role Playing : 0.9365831640713173
Libraries & Demo : 0.9365831640713173
Auto & 


However, a dominance of a certain genre does not mean that this genre has the most users.

Up to this point, we found that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps. Now we'd like to get an idea about the kind of apps that have most users.



## Most Popular Apps by Genre on the App Store


The frequency tables we analyzed on the previous screen showed us that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and fun apps. Now, we'd like to get an idea about the kind of apps with the most users.

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot app.

Let's start with calculating the average number of user ratings per app genre on the App Store. To do that, we'll need to:

- Isolate the apps of each genre.
- Sum up the user ratings for the apps of that genre.
- Divide the sum by the number of apps belonging to that genre (not by the total number of apps).

In [274]:
x = freq_table(apple_free,-5)
tuple_list = []
for genre in x:
    total = 0
    len_genre = 0
    for rows in apple_free:
        rating = int(rows[5])
        genre_app = rows[-5]
        if genre == genre_app:
            total = total + rating
            len_genre = len_genre + 1
    y = ( total / len_genre, genre)
    tuple_list.append(y)
    
# Sort the results

table_sorted = sorted(tuple_list, reverse = True)
for entry in table_sorted:
    print(entry[1], ':', entry[0])

Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22812.903311965812
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0


As we can see from the table above, the average number of ratings per app genre is highest in the Navigation genre with close to 86 thousand ratings per app, followed by Reference with 75 thousand, then Social Networking at 72 thousand, Music at 57 thousand, and weather at 52 thousand.

In order to verify our findings we need to see whether the results are skewed by seeing the number of ratings of each app inside these top genres.

In [278]:
for rows in apple_free:
    if rows[-5] == 'Navigation':
        print(rows[1],':',rows[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


In the Navigation genre, most of the ratings are dominated by Waze and Google Maps, which means that this genre is very hard to enter.

In [280]:
for rows in apple_free:
    if rows[-5] == 'Reference':
        print(rows[1],':',rows[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


In the Reference genre, the Bible and Dictionary apps have most of the ratings. However, the rest of the apps have almost the same amount of ratings (up to 18 thousand), which means that this might be a good market to compete in.

In [282]:
for rows in apple_free:
    if rows[-5] == 'Social Networking':
        print(rows[1],':',rows[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

The Social Networking genre is dominated by the big players like Facebook and Pinterest. However, we can compete with the less popular Social Networking apps which have up to 30 thousand ratings.

In [285]:
for rows in apple_free:
    if rows[-5] == 'Music':
        print(rows[1],':',rows[5])

Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
TIDAL : 7398
YouTube Mu

Music apps have a good amount of apps that have up to 80 thousand ratings and vary in utility. This might be a good market to step into.

Since the App Store is dominated with apps that aim for fun and games, we can consider making an educational and fun app that goes into the Reference genre. This app might contain educational material that are gamified to attract the user.

## Most Popular Apps by Genre on Google Play

In the previous part, we came up with an app profile recommendation for the App Store based on the number of user ratings. We have data about the number of installs for the Google Play market, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough — we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.):

For instance, we don't know whether an app with 100,000+ installs has 100,000 installs, 200,000, or 350,000. However, we don't need very precise data for our purposes — we only want to find out which app genres attract the most users, and we don't need perfect precision with respect to the number of users.

We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on. To perform computations, however, we'll need to convert each install number from string to float. This means we need to remove the commas and the plus characters, otherwise the conversion will fail and raise an error.

In [317]:
x = freq_table(android_free,1)
tuple_list = []
for genre in x:
    total = 0
    len_category = 0
    for rows in android_free:
        rating = rows[5]
        genre_app = rows[1]
        if genre == genre_app:
            rating = rating.replace(',','')
            rating = rating.replace('+','')
            total = total + int(rating)
            len_category = len_category + 1
    y = ( total / len_category, genre)
    tuple_list.append(y)
    
# Sort the results

table_sorted = sorted(tuple_list, reverse = True)
for entry in table_sorted:
    print(entry[1], ':', entry[0])

COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17805627.643678162
PRODUCTIVITY : 16787331.344927534
GAME : 15560965.599534342
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10682301.033377837
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8767811.894736841
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3694276.334922527
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1820673.076923077
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 51315

As we can see from the table above, the average number of ratings per app genre is highest in the Communication genre with almost 38 million installs per app, followed by Video Players with 25 million, then Social at 23 million, Photography at 17 million, and game at 16 million.

In order to verify our findings we need to see whether the results are skewed by seeing the number of ratings of each app inside these top genres.

In [322]:
for rows in android_free:
    if rows[1] == 'COMMUNICATION' and (rows[5] == '1,000,000,000+'
                                      or rows[5] == '500,000,000+'
                                      or rows[5] == '100,000,000+'):
        print(rows[0], ':', rows[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

In [323]:
for rows in android_free:
    if rows[1] == 'VIDEO_PLAYERS' and (rows[5] == '1,000,000,000+'
                                      or rows[5] == '500,000,000+'
                                      or rows[5] == '100,000,000+'):
        print(rows[0], ':', rows[5])

YouTube : 1,000,000,000+
Motorola Gallery : 100,000,000+
VLC for Android : 100,000,000+
Google Play Movies & TV : 1,000,000,000+
MX Player : 500,000,000+
Dubsmash : 100,000,000+
VivaVideo - Video Editor & Photo Movie : 100,000,000+
VideoShow-Video Editor, Video Maker, Beauty Camera : 100,000,000+
Motorola FM Radio : 100,000,000+


In [325]:
for rows in android_free:
    if rows[1] == 'SOCIAL' and (rows[5] == '1,000,000,000+'
                                      or rows[5] == '500,000,000+'
                                      or rows[5] == '100,000,000+'):
        print(rows[0], ':', rows[5])

Facebook : 1,000,000,000+
Facebook Lite : 500,000,000+
Tumblr : 100,000,000+
Pinterest : 100,000,000+
Google+ : 1,000,000,000+
Badoo - Free Chat & Dating App : 100,000,000+
Tango - Live Video Broadcast : 100,000,000+
Instagram : 1,000,000,000+
Snapchat : 500,000,000+
LinkedIn : 100,000,000+
Tik Tok - including musical.ly : 100,000,000+
BIGO LIVE - Live Stream : 100,000,000+
VK : 100,000,000+


In [327]:
for rows in android_free:
    if rows[1] == 'PHOTOGRAPHY' and (rows[5] == '1,000,000,000+'
                                      or rows[5] == '500,000,000+'
                                      or rows[5] == '100,000,000+'):
        print(rows[0], ':', rows[5])

B612 - Beauty & Filter Camera : 100,000,000+
YouCam Makeup - Magic Selfie Makeovers : 100,000,000+
Sweet Selfie - selfie camera, beauty cam, photo edit : 100,000,000+
Google Photos : 1,000,000,000+
Retrica : 100,000,000+
Photo Editor Pro : 100,000,000+
BeautyPlus - Easy Photo Editor & Selfie Camera : 100,000,000+
PicsArt Photo Studio: Collage Maker & Pic Editor : 100,000,000+
Photo Collage Editor : 100,000,000+
Z Camera - Photo Editor, Beauty Selfie, Collage : 100,000,000+
PhotoGrid: Video & Pic Collage Maker, Photo Editor : 100,000,000+
Candy Camera - selfie, beauty camera, photo editor : 100,000,000+
YouCam Perfect - Selfie Photo Editor : 100,000,000+
Camera360: Selfie Photo Editor with Funny Sticker : 100,000,000+
S Photo Editor - Collage Maker , Photo Collage : 100,000,000+
AR effect : 100,000,000+
Cymera Camera- Photo Editor, Filter,Collage,Layout : 100,000,000+
LINE Camera - Photo editor : 100,000,000+
Photo Editor Collage Maker Pro : 100,000,000+


In [328]:
for rows in android_free:
    if rows[1] == 'PRODUCTIVITY' and (rows[5] == '1,000,000,000+'
                                      or rows[5] == '500,000,000+'
                                      or rows[5] == '100,000,000+'):
        print(rows[0], ':', rows[5])

Microsoft Word : 500,000,000+
Microsoft Outlook : 100,000,000+
Microsoft OneDrive : 100,000,000+
Microsoft OneNote : 100,000,000+
Google Keep : 100,000,000+
ES File Explorer File Manager : 100,000,000+
Dropbox : 500,000,000+
Google Docs : 100,000,000+
Microsoft PowerPoint : 100,000,000+
Samsung Notes : 100,000,000+
SwiftKey Keyboard : 100,000,000+
Google Drive : 1,000,000,000+
Adobe Acrobat Reader : 100,000,000+
Google Sheets : 100,000,000+
Microsoft Excel : 100,000,000+
WPS Office - Word, Docs, PDF, Note, Slide & Sheet : 100,000,000+
Google Slides : 100,000,000+
ColorNote Notepad Notes : 100,000,000+
Evernote – Organizer, Planner for Notes & Memos : 100,000,000+
Google Calendar : 500,000,000+
Cloud Print : 500,000,000+
CamScanner - Phone PDF Creator : 100,000,000+


As we can see from the information above, the Communication category is dominated by the big names such as Whatsapp and Messenger. The popular applications in the Productivity category contain mostly applications developed by Google and Microsoft. In the Photography category however, we can see that we are able to penetrate the market since it is not saturated, and the popular apps are not dominated by the big names 

## Conclusion

After analyzing both the App Store and the Google Play Store, we concluded that we should develop an application that is suitable for both stores since there are some differences between the two stores. So we need to develop an application that is related to books and fun to be suitable for the App Store, and it should also be related to Photography to attract the users in the Google Store. So the app can be something that combines learning and photography. The app might have several games where the user is asked to photograph an item and guess its name in a foreign language that the user wants to learn. We think that this type of application would be successful in the free apps market since it would attract a lot of users.