In [1]:
from csv import reader

# Profitability of Applications: App Store and Google Play Markets

For this project, our aim is to deteremine profitable mobile app profiles. As the title suggests, we will look at applications in the Google Play Store & App Store. In doing this analysis, we can come to a data-driven conclusion which will aid developers in creating an application that generates the most revenue. 


With the majority of applications on these markets being free, the main source of revenue consists of in-app ads. This means the revenue generated from free applications is determined by the number of users. Therefore, *our goal* is to determine which *free* applications generate the most users. 

# Part I: Reading in the Data

The first step in our exploratory journey is to read in the data. We'll be using existing data for both markets.

-  The Google Play Store Data, which contains approximately ten thousand apps, is available to download from [this link](http://dq-content.s3.amazonaws.com/350/googleplaystore.csv). More details on how the data was obtained can be found [here](http://www.kaggle.com/lava18/google-play-store-apps).

-  The App Store Data, which contains approximately seven thousand apps, is available to download from [this link](http://dq-content.s3.amazonaws.com/350/AppleStore.csv). More details on how the data was obtained can be found [here](http://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps).


In [2]:
#App store data#
open_file = open('AppleStore.csv')
read_file = reader(open_file)
apple = list(read_file)
apple_header = apple[0]
apple = apple[1:]

#Google Play Store data#
open_file = open('googleplaystore.csv')
read_file = reader(open_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

To simplify things, we will create a function `explore_data()` which will allow us to explore the rows of our data in an more accessible way. It also returns the total number of rows and columns for any data set.

In [3]:
def explore_data(dataset, start, end, rows_and_columns=True):
"""
A function used to display row and columns of any dataset.
Specify indices to show required rows
Prints total row and columns count
"""
    
    
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [29]:
print(android_header)
print()
explore_data(android, 0, 2)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10840
Number of columns: 13


From above, we see that our `android` dataset has 10841 rows and 13 columns. More importantly, if we are trying to determine which _free_ applications generate the most revenue, then the columns: `'App', 'Category', 'Reviews', 'Installs', 'Type', 'Price', and 'Genres'` will aid our analysis. 

Let's now look at the _IOS_ applications.

In [5]:
print(apple_header)
print()
explore_data(apple, 0, 2)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 7197
Number of columns: 16


In a similar manner, we notice that our `apple` dataset has 7197 rows and 16 columns. In particular, key information lies in the `'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', and 'prime_genre'`  columns. More information about the column names can be found [here](http://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home).

# Part II: Deleting Wrong Data

An error within our Google Play Store data has been brought to our attention. The error is contained in row 10472. Let's print this row and attempt to fix the error. 

In [6]:
print(android[10472])  # incorrect row
print('\n')
print(android_header)  # header
print('\n')
print(android[0])      # correct row

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Although not immediately obviouse, our error lies in the _Life Made WI-Fi Touchscreen Photo Frame_ application. Based on the documentation found [here](http://www.kaggle.com/lava18/google-play-store-apps), the category for the third index is a rating out of 5. We see that the rating is 19 so we'll remove this row. The code below ensures the correct row is deleted. 

In [7]:
#delete row with data error
print(android[10472])
del android[10472]

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [8]:
#updated row
print(android[10472])
print()
print(len(android))

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']

10840


# Part III: Removing Duplicate Entries

Exploring our Google Play Store Data, we will find that several applications have more than one entry. For example, the popular social media app, Instagram, has four entries. 

In [9]:
for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Generalizing the above code, we can determine the total number of duplicate apps.

In [10]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of Duplicate apps:', len(duplicate_apps))
print()
print('Examples of Duplicate apps:', duplicate_apps[:7])

Number of Duplicate apps: 1181

Examples of Duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits']


Running the code above, we see that our `android` data has 1181 duplicate apps. Some examples include, 'Google My Business' and 'Box'.

This is a problem. We don't want to count certain applications more than once, doing so will lead to bad data. We also don't want to remove duplicates at random. 



One solution is to keep the row with the highest number of reviews (column 4). Why? The higher number of reviews, the more reliable the ratings will be.  

To do this, we will:

- Create a python dictionary where each key is a unique app name, and the value is the highest number of reviews of that app

- Use the dictionary to create a new data set, which will have one entry per app, then select the apps with highest number of reviews

We beging by building the dictionary. 

In [11]:

reviews_max = {} 

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

Now that the `reviews_max` dictionary is built, we can use it to remove the duplicate entries and create a new list `android_clean` containing no duplicates. Remember, we are only keeping the entries with the highest number of reviews. 

In [12]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)
        
explore_data(android_clean,0,3)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


Information about our cleaned data shows that we have 9,659 rows and 13 columns. From earlier, we determined the number of duplicates contained in the Google Play Store data was 1,181 and the total amount of rows was 10,841. Below, we check the length of our dictionary is equal to the difference between the length of our data set and 1,181. 

In [30]:
print('Expected length:', len(android) - 1181)
print('Actual length:', len(reviews_max))
print('Both are equal as expected')

Expected length: 9659
Actual length: 9659
Both are equal as expected


# Part IV: Removing Non-English Apps

There are no language requirements for applications on either the Google Play Store or App Store. As a result, we can find a multitude of applications designed for non-English speakers. Below, we print a few examples of such cases.

In [32]:
print(apple[813][1])
print(apple[6731][1])

print(android_clean[4412][0])
print(android_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜
中国語 AQリスニング
لعبة تقدر تربح DZ


Our next step is to remove each application whose name contains non-English text symbols. 

To do this, we'll take advantage of the ASCII standard. English text is encoded in this standard and each character has a corresponding number between 0 and 127 associated with it. We can then build a function that filters each app name and determines whether it contains non-ASCII characters. 

The function `check` below does this for us and we run sample cases to determine its reliability. 

In [34]:
tst_string1 = 'Instagram'
tst_string2 = '爱奇艺PPS -《欢乐颂2》电视剧热播'
tst_string3 = 'Docs To Go™ Free Office Suite'
tst_string4 = 'Instachat 😜'

def check(a_string):
"""
Function used to determine whether a string is NOT in English
Returns False if string contains more than 3 non-English characters
"""
    non_ascii = 0
    
    for character in a_string:
        if (ord(character) > 127):
            non_ascii += 1
    if non_ascii > 3:
        return False
    else: 
        return True

In [36]:
print(check(tst_string1))
print(check(tst_string2))
print(check(tst_string3))
print(check(tst_string4))

True
False
True
True


The function works as we need it to. Note the special case of the function. We included the condition `non_ascii > 3` as we will only remove an application if it has more than three non-ASCII characters. We use our `check()` function below to filter out the non-English apps for, both, the Google Play Store and App Store data.

In [16]:
eng_ios = []
eng_android = []

def eng_data(data_set, c_index, a_list):
"""
Using the check() function above, filters only English
apps & appends them to a_list for a specified column index
"""
    for app in data_set:
        name = app[c_index]
        if check(name):
            a_list.append(app)

eng_data(android_clean, 0, eng_android )
eng_data(apple, 1, eng_ios)

In [17]:
explore_data(eng_android,0,3)
print()
explore_data(eng_ios,0,3)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', '

We see that we are now left with 9,614 Google apps and 6,183 iOS apps.

# Part V: Isolating the Free Apps

Recall our goal was to determine which free apps generate the most revenue. Both of our data sets still contain a mixture of paid and free apps. The code below isolates the free application for both our data sets.

In [41]:
android_final = []
ios_final = []

for app in eng_android:
    price = app[7]
    if price == '0':
        android_final.append(app)
        
for app in eng_ios:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)
        
print('Number of free Android apps:',len(android_final))
print('Number of free iOS apps:', len(ios_final))

Number of free Android apps: 8864
Number of free iOS apps: 3222


# Part VI: Finding the Most Common Apps by Genre

Now that we have filtered our data down to free applications, the next step is to determine which genre attracts more users. The higher the users the more ad revenue is generated. 

To do this, we first need to figure out which app profiles are successful on both markets. 

We begin by building a frequency table for the `prime_genre` column of the App Store data. We will do the same with the Google Play Store data using the `Genres` and `Category` columns. 

However, to analyze the frequency tables we will need:

- A function to generate frequency tables that show percentages

- A function that we can use to display the percentages in descending order

In [20]:
#creates frequency table in percentages
def freq_table(dataset, index):
    table = {}
    table_percentages = {}
    
    for row in dataset:
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
            
    for value in table:
        percentage = (table[value] / len(dataset)) * 100
        table_percentages[value] = percentage 
    return table_percentages

#displays sorted frequency table in descending order
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [23]:
display_table(ios_final, 11) #ios prime_genre

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


Analyzing our table above, we see that the most common genre is under 'Games' totaling 58.16%. The 'Entertainment' category follows with roughly 8%. Photo & Video along with Education follow with 5% and 3.66%, respectively. 

Based on the above data, we can conclude, among the free English apps, the App Store is home to applications designed for fun. Moreover, practical applications aren't as popular compared to gaming apps. 

While fun apps are among the most common, that does not mean they hold the greatest number of users.

Let's take a look at the Genre and Category columns for the Google Play Store. 

In [42]:
display_table(android_final, 1) #android category

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

Taking a look at our table, the common genre in the Google Play data set is Family with 19%. As in the App Store, Games comprise a significant portion at the top with 9.7%. Compared to the App Store, Google Play has more applications for practical reasons. 

In [43]:
display_table(android_final, 9)#android, genres

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

The frequency table for Google Play Store apps using the Genre column is displayed above. Notice that the level of detail or granularity of the data is much higher compared to the Category frequency table. However, we will only continue working with the Category column. 

From our observations, we know the App Store contains more applications designed for fun, while the Google Play store contains practical applications.  

# Part VII: Finding the Most Common Apps by Genre

Our next task is to determine which applictions have the most users. To do this, we might want to calculate the average number of installs for each genre. However, there is one problem. The app store data does not have an `Installs` column like the Google Play Store data. As a result, we will use the total number of user ratings located in `rating_count_tot`. 

In [55]:
#calculates the avg # of user ratings per app genre on
#App store
genre_ios = freq_table(ios_final, -5)

for genre in genre_ios:
    tot = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre:
            num_ratings = float(app[5])
            tot += num_ratings
            len_genre += 1
    avg = tot/len_genre
    print(genre, ':', avg)

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


In [None]:
#more details 

# Part VIII: Most Popular Apps by Genre - Google Play

We want to determine genre popularity. We have the required data in order to do so e.g `Installs` category. Yet, the number of installs don't reflect the entire situation. That is, they are not precise enough. Below, we see the values aren't exact, but are open-ended. For example, we don't know whether an application has 1 million downloads or 1,234,432 downloads. In our case, we do not need precise numbers, rather we can leave the numbers as they are and just assume an application with 1 million+ installs has that exact amount.

In [28]:
display_table(android_final, 5) #installs columns

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


In order to perform calculations, we need to covert each install value, a string, into a `float`. This is readily done by removing the commas and plus characters. The loop below also computes the avg number of installs for each genre (as before). 

In [29]:
categories_android = freq_table(android_final, 1)
for category in categories_android:
    tot = 0.
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:
            num_installs = app[5]
            num_installs = num_installs.replace(',', '')
            num_installs = num_installs.replace('+', '')
            tot += float(num_installs)
            len_category += 1
    avg = tot/len_category
    print(category, ':', avg)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

Looking down the list, we see the communication applications have the most installs at 38,456,119. 

# Conclusions
In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.