# App Market Analysis for Maximizing User Engagement

### Introduction
This project focuses on analyzing mobile app market data to identify the types of free apps that attract the most users. Since the primary revenue source is through in-app advertisements, understanding user preferences and market trends is crucial. The goal is to leverage this analysis to guide app development strategies, ensuring that the apps created have the highest potential for user engagement and revenue generation.

### Data Cleaning and Exploratory Data Analysis
Given the vast number of apps available as of September 2018—approximately 2 million on the App Store and 2.1 million on Google Play—analyzing data for over 4 million apps is impractical due to the significant time and cost involved. Instead, we will focus on a representative sample of the data. To optimize our resources, we have identified two existing datasets that are well-suited for our analysis:

- A dataset of approximately 10,000 Android apps from Google Play, collected in August 2018.
- A dataset of approximately 7,000 iOS apps from the App Store, collected in July 2017.

These datasets provide a manageable yet comprehensive sample for conducting exploratory data analysis (EDA). The EDA will allow us to uncover patterns, identify trends, and gain insights into the types of apps that are most likely to attract users. By analyzing this data, we aim to guide our app development strategy toward maximizing user engagement and revenue generation.

In [1]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [2]:
from csv import reader

#The App Store data
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

#Google Play data
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

In [3]:
explore_data(ios, 1, 5, True)
print(ios_header)

['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns: 16
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


#### <center> iOS Apps Column Description

| Column Name  | Description  | 
| :---|---:|
|'id'   |App ID   |
| 'track_name'  |Application Name   |
| 'size_bytes'  |Memory size (in Bytes)   |
|'currency   | Currency Type  |
|'price   | Price Amount  |
| 'rating_count_tot'  | User Rating counts (for all version)   |
| 'rating_count_ver'  | User Rating counts (for current version) |
|'user_rating'   | Average User Rating (for all version)   |
| 'user_rating_ver'|Average User Rating value (for current version)|
| 'ver'  | Latest version code  |
|'cont_rating'   | Content Rating   |
| 'prime_genre'  | Primary Genre  |
| 'sup_devices.num'  | Number of supporting devices   |
|'ipadSc_urls.num'   | Number of screenshots showed for display  |
| 'lang.num'  | Number of supported languages   |
|'vpp_lic'   | Vpp Device Based Licensing Enabled   |

<center> Source: https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps

This dataset includes 7,197 iOS apps. The columns that appear to be most useful are 'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', and 'prime_genre'.

In [4]:
explore_data(android, 1, 5, True)
print(android_header)

['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 10841
Number of columns: 13
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


#### <center> Android Apps Column Description
    
| Column Name  | Description  | 
|:---|---:|
|'App'   |Application Name   |
|'Category'   |Category the app belongs to   |
| 'Rating'  | Overall user rating of the app (as when scraped)  |
|'Reviews'   |Number of user reviews (as when scraped)   |
|'Size'   | Size of the app (as when scraped)  |
|'Installs'   |Number of user downloads/installs for the app (as when scraped)   |
|'Type'   | Paid or Free  |
| 'Price'  |Price of the app (as when scraped)   |
|'Content Rating'   |Age group the app is target at - Children/ Mature 21+/ Adult   |
|'Genres'   |An app can belong to multipe genres (apart from its main category)   |
|'Last Updated'   |  N/A |
|'Current Ver'   | N/A  |
| 'Android Ver'  | N/A  |
    
<center> Source: https://www.kaggle.com/datasets/lava18/google-play-store-apps  

The Google Play dataset contains 10,841 apps and 13 columns. The most relevant columns for our analysis appear to be 'App', 'Category', 'Reviews', 'Installs', 'Type', 'Price', and 'Genres'.

#### Wrong Data

In [5]:
len(android)

10841

In [6]:
#Wrong Data
print(android_header)
print(android[10472])

#Deleting the wrong data
del android[10472]
len(android)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


10840

#### Duplicate Data

In [7]:
duplicate_android_apps = []
unique_android_apps = []

for app in android:
    name = app[0]
    if name in unique_android_apps:
        duplicate_android_apps.append(name)
    else:
        unique_android_apps.append(name)

print('Number of duplicate android apps:', len(duplicate_android_apps))
print('\n')
print('Examples of duplicate android apps:', duplicate_android_apps[:10])

Number of duplicate android apps: 1181


Examples of duplicate android apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


In [8]:
duplicate_ios_apps = []
unique_ios_apps = []

for app in ios:
    name = app[0]
    if name in unique_ios_apps:
        duplicate_ios_apps.append(name)
    else:
        unique_ios_apps.append(name)

print('Number of duplicate ios apps:', len(duplicate_ios_apps))
print('\n')
print('Examples of duplicate ios apps:', duplicate_ios_apps[:10])

Number of duplicate ios apps: 0


Examples of duplicate ios apps: []


The dataset shows that there are 1,181 duplicate Android apps, such as 'Quick PDF Scanner + OCR FREE', 'Google My Business', and 'ZOOM Cloud Meetings'. On the other hand, the iOS dataset has no duplicate apps

In [9]:
#Removing duplicates
reviews_max = {}
for element in android:
    name = element[0]
    n_reviews = float(element[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

len(reviews_max)

9659

In [10]:
android_clean = []
already_added = []
for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

print(android_clean[0:4])
print('\n')
print(already_added[0:4])

[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'], ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']]


['Photo Editor & Candy Camera & Grid & ScrapBook', 'U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'Sketch - Draw & Paint', 'Pixel Draw - Number Art Coloring Book']


In [11]:
len(android_clean)

9659

#### Addressing Non English Apps

In this project, the goal is to focus on analyzing apps intended for an English-speaking audience. Since we're only interested in apps that are relevant to this audience, it's important to filter out any non-English apps from the datasets. One way to do this is by checking the app names for characters that aren't commonly used in English. In English, characters typically fall within a certain range in the ASCII system, from 0 to 127. If an app name contains characters outside this range, it likely means the app isn't designed for English speakers. We'll use this method to identify and remove non-English apps, ensuring that our analysis stays focused on the target audience.

In [12]:
def character_id_test(string):
    for character in string:
        if ord(character) > 127:
            return False
    return True

##test
print(character_id_test('Instachat 😜'))
print("The ord value for 😜 in Instachat 😜 is:", ord('😜'))
print(character_id_test('Instagram'))
print(character_id_test('Docs To Go™'))

False
The ord value for 😜 in Instachat 😜 is: 128540
True
False


The function initially designed to detect non-English app names ran into issues when it mislabeled some English apps, like 'Docs To Go™ Free Office Suite' and 'Instachat 😜,' as non-English. This happened because special characters like ™ and emojis are outside the standard ASCII range, leading the function to mistakenly exclude these apps. To avoid losing valuable data, the function was adjusted so that it only removes apps if their names contain more than three characters outside the ASCII range. This way, English apps with a few special characters or emojis are still correctly recognized as English.

In [13]:
#Change the character_id_test function
def character_id(string):
    non_ascii = 0
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    if non_ascii > 3:
        return False
    else:
        return True

##test
print(character_id('Instachat 😜'))
print(character_id('Instagram'))
print(character_id('Docs To Go™'))

True
True
True


In [14]:
##Now that the function to identify apps with English names, it can be used on the datasets

android_en = []
ios_en = []

for app in android_clean:
    name = app[0]
    if character_id(name):
        android_en.append(app)

for app in ios:
    name = app[1]
    if character_id(name):
        ios_en.append(app)

print("Cleaned Android Data For Apps With English Names:")
print(android_en[0:5])
print('\n')
print("Cleaned IOS Data For Apps With English Names:")
print(ios_en[0:5])

Cleaned Android Data For Apps With English Names:
[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'], ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'], ['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']]


Cleaned IOS Data For Apps With English Name

#### Identifying Free Apps

Since the company focuses on building free apps with revenue generated from in-app ads, the analysis will require isolating only the free apps from the datasets, as they currently include both free and paid apps.

In [15]:
android_final = []
ios_final = []

for app in android_en:
    name = app[0]
    price = app[7]
    if price == '0' or price == '0.0':
        android_final.append(app)
        
for app in ios_en:
    name = app[1]
    price = app[4]
    if price == '0' or price == '0.0':
        ios_final.append(app)

print('Final Cleaned Android App Data:')
explore_data(android_final, 1, 5, True)
print('\n')
print('Final Cleaned iOS App Data:')
explore_data(ios_final, 1, 5, True)

Final Cleaned Android App Data:
['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 8864
Number of columns: 13


Final Cleaned iOS App Data:
['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans'

### Data Analysis

The goal is to figure out which types of apps are most likely to attract a large number of users, as this directly impacts revenue. To keep risks low and costs manageable, the strategy for validating a new app idea involves three steps: 
 - Create a minimal Android version and release it on Google Play
 - If users respond well, continue developing the app
 - If the app is profitable after six months, develop an iOS version and release it on the App Store. 

Since the ultimate goal is to have the app succeed on both platforms, it's important to identify app profiles that perform well in both markets. For example, a productivity app with gamification features might do well on both Google Play and the App Store. To start, the analysis will look at the most common genres in each market by building frequency tables for key columns in the datasets.

#### Identifying Most Common Genres

In [16]:
def freq_table(dataset, index):
    frequency_table = {}
    for row in dataset:
        value = row[index]
        if value in frequency_table:
            frequency_table[value] += 1
        else:
            frequency_table[value] = 1
    
    freq_percentages = {}
    for key in frequency_table:
        percentage = (frequency_table[key] / sum(frequency_table.values())) * 100
        freq_percentages[key] = percentage 
            
    return frequency_table, freq_percentages

In [17]:
def display_table(dataset, index):
    frequency_table, freq_percentages = freq_table(dataset, index)
    table_display = []
    for key in freq_percentages:
        key_val_as_tuple = (freq_percentages[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [18]:
print("Frequency Table For iOS 'prime_genre' column:")
display_table(ios_final, 11)

Frequency Table For iOS 'prime_genre' column:
Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


In the App Store dataset, "Games" are by far the most common genre, making up over 58% of all apps. The next most common genre is "Entertainment," but it’s much less frequent at around 8%. This shows that the App Store is largely geared towards entertainment, especially gaming, with fewer apps focused on practical uses like education, shopping, or productivity.

Given the heavy focus on games, it might seem like a good idea to develop a gaming app for the App Store. However, the high number of games also means a lot of competition, which could make it hard for new apps to gain traction. Just because a genre has a lot of apps doesn’t necessarily mean that every app in that genre will attract a lot of users.

In [19]:
print("Frequency Table For Android 'Category' column:")
display_table(android_final, 1)

print("Frequency Table For Android 'Genres' column:")
display_table(android_final, 9)

Frequency Table For Android 'Category' column:
FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
P

In the Google Play data, the most common categories are "Family" and "Games," with "Tools" also being significant. The Genres column reflects this diversity, showing a mix of practical and entertainment apps, such as tools, entertainment, and education. Compared to the App Store, Google Play has a wider range of app types, with a balance between practical and entertainment-focused apps. This suggests that the Android market offers more opportunities in different areas beyond just games, including tools and business apps.

When comparing the Genres and Category columns in the Google Play data, the patterns align, with both highlighting a variety of app types available. Unlike the App Store, where entertainment dominates, Google Play has a more even mix. Based on this, focusing on a utility or productivity app for Google Play could be a good strategy, as these areas seem less crowded compared to games on the App Store. However, it’s important to note that these frequency tables show the most common genres, not necessarily the ones with the most users, so further analysis would be needed to find out which genres actually attract the largest audiences.

#### Identifying Apps With Most Users

The frequency tables showed that the App Store is mostly filled with fun apps, while Google Play has a more balanced mix of practical and entertainment apps. Now, to figure out which types of apps have the most users, we’ll calculate the average number of installs for each genre. For Google Play, we can use the Installs column, but since the App Store data doesn’t include install numbers, we’ll use the total number of user ratings as a stand-in for popularity.

##### Most Popular Apps By Genre on iOS (using average number of user rating)

In [20]:
frequency_table, freq_percentages = freq_table(ios_final, 11)
for genre in freq_percentages:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[11]
        if genre_app == genre:
            ratings = float(app[5])
            total += ratings
            len_genre += 1
    avg_rating = total/len_genre
    print(genre,':',avg_rating)

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


The results show that Navigation, Reference, and Social Networking apps have the highest average number of user ratings with 86,090, and 74,942 and 71,548 respectively. 

This suggests that users tend to engage more with apps in these categories, likely because they’re useful in everyday life. Social networking apps are popular because they help people stay connected, while Reference apps are handy for quick information. Navigation apps are essential for travel and getting around, which explains why they’re so frequently used and rated.

For developers, this information is helpful in identifying which types of apps are most likely to attract a large number of users. If the goal is to develop an app that will have high user interaction, focusing on these genres might be a good strategy. However, it’s also worth noting that high ratings could mean there’s a lot of competition in these areas.

In [21]:
for app in ios_final:
    if app[11] == 'Navigation':
        print(app[1], ':', app[5]) # print name and number of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


In [22]:
for app in ios_final:
    if app[11] == 'Social Networking':
        print(app[1], ':', app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

In [23]:
for app in ios_final:
    if app[11] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


##### Most Popular Apps By Category on Android (using average number of user rating)

In [24]:
display_table(android_final, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


The app profile recommendation for the App Store was based on user ratings, but for Google Play, the number of installs provides insight into genre popularity. Although the install numbers are not very precise, with many values being open-ended (e.g., 100+, 1,000+, 5,000+), they can still be useful. To identify which genres attract the most users, they'll assume that an app with 100,000+ installs has exactly 100,000 installs, and so on. This method will provide an understanding of which app genres are the most popular on Google Play.

In [25]:
frequency_table, freq_percentages = freq_table(android_final, 1)
for category in freq_percentages:
    total = 0
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:
            installs = app[5]
            installs = installs.replace(',', '')
            installs = installs.replace('+', '')
            total += float(installs)
            len_category += 1
    avg_installs = total/len_category
    print(category,':',avg_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

The analysis of average installs for different app categories on Google Play highlights some interesting patterns in user interest. "Communication," "Video Players," and "Social" apps are among the most popular, with "Communication" apps averaging over 38 million installs, "Video Players" around 24.7 million, and "Social" apps over 23 million. These categories are popular because they meet everyday needs—people use communication and social apps to stay connected, and video players for entertainment.

While "Games" are also widely installed, averaging about 15.5 million installs, they aren't the top category on Google Play, unlike on the App Store where games are the most dominant. Instead, categories like "Photography" and "Productivity" also have a strong following, with average installs of 17.8 million and 16.7 million, respectively. This suggests that users on Google Play are not only interested in entertainment but also in apps that help them be productive or creative.

Based on these findings, a good app idea for Google Play might be one that falls into the "Productivity" or "Photography" categories. These types of apps attract a lot of users and could also do well on the App Store. For example, a productivity app that includes creative features like photo editing or project management tools could appeal to a wide audience on both platforms, combining the practicality that users value with the creative features that they enjoy.

##### Most Popular Apps By Genre on Android (using user rating)

In [26]:
freq_table, genre_freq_table = freq_table(android_final, 9)

In [30]:
genre_ratings = {}

for row in android_final:
    genre = row[9]  #genre index 9
    rating = float(row[2])  # rating index 2
    
    if genre in genre_ratings:
        genre_ratings[genre]['total_ratings'] += rating
        genre_ratings[genre]['count'] += 1
    else:
        genre_ratings[genre] = {'total_ratings': rating, 'count': 1}

# average rating for each genre
avg_genre_ratings = {}

for genre, values in genre_ratings.items():
    avg_rating = values['total_ratings'] / values['count']
    avg_genre_ratings[genre] = avg_rating
    print(genre, ':', avg_rating)

Art & Design : nan
Art & Design;Creativity : 4.35
Auto & Vehicles : nan
Beauty : nan
Books & Reference : nan
Business : nan
Comics : nan
Comics;Creativity : 4.8
Communication : nan
Dating : nan
Education : nan
Education;Creativity : 4.375
Education;Education : 4.303333333333332
Education;Pretend Play : 4.1
Education;Brain Games : 4.433333333333334
Entertainment : nan
Entertainment;Brain Games : 4.3
Entertainment;Creativity : 4.533333333333333
Entertainment;Music & Video : 4.180000000000001
Events : nan
Finance : nan
Food & Drink : nan
Health & Fitness : nan
House & Home : nan
Libraries & Demo : nan
Lifestyle : nan
Lifestyle;Pretend Play : 4.0
Card : nan
Arcade : nan
Puzzle : nan
Racing : nan
Sports : nan
Casual : nan
Simulation : nan
Adventure : nan
Trivia : nan
Action : nan
Word : nan
Role Playing : nan
Strategy : nan
Board : nan
Music : nan
Action;Action & Adventure : 4.288888888888888
Casual;Brain Games : 4.475
Educational;Creativity : 4.2
Puzzle;Brain Games : 4.333333333333332
Educ

average user ratings for different genres in the Google Play dataset shows that users tend to give higher ratings to apps that are creative or educational. For example, genres like "Art & Design; Creativity," "Educational; Brain Games," and "Simulation; Education" all have average ratings above 4.3, indicating that people really enjoy apps that combine learning with creative activities. Similarly, apps that mix entertainment with action, such as "Sports; Action & Adventure" and "Strategy; Action & Adventure," also receive high ratings, often above 4.5.

On the other hand, many practical categories like "Business," "Finance," and "Productivity" have missing data (shown as "nan" in the results), which might mean that there aren't many apps in these categories or that users aren't as engaged with them.

Overall, it seems that apps focused on creativity, education, and interactive entertainment are particularly well-liked by users. These types of apps could be good areas to explore for developing new apps that people will enjoy and potentially spend money on through in-app purchases or subscriptions.

### Conclusion
This project gave us useful insights into the App Store and Google Play markets. We found that the App Store is mostly dominated by games, while Google Play has a more varied mix of popular app types, including communication, productivity, and photography apps.

Based on these findings, a good strategy would be to develop an app that blends practical features with creative or entertainment elements. For example, a productivity app with photo editing or project management features could do well on both platforms.

Overall, understanding these trends can help guide app development to better meet user preferences and increase the chances of success in both markets.

In [31]:
def avg_rating_per_genre(dataset, genre_index, rating_index):
    genre_ratings = {}
    
    for row in dataset:
        genre = row[genre_index]
        rating = float(row[rating_index])
        
        if genre in genre_ratings:
            genre_ratings[genre].append(rating)
        else:
            genre_ratings[genre] = [rating]
    
    avg_ratings = {}
    for genre, ratings in genre_ratings.items():
        avg_ratings[genre] = sum(ratings) / len(ratings)
    
    return avg_ratings

# Calculate the average rating per genre
avg_genre_ratings = avg_rating_per_genre(android_final, 9, 2)  # 9 is the 'Genre' column index, 2 is the 'Rating' column index

# Print the average ratings for each genre
for genre, avg_rating in avg_genre_ratings.items():
    print(f"{genre}: {avg_rating}")


Art & Design: nan
Art & Design;Creativity: 4.35
Auto & Vehicles: nan
Beauty: nan
Books & Reference: nan
Business: nan
Comics: nan
Comics;Creativity: 4.8
Communication: nan
Dating: nan
Education: nan
Education;Creativity: 4.375
Education;Education: 4.303333333333332
Education;Pretend Play: 4.1
Education;Brain Games: 4.433333333333334
Entertainment: nan
Entertainment;Brain Games: 4.3
Entertainment;Creativity: 4.533333333333333
Entertainment;Music & Video: 4.180000000000001
Events: nan
Finance: nan
Food & Drink: nan
Health & Fitness: nan
House & Home: nan
Libraries & Demo: nan
Lifestyle: nan
Lifestyle;Pretend Play: 4.0
Card: nan
Arcade: nan
Puzzle: nan
Racing: nan
Sports: nan
Casual: nan
Simulation: nan
Adventure: nan
Trivia: nan
Action: nan
Word: nan
Role Playing: nan
Strategy: nan
Board: nan
Music: nan
Action;Action & Adventure: 4.288888888888888
Casual;Brain Games: 4.475
Educational;Creativity: 4.2
Puzzle;Brain Games: 4.333333333333332
Educational;Education: nan
Casual;Pretend Play: 4.