# Profitable App Profiles for the App Store and Google Play Markets

In this project, I am working as a data analyst for a company that develops free-to-download mobile apps available on both Google Play and the App Store. The company's primary revenue stream comes from in-app advertisements, making user engagement a critical factor for financial success.

By analyzing mobile app data, I aim to uncover insights into which types of apps attract the most users and drive higher engagement. The goal is to provide the developers with data-driven recommendations on the kinds of apps that are likely to achieve stronger user engagement. By identifying key characteristics of popular apps, the company can better align its development efforts with market demand and increase revenue through more effective ad exposure.


## Opening and Exploring the Data


To save on time and resources in collecting data for the millions of mobile apps, I will be using relevant existing data from Kaggle at no cost.

The two datasets I will be using are:

-   **[Google Play Store Apps](https://www.kaggle.com/datasets/lava18/google-play-store-apps):** Contains data on approximately ten thousand Android apps from Google Play.

-   **[Mobile App Store](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps):** Contains data on approximately seven thousand iOS apps from the App Store.


My first task will be to open and explore the datasets, which I'll start by opening each dataset and saving them to a variable.


### Import the `reader` function


To start, I'll import the `pandas` library:


In [65]:
import pandas as pd

### Open the Google Play dataset


With the `pandas` library imported, I'll use its `read_csv()` to open and convert the Google Play dataset as a `DataFrame` object, then save it in variable called `android`.


In [66]:
# Open the Google Play Store dataset as a DataFrame object, setting the header
android = pd.read_csv("../data/googleplaystore.csv", header=0)

### Open the App Store dataset


Now, I will repeat the above step for the App Store dataset:


In [67]:
# Open the iOS App Store dataset as a DataFrame object, setting the header and index column
ios = pd.read_csv("../data/AppleStore.csv", header=0, index_col=0)

### Create a function to help explore the datasets


To make exploring the datasets easier, I will create a function called `explore_data()` that will print the rows in a much more readable way.


In [68]:
def explore_data(dataset, row_count=5, rows_and_columns=False):
    """This function slices and prints rows from a dataset. Will also print the number of rows and columns in a dataset."""
    # Slice and display the dataset
    display(dataset.head(row_count if row_count else ""))

    # Prints the number of rows and columns if rows_and_columns is True
    if rows_and_columns:
        print("Number of rows: ", dataset.shape[0])
        print("Number of columns: ", dataset.shape[1])

### Use `explore_data()` to explore the datasets


With the `explore_data()` function, I'll print the first three rows from the Google Play Store dataset:


In [69]:
explore_data(android, 3, True)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up


Number of rows:  10841
Number of columns:  13


The above output shows the header and the first three rows of the Google Play Store dataset.

The function also printed the number of rows and columns in the dataset. From this, we can see there are 10,841 apps and 13 columns in the dataset.

At a quick glance, the columns that might be useful for the purpose of my analysis are `'App'`, `'Category'`, `'Reviews'`, `'Installs'`, `'Type'`, `'Price'`, and `'Genres'`.


Now to repeat the same steps for the App Store dataset:


In [70]:
explore_data(ios, 3, True)

Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
1,281656475,PAC-MAN Premium,100788224,USD,3.99,21292,26,4.0,4.5,6.3.5,4+,Games,38,5,10,1
2,281796108,Evernote - stay organized,158578688,USD,0.0,161065,26,4.0,3.5,8.2.2,4+,Productivity,37,5,23,1
3,281940292,"WeatherBug - Local Weather, Radar, Maps, Alerts",100524032,USD,0.0,188583,2822,3.5,4.5,5.0.0,4+,Weather,37,5,3,1


Number of rows:  7197
Number of columns:  16


The App Store dataset has 7,197 apps and 16 columns.

The columns that look to be the most useful for this analysis are `'track_name'`, `'currency'`, `'price'`, `'rating_count_tot'`, `'rating_count_ver'`, and `'prime_genre'`.

More details about each column can be found in the data set [documentation](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps).


## Data Cleaning


Now that the datasets are opened, it's now time to clean the data.

The data cleaning process will involve the following:

-   Detect inaccurate data, and correct or remove it
-   Detect duplicate data, and remove the duplicates
-   Remove non-English apps (since the company only builds apps for an English-speaking audience)
-   Remove apps that aren't free (since the company only develops free-to-download apps)


### Detecting and Deleting Inaccurate Data


In the [discussion section](https://www.kaggle.com/datasets/lava18/google-play-store-apps/discussion) for the Google Play Store dataset, there is [a discussion that outlines an error for row 10472](https://www.kaggle.com/datasets/lava18/google-play-store-apps/discussion/66015). Specifically, it states that row 10472 is missing a value under `Rating` which caused a column shift to occur for the remaining columns.

Below, I will print the first row of the dataset and compare it to row 10472 (the row in question).


In [88]:
# Print the first row of the dataset and row 10472
android.loc[[0, 10472]]

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
10472,Life Made WI-Fi Touchscreen Photo Frame,1.9,19.0,3.0M,"1,000+",Free,0,Everyone,,"February 11, 2018",1.0.19,4.0 and up,


The above output shows that row 10472 corresponds to the app `Life Made Wi-Fi Touchscreen Photo Frame`. Based on the current dataset, it has a `Category` of `1.9` and has a rating of `19`. The maximum rating allowed in the Play Store, however, is 5 (which is corroborated by the rating of the first row in the dataset).

Comparing the values of row 10472 with row 1, it appears the `Category` column is missing from row 10472, and is the cause for the inaccurate data (this is also mentioned in the [dataset's discussion section](https://www.kaggle.com/datasets/lava18/google-play-store-apps/discussion/66015)).


The following will delete the erroneous row:


In [None]:
# Print the number of rows in the dataset before row deletion
print("Number of Rows Before Row Deletion: ", android.shape[0])
# Drop the row with the missing column
android.drop(10472, inplace=True)
# Print the number of rows in the dataset after row deletion
print("Number of Rows After Row Deletion: ", android.shape[0])

Number of Rows Before Row Deletion:  10841
Number of Rows After Row Deletion:  10840


### Detect and Remove Duplicate Data


Upon exploring the Google Play Store dataset, it appears there are duplicate entries.

For instance, the Instagram app appears in the dataset four time:


In [113]:
# Find all instances of "Instgram" in the dataset
android.loc[android["App"] == "Instagram"]

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
2545,Instagram,SOCIAL,4.5,66577313,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"July 31, 2018",Varies with device,Varies with device
2604,Instagram,SOCIAL,4.5,66577446,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"July 31, 2018",Varies with device,Varies with device
2611,Instagram,SOCIAL,4.5,66577313,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"July 31, 2018",Varies with device,Varies with device
3909,Instagram,SOCIAL,4.5,66509917,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"July 31, 2018",Varies with device,Varies with device


The above output shows four apps titled 'Instagram' with almost exact values for every column, the exception being the `Reviews` column. The discrepancy for the `Review` value is most likely due to the data being collected at different times.

The following counts the number of similar duplicates in the dataset:


In [None]:
# Find number of duplicate apps
num_duplicate_apps = len(android['App'])-len(android['App'].drop_duplicates())
print("Number of duplicate apps:", num_duplicate_apps, "\n")

Number of duplicate apps: 1181 



The above output shows there are 1,181 cases where an app's name occurs more than once.

Removing duplicate data is important, otherwise it will skew the analysis.

For this analysis, I will keep the duplicate that has the highest value in the `Review` column, since the highest revew count suggests that it is the most recent collected data.


The code below creates a dictionary where each key is a unique app name, and the value is the highest number of reviews for that app. This dictionary will be used in the subsequent step to create a new dataset.


In [145]:
reviews_max = {}

duplicates = android.duplicated(subset='Reviews')
for app in android.itertuples():
    name = app[1]
    n_reviews = app[3]

    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    # Add app name to reviews_max if it does not already exist
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

# Print the expected number of apps in the dataset once all the duplicates have been removed
print("Expected length:", len(android) - num_duplicate_apps)
# Print the number of apps in the dataset after the removal of the duplicates
print("Actual length:", len(reviews_max))
# android[duplicates]

Expected length: 9659
Actual length: 9659


In [None]:
# Empty dictionary to store app names and their highest rating value
reviews_max = {}
# Iterate through the Google Play Store dataset
for app in android:
    # Store app name in a variable
    name = app[0]
    # Store app rating in a variable as a float data type
    n_reviews = float(app[3])
    # Replace app rating if the app's name exist in reviews_max and n_reviews is greater than the already stored rating
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    # Add app name to reviews_max if it does not already exist
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

# Print the expected number of apps in the dataset once all the duplicates have been removed
print("Expected length:", len(android) - num_duplicate_apps)
# Print the number of apps in the dataset after the removal of the duplicates
print("Actual length:", len(reviews_max))

Expected length: 9659
Actual length: 9659


Based on the output above, the dictionary now contains the expected number of apps once the duplicates have been removed from the original dataset.

Now that the correct length of the dataset has been verified, the `reviews_max` dictionary will be used to remove the duplicate rows:


In [63]:
# An empty list to store the clean data
android_clean = []
# An empty list to store the names of apps already added to android_clean
already_added = []

# Iterate through the Google Play Store dataset
for app in android:
    # Store app name to a variable
    name = app[0]
    # Store app rating into a variable as a float data type
    n_reviews = float(app[3])
    # Compare n_reviews with the value in reviews_max for the same app. Add the app's data into the android_clean list if it hasn't been added already
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

Now, when the `explore_data` function is called on the `android_clean` list, it should display the expected number of rows in the data set, 9,659.


In [64]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] 

['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] 

['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] 

Number of rows:  9659
Number of columns:  13


The duplicates have successfully been removed from the Google Play Store dataset.


The code below looks for duplicate apps in the iOS Apple Store dataset using the `id` column, which should be unique for each app:


In [None]:
# An empty list to store duplicate apps
duplicate_apps = []
# An empty list to store unique apps
unique_apps = []

# Loop through each row of the App Store dataset
for app in ios:
    # Store app id to a variable
    app_id = app[0]
    # Add the app's id to either the list of unique apps or duplicate apps based on whether it already exists in the unique_apps list
    if app_id in unique_apps:
        duplicate_apps.append(app_id)
    else:
        unique_apps.append(app_id)

# Print the number of duplicate apps
print("Number of duplicate apps:", len(duplicate_apps))

Number of duplicate apps: 0


Since the above output indicates there are no duplicate apps in the App Store dataset, no further steps are necessary.


### Removing non-English Apps


The app development company only develops apps for an English-speaking audience, therefore, this analysis will only focus on such apps.

In both the iOS App Store and Google Play Store datasets, however, there are apps with names that suggest they are not designed for an English-speaking audience:


In [None]:
print(ios[814][2])
print(ios[1094][2], "\n")
print(android_clean[4412][0])
print(android_clean[7940][0])

搜狐新闻—新闻热点资讯掌上阅读软件
Dictionary ( قاموس عربي / انجليزي + ودجيت الترجمة) 

中国語 AQリスニング
لعبة تقدر تربح DZ


The focus of this section is removing apps similar to the ones in the above output.

According to the [American Standard Code for Information Interchange (ASCII)](https://en.wikipedia.org/wiki/ASCII), the characters that are commonly used in English text are within the number range of 0 to 127.

The code below will create a function that takes in a string parameter, iterates through each character of the string, and checks if its Unicode numerical code point is in the range of 0 to 127, inclusive:


In [67]:
def is_English(str):
    """This function checks the Unicode code point of each character in a string against the ASCII code points for English characters. The function will return 'True' if the string contains all English characters, and 'False' otherwise."""
    non_eng_chars = 0
    # Iterate through each character in string
    for char in str:
        # Store the character's Unicode code point in a variable
        char_unicode = ord(char)
        # If there are more than 3 non-English characters in the string, return False
        if char_unicode > 127:
            non_eng_chars += 1
            if non_eng_chars > 3:
                return False
    # Return True if there are three or less non-English characters in the string
    return True

Some apps have emojis and characters like `™` in their name, so to ensure those apps are not left out of the data set, the function will only deem an app as a non-English app if it has more than three non-ASCII characters.


The following will test the output of the `is_English` function:


In [None]:
# Function test
print(is_English("Instagram"))
print(is_English("爱奇艺PPS -《欢乐颂2》电视剧热播"))
print(is_English("Docs To Go™ Free Office Suite"))
print(is_English("Instachat 😜"))

True
False
True
True


Now, with the `is_English` function, each dataset will be checked for non-English characters. If the function returns `True` then the app will be appended to a new list for English only apps.


In [None]:
# Add app to android_eng if the app's name has 3 or less non-English characters
android_eng = [app for app in android_clean if is_English(app[0])]

# Add app to ios_eng if the app's name has 3 or less non-English characters
ios_eng = [app for app in ios if is_English(app[2])]

# Print the header row, first three rows, and the number of rows and columns in the Google Play Store dataset with English-only apps
print("Google Play Store - English Apps:")
explore_data(android_eng, 0, 3, True)

# Print the header row, first three rows, and the number of rows and columns in the iOS App Store dataset with English-only apps
print("\nApp Store - English Apps:")
explore_data(ios_eng, 0, 3, True)

Google Play Store - English Apps:
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] 

['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] 

['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] 

Number of rows:  9614
Number of columns:  13

App Store - English Apps:
['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1'] 

['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1'] 

['3', '281940292', 'WeatherBug - Local

The Google Play Store dataset now has 9,614 apps, and the iOS App Store dataset has 6,183.


### Removing non-Free Apps


The final step in the data cleaning process for this analysis will be to remove all of the non-free apps, since the app development company only builds free-to-download mobile apps.


The Google Play Store dataset


In [None]:
# Append the app's data to android_final if the price is 0
android_final = [app for app in android_eng if app[7] == "0"]
# Append the app's data to ios_final if the price is 0
ios_final = [app for app in ios_eng if app[5] == "0"]

# Print the number of rows in the Google Play Store dataset and the first three rows
print("Google Play Store Free Apps:", len(android_final))
explore_data(android_final, 0, 3)

# Print the number of rows in the iOS App Store dataset and the first three rows
print("\niOS App Store Free Apps:", len(ios_final))
explore_data(ios_final, 0, 3)

Google Play Store Free Apps: 8864
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] 

['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] 

['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] 


iOS App Store Free Apps: 3222
['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1'] 

['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1'] 

['4', '282614216', 'eBay: Best App to Buy, S

The clean and final dataset for the Google Play Store contains 8,864 apps, and 3,222 for the iOS App Store.


## Data Analysis


With cleaned datasets, the data analysis can begin.

Because the main source of revenue for the app development company consists of in-app ads within free-to-download mobile apps, the number of people using the apps directly affects the company's revenue.

The goal of this analysis will be to draw insights based on the most common and popular apps in the Google Play Store and iOS App Store.


### Most Common Apps by Genre


The app development company would like to add an app to both the Google Play Store and the iOS App Store, so my goal is to find app profiles that are success in both markets.

My plan is to build frequency tables to determine the most common genres for each market. To determine which columns will be the most useful, the code below will print the header rows for the Google Play Store and iOS App Store datasets:


In [None]:
# Print the Google Play Store header
print("Google Play Store\n", android_header)
# Print the iOS App Store header
print("\niOS App Store\n", ios_header)

Google Play Store
 ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']

iOS App Store
 ['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


For this analysis, the columns that will be the most useful from the Google Play Store dataset are `Genres` and `Category`, and the `prime_genre` column will be the most useful from the iOS App Store dataset.


Next, I'll create two functions to help analyze the frequency tables:

-   One function will generate the frequency tables that show percentages
-   Another function will display the percentages in descending order


The following code block creates a function called `freq_table()` that takes two inputs, `dataset` and `index`, and returns a frequency table as a dictionary from a specified column in the dataset.


In [None]:
def freq_table(dataset, index):
    """Returns a frequency table as a dictionary from a column in a dataset"""
    ft_dict = {}  # output dictionary
    total_num = 0  # used to count the total number of values

    # Loop through the dataset
    for row in dataset:
        # Store the row's index value in a variable
        row_index = row[index]
        # If the index value is already in ft_dict, add a tally to the frequency table. Otherwise, start the tally at 1 for row's index value
        if row_index in ft_dict:
            ft_dict[row_index] += 1
        else:
            ft_dict[row_index] = 1

        # Calculate the total number of values
        total_num += 1

    # Convert dictionary values to percentages, rounded to the nearest hundredth
    for key in ft_dict:
        ft_dict[key] = round((ft_dict[key] / total_num) * 100, 2)

    # return the frequency table as a dictionary
    return ft_dict

The following code creates a function called `sort_output()`, which prints the key-value pairs of a dictionary in descending order, ordered by the dictionary's values. This function will be used repeatedly in subsequent code blocks.


In [None]:
def sort_output(dict, display_percentage=False):
    """This function takes in a dictionary and prints the key-value pairs, sorted by values descending."""
    # Store all dictionary values and keys into tuples
    tuple_list = [(dict[key], key) for key in dict]

    # Sort the list of tuples
    sorted_list = sorted(tuple_list, reverse=True)

    # Print the sorted list
    for entry in sorted_list:
        print(f"{entry[1]}: {entry[0]}{'%' if display_percentage == True else ''}")

The code block below creates a function called `display_table()` which takes in two parameters called `index` and `dataset`, and uses the `freq_table()` function above to display a list of each genre along with its frequency percentage:


In [74]:
def display_table(dataset, index):
    """Displays a frequency table for the desired columns"""
    table = freq_table(dataset, index)
    table_display = [(table[key], key) for key in table]

    sort_output(table, display_percentage=True)

Using these two functions, the output for the Google Play Store's `Category` column is:


In [75]:
# Print the frequency table for the Google Play Store's Category column
display_table(android_final, 1)

FAMILY: 18.91%
GAME: 9.72%
TOOLS: 8.46%
BUSINESS: 4.59%
LIFESTYLE: 3.9%
PRODUCTIVITY: 3.89%
FINANCE: 3.7%
MEDICAL: 3.53%
SPORTS: 3.4%
PERSONALIZATION: 3.32%
COMMUNICATION: 3.24%
HEALTH_AND_FITNESS: 3.08%
PHOTOGRAPHY: 2.94%
NEWS_AND_MAGAZINES: 2.8%
SOCIAL: 2.66%
TRAVEL_AND_LOCAL: 2.34%
SHOPPING: 2.25%
BOOKS_AND_REFERENCE: 2.14%
DATING: 1.86%
VIDEO_PLAYERS: 1.79%
MAPS_AND_NAVIGATION: 1.4%
FOOD_AND_DRINK: 1.24%
EDUCATION: 1.16%
ENTERTAINMENT: 0.96%
LIBRARIES_AND_DEMO: 0.94%
AUTO_AND_VEHICLES: 0.93%
HOUSE_AND_HOME: 0.82%
WEATHER: 0.8%
EVENTS: 0.71%
PARENTING: 0.65%
ART_AND_DESIGN: 0.64%
COMICS: 0.62%
BEAUTY: 0.6%


The frequency for the Google Play Store's `Genre` column:


In [76]:
# Prints the frequency table for the Genre column from the Google Play Store dataset
display_table(android_final, -4)

Tools: 8.45%
Entertainment: 6.07%
Education: 5.35%
Business: 4.59%
Productivity: 3.89%
Lifestyle: 3.89%
Finance: 3.7%
Medical: 3.53%
Sports: 3.46%
Personalization: 3.32%
Communication: 3.24%
Action: 3.1%
Health & Fitness: 3.08%
Photography: 2.94%
News & Magazines: 2.8%
Social: 2.66%
Travel & Local: 2.32%
Shopping: 2.25%
Books & Reference: 2.14%
Simulation: 2.04%
Dating: 1.86%
Arcade: 1.85%
Video Players & Editors: 1.77%
Casual: 1.76%
Maps & Navigation: 1.4%
Food & Drink: 1.24%
Puzzle: 1.13%
Racing: 0.99%
Role Playing: 0.94%
Libraries & Demo: 0.94%
Auto & Vehicles: 0.93%
Strategy: 0.91%
House & Home: 0.82%
Weather: 0.8%
Events: 0.71%
Adventure: 0.68%
Comics: 0.61%
Beauty: 0.6%
Art & Design: 0.6%
Parenting: 0.5%
Card: 0.45%
Casino: 0.43%
Trivia: 0.42%
Educational;Education: 0.39%
Board: 0.38%
Educational: 0.37%
Education;Education: 0.34%
Word: 0.26%
Casual;Pretend Play: 0.24%
Music: 0.2%
Racing;Action & Adventure: 0.17%
Puzzle;Brain Games: 0.17%
Entertainment;Music & Video: 0.17%
Casual;

And finally, the frequency table for the iOS App Store's `prime_genre` column:


In [77]:
display_table(ios_final, -5)

Games: 58.16%
Entertainment: 7.88%
Photo & Video: 4.97%
Education: 3.66%
Social Networking: 3.29%
Shopping: 2.61%
Utilities: 2.51%
Sports: 2.14%
Music: 2.05%
Health & Fitness: 2.02%
Productivity: 1.74%
Lifestyle: 1.58%
News: 1.33%
Travel: 1.24%
Finance: 1.12%
Weather: 0.87%
Food & Drink: 0.81%
Reference: 0.56%
Business: 0.53%
Book: 0.43%
Navigation: 0.19%
Medical: 0.19%
Catalogs: 0.12%


#### Google Play Store Analysis


##### Category


`FAMILY` is the most common category in the Google Play Store dataset with a frequency of 18.91%.

The second most common category is `GAME` with a frequency of 9.72%, followed by `TOOLS` with 8.46% frequency.


With this information, an app for the that fits in the `FAMILY` category could be worth building. Additionally, a cross-sectional app that expands into more than one category (i.e. a family-oriented game, or a tool app for families) could be more useful or popular that an app that only targets one category, and it could help the app stand out from the sea of other apps within a single category.


##### Genre


The most common `Genre` in the Google Play Store is `Tools` with a 8.45% frequency, followed by `Entertainment` (6.07%) and `Education` (5.35%).


Similar to the category column, free apps in the `Tools` genre are some of the most frequent apps in the Google Play Store. The genres `Entertainment` and `Education` can be paired well with the `GAME` and `FAMILY` categories, and seems to make sense that they are also some of the most frequent free apps in the Google Play Store.


#### iOS App Store Analysis


##### prime_genre


Based on the frequency tables, the most common `prime_genre` is `games` with an overwhelming 58.16% frequency - slight over half of all free mobile applications in the App Store. Clearly this should be a target genre for both mobile app stores, but especially for the iOS App Store

The second and third most common genres in the iOS App Store are `Entertainment` and `Photo & Video` with 7.88% and 4.97% frequency, respectively.


#### Recommended App Profile


Since both the Google Play Store and the iOS App Store have a high number of free game applications (especially the iOS App Store), a game application should be an ideal app to create. A game app can also cross into other app categories / genres, allowing it to reach an even broader audience.

Although game applications make up 58.16% of the iOS App Store, it should be noted that a higher genre frequency does not directly mean there is access to a wider audience. More game applications could mean games are easier to develop or get approved to be in the iOS App Store. A higher frequency of game applications could also mean more competition within the app store.

However, for the case of the app development company, a game application could be a low-risk project. I suggest to first build a game app for Android and added to the Google Play Store, where games make up 9.72% of all free English-only applications. If the game has a good response from users, it should be developed further. Then, if the app is profitable after about six months, it is highly recommended to build the same application in an iOS version to be added to the iOS App Store.


### Most Popular Apps by Genre


As mentioned above, the frequency tables do not infer a higher number of users. For the next part of this analysis, I will be looking at which apps have the most users.


### The Most Popular Apps in the iOS App Store


Unlike the Google Play Store dataset, the iOS App Store dataset does not have an `Installs` column that can be used for this analysis, so the total number of user ratings will be used instead.

To calculate the average number of user ratings per app genre, the `rating_count_tot` column will be used.


The code block below uses a nested loop to first iterate through each genre in the iOS App Store, then for each genre, it will look for other apps with the same genre within the whole iOS App Store dataset, calculate the average user rating, and print each one out.


In [None]:
genre_dict = freq_table(ios_final, -5)
# Create a dictionary of all the genres listed in the iOS App Store dataset

avg_user_ratings = {}  # This empty dictionary will store all of the genres as keys and their respective ratings as values

# Iterate through the dictionary of genres
for genre in genre_dict:
    total = 0  # Will store the sum of user ratings
    len_genre = 0  # Will store the number of apps specific to each genre

    # Loop over the iOS App Store dataset
    for app in ios_final:
        genre_app = app[-5]  # Stores the app's genre
        # If the app's genre matches the genre from the genre_dict, add its user ratings and add 1 to len_genre
        if genre_app == genre:
            user_ratings = float(
                app[6]
            )  # Stores the app's user rating to a variable as a float
            total += user_ratings
            len_genre += 1

    # Add genre and average user rating to dictionary
    avg_user_ratings[genre] = round(total / len_genre, 2)

# Sort and print the average user rating for each genre
sort_output(avg_user_ratings)

Navigation: 86090.33
Reference: 74942.11
Social Networking: 71548.35
Music: 57326.53
Weather: 52279.89
Book: 39758.5
Food & Drink: 33333.92
Finance: 31467.94
Photo & Video: 28441.54
Travel: 28243.8
Shopping: 26919.69
Health & Fitness: 23298.02
Sports: 23008.9
Games: 22788.67
News: 21248.02
Productivity: 21028.41
Utilities: 18684.46
Lifestyle: 16485.76
Entertainment: 14029.83
Business: 7491.12
Education: 7003.98
Catalogs: 4004.0
Medical: 612.0


The above table shows that `Navigation`, `Reference`, `Social Networking`, and `Music` have the highest average user ratings among all genres in the iOS App Store.

Further analysis shows within each genre the data is skewed to a few applications.


The code below creates a function that will print each application under a specified genre and their user ratings, and the function will sort the output by the app's user ratings:


In [79]:
def print_ios_ratings(genre_name):
    """Prints all the applications within a specified genre and their number of user ratings, sorted by user ratings."""
    tmp_dict = {}  # This empty dict will store the apps in the specified genre and their user ratings

    # If an app's genre matches genre_name, add its name and number of user ratings to the dictionary
    for app in ios_final:
        if app[-5] == genre_name:
            tmp_dict[app[2]] = int(app[6])

    # Sort and print the number of user ratings for each genre
    sort_output(tmp_dict)

The `Navigation` genres shows a skew toward two very popular applications:


In [None]:
print_ios_ratings("Navigation")

Waze - GPS Navigation, Maps & Real-time Traffic: 345046
Google Maps - Navigation & Transit: 154911
Geocaching®: 12811
CoPilot GPS – Car Navigation & Offline Maps: 3582
ImmobilienScout24: Real Estate Search in Germany: 187
Railway Route Search: 5


The above output shows that Waze and Google Maps have 499,957 user ratings combine - that's almost half a million user ratings for only two mobile apps!


A similar pattern applies to `Social Networking` and `Music` apps:


In [None]:
print_ios_ratings("Social Networking")

Facebook: 2974676
Pinterest: 1061624
Skype for iPhone: 373519
Messenger: 351466
Tumblr: 334293
WhatsApp Messenger: 287589
Kik: 260965
ooVoo – Free Video Call, Text and Voice: 177501
TextNow - Unlimited Text + Calls: 164963
Viber Messenger – Text & Call: 164249
Followers - Social Analytics For Instagram: 112778
MeetMe - Chat and Meet New People: 97072
We Heart It - Fashion, wallpapers, quotes, tattoos: 90414
InsTrack for Instagram - Analytics Plus More: 85535
Tango - Free Video Call, Voice and Chat: 75412
LinkedIn: 71856
Match™ - #1 Dating App.: 60659
Skype for iPad: 60163
POF - Best Dating App for Conversations: 52642
Timehop: 49510
Find My Family, Friends & iPhone - Life360 Locator: 43877
Whisper - Share, Express, Meet: 39819
Hangouts: 36404
LINE PLAY - Your Avatar World: 34677
WeChat: 34584
Badoo - Meet New People, Chat, Socialize.: 34428
Followers + for Instagram - Follower Analytics: 28633
GroupMe: 28260
Marco Polo Video Walkie Talkie: 27662
Miitomo: 23965
SimSimi: 23530
Grindr - G

In [None]:
print_ios_ratings("Music")

Pandora - Music & Radio: 1126879
Spotify Music: 878563
Shazam - Discover music, artists, videos & lyrics: 402925
iHeartRadio – Free Music & Radio Stations: 293228
SoundCloud - Music & Audio: 135744
Magic Piano by Smule: 131695
Smule Sing!: 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music: 110420
Amazon Music: 106235
SoundHound Song Search & Music Player: 82602
Sonos Controller: 48905
Bandsintown Concerts: 30845
Karaoke - Sing Karaoke, Unlimited Songs!: 28606
My Mixtapez Music: 26286
Sing Karaoke Songs Unlimited with StarMaker: 26227
Ringtones for iPhone & Ringtone Maker: 25403
Musi - Unlimited Music For YouTube: 25193
AutoRap by Smule: 18202
Spinrilla - Mixtapes For Free: 15053
Napster - Top Music & Radio: 14268
edjing Mix:DJ turntable to remix and scratch music: 13580
Free Music - MP3 Streamer & Playlist Manager Pro: 13443
Free Piano app by Yokee: 13016
Google Play Music: 10118
Certified Mixtapes - Hip Hop Albums & Mixtapes: 9975
TIDAL: 7398
YouTube Music: 7109
Nicki Minaj: The

The `Social Networking` genre's output is heavily skewed by popular apps such as Facebook, Pinterest, Skype, etc.

The `Music` genre is heavily influenced by the ratings of apps such as Pandora, Spotify, Shazam, and iHeartRadio.


What can be inferred by the above output is that apps in the `Navigation`, `Social Networking` and `Music` genres may not be as popular as they seem. All of these apps make up a small percentage of the total number of apps in each genre, yet they have significantly more user ratings than most other apps within the same genre.


In regards to `Reference` apps, which has the second highest average number of user ratings, there are two apps that skew the average results: the Bible and Dictionary.com.


In [None]:
print_ios_ratings("Reference")

Bible: 985920
Dictionary.com Dictionary & Thesaurus: 200047
Dictionary.com Dictionary & Thesaurus for iPad: 54175
Google Translate: 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran: 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition: 17588
Merriam-Webster Dictionary: 16849
Night Sky: 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE): 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools: 4693
GUNS MODS for Minecraft PC Edition - Mods Tools: 1497
Guides for Pokémon GO - Pokemon GO News and Cheats: 826
WWDC: 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free: 718
VPN Express: 14
Real Bike Traffic Rider Virtual Reality Glasses: 8
教えて!goo: 0
Jishokun-Japanese English Dictionary & Translator: 0


This information could be useful for the app development company. Another possible app profile to consider is an app that takes a popular book and turn it into an app that provides unique features beside the book itself, such as daily quotes from the book, quizzes about the book, etc. In addition, building in a dictionary into the app will be valuable to users to look up definition of words without having to download or open an external dictionary app.

Earlier, it was determined that over half of the iOS App Store's mobile applications are classified as games, suggesting there is some over saturation, and that a more practical app is more likely to differentiate itself from the plethora of entertainment apps.


### The Most Popular Apps in the Google Play Store


To determine which apps have the most users, I will be calculating the average number of installs for each app genre using the values in the `Installs` column of the Google Play Store dataset.

It's important to note that the `Installs` column is not precise. The output belows shows the values in the `Installs` column are open-ended (e.g 5,000+, 1,000+, etc.):


In [84]:
display_table(android_final, 5)

1,000,000+: 15.73%
100,000+: 11.55%
10,000,000+: 10.55%
10,000+: 10.2%
1,000+: 8.39%
100+: 6.92%
5,000,000+: 6.83%
500,000+: 5.56%
50,000+: 4.77%
5,000+: 4.51%
10+: 3.54%
500+: 3.25%
50,000,000+: 2.3%
100,000,000+: 2.13%
50+: 1.92%
5+: 0.79%
1+: 0.51%
500,000,000+: 0.27%
1,000,000,000+: 0.23%
0+: 0.05%
0: 0.01%


For this analysis, the values will be left as they are, so if an app has 10,000+ installs, it will be assumed that the number of installs in 10,000.


The code block below takes the dictionary output from the `freq_table()` function, calculates the average number of installs for each category in the dictionary and outputs the results sorted by the average number of installs:


In [None]:
# Store all unique category names to a dictionary
android_genre = freq_table(android_final, 1)

# Iterate through each category in the Google Play Store
for category in android_genre:
    total = 0  # This will store the sum of installs specific to each genre
    len_category = 0  # This will store the number of apps specific to each genre
    # Iterate through each app in the Google Play Store
    for app in android_final:
        category_app = app[1]  # Store the app's genre
        # If the app's genre matches the genre of the outer loop, take the its number of installs, then remove all '+' and ',' and convert to a float to be added to the total variable
        if category_app == category:
            num_of_installs = app[5]
            num_of_installs = num_of_installs.replace("+", "")
            num_of_installs = num_of_installs.replace(",", "")
            total += float(num_of_installs)
            len_category += 1

    # Add genre and average number of installs installs to dictionary
    android_genre[category] = round(total / len_category, 2)

# Sort and print the average number of installs for each genre
sort_output(android_genre)

COMMUNICATION: 38456119.17
VIDEO_PLAYERS: 24727872.45
SOCIAL: 23253652.13
PHOTOGRAPHY: 17840110.4
PRODUCTIVITY: 16787331.34
GAME: 15588015.6
TRAVEL_AND_LOCAL: 13984077.71
ENTERTAINMENT: 11640705.88
TOOLS: 10801391.3
NEWS_AND_MAGAZINES: 9549178.47
BOOKS_AND_REFERENCE: 8767811.89
SHOPPING: 7036877.31
PERSONALIZATION: 5201482.61
WEATHER: 5074486.2
HEALTH_AND_FITNESS: 4188821.99
MAPS_AND_NAVIGATION: 4056941.77
FAMILY: 3695641.82
SPORTS: 3638640.14
ART_AND_DESIGN: 1986335.09
FOOD_AND_DRINK: 1924897.74
EDUCATION: 1833495.15
BUSINESS: 1712290.15
LIFESTYLE: 1437816.27
FINANCE: 1387692.48
HOUSE_AND_HOME: 1331540.56
DATING: 854028.83
COMICS: 817657.27
AUTO_AND_VEHICLES: 647317.82
LIBRARIES_AND_DEMO: 638503.73
PARENTING: 542603.62
BEAUTY: 513151.89
EVENTS: 253542.22
MEDICAL: 120550.62


On average, `COMMUNICATION` apps have the most installs with an average number installs of 38,456,119.17.

Before making any decisions based on this information, it's important to take a closer look at the apps within the `COMMUNICATION` genre:


In [None]:
# An empty dictionary to store all the apps in the communication genre with their respective number of installs
comm_dict = {}

# Iterate through the Google Play Store dataset
for app in android_final:
    # For each app in the communication genre, remove any plus-signs or commas, convert the number to a float data type, and add it to the comm_dict
    if app[1] == "COMMUNICATION":
        num_of_installs = app[5]
        num_of_installs = num_of_installs.replace("+", "")
        num_of_installs = num_of_installs.replace(",", "")
        comm_dict[app[0]] = float(num_of_installs)

# Sort and print the comm_dict
sort_output(comm_dict)

WhatsApp Messenger: 1000000000.0
Skype - free IM & video calls: 1000000000.0
Messenger – Text and Video Chat for Free: 1000000000.0
Hangouts: 1000000000.0
Google Chrome: Fast & Secure: 1000000000.0
Gmail: 1000000000.0
imo free video calls and chat: 500000000.0
Viber Messenger: 500000000.0
UC Browser - Fast Download Private & Secure: 500000000.0
LINE: Free Calls & Messages: 500000000.0
Google Duo - High Quality Video Calls: 500000000.0
imo beta free calls and text: 100000000.0
Yahoo Mail – Stay Organized: 100000000.0
Who: 100000000.0
WeChat: 100000000.0
UC Browser Mini -Tiny Fast Private & Secure: 100000000.0
Truecaller: Caller ID, SMS spam blocking & Dialer: 100000000.0
Telegram: 100000000.0
Opera Mini - fast web browser: 100000000.0
Opera Browser: Fast and Secure: 100000000.0
Messenger Lite: Free Calls & Messages: 100000000.0
Kik: 100000000.0
KakaoTalk: Free Calls & Text: 100000000.0
GO SMS Pro - Messenger, Free Themes, Emoji: 100000000.0
Firefox Browser fast & private: 100000000.0
BB

The output above shows that the average number of installs of `COMMUNICATION` apps is heavily skewed by a few apps that have over one billion installs (e.g - WhatsApp, Skype, Facebook Messenger, Google Hangouts, Gmail, etc.), as well as a few apps with over 100 and 500 million installs.

The function `print_apps_with_100m_plus_installs()` below singles out these apps:


In [None]:
def print_apps_with_100m_plus_installs(genre):
    """This function finds and prints a list of all the apps in the Google Play Store that have more than 100 million average installs."""
    num_of_apps = (
        0  # This will count the number of apps with more than 100 million installs
    )

    # Iterate through the Google Play Store dataset
    for app in android_final:
        # If the app has more than 100 million installs, add one to the count and print the app's name along with its number of installs
        if app[1] == genre and (
            app[5] in ["1,000,000,000+", "500,000,000+", "100,000,000+"]
        ):
            num_of_apps += 1
            print(app[0], ":", app[5])

    # Print the number of apps with more than 100 million installs
    print(f"\nNumber of apps with 100,000,000 or more installs: {num_of_apps}")

Running the above function for all the `COMMUNICATION` apps produces the following:


In [None]:
print_apps_with_100m_plus_installs("COMMUNICATION")

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

A similar pattern can be seen for `VIDEO_PLAYERS` (which has the 2nd highest number of average installs at 24,727,872.45), `SOCIAL` (with 23,253,652.13 average installs), `PHOTOGRAPHY` (with 17,840,110.40), `PRODUCTIVITY` (with 16,787,331.34):


In [None]:
print_apps_with_100m_plus_installs("VIDEO_PLAYERS")

YouTube : 1,000,000,000+
Motorola Gallery : 100,000,000+
VLC for Android : 100,000,000+
Google Play Movies & TV : 1,000,000,000+
MX Player : 500,000,000+
Dubsmash : 100,000,000+
VivaVideo - Video Editor & Photo Movie : 100,000,000+
VideoShow-Video Editor, Video Maker, Beauty Camera : 100,000,000+
Motorola FM Radio : 100,000,000+

Number of apps with 100,000,000 or more installs: 9


In [None]:
print_apps_with_100m_plus_installs("SOCIAL")

Facebook : 1,000,000,000+
Facebook Lite : 500,000,000+
Tumblr : 100,000,000+
Pinterest : 100,000,000+
Google+ : 1,000,000,000+
Badoo - Free Chat & Dating App : 100,000,000+
Tango - Live Video Broadcast : 100,000,000+
Instagram : 1,000,000,000+
Snapchat : 500,000,000+
LinkedIn : 100,000,000+
Tik Tok - including musical.ly : 100,000,000+
BIGO LIVE - Live Stream : 100,000,000+
VK : 100,000,000+

Number of apps with 100,000,000 or more installs: 13


In [None]:
print_apps_with_100m_plus_installs("PHOTOGRAPHY")

B612 - Beauty & Filter Camera : 100,000,000+
YouCam Makeup - Magic Selfie Makeovers : 100,000,000+
Sweet Selfie - selfie camera, beauty cam, photo edit : 100,000,000+
Google Photos : 1,000,000,000+
Retrica : 100,000,000+
Photo Editor Pro : 100,000,000+
BeautyPlus - Easy Photo Editor & Selfie Camera : 100,000,000+
PicsArt Photo Studio: Collage Maker & Pic Editor : 100,000,000+
Photo Collage Editor : 100,000,000+
Z Camera - Photo Editor, Beauty Selfie, Collage : 100,000,000+
PhotoGrid: Video & Pic Collage Maker, Photo Editor : 100,000,000+
Candy Camera - selfie, beauty camera, photo editor : 100,000,000+
YouCam Perfect - Selfie Photo Editor : 100,000,000+
Camera360: Selfie Photo Editor with Funny Sticker : 100,000,000+
S Photo Editor - Collage Maker , Photo Collage : 100,000,000+
AR effect : 100,000,000+
Cymera Camera- Photo Editor, Filter,Collage,Layout : 100,000,000+
LINE Camera - Photo editor : 100,000,000+
Photo Editor Collage Maker Pro : 100,000,000+

Number of apps with 100,000,000

In [None]:
print_apps_with_100m_plus_installs("PRODUCTIVITY")

Microsoft Word : 500,000,000+
Microsoft Outlook : 100,000,000+
Microsoft OneDrive : 100,000,000+
Microsoft OneNote : 100,000,000+
Google Keep : 100,000,000+
ES File Explorer File Manager : 100,000,000+
Dropbox : 500,000,000+
Google Docs : 100,000,000+
Microsoft PowerPoint : 100,000,000+
Samsung Notes : 100,000,000+
SwiftKey Keyboard : 100,000,000+
Google Drive : 1,000,000,000+
Adobe Acrobat Reader : 100,000,000+
Google Sheets : 100,000,000+
Microsoft Excel : 100,000,000+
WPS Office - Word, Docs, PDF, Note, Slide & Sheet : 100,000,000+
Google Slides : 100,000,000+
ColorNote Notepad Notes : 100,000,000+
Evernote – Organizer, Planner for Notes & Memos : 100,000,000+
Google Calendar : 500,000,000+
Cloud Print : 500,000,000+
CamScanner - Phone PDF Creator : 100,000,000+

Number of apps with 100,000,000 or more installs: 22


Nine apps (including major apps like YouTube, Google Play Movies & TV, and MX Player) dominate the `VIDEO_PLAYER` market and each have more than 100 million installs.

Instagram, Facebook, Google+ and ten other apps have more than 100 million installs in the `SOCIAL` genre.

The `PHOTOGRAPHY` genre has 19 apps with more than 100 million installs, with apps likes Google Photos.

Lastly, the `PRODUCTIVITY` genre has 22 apps with more than 100 million installs and has major applications like Dropbox, Microsoft Word, Google Calendar, Evernote.


What this shows is that these genres are dominated by a few giant apps that are difficult to compete with.


Although the `GAME` genre appears to be quite popular, my earlier analysis of the iOS App Store indicated that this segment of the market is relatively saturated. For this reason, I sought to identify an alternative app category that may offer greater opportunity.

The `BOOKS_AND_REFERENCE` also demonstrates notable popularity, with an average install count of 87,67,811.89. This category warrants further exploration, as my earlier findings suggest it holds potential for success on the iOS App Store. Since my objective is to recommend an app genre with profitability prospects across both the iOS App Store and Google Play Store, this genre emerged as a promising candidate.


Taking a look at the apps in the `BOOKS_AND_REFERENCE` genre and their number of installs:


In [None]:
# This empty dictionary will store the name and number of installs for each app in the BOOKS_AND_REFERENCE genre
book_ref_dict = {}

# Loop through the Google Play Store dataset
for app in android_final:
    # For each app in the books and reference genre, remove any plus-signs or commas, convert the number to a float data type, and add it to the book_ref_dict
    if app[1] == "BOOKS_AND_REFERENCE":
        num_of_installs = app[5]
        num_of_installs = num_of_installs.replace("+", "")
        num_of_installs = num_of_installs.replace(",", "")
        book_ref_dict[app[0]] = float(num_of_installs)

# Sort and print the book_ref_dict
sort_output(book_ref_dict)

Google Play Books: 1000000000.0
Wattpad 📖 Free Books: 100000000.0
Bible: 100000000.0
Audiobooks from Audible: 100000000.0
Amazon Kindle: 100000000.0
Wikipedia: 10000000.0
Spanish English Translator: 10000000.0
Quran for Android: 10000000.0
Oxford Dictionary of English : Free: 10000000.0
NOOK: Read eBooks & Magazines: 10000000.0
Moon+ Reader: 10000000.0
JW Library: 10000000.0
HTC Help: 10000000.0
FBReader: Favorite Book Reader: 10000000.0
English Hindi Dictionary: 10000000.0
English Dictionary - Offline: 10000000.0
Dictionary.com: Find Definitions for English Words: 10000000.0
Dictionary - Merriam-Webster: 10000000.0
Dictionary: 10000000.0
Cool Reader: 10000000.0
Aldiko Book Reader: 10000000.0
Al-Quran (Free): 10000000.0
Al'Quran Bahasa Indonesia: 10000000.0
Al Quran Indonesia: 10000000.0
Read books online: 5000000.0
English to Hindi Dictionary: 5000000.0
Ebook Reader: 5000000.0
Dictionary - WordWeb: 5000000.0
Bible KJV: 5000000.0
Ancestry: 5000000.0
AlReader -any text book reader: 5000

The books and reference genre encompasses a wide range of applications, including ebook readers, digital library collections, dictionaries, and educational resources such as programming or language tutorials. Upon closer examination, I observed that a small number of highly popular apps appear to disproportionately influence the average install figures:


In [None]:
print_apps_with_100m_plus_installs("BOOKS_AND_REFERENCE")

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+

Number of apps with 100,000,000 or more installs: 5


Although the genre contains only a limited number of exceptionally popular apps, the market still appears to hold potential. To generate viable app ideas, I focused on examining applications with moderate levels of popularity, specifically those with download counts between one million and one hundred million.


In [None]:
# Loop through the Google Play Store dataset
for app in android_final:
    # Find and print the names of each app in the book and reference genre with their number of installs if the number of installs is between one million and one hundred million
    if app[1] == "BOOKS_AND_REFERENCE" and app[5] in [
        "1,000,000+",
        "5,000,000+",
        "10,000,000+",
        "50,000,000+",
    ]:
        print(app[0], ":", app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

This niche appears to be largely dominated by ebook readers, library collections, and dictionary applications. Given the level of existing competition, developing a similar app may not be the most strategic approach.

I also observed a significant number of apps centered around the Quran, which indicates that creating an app based on a well-known book can be a viable and profitable strategy. This suggests that adapting a popular or recently published book into an app may hold potential for success in both the Google Play and iOS App Store markets.

However, since the market is already saturated with basic library apps, it would be important to offer additional features that enhance the user experience. These could include daily excerpts or quotes, audio narration, interactive quizzes, or a discussion forum to foster engagement around the content.


## Conclusion


In this project, I conducted an analysis of mobile app data from both the iOS App Store and Google Play Store with the objective of identifying an app profile that has the potential to generate profit across both platforms.

Based on the findings, I concluded that developing an app based on a popular book, particularly a recent publication, could be a viable strategy for profitability in both markets. Given the existing saturation of library-style apps, it would be necessary to differentiate the product by incorporating additional features. These may include daily quotes, an audio narration of the book, interactive quizzes, or a community forum for discussion.
