# Analyzing Profitable Apps in the App Store and Google Play Store

# Introduction

The main objective for this project is to deduce which mobile apps from the two most popular app markets (App Store & Google Play Store) is profitable. For context, we are a data analyst for a company which specializes in developing IOS and Android apps and we are given the task of analysing the data to know which apps are worth making and which are not.

All of the apps developed are free to install and therefore the revenue model we opted for is in-app ads and in-app purchases. Thus, it can be said that the more customers we have, the higher the revenue. So, this means that we must analyze the data to understand what types of apps result in the most amount of downloads in order to maximize revenue.

## Opening and Exploring the Data

In [8]:
from csv import reader

def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row
    
    # display the number of rows and columns in the dataset
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        
# open the two datasets (AppleStore.csv & GooglePlayStore.csv)
ios_store = open("AppleStore.csv")
ios_store = reader(ios_store)
ios_content = list(ios_store)
ios_header = ios_content[0]
ios_data = ios_content[1:]

android_store = open("googleplaystore.csv")
android_store = reader(android_store)
android_content = list(android_store)
android_header = android_content[0]
android_data = android_content[1:]

# display the first five rows the ios dataset
print("App Store - First 3 Rows")
explore_data(ios_content, 0, 4, True)
print("")

App Store - First 3 Rows
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7198
Number of columns: 16



<b>Comment:</b> There are a few columns of interest in our App Store dataset that could help us with our objective and these are the ratings, genre and maybe even the currency.

In [9]:
# display the first five rows the android dataset
print("Google Play Store - First 3 Rows")
explore_data(android_content, 0, 4, True)

Google Play Store - First 3 Rows
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10842
Number of columns: 13


<b>Comment:</b> Here, we can see that there is several columns of interest that we can use to solve this problem. So, what we can probably deduce is that the genre, rating, type, category and the number of installs will play a huge role in the success of an app.