# App Ad Analysis

I will be analyzing apps that are free and get most of their revenue through in app ads. 
The objective is to attract more users to a increase revenue for a company I'm working with. To do that, we will need to see what type of free apps perform well on both the **IOS App Store** and the **Google Play Store**.

In [27]:
from csv import reader        #Opening up the IOS App Store
file = open('AppleStore.csv')
file_reader = reader(file)
ios = list(file_reader)
ios_header = ios[0]
ios = ios[1:]

file = open('googleplaystore.csv') #Opening up the Google Play Store
file_reader = reader(file)
ggle = list(file_reader)
ggle_header = ggle[0]
ggle = ggle[1:]

In [28]:
#This Function will be used to display the data

def explore_data(dataset, start, end, rows_and_columns=False):
    
    dataset_slice = dataset[start:end]
    for x in dataset_slice:
        print(x)
        print('\n') # This is to skip a line
        
    if rows_and_columns:      #Used to count the number of rows and columns
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))


### Now to display the two data sets

I'll start off with displaying the header and three of the rows of the **IOS App Store**.

In [29]:

print(ios_header)
print('\n')
print(explore_data(ios, 0, 3, True))

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16
None


Now to display the header and three of the rows of the **Google Play Store**.

In [30]:
print(ggle_header)
print('\n')
print(explore_data(ggle, 0, 3, True))

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13
None


### Now to clean the data!

Some of the rows can have more columns then they are supposed to have. It's important that any error row is found, and deleted. 

In [31]:
for row in ggle:
    gheader_length = len(ggle_header)
    g_row_length = len(row)
    
    if gheader_length != g_row_length:
        print(row)
        print(ggle.index(row))
        

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
10472


There was only a faulty row in the **Google Play Store** data set. Now that the faulty row has been found, it's time to delete it. 

In [32]:
del ggle[10472]

##### Now to look for any rows that may be duplicates.

We're going to check one app name to see if there are any duplicates of it. 

In [33]:
for x in ggle:
    name = x[0]
    if name == 'Instagram':
        print(x)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


It turns out there are duplicates as we can see above. Now to separate them from the apps that are unique. 

In [34]:
duplicate_apps = []
unique_apps = []

for app in ggle:    #Going to loop throught the google play dataset
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name) #Putting the duplicates in a list
    else:
        unique_apps.append(name) #Putting the unique apps in a list
        
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


I won't be deleting the duplicates randomly. Instead, I'll only keep the duplicate app with the highest rating score.

In [35]:
ggle_dict = {}


for app in ggle:
    name = app[0]
    ratings = float(app[3])
    
    if name in ggle_dict and ggle_dict[name] < ratings:
        ggle_dict[name]  = ratings
        
    elif name not in ggle_dict:
        ggle_dict[name] = ratings
        

Now let's check to see if the expected length equals the actual length from the dictionary.

In [36]:
print('Expected length:', len(ggle) - 1181)
print('Actual length:',  len(ggle_dict))

Expected length: 9659
Actual length: 9659


It looks like they match up. 

#### Now lets actually clean the dataset of all duplicates and display the results. 

In [37]:
android_clean = []
already_added = []

for x in ggle:
    name = x[0]
    ratings = float(x[3])
    
    if (ggle_dict[name] == ratings) and (name not in already_added):
        android_clean.append(x)
        already_added.append(name)

In [38]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


#### Now I'm going to try and delete any data that uses characters outside of the english language so we can analyze apps that are targeted towards english speakers. 

In [39]:
def character_check(string):
    
    instances = 0
    
    for x in string:
        char = ord(x)
        english_c = 127
        
        if char > english_c:
            instances += 1 
            
    if instances > 3:
            return False
    return True
        
        
print(character_check('Instagram'))
print(character_check('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(character_check('Docs To Go™ Free Office Suite'))
print(character_check('Instachat 😜'))

        

True
False
True
True


In [40]:
ggle_english = []
ios_english = []

for x in android_clean:
    name = x[0]
    if character_check(name):
        ggle_english.append(x)
        
for y in ios:
    name = y[1]
    if character_check(name):
        ios_english.append(y)
        
print(explore_data(ggle_english, 0, 3, True))
print('\n')
print(explore_data(ios_english, 0, 3, True))

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13
None


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '

#### Now we need to isolate the free apps in the dataset so we can analyze

In [41]:
ggle_clean = []
ios_clean = []



for x in ggle_english:
    price = x[7]
    price_2 = x[6]
    if price == '0' or price_2 == 'Free':
        ggle_clean.append(x)
        
for y in ios_english:
    i_price = y[4]
    if i_price == '0.0':
        ios_clean.append(y)
    
        

        
print(explore_data(ggle_clean, 0, 3, True))
print('\n')
print(explore_data(ios_clean, 0, 3, True))

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 8864
Number of columns: 13
None


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '

We want apps that perform well on both the google play store and the App Store.

* Reason being:
* 1. We usually upload apps to the **Google Play Store** first.
* 2. If that apps does well we add it to the **IOS App Store**.
* 3. if the app is doing well on both stores that's a sign it's doing well with users. 


We will now display frequency table percentages of the genres of apps from the **IOS App Store** and the **Google Play Store**. 

In [42]:
def freq_table(dataset, index):
    
    freq_dict = {}
    
    for x in dataset:
        genre = x[index]
        total = len(dataset)
        
        if genre in freq_dict:
            freq_dict[genre] += 1
        else:
            freq_dict[genre] = 1
            
    perc_freq = {}
    for y in freq_dict:     #Now looping through the dictionary
        percentage = (freq_dict[y] / total) * 100
        perc_freq[y] = round(percentage, 2)
        
    return perc_freq

def display_table(dataset, index):
    freq_dict = freq_table(dataset, index)
    table_display = []
    for y in freq_dict:
        key_val_as_tuple = (freq_dict[y], y)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse=True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])



print(display_table(ios_clean, -5))
print('\n')
print(display_table(ggle_clean, 1))
print('\n')
print(display_table(ggle_clean, 5))
        
        
        

Games : 58.16
Entertainment : 7.88
Photo & Video : 4.97
Education : 3.66
Social Networking : 3.29
Shopping : 2.61
Utilities : 2.51
Sports : 2.14
Music : 2.05
Health & Fitness : 2.02
Productivity : 1.74
Lifestyle : 1.58
News : 1.33
Travel : 1.24
Finance : 1.12
Weather : 0.87
Food & Drink : 0.81
Reference : 0.56
Business : 0.53
Book : 0.43
Navigation : 0.19
Medical : 0.19
Catalogs : 0.12
None


FAMILY : 18.91
GAME : 9.72
TOOLS : 8.46
BUSINESS : 4.59
LIFESTYLE : 3.9
PRODUCTIVITY : 3.89
FINANCE : 3.7
MEDICAL : 3.53
SPORTS : 3.4
PERSONALIZATION : 3.32
COMMUNICATION : 3.24
HEALTH_AND_FITNESS : 3.08
PHOTOGRAPHY : 2.94
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.66
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.25
BOOKS_AND_REFERENCE : 2.14
DATING : 1.86
VIDEO_PLAYERS : 1.79
MAPS_AND_NAVIGATION : 1.4
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16
ENTERTAINMENT : 0.96
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.93
HOUSE_AND_HOME : 0.82
WEATHER : 0.8
EVENTS : 0.71
PARENTING : 0.65
ART_AND_DESIGN : 0.64
COMICS : 0.62


It seems like the App Store has majority gaming apps. This doesn't mean its the most popular genre to the users. The same thing goes for familly and education. We might have to take a look at the `user_count_ratings

In [43]:
prime_genre = freq_table(ios_english, -5)

ios_dict = {}
for app in prime_genre:
    total = 0
    len_genre = 0
    
    for x in ios_english:
        genre_app = x[-5]
        if genre_app == app:
            users = float(x[5])
            total += users
            len_genre += 1
            
    avg_users = total / len_genre
    
    ios_dict[app] = round(avg_users, 2)
    
    
    sort_ios = sorted(ios_dict.items(), key=lambda x:x[1], reverse=True)
for y in sort_ios:
    print(y[0], y[1])


Social Networking 60253.85
Music 29047.11
Reference 27037.19
Shopping 26635.01
Finance 23353.53
Weather 23145.25
Food & Drink 19934.39
Navigation 19370.82
Travel 19030.18
News 16980.32
Games 15586.76
Sports 15350.91
Photo & Video 14688.72
Health & Fitness 10802.16
Book 10359.2
Lifestyle 8930.37
Entertainment 8862.41
Productivity 8508.09
Utilities 7927.53
Business 5149.32
Catalogs 3465.0
Education 2472.28
Medical 648.95


Social networking has the highest average of users on the IOS Apple Store. Games comes in at third. I came to the conclusion that social networking is the best genre to focus on because the although it only takes up about 3.29 percent of the genres'. It has the highest amount of users on average.   

In [44]:
category_freq = freq_table(ggle_clean, 1)
display_freq = {}
for category in category_freq:
    total = 0
    len_category = 0
    for x in ggle_clean:
        category_app = x[1]
        
        if category_app == category:
            
            installs = x[5]
            installs = installs.replace('+', '')
            installs = installs.replace(',', '')
            total += float(installs)
            len_genre += 1
            
            
    avg_installs = total / len_genre      #Average calculations
    
    display_freq[category] = round(avg_installs, 2) #converting the results into a dictionary
    
    sort_avg = sorted(display_freq.items(), key=lambda x:x[1], reverse=True)
for i in sort_avg:
    print(i[0],i[1])     #displaying the results in descending order


COMMUNICATION 9580647.74
BOOKS_AND_REFERENCE 4133707.84
GAME 3688407.75
ART_AND_DESIGN 1451552.56
FAMILY 1164485.0
TOOLS 1067893.95
SOCIAL 935218.46
BUSINESS 860372.95
PHOTOGRAPHY 735819.98
PRODUCTIVITY 704149.46
ENTERTAINMENT 657448.5
HEALTH_AND_FITNESS 501776.39
VIDEO_PLAYERS 461850.31
TRAVEL_AND_LOCAL 423449.98
AUTO_AND_VEHICLES 331750.38
NEWS_AND_MAGAZINES 270311.18
FINANCE 240064.94
SHOPPING 230812.36
PERSONALIZATION 194065.47
LIFESTYLE 178886.89
SPORTS 165218.08
EDUCATION 132992.96
BEAUTY 127685.68
DATING 106996.78
FOOD_AND_DRINK 105552.72
MAPS_AND_NAVIGATION 56619.11
COMICS 51989.77
WEATHER 43127.67
HOUSE_AND_HOME 41327.58
LIBRARIES_AND_DEMO 21764.19
EVENTS 10186.96
MEDICAL 6699.63
PARENTING 3799.47


Communication has the highest amount of users in the Google Play Store. This mirrors "Social Networking" in the IOS App Store. 

In [45]:
for app in ggle_clean: 
    if app[1] == 'COMMUNICATION':
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Calls & Text by Mo+ : 5,000,000+
free 

While communication/social networking may technically be the most popular, it doesn't mean it actually is. Tech juggernauts like Facebook or Google are some of the few outliers skewing the data. They are too big to compete against so I don't think nit would be a wise decision to create apps in the Social Media space. 

## Let's dig a little deeper

Let's work our way down from `COMMUNICATION` to `BOOKS AND REFERENCE`. 

In [46]:
for app in ggle_clean:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

# Conclusion

The `Books and Reference` genre seems to be a more balanced in regards to users using a more diverse set of apps. Creating a book app looks to be the best option to attact a the most amount of users for the ad revenue. 