# Analysis of Profitable App Profiles for the App Store and Google Play Markets

## This project is intended to showcase the skills I've learned in Dataquest's 'Introduction to Python' course.

Here I assume the role of an analyst working for an app developer. The company's mission is to build apps that are free to download and install. The main source of revenue for these apps is advertising. The goal of the project is to analyze data from Apple's App Store and Google Play in order to produce insights to help the Company's developers understand the characteristics of popular apps.

In [1]:
opened_file = open('AppleStore.csv', encoding="utf8")
from csv import reader
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]


In [2]:
opened_file = open('googleplaystore.csv', encoding="utf8")
from csv import reader
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:] 

**For data exploration, the following function can be used to print rows in a readable way:**

In [3]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

**Below is a sample of data from the AppStore dataset**

In [4]:
print(ios_header)
print('\n')
explore_data(ios, 0 , 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


**Below is a sample of data from the Google Play dataset**

In [5]:
print(android_header)
print('\n')
explore_data(android, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


**Documentation for both data sets can be found through the following links:**

Apple:
https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps

Google:
https://www.kaggle.com/lava18/google-play-store-apps


## Data Cleaning, part 1 (duplicate entries)

**Per review of the Google documention & community discussion, row 10472 is missing 'Category' data which then causes a column shift. For the purposes of this project, I will simply delete the entry.**

In [6]:
print(android_header) #android header
print('\n')
print(android[10472]) #problem entry - notice that the row contains only 12 columns when 13 are expected


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [7]:
del android[10472]

In [8]:
print(len(android)) #revised length of android after delete

10840


To check for duplicate entries, I'll create two separate lists using loops:

In [9]:
android_unique_apps = []
android_duplicate_apps = []

for app in android:
    app_name = app[0]
    
    if app_name in android_unique_apps:
        android_duplicate_apps.append(app_name)
    else:
        android_unique_apps.append(app_name)

print(len(android_unique_apps))
print(len(android_duplicate_apps))

9659
1181


Further investigating duplicates in the Google Play Store, I examine duplicate Uber entries:

In [10]:
for app in android:
    name = app[0]
    if name == 'Uber':
        print(app)

['Uber', 'MAPS_AND_NAVIGATION', '4.2', '4928420', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Maps & Navigation', 'August 6, 2018', 'Varies with device', 'Varies with device']
['Uber', 'MAPS_AND_NAVIGATION', '4.2', '4921866', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Maps & Navigation', 'August 2, 2018', 'Varies with device', 'Varies with device']


The results above tell us that the two entries are data collected at different times (4928420 reviews vs 4921866 reviews). Below, I will use this information to delete entries with less reviews (older)

**Per review of the App Store documentation & community discussion, there are potentially duplicate entries. Below I will identify any duplicates using lists**

In [11]:
ios_unique_apps = []
ios_duplicate_apps = []

for app in ios:
    app_name = app[1]
    
    if app_name in ios_unique_apps:
        ios_duplicate_apps.append(app_name)
    else:
        ios_unique_apps.append(app_name)

print(len(ios_unique_apps))
print(len(ios_duplicate_apps))

7195
2


Further investigating duplicates in the App store:

In [12]:
print(ios_duplicate_apps)

['Mannequin Challenge', 'VR Roller Coaster']


In [13]:
for app in ios:
    app_name = app[1]
    
    if app_name == 'Mannequin Challenge':
        print(app)

['1173990889', 'Mannequin Challenge', '109705216', 'USD', '0.0', '668', '87', '3.0', '3.0', '1.4', '9+', 'Games', '37', '4', '1', '1']
['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0.0', '105', '58', '4.0', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1']


Different version of app? 1st entry is Content Rating = 9+. 2nd entry = 4+

In [14]:
for app in ios:
    app_name = app[1]
    
    if app_name == 'VR Roller Coaster':
        print(app)

['952877179', 'VR Roller Coaster', '169523200', 'USD', '0.0', '107', '102', '3.5', '3.5', '2.0.0', '4+', 'Games', '37', '5', '1', '1']
['1089824278', 'VR Roller Coaster', '240964608', 'USD', '0.0', '67', '44', '3.5', '4.0', '0.81', '4+', 'Games', '38', '0', '1', '1']


Appears to be two entries from different times, similar to the Uber duplicate within the Google Play store above

Further investigation reveals that **there are not in fact duplicates in the app store**. The suspected duplicates of 'Mannequin Challenge' and 'VR Roller Coaster' are in fact unique apps as evidenced by their id's

In [15]:
ios_unique_apps = []
ios_duplicate_apps = []

for app in ios:
    app_id = app[0]
    
    if app_id in ios_unique_apps:
        ios_duplicate_apps.append(app_id)
    else:
        ios_unique_apps.append(app_id)

print(len(ios_unique_apps))
print(len(ios_duplicate_apps))

7197
0


## Data Cleaning, part 1.2 (Google)

I will create a dictionary that lists each unique app and the maximum number of reviews. This will allow us to create a new clean data set for android that references the max number of reviews

In [16]:
reviews_max = {}

for row in android:
    name = row[0]
    n_reviews = float(row[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews


In [17]:
len(reviews_max)

9659

The length of the dictionary (9659) is equal to the expected value of unique apps identified above (9659)

In [18]:
android_clean = []
already_added = []

for row in android:
    name = row[0]
    n_reviews = float(row[3])

    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(row)
        already_added.append(name)
     
    


In [19]:
explore_data(android_clean, 0, 5, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 9659
Number of columns: 13


The new dataset 'android_clean' includes only one entry for each unique app

## Data Cleaning part 2.0 (foreign-language apps)

Both datasets include instances of foreign language apps as evidenced by the output below. Becuase this company is only interested in english-language apps, we will remove these entries from our datasets

In [20]:
print(ios[813][0:2])
print(ios[6731][0:2])
print('\n')
print(android_clean[4412][0])

print(android_clean[4412][0])

['445375097', '爱奇艺PPS -《欢乐颂2》电视剧热播']
['1120021683', '【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜']


中国語 AQリスニング
中国語 AQリスニング


The loop below will evalute each character in a string and determine whether the character is less than or equal to the 127th character as definied in ASCII. If the logic returns 'True' this means all the characters are found in English. A False evaluation implies that the characters are not English.

In [21]:
ASCII_agg = []

for row in android_clean:
    name = row[0]
    
    if name == 'Instagram':

        for character in name:
            
            ASCII_val = ord(character)
            
            if ASCII_val <= 127:
                ASCII_agg.append(True)
            else:
                ASCII_agg.append(False)
                
print(all(ASCII_agg))
            
            
            
            
            
             

True


Converting the loop above to function:

In [22]:
def is_english(name):
    
    ASCII_agg = []
    for character in name:
        
        ASCII_val = ord(character)
            
        if ASCII_val <= 127:
            ASCII_agg.append(True)
        else:
            ASCII_agg.append(False)
                
    return all(ASCII_agg)


        

Testing the function:

In [23]:
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
False
False


The function appears to work as designed, but some English app names use emojis or other symbols (™, — (em dash), – (en dash), etc.) that fall outside of the ASCII range. Because of this, we'll remove useful apps if we use the function in its current form.

Note: above I designed the function as a loop that appends an empty list. After the loop completes I use the all function to evalute the list in aggregate. There is a much simpler solution however, see below for an alternative:

In [24]:
def is_english_alt(name):
    
    for character in name:
        if ord(character) > 127:
            return False
        
    return True

In [25]:
print(is_english_alt('Instagram'))
print(is_english_alt('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english_alt('Docs To Go™ Free Office Suite'))
print(is_english_alt('Instachat 😜'))

True
False
False
False


To minimize the impact of data loss, we'll only remove an app if its name has more than three characters with corresponding numbers falling outside the ASCII range. Not perfect, but effective:

In [26]:
def is_english_alt(name):
    
    running_list = []
    
    for character in name:
        if ord(character) > 127:
            running_list.append('False')
    
    if len(running_list) > 3:
        return False
        
    return True

In [27]:
print(is_english_alt('Docs To Go™ Free Office Suite'))
print(is_english_alt('Instachat 😜'))
print(is_english_alt('爱奇艺PPS -《欢乐颂2》电视剧热播'))


True
True
False


As expected, the modified function appropriately identifies the first two apps as English and the third as non-english. Below I will apply this to the android_clean and ios_clean data sets in order to create two new english-only lists

In [28]:
android_clean_english = []
android_clean_non_english = []
ios_clean_english = []
ios_clean_non_english = []

for row in android_clean:
    name = row[0]
    if is_english_alt(name) == True:
        android_clean_english.append(row)
    else:
        android_clean_non_english.append(row)
        
for row in ios:
    name = row[1]
    if is_english_alt(name) == True:
        ios_clean_english.append(row)
    else:
        ios_clean_non_english.append(row)
        

In [29]:
print('android_clean_english:', len(android_clean_english))
print('android_clean_non_english:', len(android_clean_non_english))
print('Total:', len(android_clean_english) + len(android_clean_non_english))
print('ios_clean_english:', len(ios_clean_english))
print('ios_clean_non_english:', len(ios_clean_non_english))
print('Total:', len(ios_clean_english) + len(ios_clean_non_english))

android_clean_english: 9614
android_clean_non_english: 45
Total: 9659
ios_clean_english: 6183
ios_clean_non_english: 1014
Total: 7197


Using the is_english_alt function, I filtered out 45 non-english apps from the android dataset and 1,104 apps from the ios dataset. (Sidenote: ios has many more non-english apps than does ios... why?)

In [30]:
print(ios_header)
explore_data(ios_clean_english, 0, 2)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']




In [31]:
ios_final = []

for row in ios_clean_english:
    price = row[4]
    if price == '0.0':
        ios_final.append(row)

In [32]:
explore_data(ios_final, 0, 2, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 3222
Number of columns: 16


using a for loop I filtered out all apps with a price equal to $0.00 and added the two to the new list ios_final. The total number of free ios apps ready for analysis is 3,222 (compared to 6,183 total apps in the ios_clean_english dataset

In [33]:
print(android_header)
explore_data(android_clean_english, 0, 2, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 9614
Number of columns: 13


In [34]:
android_final = []

for row in android_clean_english:
    price = row[7]
    if price == '0':
        android_final.append(row)

In [35]:
explore_data(android_final,0,5,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 8864
Number of columns: 13


using a for loop I filtered out all apps with a price equal to 'Free' and added the two to the new list android_final. The total number of free android apps ready for analysis is 8,863 (compared to 9,614 total apps in the android_clean_english dataset
**end data cleaning**

# Analysis:

The goal of the company is to develop an app for release on both the Google Play platform and the App Store platform. To minimize development costs the company's validation trategy is to first develp a 'minimal' android version of the app and add it to Google Play. If the app is successful the company will devote resources to develop the app further and build an iOS version for relase on the App Store. 

## Analysis 1.1: Identifying the most common app genres on the Google Play platform

In [36]:
android_genres = {}

for app in android_final:
    genre = app[1]
    if genre in android_genres:
        android_genres[genre] += 1
    else:
        android_genres[genre] = 1
        
print(android_genres)
        

{'ART_AND_DESIGN': 57, 'AUTO_AND_VEHICLES': 82, 'BEAUTY': 53, 'BOOKS_AND_REFERENCE': 190, 'BUSINESS': 407, 'COMICS': 55, 'COMMUNICATION': 287, 'DATING': 165, 'EDUCATION': 103, 'ENTERTAINMENT': 85, 'EVENTS': 63, 'FINANCE': 328, 'FOOD_AND_DRINK': 110, 'HEALTH_AND_FITNESS': 273, 'HOUSE_AND_HOME': 73, 'LIBRARIES_AND_DEMO': 83, 'LIFESTYLE': 346, 'GAME': 862, 'FAMILY': 1676, 'MEDICAL': 313, 'SOCIAL': 236, 'SHOPPING': 199, 'PHOTOGRAPHY': 261, 'SPORTS': 301, 'TRAVEL_AND_LOCAL': 207, 'TOOLS': 750, 'PERSONALIZATION': 294, 'PRODUCTIVITY': 345, 'PARENTING': 58, 'WEATHER': 71, 'VIDEO_PLAYERS': 159, 'NEWS_AND_MAGAZINES': 248, 'MAPS_AND_NAVIGATION': 124}


## Analysis 1.2: Identifying the most common app genres on the App Store platform

In [37]:
ios_genres = {}

for app in ios_final:
    genre = app[11]
    if genre in ios_genres:
        ios_genres[genre] += 1
    else:
        ios_genres[genre] = 1
        
print(ios_genres)

{'Social Networking': 106, 'Photo & Video': 160, 'Games': 1874, 'Music': 66, 'Reference': 18, 'Health & Fitness': 65, 'Weather': 28, 'Utilities': 81, 'Travel': 40, 'Shopping': 84, 'News': 43, 'Navigation': 6, 'Lifestyle': 51, 'Entertainment': 254, 'Food & Drink': 26, 'Sports': 69, 'Book': 14, 'Finance': 36, 'Education': 118, 'Productivity': 56, 'Business': 17, 'Catalogs': 4, 'Medical': 6}


Above I created dictionaries using a for loop. Below I will create a function that will automate this process.

In [54]:
def freq_table(dataset, index):
    
    dataset_freq = {}
    
    for row in dataset:
        variable = row[index]
        if variable in dataset_freq:
            dataset_freq[variable] += 1
        else:
            dataset_freq[variable] = 1
    return dataset_freq
            
    

below is a predefined function provided by Dataquest to convert a dictionary into a list and then sort by the values.

In [44]:
#not my code:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

using the display_table function provided, below are the most common genres of apps in the App Store

In [52]:
display_table(ios_final, 11) #'prime_genre'

Games : 1874
Entertainment : 254
Photo & Video : 160
Education : 118
Social Networking : 106
Shopping : 84
Utilities : 81
Sports : 69
Music : 66
Health & Fitness : 65
Productivity : 56
Lifestyle : 51
News : 43
Travel : 40
Finance : 36
Weather : 28
Food & Drink : 26
Reference : 18
Business : 17
Book : 14
Navigation : 6
Medical : 6
Catalogs : 4


Results for the App Store... the most common genre for free, english-language apps is Games, with Entertainment in a distant second. This is not surprising as there are many popular games as compared to social media where there are a few ubiquitous platforms

using the display_table function provided, below are the most common genres of apps in the Google Play store

In [50]:
display_table(android_final, 1) #'category'


FAMILY : 1676
GAME : 862
TOOLS : 750
BUSINESS : 407
LIFESTYLE : 346
PRODUCTIVITY : 345
FINANCE : 328
MEDICAL : 313
SPORTS : 301
PERSONALIZATION : 294
COMMUNICATION : 287
HEALTH_AND_FITNESS : 273
PHOTOGRAPHY : 261
NEWS_AND_MAGAZINES : 248
SOCIAL : 236
TRAVEL_AND_LOCAL : 207
SHOPPING : 199
BOOKS_AND_REFERENCE : 190
DATING : 165
VIDEO_PLAYERS : 159
MAPS_AND_NAVIGATION : 124
FOOD_AND_DRINK : 110
EDUCATION : 103
ENTERTAINMENT : 85
LIBRARIES_AND_DEMO : 83
AUTO_AND_VEHICLES : 82
HOUSE_AND_HOME : 73
WEATHER : 71
EVENTS : 63
PARENTING : 58
ART_AND_DESIGN : 57
COMICS : 55
BEAUTY : 53


In [51]:
display_table(android_final, 9) #'genres'

Tools : 749
Entertainment : 538
Education : 474
Business : 407
Productivity : 345
Lifestyle : 345
Finance : 328
Medical : 313
Sports : 307
Personalization : 294
Communication : 287
Action : 275
Health & Fitness : 273
Photography : 261
News & Magazines : 248
Social : 236
Travel & Local : 206
Shopping : 199
Books & Reference : 190
Simulation : 181
Dating : 165
Arcade : 164
Video Players & Editors : 157
Casual : 156
Maps & Navigation : 124
Food & Drink : 110
Puzzle : 100
Racing : 88
Role Playing : 83
Libraries & Demo : 83
Auto & Vehicles : 82
Strategy : 81
House & Home : 73
Weather : 71
Events : 63
Adventure : 60
Comics : 54
Beauty : 53
Art & Design : 53
Parenting : 44
Card : 40
Casino : 38
Trivia : 37
Educational;Education : 35
Board : 34
Educational : 33
Education;Education : 30
Word : 23
Casual;Pretend Play : 21
Music : 18
Racing;Action & Adventure : 15
Puzzle;Brain Games : 15
Entertainment;Music & Video : 15
Casual;Brain Games : 12
Casual;Action & Adventure : 12
Arcade;Action & Advent

Google Play results... it's interesting to note that Google Play includes a very granular 'genres' column in addition to a more aggregated 'category' column. Consistent with the App Store, some of the most popular apps on Google Play are of the games variety. One difference that jumps out at me is the prevalence of the 'Tools' category (750) in the google play store compared to only 81 apps in the 'Utilities' prime_genre of the App Store. If the goal is to produce a cross-platform app, one might avoid developing a Tool/ Utility.

## Analysis 2.1: Identifying the most popular apps by genre Genre on the App Store

Below I calculate the average number of user ratings by App Genre in the App Store as a proxy for # of downloads:

In [68]:
ios_genre_freq = freq_table(ios_final, -5)

for genre in ios_genre_freq:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1    
    avg_ratings = total / len_genre
    print(genre, ':', avg_ratings)
  





Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


From this table we can conclude that the most popular apps by genre in the App Store are as follows:
    1) Navigation
    2) Reference
    3) Social Networking
    

## Analysis 2.1: Identifying the most popular apps by genre Genre on the App Store

For the Google Play store we have data on the number of installations for each app so we can take a different approach

In [70]:
android_genre_freq = freq_table(android_final, 1)

for genre in android_genre_freq:
    total = 0
    len_genre = 0
    for app in android_final:
        genre_app = app[1]
        if genre_app == genre:
            n_installs = app[5]
            n_installs = n_installs.replace(',','')
            n_installs = n_installs.replace('+','')
            total += float(n_installs)
            len_genre += 1    
    avg_ratings = total / len_genre
    print(genre, ':', avg_ratings)
  

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

From this table we can conclude that the most popular apps by genre in the App Store are as follows: 1) Communication 2) video_players 3) Social 