## Data Analysis for Attractive Apps

Our goal for this project is to analyze data to help our developers understand what type of apps are likely to attract more users.

Here we need to write code in order for the data to be read. The two datasets we 
are focusing on are the ApppleStore.csv and the googleplaystore.csv
from csv import reader

In [1]:
from csv import reader

opened_file = open('AppleStore.csv',encoding='utf8')
read_file = reader(opened_file) #note I did not indent when starting the line
ios_data = list(read_file)
        
opened_file = open('googleplaystore.csv',encoding='utf8')
read_file = reader(opened_file)
android_data = list(read_file)        

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        
        

In [3]:
explore_data(android_data[1:],0,3,1)
print('\n')
explore_data(ios_data[1:],0,3,1)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+

Below is code to print the column names. Once we see the names, it will be easier to tell which columns we should be using in our dataset. 


First lets look at the column names in the Apple file: 

In [4]:
print(ios_data[0])
print('\n')
print(android_data[0])

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


---


We are now going to clean the data from both Apple and Google. We want to remove apps that cost money and remove apps that are not in English. 

After looking at the rows, I noticed row 10474 has an error in it. There is no Category entry. Instead the entry is Rating. The rest of the columns to the right are all offset. 

I must use the Index 10473 to get the actual error row. Becasue Python is not reading the first row, the header row. 

In [5]:
print(android_data[10473])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


We will remove the row using the del statement.

In [6]:
del android_data[10473]

Here we will check and see if it worked. I am going to reprint the row and we should see new information: 

In [7]:
print(android_data[10473])

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


**It worked!!**

---


We think the Google Play Store file has duplicate rows. Let's confirm this. We will do this using the following code: (Note the code is only checking for repeated App names.)

In [8]:
duplicate = []
unique = []

for row in android_data:
    name = row[0]
    if name in unique:
        duplicate.append(name)
    else:
        unique.append(name)
    
print('Number of duplicates', len(duplicate))    
print('\n')        
print('Examples of duplicates:', duplicate[:15])

Number of duplicates 1181


Examples of duplicates: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


I want to see what about the rows are duplicated.

For example, if the row for ZOOM Cloud meetings is entirely duplicated, then I will feel comfortable deleting the dupicate. However, if there is something different between the rows (Even though we know the App names are duplicated,carelessly deleting a row could be a potential loss of valuable data. Let's look at all duplicated data and see if there are discrepancies in the repeated App name occurences.

I have divided each bunch of duplicates by a long dashed line --------


In [9]:
for entry in duplicate:
    
    for instance in android_data:
        if instance[0] == entry :
            print(instance)
            print('\n')
    print('------------------------------------------------')
    
    

['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']


['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']


['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80804', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']


------------------------------------------------
['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Box', 'BUSINESS'

['LINE: Free Calls & Messages', 'COMMUNICATION', '4.2', '10790289', 'Varies with device', '500,000,000+', 'Free', '0', 'Everyone', 'Communication', 'July 26, 2018', 'Varies with device', 'Varies with device']


['LINE: Free Calls & Messages', 'COMMUNICATION', '4.2', '10790289', 'Varies with device', '500,000,000+', 'Free', '0', 'Everyone', 'Communication', 'July 26, 2018', 'Varies with device', 'Varies with device']


['LINE: Free Calls & Messages', 'COMMUNICATION', '4.2', '10790092', 'Varies with device', '500,000,000+', 'Free', '0', 'Everyone', 'Communication', 'July 26, 2018', 'Varies with device', 'Varies with device']


------------------------------------------------
['KakaoTalk: Free Calls & Text', 'COMMUNICATION', '4.3', '2546527', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 3, 2018', 'Varies with device', 'Varies with device']


['KakaoTalk: Free Calls & Text', 'COMMUNICATION', '4.3', '2546527', 'Varies with device', '100,000,000+', 

------------------------------------------------
['Meet4U - Chat, Love, Singles!', 'DATING', '4.2', '40035', '6.5M', '1,000,000+', 'Free', '0', 'Mature 17+', 'Dating', 'July 27, 2018', '1.31.3', '4.0.3 and up']


['Meet4U - Chat, Love, Singles!', 'DATING', '4.2', '40039', '6.5M', '1,000,000+', 'Free', '0', 'Mature 17+', 'Dating', 'July 27, 2018', '1.31.3', '4.0.3 and up']


------------------------------------------------
['95Live -SG#1 Live Streaming App', 'DATING', '4.1', '4953', '15M', '1,000,000+', 'Free', '0', 'Teen', 'Dating', 'August 1, 2018', '8.7.2', '4.2 and up']


['95Live -SG#1 Live Streaming App', 'DATING', '4.1', '4954', '15M', '1,000,000+', 'Free', '0', 'Teen', 'Dating', 'August 1, 2018', '8.7.2', '4.2 and up']


------------------------------------------------
['Just She - Top Lesbian Dating', 'DATING', '1.9', '953', '19M', '100,000+', 'Free', '0', 'Mature 17+', 'Dating', 'July 18, 2018', '6.3.7', '5.0 and up']


['Just She - Top Lesbian Dating', 'DATING', '1.9', '953',

['Duolingo: Learn Languages Free', 'EDUCATION', '4.7', '6290507', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Education;Education', 'August 1, 2018', 'Varies with device', 'Varies with device']


['Duolingo: Learn Languages Free', 'EDUCATION', '4.7', '6290507', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Education;Education', 'August 1, 2018', 'Varies with device', 'Varies with device']


['Duolingo: Learn Languages Free', 'EDUCATION', '4.7', '6290507', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Education;Education', 'August 1, 2018', 'Varies with device', 'Varies with device']


['Duolingo: Learn Languages Free', 'FAMILY', '4.7', '6294400', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Education;Education', 'August 1, 2018', 'Varies with device', 'Varies with device']


['Duolingo: Learn Languages Free', 'FAMILY', '4.7', '6294397', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Education;Ed


['A&E - Watch Full Episodes of TV Shows', 'ENTERTAINMENT', '4.0', '29706', '19M', '1,000,000+', 'Free', '0', 'Teen', 'Entertainment', 'July 16, 2018', '3.1.4', '4.4 and up']


['A&E - Watch Full Episodes of TV Shows', 'FAMILY', '4.0', '29708', '19M', '1,000,000+', 'Free', '0', 'Teen', 'Entertainment', 'July 16, 2018', '3.1.4', '4.4 and up']


------------------------------------------------
['VH1', 'ENTERTAINMENT', '4.1', '27424', '17M', '1,000,000+', 'Free', '0', 'Teen', 'Entertainment', 'July 8, 2018', '11.45.0', '4.4 and up']


['VH1', 'ENTERTAINMENT', '4.1', '27424', '17M', '1,000,000+', 'Free', '0', 'Teen', 'Entertainment', 'July 8, 2018', '11.45.0', '4.4 and up']


['VH1', 'ENTERTAINMENT', '4.1', '27424', '17M', '1,000,000+', 'Free', '0', 'Teen', 'Entertainment', 'July 8, 2018', '11.45.0', '4.4 and up']


------------------------------------------------
['Lifetime - Watch Full Episodes & Original Movies', 'ENTERTAINMENT', '4.0', '35928', '19M', '1,000,000+', 'Free', '0', 'Teen',



['Run with Map My Run', 'HEALTH_AND_FITNESS', '4.5', '183669', '57M', '5,000,000+', 'Free', '0', 'Everyone', 'Health & Fitness', 'July 27, 2018', '18.7.1', '5.0 and up']


['Run with Map My Run', 'HEALTH_AND_FITNESS', '4.5', '183669', '57M', '5,000,000+', 'Free', '0', 'Everyone', 'Health & Fitness', 'July 27, 2018', '18.7.1', '5.0 and up']


------------------------------------------------
['Weight Loss Running by Verv', 'HEALTH_AND_FITNESS', '4.5', '27393', '59M', '1,000,000+', 'Free', '0', 'Mature 17+', 'Health & Fitness', 'July 16, 2018', '6.5.3', '4.1 and up']


['Weight Loss Running by Verv', 'HEALTH_AND_FITNESS', '4.5', '27396', '59M', '1,000,000+', 'Free', '0', 'Mature 17+', 'Health & Fitness', 'July 16, 2018', '6.5.3', '4.1 and up']


['Weight Loss Running by Verv', 'HEALTH_AND_FITNESS', '4.5', '27396', '59M', '1,000,000+', 'Free', '0', 'Mature 17+', 'Health & Fitness', 'July 16, 2018', '6.5.3', '4.1 and up']


------------------------------------------------
['Nike+ Run Club


['Food Calorie Calculator', 'HEALTH_AND_FITNESS', '4.2', '1324', '4.0M', '100,000+', 'Free', '0', 'Everyone', 'Health & Fitness', 'January 29, 2018', '10.2.0', '4.0 and up']


------------------------------------------------
['Calorie Counter - MyFitnessPal', 'HEALTH_AND_FITNESS', '4.6', '1873516', 'Varies with device', '50,000,000+', 'Free', '0', 'Everyone', 'Health & Fitness', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Calorie Counter - MyFitnessPal', 'HEALTH_AND_FITNESS', '4.6', '1873523', 'Varies with device', '50,000,000+', 'Free', '0', 'Everyone', 'Health & Fitness', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Calorie Counter - MyFitnessPal', 'HEALTH_AND_FITNESS', '4.6', '1873520', 'Varies with device', '50,000,000+', 'Free', '0', 'Everyone', 'Health & Fitness', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Calorie Counter - MyFitnessPal', 'HEALTH_AND_FITNESS', '4.6', '1873520', 'Varies with device', '50,000,000+', 'F


------------------------------------------------
['My Talking Angela', 'GAME', '4.5', '9881829', '99M', '100,000,000+', 'Free', '0', 'Everyone', 'Casual', 'July 3, 2018', '3.7.2.51', '4.1 and up']


['My Talking Angela', 'GAME', '4.5', '9881908', '99M', '100,000,000+', 'Free', '0', 'Everyone', 'Casual', 'July 3, 2018', '3.7.2.51', '4.1 and up']


['My Talking Angela', 'GAME', '4.5', '9883367', '99M', '100,000,000+', 'Free', '0', 'Everyone', 'Casual', 'July 3, 2018', '3.7.2.51', '4.1 and up']


['My Talking Angela', 'FAMILY', '4.5', '9876369', '99M', '100,000,000+', 'Free', '0', 'Everyone', 'Casual', 'July 3, 2018', '3.7.2.51', '4.1 and up']


------------------------------------------------
['Bubble Shooter', 'GAME', '4.5', '148897', '46M', '10,000,000+', 'Free', '0', 'Everyone', 'Casual', 'July 17, 2018', '1.20.1', '4.0.3 and up']


['Bubble Shooter', 'GAME', '4.5', '148895', '46M', '10,000,000+', 'Free', '0', 'Everyone', 'Casual', 'July 17, 2018', '1.20.1', '4.0.3 and up']


['Bubbl

['Magic Tiles 3', 'GAME', '4.5', '592282', 'Varies with device', '50,000,000+', 'Free', '0', 'Everyone', 'Music', 'August 3, 2018', '5.13.007', '4.1 and up']


['Magic Tiles 3', 'GAME', '4.5', '592504', 'Varies with device', '50,000,000+', 'Free', '0', 'Everyone', 'Music', 'August 3, 2018', '5.13.007', '4.1 and up']


['Magic Tiles 3', 'GAME', '4.5', '592504', 'Varies with device', '50,000,000+', 'Free', '0', 'Everyone', 'Music', 'August 3, 2018', '5.13.007', '4.1 and up']


------------------------------------------------
['Bowmasters', 'GAME', '4.7', '1534466', 'Varies with device', '50,000,000+', 'Free', '0', 'Teen', 'Action', 'July 23, 2018', '2.12.5', '4.1 and up']


['Bowmasters', 'GAME', '4.7', '1535084', 'Varies with device', '50,000,000+', 'Free', '0', 'Teen', 'Action', 'July 23, 2018', '2.12.5', '4.1 and up']


['Bowmasters', 'GAME', '4.7', '1535973', 'Varies with device', '50,000,000+', 'Free', '0', 'Teen', 'Action', 'July 23, 2018', '2.12.5', '4.1 and up']


['Bowmasters', 

['Temple Run 2', 'GAME', '4.3', '8119154', '62M', '500,000,000+', 'Free', '0', 'Everyone', 'Action', 'July 5, 2018', '1.49.1', '4.0 and up']


['Temple Run 2', 'GAME', '4.3', '8116142', '62M', '500,000,000+', 'Free', '0', 'Everyone', 'Action', 'July 5, 2018', '1.49.1', '4.0 and up']


------------------------------------------------
['Flow Free', 'GAME', '4.3', '1295557', '11M', '100,000,000+', 'Free', '0', 'Everyone', 'Puzzle', 'April 11, 2018', '4.0', '4.1 and up']


['Flow Free', 'GAME', '4.3', '1295606', '11M', '100,000,000+', 'Free', '0', 'Everyone', 'Puzzle', 'April 11, 2018', '4.0', '4.1 and up']


['Flow Free', 'GAME', '4.3', '1295625', '11M', '100,000,000+', 'Free', '0', 'Everyone', 'Puzzle', 'April 11, 2018', '4.0', '4.1 and up']


['Flow Free', 'GAME', '4.3', '1295625', '11M', '100,000,000+', 'Free', '0', 'Everyone', 'Puzzle', 'April 11, 2018', '4.0', '4.1 and up']


['Flow Free', 'FAMILY', '4.3', '1295293', '11M', '100,000,000+', 'Free', '0', 'Everyone', 'Puzzle', 'April 11



['Bowmasters', 'GAME', '4.7', '1536349', 'Varies with device', '50,000,000+', 'Free', '0', 'Teen', 'Action', 'July 23, 2018', '2.12.5', '4.1 and up']


['Bowmasters', 'GAME', '4.7', '1535581', 'Varies with device', '50,000,000+', 'Free', '0', 'Teen', 'Action', 'July 23, 2018', '2.12.5', '4.1 and up']


------------------------------------------------
['Talking Tom Gold Run', 'GAME', '4.6', '2698348', '78M', '100,000,000+', 'Free', '0', 'Everyone', 'Action', 'July 31, 2018', '2.8.2.59', '4.1 and up']


['Talking Tom Gold Run', 'GAME', '4.6', '2698882', '78M', '100,000,000+', 'Free', '0', 'Everyone', 'Action', 'July 31, 2018', '2.8.2.59', '4.1 and up']


['Talking Tom Gold Run', 'GAME', '4.6', '2698889', '78M', '100,000,000+', 'Free', '0', 'Everyone', 'Action', 'July 31, 2018', '2.8.2.59', '4.1 and up']


['Talking Tom Gold Run', 'GAME', '4.6', '2694969', '78M', '100,000,000+', 'Free', '0', 'Everyone', 'Action', 'July 31, 2018', '2.8.2.59', '4.1 and up']


-----------------------------

['Magic Tiles 3', 'GAME', '4.5', '592504', 'Varies with device', '50,000,000+', 'Free', '0', 'Everyone', 'Music', 'August 3, 2018', '5.13.007', '4.1 and up']


['Magic Tiles 3', 'GAME', '4.5', '592504', 'Varies with device', '50,000,000+', 'Free', '0', 'Everyone', 'Music', 'August 3, 2018', '5.13.007', '4.1 and up']


------------------------------------------------
['Block Puzzle Classic Legend !', 'GAME', '4.2', '17039', '4.9M', '5,000,000+', 'Free', '0', 'Everyone', 'Puzzle', 'April 13, 2018', '2.9', '2.3.3 and up']


['Block Puzzle Classic Legend !', 'GAME', '4.2', '17044', '4.9M', '5,000,000+', 'Free', '0', 'Everyone', 'Puzzle', 'April 13, 2018', '2.9', '2.3.3 and up']


------------------------------------------------
['Pixel Art: Color by Number Game', 'GAME', '4.7', '1125017', '25M', '10,000,000+', 'Free', '0', 'Everyone', 'Puzzle', 'July 17, 2018', '3.9.2', '4.3 and up']


['Pixel Art: Color by Number Game', 'GAME', '4.7', '1125438', '25M', '10,000,000+', 'Free', '0', 'Everyon

------------------------------------------------
['DC Super Hero Girls™', 'GAME', '4.3', '43055', '95M', '5,000,000+', 'Free', '0', 'Everyone', 'Action;Action & Adventure', 'June 29, 2018', '2.8.0', '4.0 and up']


['DC Super Hero Girls™', 'FAMILY', '4.3', '43060', '95M', '5,000,000+', 'Free', '0', 'Everyone', 'Action;Action & Adventure', 'June 29, 2018', '2.8.0', '4.0 and up']


['DC Super Hero Girls™', 'FAMILY', '4.3', '43060', '95M', '5,000,000+', 'Free', '0', 'Everyone', 'Action;Action & Adventure', 'June 29, 2018', '2.8.0', '4.0 and up']


['DC Super Hero Girls™', 'FAMILY', '4.3', '43090', '95M', '5,000,000+', 'Free', '0', 'Everyone', 'Action;Action & Adventure', 'June 29, 2018', '2.8.0', '4.0 and up']


------------------------------------------------
['Strawberry Shortcake BerryRush', 'GAME', '4.3', '525517', '48M', '10,000,000+', 'Free', '0', 'Everyone', 'Action;Action & Adventure', 'October 15, 2017', '1.2.3', '2.3 and up']


['Strawberry Shortcake BerryRush', 'FAMILY', '4.3',

['2017 EMRA Antibiotic Guide', 'MEDICAL', '4.4', '12', '3.8M', '1,000+', 'Paid', '$16.99', 'Everyone', 'Medical', 'January 27, 2017', '1.0.5', '4.0.3 and up']


------------------------------------------------
['Essential Anatomy 3', 'MEDICAL', '4.1', '1533', '42M', '50,000+', 'Paid', '$11.99', 'Mature 17+', 'Medical', 'August 7, 2014', '1.1.3', '4.0.3 and up']


['Essential Anatomy 3', 'MEDICAL', '4.1', '1533', '42M', '50,000+', 'Paid', '$11.99', 'Mature 17+', 'Medical', 'August 7, 2014', '1.1.3', '4.0.3 and up']


------------------------------------------------
['EMT PASS', 'MEDICAL', '3.4', '51', '2.4M', '1,000+', 'Paid', '$29.99', 'Everyone', 'Medical', 'October 22, 2014', '2.0.2', '4.0 and up']


['EMT PASS', 'MEDICAL', '3.4', '51', '2.4M', '1,000+', 'Paid', '$29.99', 'Everyone', 'Medical', 'October 22, 2014', '2.0.2', '4.0 and up']


------------------------------------------------
['Block Buddy', 'MEDICAL', '4.0', '15', '5.0M', '1,000+', 'Paid', '$14.99', 'Everyone', 'Medical',


['Pinterest', 'SOCIAL', '4.6', '4300936', 'Varies with device', '100,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']


------------------------------------------------
['MeetMe: Chat & Meet New People', 'SOCIAL', '4.2', '1259849', '76M', '50,000,000+', 'Free', '0', 'Mature 17+', 'Social', 'August 3, 2018', 'Varies with device', '4.1 and up']


['MeetMe: Chat & Meet New People', 'SOCIAL', '4.2', '1259894', '76M', '50,000,000+', 'Free', '0', 'Mature 17+', 'Social', 'August 3, 2018', 'Varies with device', '4.1 and up']


['MeetMe: Chat & Meet New People', 'SOCIAL', '4.2', '1259894', '76M', '50,000,000+', 'Free', '0', 'Mature 17+', 'Social', 'August 3, 2018', 'Varies with device', '4.1 and up']


['MeetMe: Chat & Meet New People', 'SOCIAL', '4.2', '1259894', '76M', '50,000,000+', 'Free', '0', 'Mature 17+', 'Social', 'August 3, 2018', 'Varies with device', '4.1 and up']


['MeetMe: Chat & Meet New People', 'SOCIAL', '4.2', '1259723', 


['LivingSocial - Local Deals', 'SHOPPING', '4.1', '28523', '29M', '5,000,000+', 'Free', '0', 'Everyone', 'Shopping', 'August 3, 2018', '18.10.157066', '4.4 and up']


------------------------------------------------
['Amazon Shopping', 'SHOPPING', '4.3', '909226', '42M', '100,000,000+', 'Free', '0', 'Teen', 'Shopping', 'July 31, 2018', '16.14.0.100', '4.4 and up']


['Amazon Shopping', 'SHOPPING', '4.3', '909204', '42M', '100,000,000+', 'Free', '0', 'Teen', 'Shopping', 'July 31, 2018', '16.14.0.100', '4.4 and up']


['Amazon Shopping', 'SHOPPING', '4.3', '908525', '42M', '100,000,000+', 'Free', '0', 'Teen', 'Shopping', 'July 31, 2018', '16.14.0.100', '4.4 and up']


------------------------------------------------
['RetailMeNot - Coupons, Deals & Discount Shopping', 'SHOPPING', '4.4', '210208', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Shopping', 'August 1, 2018', 'Varies with device', 'Varies with device']


['RetailMeNot - Coupons, Deals & Discount Shopping', 'S


['Shutterfly: Free Prints, Photo Books, Cards, Gifts', 'PHOTOGRAPHY', '4.6', '98716', '59M', '5,000,000+', 'Free', '0', 'Everyone', 'Photography', 'August 1, 2018', '5.13.1', '5.0 and up']


['Shutterfly: Free Prints, Photo Books, Cards, Gifts', 'PHOTOGRAPHY', '4.6', '98716', '59M', '5,000,000+', 'Free', '0', 'Everyone', 'Photography', 'August 1, 2018', '5.13.1', '5.0 and up']


['Shutterfly: Free Prints, Photo Books, Cards, Gifts', 'PHOTOGRAPHY', '4.6', '98717', '59M', '5,000,000+', 'Free', '0', 'Everyone', 'Photography', 'August 1, 2018', '5.13.1', '5.0 and up']


------------------------------------------------
['InstaBeauty -Makeup Selfie Cam', 'PHOTOGRAPHY', '4.3', '654419', 'Varies with device', '50,000,000+', 'Free', '0', 'Everyone', 'Photography', 'February 1, 2018', 'Varies with device', '4.0.3 and up']


['InstaBeauty -Makeup Selfie Cam', 'PHOTOGRAPHY', '4.3', '654418', 'Varies with device', '50,000,000+', 'Free', '0', 'Everyone', 'Photography', 'February 1, 2018', 'Varies w


------------------------------------------------
['theScore: Live Sports Scores, News, Stats & Videos', 'SPORTS', '4.4', '133825', '34M', '10,000,000+', 'Free', '0', 'Everyone 10+', 'Sports', 'July 25, 2018', '6.17.2', '4.4 and up']


['theScore: Live Sports Scores, News, Stats & Videos', 'SPORTS', '4.4', '133825', '34M', '10,000,000+', 'Free', '0', 'Everyone 10+', 'Sports', 'July 25, 2018', '6.17.2', '4.4 and up']


['theScore: Live Sports Scores, News, Stats & Videos', 'SPORTS', '4.4', '133833', '34M', '10,000,000+', 'Free', '0', 'Everyone 10+', 'Sports', 'July 25, 2018', '6.17.2', '4.4 and up']


['theScore: Live Sports Scores, News, Stats & Videos', 'SPORTS', '4.4', '133833', '34M', '10,000,000+', 'Free', '0', 'Everyone 10+', 'Sports', 'July 25, 2018', '6.17.2', '4.4 and up']


['theScore: Live Sports Scores, News, Stats & Videos', 'SPORTS', '4.4', '133833', '34M', '10,000,000+', 'Free', '0', 'Everyone 10+', 'Sports', 'July 25, 2018', '6.17.2', '4.4 and up']


--------------------



------------------------------------------------
['FotMob - Live Soccer Scores', 'SPORTS', '4.7', '410384', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Sports', 'July 31, 2018', 'Varies with device', 'Varies with device']


['FotMob - Live Soccer Scores', 'SPORTS', '4.7', '410395', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Sports', 'July 31, 2018', 'Varies with device', 'Varies with device']


------------------------------------------------
['Yahoo Fantasy Sports - #1 Rated Fantasy App', 'SPORTS', '4.2', '277902', 'Varies with device', '5,000,000+', 'Free', '0', 'Mature 17+', 'Sports', 'August 2, 2018', 'Varies with device', 'Varies with device']


['Yahoo Fantasy Sports - #1 Rated Fantasy App', 'SPORTS', '4.2', '277900', 'Varies with device', '5,000,000+', 'Free', '0', 'Mature 17+', 'Sports', 'August 2, 2018', 'Varies with device', 'Varies with device']


['Yahoo Fantasy Sports - #1 Rated Fantasy App', 'SPORTS', '4.2', '277904', 'Varies with 

['Orbitz - Hotels, Flights & Package Deals', 'TRAVEL_AND_LOCAL', '4.4', '33256', 'Varies with device', '1,000,000+', 'Free', '0', 'Everyone', 'Travel & Local', 'July 31, 2018', 'Varies with device', 'Varies with device']


------------------------------------------------
['Skyscanner', 'TRAVEL_AND_LOCAL', '4.5', '481545', '29M', '10,000,000+', 'Free', '0', 'Everyone', 'Travel & Local', 'August 6, 2018', '5.48', '4.4 and up']


['Skyscanner', 'TRAVEL_AND_LOCAL', '4.5', '481546', '29M', '10,000,000+', 'Free', '0', 'Everyone', 'Travel & Local', 'August 6, 2018', '5.48', '4.4 and up']


['Skyscanner', 'TRAVEL_AND_LOCAL', '4.5', '481546', '29M', '10,000,000+', 'Free', '0', 'Everyone', 'Travel & Local', 'August 6, 2018', '5.48', '4.4 and up']


['Skyscanner', 'TRAVEL_AND_LOCAL', '4.5', '481546', '29M', '10,000,000+', 'Free', '0', 'Everyone', 'Travel & Local', 'August 6, 2018', '5.48', '4.4 and up']


['Skyscanner', 'TRAVEL_AND_LOCAL', '4.5', '481546', '29M', '10,000,000+', 'Free', '0', 'Ever

['Google Calendar', 'PRODUCTIVITY', '4.2', '858230', 'Varies with device', '500,000,000+', 'Free', '0', 'Everyone', 'Productivity', 'August 6, 2018', 'Varies with device', 'Varies with device']


------------------------------------------------
['Google Drive', 'PRODUCTIVITY', '4.4', '2731171', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Productivity', 'August 6, 2018', 'Varies with device', 'Varies with device']


['Google Drive', 'PRODUCTIVITY', '4.4', '2731211', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Productivity', 'August 6, 2018', 'Varies with device', 'Varies with device']


['Google Drive', 'PRODUCTIVITY', '4.4', '2731211', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Productivity', 'August 6, 2018', 'Varies with device', 'Varies with device']


['Google Drive', 'PRODUCTIVITY', '4.4', '2728941', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Productivity', 'July 30, 2018', 'Varies with devic

['Google News', 'NEWS_AND_MAGAZINES', '3.9', '877635', '13M', '1,000,000,000+', 'Free', '0', 'Teen', 'News & Magazines', 'August 1, 2018', '5.2.0', '4.4 and up']


['Google News', 'NEWS_AND_MAGAZINES', '3.9', '877643', '13M', '1,000,000,000+', 'Free', '0', 'Teen', 'News & Magazines', 'August 1, 2018', '5.2.0', '4.4 and up']


['Google News', 'NEWS_AND_MAGAZINES', '3.9', '878065', '13M', '1,000,000,000+', 'Free', '0', 'Teen', 'News & Magazines', 'August 1, 2018', '5.2.0', '4.4 and up']


------------------------------------------------
['BuzzFeed: News, Tasty, Quizzes', 'NEWS_AND_MAGAZINES', '4.3', '131028', '13M', '5,000,000+', 'Free', '0', 'Teen', 'News & Magazines', 'July 30, 2018', '5.38', '4.4 and up']


['BuzzFeed: News, Tasty, Quizzes', 'NEWS_AND_MAGAZINES', '4.3', '131028', '13M', '5,000,000+', 'Free', '0', 'Teen', 'News & Magazines', 'July 30, 2018', '5.38', '4.4 and up']


------------------------------------------------
['Flipboard: News For Our Time', 'NEWS_AND_MAGAZINES', '


['Candy Crush Saga', 'GAME', '4.4', '22428456', '74M', '500,000,000+', 'Free', '0', 'Everyone', 'Casual', 'July 5, 2018', '1.129.0.2', '4.1 and up']


['Candy Crush Saga', 'GAME', '4.4', '22429716', '74M', '500,000,000+', 'Free', '0', 'Everyone', 'Casual', 'July 5, 2018', '1.129.0.2', '4.1 and up']


['Candy Crush Saga', 'GAME', '4.4', '22430188', '74M', '500,000,000+', 'Free', '0', 'Everyone', 'Casual', 'July 5, 2018', '1.129.0.2', '4.1 and up']


['Candy Crush Saga', 'GAME', '4.4', '22430188', '74M', '500,000,000+', 'Free', '0', 'Everyone', 'Casual', 'July 5, 2018', '1.129.0.2', '4.1 and up']


['Candy Crush Saga', 'FAMILY', '4.4', '22419455', '74M', '500,000,000+', 'Free', '0', 'Everyone', 'Casual', 'July 5, 2018', '1.129.0.2', '4.1 and up']


------------------------------------------------
['Google Chrome: Fast & Secure', 'COMMUNICATION', '4.3', '9642995', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 1, 2018', 'Varies with device', 'Va


['Maps - Navigate & Explore', 'TRAVEL_AND_LOCAL', '4.3', '9231613', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Travel & Local', 'July 31, 2018', 'Varies with device', 'Varies with device']


------------------------------------------------
['AliExpress - Smarter Shopping, Better Living', 'SHOPPING', '4.6', '5916606', 'Varies with device', '100,000,000+', 'Free', '0', 'Teen', 'Shopping', 'August 6, 2018', 'Varies with device', 'Varies with device']


['AliExpress - Smarter Shopping, Better Living', 'SHOPPING', '4.6', '5916569', 'Varies with device', '100,000,000+', 'Free', '0', 'Teen', 'Shopping', 'August 6, 2018', 'Varies with device', 'Varies with device']


['AliExpress - Smarter Shopping, Better Living', 'SHOPPING', '4.6', '5917485', 'Varies with device', '100,000,000+', 'Free', '0', 'Teen', 'Shopping', 'August 6, 2018', 'Varies with device', 'Varies with device']


['AliExpress - Smarter Shopping, Better Living', 'SHOPPING', '4.6', '5911055', 'Varies with de

['Opera Browser: Fast and Secure', 'COMMUNICATION', '4.4', '2473795', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Communication', 'July 31, 2018', '47.1.2249.129326', 'Varies with device']


------------------------------------------------
['O-Star', 'DATING', '4.4', '59', '38M', '5,000+', 'Free', '0', 'Everyone', 'Dating', 'July 19, 2018', '1.0.1', '4.3 and up']


['O-Star', 'DATING', '4.4', '59', '38M', '5,000+', 'Free', '0', 'Everyone', 'Dating', 'July 19, 2018', '1.0.1', '4.3 and up']


['O-Star', 'DATING', '4.4', '59', '38M', '5,000+', 'Free', '0', 'Everyone', 'Dating', 'July 19, 2018', '1.0.1', '4.3 and up']


------------------------------------------------
['PicsArt Photo Studio: Collage Maker & Pic Editor', 'PHOTOGRAPHY', '4.5', '7594559', '34M', '100,000,000+', 'Free', '0', 'Teen', 'Photography', 'August 6, 2018', '9.40.3', '4.0.3 and up']


['PicsArt Photo Studio: Collage Maker & Pic Editor', 'PHOTOGRAPHY', '4.5', '7590099', '34M', '100,000,000+', 'Free',



['A&E - Watch Full Episodes of TV Shows', 'ENTERTAINMENT', '4.0', '29706', '19M', '1,000,000+', 'Free', '0', 'Teen', 'Entertainment', 'July 16, 2018', '3.1.4', '4.4 and up']


['A&E - Watch Full Episodes of TV Shows', 'FAMILY', '4.0', '29708', '19M', '1,000,000+', 'Free', '0', 'Teen', 'Entertainment', 'July 16, 2018', '3.1.4', '4.4 and up']


------------------------------------------------
['Camera FV-5 Lite', 'PHOTOGRAPHY', '4.0', '130081', '5.6M', '10,000,000+', 'Free', '0', 'Everyone', 'Photography', 'November 10, 2017', '3.31.4', '4.0 and up']


['Camera FV-5 Lite', 'PHOTOGRAPHY', '4.0', '130063', '5.6M', '10,000,000+', 'Free', '0', 'Everyone', 'Photography', 'November 10, 2017', '3.31.4', '4.0 and up']


------------------------------------------------
['Cardiac diagnosis (heart rate, arrhythmia)', 'MEDICAL', '4.4', '8', '6.5M', '100+', 'Paid', '$12.99', 'Everyone', 'Medical', 'July 25, 2018', '7', '3.0 and up']


['Cardiac diagnosis (heart rate, arrhythmia)', 'MEDICAL', '4.4',

['American Airlines', 'TRAVEL_AND_LOCAL', '3.7', '16980', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Travel & Local', 'July 18, 2018', 'Varies with device', 'Varies with device']


['American Airlines', 'TRAVEL_AND_LOCAL', '3.7', '16973', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Travel & Local', 'July 18, 2018', 'Varies with device', 'Varies with device']


------------------------------------------------
['Anthem BC Anywhere', 'MEDICAL', '2.6', '496', '24M', '100,000+', 'Free', '0', 'Everyone', 'Medical', 'July 27, 2018', '8.0.226', '4.4 and up']


['Anthem BC Anywhere', 'MEDICAL', '2.6', '496', '24M', '100,000+', 'Free', '0', 'Everyone', 'Medical', 'July 27, 2018', '8.0.226', '4.4 and up']


------------------------------------------------
['Transit: Real-Time Transit App', 'MAPS_AND_NAVIGATION', '4.2', '43269', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Maps & Navigation', 'July 18, 2018', '4.4.7', 'Varies with device']


['

['No Crop & Square for Instagram', 'PHOTOGRAPHY', '4.6', '819694', '25M', '10,000,000+', 'Free', '0', 'Everyone', 'Photography', 'June 26, 2018', '4.2.3', '4.0 and up']


------------------------------------------------
['Hungry Shark World', 'GAME', '4.5', '1242855', '27M', '50,000,000+', 'Free', '0', 'Teen', 'Action', 'July 18, 2018', '3.0.0', '4.2 and up']


['Hungry Shark World', 'GAME', '4.5', '1243017', '27M', '50,000,000+', 'Free', '0', 'Teen', 'Action', 'July 18, 2018', '3.0.0', '4.2 and up']


------------------------------------------------
['iBP Blood Pressure', 'MEDICAL', '4.4', '578', '704k', '10,000+', 'Paid', '$0.99', 'Everyone', 'Medical', 'November 30, 2014', '7.0.1', '2.2 and up']


['iBP Blood Pressure', 'MEDICAL', '4.4', '578', '704k', '10,000+', 'Paid', '$0.99', 'Everyone', 'Medical', 'November 30, 2014', '7.0.1', '2.2 and up']


------------------------------------------------
['Blood Pressure', 'MEDICAL', '4.2', '33033', '7.4M', '5,000,000+', 'Free', '0', 'Everyo

['CM Locker - Security Lockscreen', 'TOOLS', '4.6', '3090680', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Tools', 'July 31, 2018', 'Varies with device', 'Varies with device']


------------------------------------------------
['CM Flashlight (Compass, SOS)', 'TOOLS', '4.4', '166367', '1.8M', '5,000,000+', 'Free', '0', 'Everyone', 'Tools', 'May 25, 2018', '1.4.0', '2.3 and up']


['CM Flashlight (Compass, SOS)', 'TOOLS', '4.4', '166363', '1.8M', '5,000,000+', 'Free', '0', 'Everyone', 'Tools', 'May 25, 2018', '1.4.0', '2.3 and up']


------------------------------------------------
['Ruler', 'HOUSE_AND_HOME', '4.7', '126', '1.9M', '10,000+', 'Free', '0', 'Everyone', 'House & Home', 'August 21, 2017', '1.0', '4.0 and up']


['Ruler', 'TOOLS', '4.5', '27180', '4.1M', '1,000,000+', 'Free', '0', 'Everyone', 'Tools', 'July 6, 2018', '3.24', '4.1 and up']


------------------------------------------------
['QuickPic - Photo Gallery with Google Drive Support', 'PHOTOGRAPHY'

['Duolingo: Learn Languages Free', 'EDUCATION', '4.7', '6290507', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Education;Education', 'August 1, 2018', 'Varies with device', 'Varies with device']


['Duolingo: Learn Languages Free', 'FAMILY', '4.7', '6294400', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Education;Education', 'August 1, 2018', 'Varies with device', 'Varies with device']


['Duolingo: Learn Languages Free', 'FAMILY', '4.7', '6294397', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Education;Education', 'August 1, 2018', 'Varies with device', 'Varies with device']


['Duolingo: Learn Languages Free', 'FAMILY', '4.7', '6297590', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Education;Education', 'August 6, 2018', 'Varies with device', 'Varies with device']


------------------------------------------------
['Free phone calls, free texting SMS on free number', 'SOCIAL', '4.5', '412725', '35M', '10,000,

['Dictionary - Merriam-Webster', 'BOOKS_AND_REFERENCE', '4.5', '454412', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Books & Reference', 'May 18, 2018', 'Varies with device', 'Varies with device']


------------------------------------------------
['Edmodo', 'EDUCATION', '4.1', '200058', '18M', '10,000,000+', 'Free', '0', 'Everyone', 'Education', 'July 20, 2018', '9.12.5', '4.0.3 and up']


['Edmodo', 'FAMILY', '4.1', '200214', '18M', '10,000,000+', 'Free', '0', 'Everyone', 'Education', 'August 6, 2018', '9.12.6', '4.0.3 and up']


------------------------------------------------
['busuu: Learn Languages - Spanish, English & More', 'EDUCATION', '4.3', '206527', '21M', '10,000,000+', 'Free', '0', 'Everyone 10+', 'Education', 'August 1, 2018', '13.9.0.161', '5.0 and up']


['busuu: Learn Languages - Spanish, English & More', 'EDUCATION', '4.3', '206532', '21M', '10,000,000+', 'Free', '0', 'Everyone 10+', 'Education', 'August 1, 2018', '13.9.0.161', '5.0 and up']


['bu


------------------------------------------------
['All Social Networks', 'SOCIAL', '4.2', '22492', '1.5M', '1,000,000+', 'Free', '0', 'Everyone', 'Social', 'May 21, 2018', '2.4.12', '4.0 and up']


['All Social Networks', 'SOCIAL', '4.2', '22650', '1.5M', '1,000,000+', 'Free', '0', 'Everyone', 'Social', 'May 21, 2018', '2.4.12', '4.0 and up']


------------------------------------------------
['Premier League - Official App', 'SPORTS', '4.3', '63580', '24M', '5,000,000+', 'Free', '0', 'Everyone', 'Sports', 'July 20, 2018', '1.1.5', '4.1 and up']


['Premier League - Official App', 'SPORTS', '4.3', '63782', '24M', '5,000,000+', 'Free', '0', 'Everyone', 'Sports', 'August 7, 2018', '1.1.5', '4.1 and up']


------------------------------------------------
['Farm Heroes Saga', 'GAME', '4.4', '7614130', '70M', '100,000,000+', 'Free', '0', 'Everyone', 'Casual', 'July 26, 2018', '5.1.8', '2.3 and up']


['Farm Heroes Saga', 'GAME', '4.4', '7614271', '70M', '100,000,000+', 'Free', '0', 'Everyo

Many duplicate rows are fully duplicated. At the same time, many duplicates deviate from one another in that the number of reviews are different even though the app names are the same. This is because over time, more and more people will review the app. We must delete all the dup. rows with the lower number of reviews while retaining the row with the highest review count in the data. 

The below code should help with this:

---


In [10]:
reviews_max = {}

for row in android_data[1:]:
    name = row[0]
    n_reviews = float(row[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    if name not in reviews_max:
        reviews_max[name] = n_reviews
        
print( 'The number of unique Apps on the Play Store : ',len(reviews_max))        

The number of unique Apps on the Play Store :  9659


We will now use the dictionary we
created above to remove the duplicate rows:

In [11]:
android_clean = []
already_added = []


for row in android_data[1:]:
    name = row[0]
    if name not in already_added:
        already_added.append(name)
        row[3] = reviews_max[name] 
        android_clean.append(row)
        

Bc I know the app named, "imo free video calls and chat" , is a repeat, 
I am isolating it below to check if my code 
retained the data that has the highest review count

In [12]:
i= 1
while i < len(reviews_max):
    if android_clean[i][0] == 'imo free video calls and chat': 
        print(android_clean[i])
    i +=1
        
        

['imo free video calls and chat', 'COMMUNICATION', '4.3', 4785988.0, '11M', '500,000,000+', 'Free', '0', 'Everyone', 'Communication', 'June 8, 2018', '9.8.000000010501', '4.0 and up']


The check worked!


In [13]:
def charreview1(string):
    for character in string:
        if ord(character) > 127:
            return False
    return True
        
print(charreview1('Instagram'))   
print(charreview1('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(charreview1('Docs To Go™ Free Office Suite'))   
print(charreview1('Instachat 😜'))           


True
False
False
False


Note: Instachat is english but the code returned False since it didn't recognize the ASCII for the emoji. Same with TM superscript to the word GO. Here's how we can fix it so we dont lose data: 

One rule we are going to set is that all English apps with up to three emoji or other special characters will still be labeled as English. Anything more will not be deemed as an enghlish app. Meaning we will remove it from the data. 

With this new rule in mind, we will rewrite this function: 

In [14]:
def is_this_english(string):
    i = 0    
    for character in string:

        if ord(character) > 127 and i < 4:
            i+=1
                
        if i == 4:
            return False
        
    return True
        
print(is_this_english('Instagram'))   
print(is_this_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_this_english('Docs To Go™ Free Office Suite'))   
print(is_this_english('Instachat 😜'))           


True
False
True
True


---

We will use this new function to filter out non-English apps from both datasets. We'll loop through each dataset. If an app name is identified as English, we will append the whole row to a separate list.

Also another fact that we know is that there are no duplicates in the ios dataset. Meanning we can just check the raw data that is given for non-English apps. 

In [15]:
english_only_apps_android = []
for row in android_clean:
    name = row[0]
    if is_this_english(name):
        english_only_apps_android.append(row)
        
print('The number of unique English-only ' 
      'apps for Android is:',len(english_only_apps_android))        
        
print('\n')        
                
    
english_only_apps_ios = []
for row in ios_data[1:]:
    if is_this_english(row[1]):
        english_only_apps_ios.append(row)        
        
print('The number of unique English-only '
      'apps for ios is:', len(english_only_apps_ios))        

The number of unique English-only apps for Android is: 9614


The number of unique English-only apps for ios is: 6183


Our datasets contain both free and non-free apps; we'll need to isolate only the free apps for our analysis.


In [18]:
android_free = []
ios_free = []

for row in english_only_apps_android:
    if row[7] == '0':
        android_free.append(row)
        
print('The number of free unique English-only '
      'apps for Android is:', len(android_free))          
        
for row in english_only_apps_ios:
    if  row[4]== '0.0':
        ios_free.append(row)

print('The number of free unique English-only '
      'apps for ios is:', len(ios_free))      

The number of free unique English-only apps for Android is: 8862
The number of free unique English-only apps for ios is: 3222


---


New goal: determine the kinds of free apps that draw in more users. More users implies more revune we make from ads.

To minimize risk, we look for apps that we can: 

1) Build a minimal Android version of therfore adding it to Google Play.
2) We develop it further if it draws users 
3) If the app is profitable after six months, then build iOS version for the App Store.


We will look through the cleaned data and see what type of app would succeed. 

Columns that would be helpful to look at for our objective would be the following in the data sets:

In the Google data: Genre, Category

in the Apple data: prime genre 

---


In [39]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percent = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percent[key] = percentage 
    
    return table_percent

    
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table: #this is genre and percentage table generated from other function
        tuple_pair = (table[key], key)
        #'Key' is the genre 
        # 'table[key]', is the percentage associated with that genre
        
        table_display.append(tuple_pair)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])    
    

Now, generate frequency tables for the columns prime_genre, Genres, and Category.

In [38]:
# Apple Prime Genre Freq Table 

display_table(ios_free, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


In [41]:
# Google Genres Freq Table 

display_table(android_free, -4)

Tools : 8.429248476641842
Entertainment : 6.070864364703228
Education : 5.348679756262695
Business : 4.5926427443015125
Productivity : 3.8930264048747465
Lifestyle : 3.8930264048747465
Finance : 3.7011961182577298
Medical : 3.5206499661475967
Sports : 3.4642292936131795
Personalization : 3.3175355450236967
Communication : 3.238546603475513
Action : 3.1031369893929135
Health & Fitness : 3.080568720379147
Photography : 2.945159106296547
News & Magazines : 2.798465357707064
Social : 2.663055743624464
Travel & Local : 2.324531708417964
Shopping : 2.2455427668697814
Books & Reference : 2.143985556307831
Simulation : 2.0424283457458814
Dating : 1.8618821936357481
Arcade : 1.8505980591288649
Video Players & Editors : 1.7716091175806816
Casual : 1.7603249830737984
Maps & Navigation : 1.399232678853532
Food & Drink : 1.2412547957571656
Puzzle : 1.128413450688332
Racing : 0.9930038366057323
Role Playing : 0.9365831640713158
Libraries & Demo : 0.9365831640713158
Auto & Vehicles : 0.92529902956443

In [42]:
# Google Category Freq Table 

display_table(android_free, 1)

FAMILY : 18.449559918754233
GAME : 9.873617693522906
TOOLS : 8.440532611148726
BUSINESS : 4.5926427443015125
LIFESTYLE : 3.9043105393816293
PRODUCTIVITY : 3.8930264048747465
FINANCE : 3.7011961182577298
MEDICAL : 3.5206499661475967
SPORTS : 3.39652448657188
PERSONALIZATION : 3.3175355450236967
COMMUNICATION : 3.238546603475513
HEALTH_AND_FITNESS : 3.080568720379147
PHOTOGRAPHY : 2.945159106296547
NEWS_AND_MAGAZINES : 2.798465357707064
SOCIAL : 2.663055743624464
TRAVEL_AND_LOCAL : 2.335815842924848
SHOPPING : 2.2455427668697814
BOOKS_AND_REFERENCE : 2.143985556307831
DATING : 1.8618821936357481
VIDEO_PLAYERS : 1.782893252087565
MAPS_AND_NAVIGATION : 1.399232678853532
EDUCATION : 1.2863913337846988
FOOD_AND_DRINK : 1.2412547957571656
ENTERTAINMENT : 1.128413450688332
LIBRARIES_AND_DEMO : 0.9365831640713158
AUTO_AND_VEHICLES : 0.9252990295644324
HOUSE_AND_HOME : 0.8350259535093659
WEATHER : 0.8011735499887158
EVENTS : 0.7109004739336493
ART_AND_DESIGN : 0.6770480704129994
PARENTING : 0.65

---


Now let's take a look at these tables and see what we can conclude: 
 
**For the Prime Genre in IOS**
- games are the most common app. About 58% of free apps 
- entertainment and photo and video come in second and third. The 3 categories are geared toward personal enjoyment rather than personal growth. Genres focued on growth would be productivity, education, finance, etc. It's easy to be lazy. And the fact that the app store encourages it by making more "lazy-generating apps" available, will slowly create a divide of productivity, thus in the long-run, further widening the income gap. This statment maybe farfetched but this scenario is viable. 
- However, we are only looking availability of apps and not that actual time each one is being used. So we would need to have more data to determine any correlations to scenarios such as the one descrivbed.  


**For the Genres and Category in the Google Play Store**

- Tools, entertainment, games and family are more poular. However, games do not take nearly as big as a percentage than on the App store. The Play Store has more apps for practical purposes. The Genres column is also very detailed. Which makes a comparison between that of the App store a bit harder. More info is necessary.

- Again, these tables only show types of free apps available. It does not show anything about amount of time spent on the apps. 



---

GOAL: Determine the kind of apps with the most users.

In [79]:
#For ios App Store. calc the avg number of user rating per app genre

genres_ios = freq_table(ios_free, -5)

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_free:
        genre_app = app[-5]
        if genre_app == genre:            
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
         
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)




Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


Navigation apps have the highest number of user reviews. Given that Waze and Google maps contributes hevaily to this, it would be best to recalc the averages without accounting for those 2 apps. 



In [84]:
categories_android = freq_table(android_free,1)

for category in categories_android:
    total = 0 
    len_category = 0 
    for app in android_free:
        category_app = app[1]
        if category_app == category:
            num_installs = app[5]
            num_installs = num_installs.replace('+','')
            num_installs = num_installs.replace(',','')
            total += float(num_installs)
            len_category += 1
    avg_n_installs = total/len_category
    print(category, ':', avg_n_installs)
            
            
            


ART_AND_DESIGN : 1905351.6666666667
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 3082017.543859649
ENTERTAINMENT : 21134600.0
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1313681.9054054054
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15837565.085714286
FAMILY : 2691618.159021407
MEDICAL : 120616.48717948717
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17805627.643678162
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10695245.286096256
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24852732.40506329
NEWS_AND_MAGAZINE

Communication apps seem to have the most installs on average. The larger players in communication (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts) seem to have most of the share. The data is biased toward them.

It would be best to avoid developing an app in a space where there are huge giants. The book and reference genre does not seem to have a heavy bias toward a small handfull of companies. Meaning there is potential to potentially be a big player in the app space. See the result of the code below to see the reasoning

In [86]:
for app in android_free:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Wattpad 📖 Free Books : 100,000,000+
Amazon Kindle : 100,000,000+
Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Audiobooks from Audible : 100,000,000+
