## Title: Predicting features of a free app to be attractive to users for both Google and App Store

**Goal**: To find features of an app to make it attractive to the users. The app will be deployed in both Google App Store(Android) and in Apple Store. The app is free and in development stage. The more number of users download the app the company will collect more revenues. We need to find out deciding factors for the app to be welcoming to more users.


### Explore the Dataset

By September 2018, there were around 2 million iOS apps on Apps Store and 2.1 million Android apps on Google Play Store. Collecting this 4-million data will be time-consuming and need a lot of resource. Following are two sample datasets:

- `googleplaystore.csv`: ten thousand Android apps from Google Play
- `AppleStore.csv` : seven thousand iOS apps from App Store

To explore the data, we will make function named explore_dataset() that takes in the data set, index of start and end row, also the Boolean parameter to show the number of row and column.

In [58]:
#define a function to explore a dataset
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') 

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        print('\n') 

In [59]:
def read_dataset(dataset):
    from csv import reader 
    
    #open the file
    f = open(dataset)
    data = reader(f)
    data = list(data)
    
    #return without header
    return data[1:]

In [60]:
#read data files
ios_data = read_dataset('AppleStore.csv')
android_data = read_dataset('googleplaystore.csv')

In [61]:
print('--------------IOS dataset---------------','\n')

explore_data(ios_data,0,4, True)

print('--------------Android dataset---------------','\n')
explore_data(android_data,0,4,True)

--------------IOS dataset--------------- 

['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


['4', '282614216', 'eBay: Best App to Buy, Sell, Save! Online Shopping', '128512000', 'USD', '0', '262241', '649', '4', '4.5', '5.10.0', '12+', 'Shopping', '37', '5', '9', '1']


Number of rows: 7197
Number of columns: 17


--------------Android dataset--------------- 

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', '

In [62]:
#column names of the data set
print(ios_header)
print('\n')
print(android_header)

['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


#### Coumns can be useful from the datasets:

1. ios: `prime_genre, cont_rating, user_rating_ver, size`
2. android: `Size, Generes, Ratings, Reviews`

### Data Cleaning
We are interested for free apps and also the apps in english-language. Consider the datapoints which will satify those requirements.

In [67]:
print(android_header,'\n')
print(android[10472],'\n')
print(android[10471])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up'] 

['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up']


**There is no `Genres` mentioned for the android data in row number 10472(without considering header)**, so delete that row.

In [12]:
# delete that row
del android[10472]

## Create a function to count duplicate app for both ios and android
There are many duplicate apps in the dataset, let's count  them.

In [69]:
#duplicate apps in Google App Store
for app in android:
    if app[0] == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [70]:
#make a dictionary to calculate the frequency of duplicate and unique apps
duplicate_apps = []
unique_apps= []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

In [71]:
print('Number of unique apps : ', len(unique_apps))
print('Number of duplicate apps : ', len(duplicate_apps))

Number of unique apps :  9659
Number of duplicate apps :  1181


In [74]:
print('Some o fthe Duplicate apps in Android {}:'.format(duplicate_apps[:10]))

Some o fthe Duplicate apps in Android ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']:


### Creating a frequency table for duplicate apps

In [79]:
# Cthe dictionary
duplicate = {}

#initialize the dict
for app in duplicate_apps:
    duplicate[app] = 1
    
#count the frequency by looping over the list  
for app in duplicate_apps:
    # if app == 'Instagram':
    #print('duplicate:', app )
    duplicate[app] +=1

#check for Instagram       
print('duplicate apps for Instagram :', duplicate['Instagram'] ,'\n')

duplicate apps for Instagram : 4 



In [78]:
duplicate

{'Quick PDF Scanner + OCR FREE': 3,
 'Box': 3,
 'Google My Business': 3,
 'ZOOM Cloud Meetings': 2,
 'join.me - Simple Meetings': 3,
 'Zenefits': 2,
 'Google Ads': 3,
 'Slack': 3,
 'FreshBooks Classic': 2,
 'Insightly CRM': 2,
 'QuickBooks Accounting: Invoicing & Expenses': 3,
 'HipChat - Chat Built for Teams': 2,
 'Xero Accounting Software': 2,
 'MailChimp - Email, Marketing Automation': 2,
 'Crew - Free Messaging and Scheduling': 2,
 'Asana: organize team projects': 2,
 'Google Analytics': 2,
 'AdWords Express': 2,
 'Accounting App - Zoho Books': 2,
 'Invoice & Time Tracking - Zoho': 2,
 'Invoice 2go — Professional Invoices and Estimates': 2,
 'SignEasy | Sign and Fill PDF and other Documents': 2,
 'Genius Scan - PDF Scanner': 2,
 'Tiny Scanner - PDF Scanner App': 2,
 'Fast Scanner : Free PDF Scan': 2,
 'Mobile Doc Scanner (MDScan) Lite': 2,
 'TurboScan: scan documents and receipts in PDF': 2,
 'Tiny Scanner Pro: PDF Doc Scan': 2,
 'Docs To Go™ Free Office Suite': 2,
 'OfficeSuite : 

### Removing duplicate apps
We need to follow a stretegy where only the latest version of the app having highest number of user ratings will be considered. 

Create a dictionary where each key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.

In [None]:
# dictionary  app-higest review
unique_app_dict = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    #fill the dictionary 
    if name not in unique_app_dict:
        unique_app_dict[name] = n_reviews
    #only with max review numbers
    elif (name in unique_app_dict and unique_app_dict[name] < n_reviews):
        unique_app_dict[name] = n_reviews
        
print('length of unique_app_dict = ', len(unique_app_dict),'\n')

#check the total counts of unique_apps with the new dictionary length
print(len(unique_apps) == len(unique_app_dict))


So, the `unique_app_dict` dictionary will contain all the unique app and their highest number of reviews value for duplicate apps. 

### Write a function to return a dictionary with name of the app and max no of reviews/ratings it gets

In [None]:
# write a function to return a dictionary with name of the app and max no of reviews/ratings it get'''
def maximum_review(dataset):
    
    unique_app_dict = {}

    for app in dataset:
        if dataset == ios:
            name = app[1]
            n_reviews = float(app[6])
            #print(name, n_reviews)
        else:
            name = app[0]
            n_reviews = float(app[3])
        #fill the dictionary 
        if name not in unique_app_dict:
            unique_app_dict[name] = n_reviews
        #only with max review numbers
        elif (name in unique_app_dict and unique_app_dict[name] < n_reviews):
            unique_app_dict[name] = n_reviews
        
    return unique_app_dict

Now create a dataset using the `unique_app_dict` list(which contains only the app name and max reviews received for that app). Using the `unique_app_dict` we need to remove the duplicate rows and create a clean dataset.
Why 10054 entries(should be 9659), if I apply the contidion to fill the clean list `if name in unique_app_dict and n_reviews == unique_app_dict[name]`

In [None]:
unique_app_dict_ios=maximum_review(ios)

In [20]:
#for ios
ios_clean = []
already_added_ios = []

unique_app_dict_ios=maximum_review(ios)
for app in ios:
    name = app[1]
    n_reviews = float(app[6])
 
    #if the app name is in review_max list and matches with max value
    if name not in already_added_ios and n_reviews == unique_app_dict_ios[name] :
        ios_clean.append(app)
        already_added_ios.append(name)
        
print('ios clean = ', len(ios_clean))
print('ios  = ', len(ios))

ios clean =  7197
ios  =  7197


ios duplicate app information
2
7195
['Mannequin Challenge', 'VR Roller Coaster']

In [21]:
#new dataset list

# For android 
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
 
    #if the app name is in review_max list and matches with max value
    if name not in already_added and n_reviews == reviews_max[name] :
        android_clean.append(app)
        already_added.append(name)
        
print(len(android_clean))

9659


Write a function that takes in a string and returns False if there's any character in the string that doesn't belong to the set of common English characters, otherwise it returns True.}**

-  <font color=Blue> If I write the following piece of code then the 1st character of the name string will return True and the loop will **NOT** check the next character of the string </font>

``` python
def check_language(name):
    for char in name:
        print('char = ', char)
        if ord(char) <= 127:
            return True
         else:
            return False
```
-  So the correct code to check all the character of a string is(**Pay attention to the print statement**) :
```python
def check_language(name):
    for char in name:
        print('char = ', char)
        if ord(char) > 127:
            return False

In [22]:
# function to detect non-english language app name
def check_language(name):
    for char in name:
        #print('char = ', char)
        if ord(char) > 127:
            return False
        
print(check_language('Instagram'))
print(check_language('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(check_language('Docs To Go™ Free Office Suite'))
print(check_language('Instachat 😜'))

None
False
False
False


Still there is problem in the code. Because it could not identify Instagram 😜 as an english app name due to the emoji sign whose number is  >> 127. If we're going to use the function we've created, we'll lose useful data since many English apps will be incorrectly labeled as non-English. To minimize the impact of data loss, we'll only remove an app if its name has more than three characters with corresponding numbers falling outside the ASCII range. This means all English apps with up to three emoji or other special characters will still be labeled as English. Our filter function is still not perfect, but it should be fairly effective.

In [23]:
def check_language(name):
    count = 0
    for char in name:
        #print('char = ', char)
        if ord(char) > 127:
            count += 1
            #print('count =', count)
    if count > 3:
        return False
    else:
        return True
print(check_language('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(check_language('Docs To Go™ Free Office Suite'))
print(check_language('Instachat 😜'))  

False
True
True


Use the new function to filter out non-English apps from both data sets. Loop through each data set. If an app name is identified as English, append the whole row to a separate list.

In [24]:
#remove non-English app name 

## for android_clean data
android_eng = []
for app in android_clean:
    #print(app[0])
    if check_language(app[0]):
        android_eng.append(app)
        
print('android', len(android_eng))

##for ios
ios_eng = []
for app in ios:
    #print(app[1])
    if check_language(app[1]):
        ios_eng.append(app)
        
print('ios', len(ios_eng))

android 9614
ios 7197


In [25]:
# explore the data sets
print('android data \n')
explore_data(android_eng,0,3)
print('ios data \n')
explore_data(ios_eng,0,3)

android data 

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


ios data 

['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5'

**Observation** : 
- There were 9659 apps altogether in the clean **android** dataset(after removing the duplicate apps). Now total english names are 9614.
- For **ios**, 6183 apps have englisg name out of 7197 apps.

### Next : remove the 'non-free' app

In [26]:
android_free = []
ios_free = []

##android
for app in android_eng:
    price = app[7]
    if price == '0' :
        android_free.append(app)
##ios         
for app in ios_eng:
    price = app[4]
    #print(type(price))
    if price == '0.0':
        ios_free.append(app)
        
print('Free ios apps  = ', len(ios_free))
print('Free android apps = ', len(android_free))

Free ios apps  =  0
Free android apps =  8864


The aim is to determine the kinds of apps that are more attractive to users because our revenue is highly influenced by the number of people using our apps.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

- Build a minimal Android version of the app, and add it to Google Play.
- If the app has a good response from users, we develop it further.
- If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets.

### create a frequency table to identify the decidinng factors for attractive app : genre and percentage frequency

In [27]:
def generate_frequency(dataset, index):
    genres = {}
    entries = 0
    
    for app in dataset:
        # assign value
        genre = app[index]
        #total entry in the dataset
        entries += 1   
        
        ##fill the dictionary    
        if genre in genres:
            #print(genre)
            genres[genre] +=1
        else:
            genres[genre] =1
    #print(genres) 
    
    #genres percentage frequency table
    genres_percentage = {}
    
    for name in genres:
        fraction = genres[name]/entries
        genres_percentage[name] = fraction*100 
        
    return genres_percentage

In [28]:
ios_genres = generate_frequency(ios_free,11)
android_genres = generate_frequency(android_free,9)

# print('--------------ios------------','\n')
print(ios_genres)
# print('--------------android------------')
# print(android_genres)

{}


### arrange the frequency table items in descending order of popularity

In [29]:
def return_value(pairs):
        new_pairs = ()
        for key in pairs:
            #print(key, pairs[key])
            new_pairs = (pairs[key],key)
            print(new_pairs[0], new_pairs[1])
        return new_pairs

In [30]:
def display_frequency(dataset, index):
    
    #call the frequency table
    pairs = generate_frequency(dataset, index)
    
    #rearrange to sort values of each key
    # was thinking to create a function to provide as a key
    
#     values = sorted(pairs,key=return_value(pairs), reverse=True)
#     print(values)
# print it nicely in order
#     for key in values:
#         print(key, values[key] )
        
    new_pairs = []
    tuple_pairs = ()
    for key in pairs:
        tuple_pairs = (pairs[key],key)
        new_pairs.append(tuple_pairs)
        #print(type(pairs[key]))
    display = sorted(new_pairs, reverse=True)
    
    # print it nicely in order
    for key in display:
        print(key[1], ' : ', key[0] )

In [31]:
display_frequency(ios_free,11)

**Analyze the frequency table you generated for the prime_genre column of
the App Store data set.**

- What is the most common genre? What is the runner-up?
Free english app Games(58%) is mostly popular in company followed by Entertainment(7%) and photo-video apps(5%)
- What other patterns do you see?
Most surprising to me education(3.66%) and social networking(3.28%) has similar level of 
importance whereas shopping/sports/utilities apps got small percentage.

- What is the general impression — are most of the apps designed for practical purposes (education, shopping, utilities, productivity, lifestyle) or more for entertainment (games, photo and video, social networking, sports, music)?
Most of the apps aim for entertainment of customers.

- Can you recommend an app profile for the App Store market based on this frequency table alone? If there's a large number of apps for a particular genre, does that also imply that apps of that genre generally have a large number of users?
Yes, I think Games app are more popular in English understood customers and App Store make them more and that's why companies produce them more for better revenews (?)

## why could not sort a tuple ??
```python
ipython-input-128-351a0432a3c4> in display_frequency(pairs)
      8 
      9     #rearranged = return_value(pairs)
---> 10     values = sorted(pairs,key=return_value(pairs), reverse=True)
     11     print(values)
     12 #     new_pairs = []

TypeError: 'tuple' object is not callable
```
That's why could not use the key feature of sorted function
```
values = sorted(pairs,key=return_value(pairs), reverse=True)
```

In [32]:
display_frequency(android_free,9)

Tools  :  8.449909747292418
Entertainment  :  6.069494584837545
Education  :  5.347472924187725
Business  :  4.591606498194946
Productivity  :  3.892148014440433
Lifestyle  :  3.892148014440433
Finance  :  3.7003610108303246
Medical  :  3.531137184115524
Sports  :  3.463447653429603
Personalization  :  3.3167870036101084
Communication  :  3.2378158844765346
Action  :  3.1024368231046933
Health & Fitness  :  3.0798736462093865
Photography  :  2.944494584837545
News & Magazines  :  2.7978339350180503
Social  :  2.6624548736462095
Travel & Local  :  2.3240072202166067
Shopping  :  2.2450361010830324
Books & Reference  :  2.1435018050541514
Simulation  :  2.0419675090252705
Dating  :  1.861462093862816
Arcade  :  1.8501805054151623
Video Players & Editors  :  1.7712093862815883
Casual  :  1.7599277978339352
Maps & Navigation  :  1.3989169675090252
Food & Drink  :  1.2409747292418771
Puzzle  :  1.128158844765343
Racing  :  0.9927797833935018
Role Playing  :  0.9363718411552346
Libraries & D

In [33]:
# category
display_frequency(android_free,1)

FAMILY  :  18.907942238267147
GAME  :  9.724729241877256
TOOLS  :  8.461191335740072
BUSINESS  :  4.591606498194946
LIFESTYLE  :  3.9034296028880866
PRODUCTIVITY  :  3.892148014440433
FINANCE  :  3.7003610108303246
MEDICAL  :  3.531137184115524
SPORTS  :  3.395758122743682
PERSONALIZATION  :  3.3167870036101084
COMMUNICATION  :  3.2378158844765346
HEALTH_AND_FITNESS  :  3.0798736462093865
PHOTOGRAPHY  :  2.944494584837545
NEWS_AND_MAGAZINES  :  2.7978339350180503
SOCIAL  :  2.6624548736462095
TRAVEL_AND_LOCAL  :  2.33528880866426
SHOPPING  :  2.2450361010830324
BOOKS_AND_REFERENCE  :  2.1435018050541514
DATING  :  1.861462093862816
VIDEO_PLAYERS  :  1.7937725631768955
MAPS_AND_NAVIGATION  :  1.3989169675090252
FOOD_AND_DRINK  :  1.2409747292418771
EDUCATION  :  1.1620036101083033
ENTERTAINMENT  :  0.9589350180505415
LIBRARIES_AND_DEMO  :  0.9363718411552346
AUTO_AND_VEHICLES  :  0.9250902527075812
HOUSE_AND_HOME  :  0.8235559566787004
WEATHER  :  0.8009927797833934
EVENTS  :  0.7107400

**Analyze the frequency table you generated for the Category and Genres column of the Google Play data set.**

- What are the most common genres?
Tools(8.5%) and Entertainment(6%) are the most common 
- What other patterns do you see?
Entertainment  :  6.06
Education  :  5.34
Business  :  4.59 
Productivity  :  3.89
Lifestyle  :  3.89
Finance  :  3.70
Medical  :  3.53
Sports  :  3.46
Different genrs app are produced in Google with equal percentage like fun apps. 

For CATEGORY apps, familty is 18% and then GAME 9% and Tools 8%
There are more number of generes apps han the Category apps, so genres apps are more segmented and in detail compare to category app

- Compare the patterns you see for the Google Play market with those you saw for the App Store market.
So different utility apps are made in Google Play compared to dominating GAME apps in App Store.
- Can you recommend an app profile based on what you found so far? Do the frequency tables you generated reveal the most frequent app genres or what genres have the most users?
Goolge Play is more more versetile and they make apps in English for different categories e.g. family fun, utilities, tools etc in similar percentage.
** Not clear how game is part of family in category apps??**

## What kinds of app are more popular

Let's start with calculating the average number of user ratings per app genre on the App Store. To do that, we'll need to:

- Isolate the apps of each genre.
- Sum up the user ratings for the apps of that genre.
- Divide the sum by the number of apps belonging to that genre (not by the total number of apps).
To calculate the average number of user ratings for each genre, we'll use a for loop inside of another for loop.

In [34]:
def user_response(dataset, genre_index, install_index):
    app_responce = {}
    
    #genres frequency table
    genres = generate_frequency(dataset,genre_index)
    #print(genres)
    #loop over each genre
    for genre in genres:
        
        total_install = 0
        app_count = 0
  
        #print('genre -- ', genre)
        #check by printing
        for app in dataset:

            #extract the genre
            genre_app = app[genre_index] 
            #print('genre_app -- ',genre_app)
            
            if genre_app == genre:
                
#                 print('genre -- ', genre,', ', genre_app, '\n')
#                 print('install --', app[install_index])
                if dataset == android_free:
                    app_install = app[install_index].replace('+','')
                    app_install = app_install.replace(',','')
                    app_install = float(app_install)
                else:
                    app_install = float(app[install_index])
                
            
                total_install += app_install
                app_count +=1
        
            
            #print('genre:', genre,', ', app_install,',',app_count ,'\n')
            
        # calculate average install
        avg_response = total_install/app_count
            
        #print('genre:', genre,', avg ',avg_response)
        
        #fill the dictionary
        app_responce[genre] = avg_response
        
    return app_responce

In [35]:
ios_apps = user_response(android_free, 1, 5)

app_list = []

for genre in ios_apps:
    app_tuple = (ios_apps[genre], genre)
    app_list.append(app_tuple)
    
genre_sorted = sorted(app_list, reverse=True)

#now print according to popularity
for genre in genre_sorted:
    print(genre[1], ' : ',genre[0])

COMMUNICATION  :  38456119.167247385
VIDEO_PLAYERS  :  24727872.452830188
SOCIAL  :  23253652.127118643
PHOTOGRAPHY  :  17840110.40229885
PRODUCTIVITY  :  16787331.344927534
GAME  :  15588015.603248259
TRAVEL_AND_LOCAL  :  13984077.710144928
ENTERTAINMENT  :  11640705.88235294
TOOLS  :  10801391.298666667
NEWS_AND_MAGAZINES  :  9549178.467741935
BOOKS_AND_REFERENCE  :  8767811.894736841
SHOPPING  :  7036877.311557789
PERSONALIZATION  :  5201482.6122448975
WEATHER  :  5074486.197183099
HEALTH_AND_FITNESS  :  4188821.9853479853
MAPS_AND_NAVIGATION  :  4056941.7741935486
FAMILY  :  3695641.8198090694
SPORTS  :  3638640.1428571427
ART_AND_DESIGN  :  1986335.0877192982
FOOD_AND_DRINK  :  1924897.7363636363
EDUCATION  :  1833495.145631068
BUSINESS  :  1712290.1474201474
LIFESTYLE  :  1437816.2687861272
FINANCE  :  1387692.475609756
HOUSE_AND_HOME  :  1331540.5616438356
DATING  :  854028.8303030303
COMICS  :  817657.2727272727
AUTO_AND_VEHICLES  :  647317.8170731707
LIBRARIES_AND_DEMO  :  638

### Analyze the results and try to come up with at least one app profile recommendation for the App Store. Note that there's no fixed answer here, and it's perfectly fine if the app profile you recommended is different than the one recommended in the solution notebook.# import matplotlib.pylab as plt

- Increase no of apps productivity on Travel, finance and Books which is together having 90k installation, however the productivity is together $<3 %$

```python
# lists = sorted(ios_apps.items()) # sorted by key, return a list of tuples

# x, y = zip(*lists) # unpack a list of pairs into two tuples

# plt.plot(x, y)
# plt.show()
```
# how to do for android dataset ??

In [36]:
print(user_response(android_free, 1, 5))

{'ART_AND_DESIGN': 1986335.0877192982, 'AUTO_AND_VEHICLES': 647317.8170731707, 'BEAUTY': 513151.88679245283, 'BOOKS_AND_REFERENCE': 8767811.894736841, 'BUSINESS': 1712290.1474201474, 'COMICS': 817657.2727272727, 'COMMUNICATION': 38456119.167247385, 'DATING': 854028.8303030303, 'EDUCATION': 1833495.145631068, 'ENTERTAINMENT': 11640705.88235294, 'EVENTS': 253542.22222222222, 'FINANCE': 1387692.475609756, 'FOOD_AND_DRINK': 1924897.7363636363, 'HEALTH_AND_FITNESS': 4188821.9853479853, 'HOUSE_AND_HOME': 1331540.5616438356, 'LIBRARIES_AND_DEMO': 638503.734939759, 'LIFESTYLE': 1437816.2687861272, 'GAME': 15588015.603248259, 'FAMILY': 3695641.8198090694, 'MEDICAL': 120550.61980830671, 'SOCIAL': 23253652.127118643, 'SHOPPING': 7036877.311557789, 'PHOTOGRAPHY': 17840110.40229885, 'SPORTS': 3638640.1428571427, 'TRAVEL_AND_LOCAL': 13984077.710144928, 'TOOLS': 10801391.298666667, 'PERSONALIZATION': 5201482.6122448975, 'PRODUCTIVITY': 16787331.344927534, 'PARENTING': 542603.6206896552, 'WEATHER': 50

In [37]:
## From the solution book
genres_ios = generate_frequency(ios_free, -5)

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_free:
        genre_app = app[-5]
        if genre_app == genre: 
            #print(app[5])
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)