## Python Project
Analysis of mobile applications for Android and iOS, the target are applications that are free and derive their main source of revenue from in-app advertisements. The goal of this project is to analyze data to help developers understand which types of apps are likely to be seen and engaged by users.

Two data sets each from different mobile app stores the first is data on a collection of approximately 10,000 Android apps from Google Play, the second data set is a collection of approximately 7,000 iOS apps from the App Store.

The Android apps data was collected in August 2018, and the App Store data was collected in July of 2017.

`explore_data()` function:

Takes in four parameters:
1. `dataset`: list of lists.
2. `start`: int. starting index of slice.
3. `end`: int. ending index of slice.
4. `rows_and_columns`: bool. default to `False`

`data_slice`: variable. holds the slice dataset.

`for` row `in` dataset_slice: loops through the slice and with each iteration prints a row and adds a new line.

`if` rows_and_columns: evaluates to `True`, prints the number of rows and columns.

In [1]:
def explore_data(dataset, start, end, rows_and_columns=False):
    
    dataset_slice = dataset[start:end]
    
    for row in dataset_slice:
        print(row)
        print('\n')
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
    

`open_data()` function:
Takes in one paramater:
1. `file_name`: the path of data set.

`open_file`: file object. `open()` function stores file object in `open_file` variable.

from `csv` module import `reader` function.

`read_file`: variable. stores reader object from `reader` function. storing contents of `opened_file`.
`apps_data`: list of lists. stores contents of `read_file`.
`header`: list. stores the first row of list. column values of `csv` file.
`data`: list of lists. stores remaining rows from `csv` file.

function returns two values `header` and `data`

In [2]:
def open_data(file_name):
    
    opened_file = open(file_name, encoding='utf8')
    
    from csv import reader
    read_file = reader(opened_file)
    apps_data = list(read_file)
    
    header = apps_data[0]
    data = apps_data[1:]
    
    return header, data

`android_apps_header` and `android_apps_data`: hold return values of the `open_data()` function for `googleplaystore.csv`
`apple_apps_header` and `apples_apps_data`: hold return values of the `open_data()` function for `AppleStore.csv`

In [3]:
android_apps_header, android_apps_data = open_data('googleplaystore.csv')
apple_apps_header, apple_apps_data = open_data('AppleStore.csv')

In [4]:
print(android_apps_header)
explore_data(android_apps_data, 0, 10, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Every

`row_check` function: takes in two parameters. checks if the column size is valid for each row in the dataset.
1. `data`: dataset we are checking.
2. `header`: used to check the column length for each row.

Function uses a for loop to iterate through each row in `data`. 
The checks if the `row` length is equal to the length of the `header`.
If it isn't it sets `missing_data` to `True`.
The prints out the `row` and the index value associated to the `row`.

If no missing data is found the function prints out the message `No missing data found.`

In [5]:
def row_check(data, header):
    missing_data = False
    for row in data:
        if len(row) != len(header):
            missing_data = True
            print(row)
            print(data.index(row))
    if(missing_data != True):
        print('No missing data found.')

In [6]:
row_check(android_apps_data, android_apps_header)

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
10472


In [7]:
del android_apps_data[10472]
row_check(android_apps_data, android_apps_header)

No missing data found.


In [8]:
print(apple_apps_header)
explore_data(apple_apps_data, 0, 10, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


['429047995', 'Pinterest', '74778624', 'USD', '0.0', '106162

In [9]:
row_check(apple_apps_data, apple_apps_header)

No missing data found.


`duplicate_apps_android` and `duplicate_apps_apple` functions

In [10]:
def duplicate_apps_android(apps_data):
    duplicate_apps = []
    unique_apps = []
    
    for app in apps_data:
        app_name = app[0]
        if app_name in unique_apps:
            duplicate_apps.append(app_name)
        else:
            unique_apps.append(app_name)
            
    print('Number of duplicate apps:', len(duplicate_apps))
    print('Number of unique apps:', len(unique_apps))
    

In [11]:
def duplicate_apps_apple(apps_data):
    duplicate_apps = {}
    unique_apps = {}
    
    for app in apps_data:
        app_id = app[0]
        app_name = app[1]
        if app_id in unique_apps:
            duplicate_apps[app_id] = app_name
        else:
            unique_apps[app_id] = app_name
            
    print('Number of duplicate apps:', len(duplicate_apps))
    print('Number of unique apps:', len(unique_apps))

In [12]:
duplicate_apps_apple(apple_apps_data)

Number of duplicate apps: 0
Number of unique apps: 7197


In [13]:
duplicate_apps_android(android_apps_data)

Number of duplicate apps: 1181
Number of unique apps: 9659


In [14]:
def max_reviews_android(app_data):
    reviews_max = {}
    for app in app_data:
        name = app[0]
        n_reviews = float(app[3])
        
        if name in reviews_max and reviews_max[name] < n_reviews:
            reviews_max[name] = n_reviews
        elif name not in reviews_max:
            reviews_max[name] = n_reviews
    return reviews_max

In [15]:
android_max = max_reviews_android(android_apps_data)
print(len(android_max))

9659


In [16]:
def is_english(str):
    foreign_char = 0
    
    for char in str:
        if ord(char) > 127: 
            foreign_char += 1
            
        if foreign_char > 3:
            return False
        else:
            return True

In [17]:
def clean_android_data(apps_data, dupl_dict):
    android_clean = []
    already_added = []
    
    for app in apps_data:
        name = app[0]
        n_reviews = float(app[3])
        
        if (dupl_dict[name] == n_reviews) and (name not in already_added):
            android_clean.append(app)
            already_added.append(name)
            
    return android_clean

In [18]:
clean_android_list = clean_android_data(android_apps_data, android_max)
explore_data(clean_android_list, 0, 10, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


['Smoke Effect Photo Maker - Smoke Editor', 'ART_AND_DESIGN', '3.8', '178', '19M', '50,000+

`free_app_android` and `free_app_apple`

In [19]:
def free_app_android(apps_data):
    android_clean = []
    for app in apps_data:
        price = app[7]
        if price == '0':
            android_clean.append(app)
    return android_clean
        

In [20]:
def free_app_apple(apps_data):
    apple_clean = []
    for app in apps_data:
        price = app[4]
        if price == '0.0':
            apple_clean.append(app)
    return apple_clean
        

In [21]:
android_english = []
ios_english = []

for app in clean_android_list:
    name = app[0]
    
    if is_english(name):
        android_english.append(app)
        
for app in apple_apps_data:
    name = app[1]
    if is_english(name):
        ios_english.append(app)
        
explore_data(android_english, 0, 10, True)
print('\n')
explore_data(ios_english, 0, 10, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


['Smoke Effect Photo Maker - Smoke Editor', 'ART_AND_DESIGN', '3.8', '178', '19M', '50,000+

In [22]:
android_final = free_app_android(android_english)
apple_final = free_app_apple(ios_english)

print(len(android_final))
print(len(apple_final))

8905
4056


## Validation Strategy
The aim is to deteremine the kinds of apps that are likely to attract more users based on revenue that is influenced by the number of people using the apps.

To minimize risks and overhead, the validation strategy is comprised of three steps:
1. Build a minimal Android version of the app, and add it to the Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the Apple Store.

The end goal is to add on both Google Play and the Apple Store, finding app profiles that are successful in both markets.

Build a frequency table for the `prime_genre` of the Apple Store data set, and for the `Genres` and `Gategory` columns of the Google Play data set.

A function to generate a frequency table to analyze the show percentages.

A function use to display the percentages in a decending order.

`display_table()` function.
- Takes in two parameters: `dataset` and `index`. `dataset` is expected to be a list of lists, and `index` is expected to be an integer.
- Genereates a frequency table using the `freq_table()` function.
- Transforms the frequency table into a list of tuples, then sorts the list in a descending order.
- Prints the entries of the frequency table in descending order.

In [27]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [24]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        
        if value in table:
            table[value] += 1
        else: table[value] = 1
            
        table_percentages = {}
        for key in table:
            percentage = (table[key] / total) * 100
            table_percentages[key] = percentage
            
    return table_percentages

In [25]:
category = 1
genres = 9
prime_genre = 11

In [28]:
display_table(apple_final, -5)

Games : 55.64595660749507
Entertainment : 8.234714003944774
Photo & Video : 4.117357001972387
Social Networking : 3.5256410256410255
Education : 3.2544378698224854
Shopping : 2.983234714003945
Utilities : 2.687376725838264
Lifestyle : 2.3175542406311638
Finance : 2.0710059171597637
Sports : 1.947731755424063
Health & Fitness : 1.8737672583826428
Music : 1.6518737672583828
Book : 1.6272189349112427
Productivity : 1.5285996055226825
News : 1.4299802761341223
Travel : 1.3806706114398422
Food & Drink : 1.0601577909270217
Weather : 0.7642998027613412
Reference : 0.4930966469428008
Navigation : 0.4930966469428008
Business : 0.4930966469428008
Catalogs : 0.22189349112426035
Medical : 0.19723865877712032


In [29]:
display_table(android_final, category)

FAMILY : 18.97810218978102
GAME : 9.70241437394722
TOOLS : 8.433464345873105
BUSINESS : 4.581695676586187
LIFESTYLE : 3.9303761931499155
PRODUCTIVITY : 3.885457608085345
FINANCE : 3.6833239752947784
MEDICAL : 3.5148792813026386
SPORTS : 3.3801235261089273
PERSONALIZATION : 3.312745648512072
COMMUNICATION : 3.2341381246490735
HEALTH_AND_FITNESS : 3.065693430656934
PHOTOGRAPHY : 2.9421673217293653
NEWS_AND_MAGAZINES : 2.829870859067939
SOCIAL : 2.6501965188096577
TRAVEL_AND_LOCAL : 2.3245367770915215
SHOPPING : 2.2459292532285233
BOOKS_AND_REFERENCE : 2.1785513756316677
DATING : 1.8528916339135317
VIDEO_PLAYERS : 1.7967434025828188
MAPS_AND_NAVIGATION : 1.4149354295339696
FOOD_AND_DRINK : 1.235261089275688
EDUCATION : 1.167883211678832
ENTERTAINMENT : 0.9545199326221224
LIBRARIES_AND_DEMO : 0.9320606400898372
AUTO_AND_VEHICLES : 0.9208309938236946
HOUSE_AND_HOME : 0.8197641774284109
WEATHER : 0.7973048848961257
EVENTS : 0.7074677147669848
PARENTING : 0.6513194834362718
ART_AND_DESIGN : 0