# App Store / Google Play - Free Apps Data Analysis 

**About:** App store etc  
**Goal:** Find Insights


In [1]:
from csv import reader 

def open_f(data):
    opened = open(data, encoding='utf8')
    readed = reader(opened)
    return list(readed)

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        
        #google only
def explore_data_column(dataset, column, rows_and_columns=False):
    dict_f = {}
    for row in dataset:
        dict_f[row[0]] = row[column]
    
    return dict_f
    if rows_and_columns:
        print('Number of columns:', len(dataset[column]))
        

In [3]:
gplay_data = open_f('googleplaystore.csv')
astore_data = open_f('AppleStore.csv')

google = gplay_data[1:]
google_h = gplay_data[0]

apple = astore_data[1:]
apple_h = astore_data[0]

In [4]:
dup_apps = []
orig_apps = []
 

def dup_finder(data):
    for app in data:
        name = app[0]
        if name in orig_apps:
            dup_apps.append(name)
        else:
            orig_apps.append(name)

In [5]:
del google[10472] #del a row with a wrong rating

In [6]:
dup_finder(google)

In [7]:
print(len(dup_apps))

1181


### Duplication Issue with Play Store Data

By reading documentation & discussions for data set we found out that the Play Store data set contains duplication. 

In oreder to verify it we used dup_finder function to collect these duplicates.


In [8]:
print('Number of duplicates',len(dup_apps))

Number of duplicates 1181


These duplicates won't removed randomly. The entry with the highest number of reviews will be kept since it indicates that this is the last entry others will be removed.

For these two loops were created - one is looping through the Google Play data set and leaving duplicated entries with highest number of reviews. Another adds only unique entries based on the number of reviews & name. 

In [9]:
google_appname_rev = {}

for i in google:
    name = i[0]
    reviews = float(i[3])
    
    if name in google_appname_rev and google_appname_rev[name] < reviews:
        google_appname_rev[name] = reviews 
    
    elif name not in google_appname_rev:
        google_appname_rev[name] = reviews 

In [10]:
google_clean = []
already_added = []

for i in google:
    name = i[0]
    reviews = float(i[3])
    
    if reviews == google_appname_rev[name] and name not in already_added:
        google_clean.append(i)
        already_added.append(name)


In [11]:
print('New Length:',len(google_clean)) #clean data set 

New Length: 9659


In [12]:
def eng_check(stri):
    err = 0
    for i in stri:
        if ord(i) > 127:
            err += 1
            if err == 4:
                return False
        

In [13]:
google_clean_eng = []
google_noneng = []
apple_clean_eng = []
apple_noneng = []

for i in google_clean:
    name = i[0]
    if eng_check(name) == False:
        google_noneng.append(i)
    else:
        google_clean_eng.append(i)
        
for i in apple:
    name = i[1]
    if eng_check(name) == False:
        apple_noneng.append(i)
    else:
        apple_clean_eng.append(i)

In [14]:
print('Google Non-Eng Apps',len(google_noneng))
print('Google Eng Apps',len(google_clean_eng))
print('Apple Non-Eng Apps',len(apple_noneng))
print('Apple Eng Apps',len(apple_clean_eng))

Google Non-Eng Apps 45
Google Eng Apps 9614
Apple Non-Eng Apps 1014
Apple Eng Apps 6183


### Isolating Free Apps

In [15]:
def free_apps_finder(data, list_to_add, app_store = False):
    
    column = 7
    
    if app_store == True:
        column = 4
    
    for i in data:
        price_raw = i[column]
        if "$" in i[column]:
            price = float(price_raw[1:])
        else:
            price = float(price_raw)
            
        if price == 0:
            list_to_add.append(i)

In [16]:
#isolation free apps

google_free_apps = []
apple_free_apps = []

free_apps_finder(google_clean_eng,google_free_apps)
free_apps_finder(apple_clean_eng,apple_free_apps,True)


In [17]:
print("Number of Free Apps on Google Play",len(google_free_apps))
print("Number of Free Apps on Apple Store",len(apple_free_apps))

Number of Free Apps on Google Play 8864
Number of Free Apps on Apple Store 3222


# Finding an app profie

Since our goal is to build a free app for both sotores we want to find a case which would works as an example/role model.

We will start by analysing Generes to find the best one for the upcoming app.

In [18]:
def freq_table(dataset, index):
    counter = {}
    counterpr = {}
    for i in dataset:
        ind = i[index]
        if ind in counter:
            counter[ind] += 1
        else:
            counter[ind] = 1            
    for i in counter:
        counterpr[i] = round((counter[i] / len(dataset)*100),2)
        
    return counterpr
            

In [19]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

#### Freq. Table For Play Store Categories

In [20]:
display_table(google_free_apps,1)

FAMILY : 18.91
GAME : 9.72
TOOLS : 8.46
BUSINESS : 4.59
LIFESTYLE : 3.9
PRODUCTIVITY : 3.89
FINANCE : 3.7
MEDICAL : 3.53
SPORTS : 3.4
PERSONALIZATION : 3.32
COMMUNICATION : 3.24
HEALTH_AND_FITNESS : 3.08
PHOTOGRAPHY : 2.94
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.66
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.25
BOOKS_AND_REFERENCE : 2.14
DATING : 1.86
VIDEO_PLAYERS : 1.79
MAPS_AND_NAVIGATION : 1.4
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16
ENTERTAINMENT : 0.96
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.93
HOUSE_AND_HOME : 0.82
WEATHER : 0.8
EVENTS : 0.71
PARENTING : 0.65
ART_AND_DESIGN : 0.64
COMICS : 0.62
BEAUTY : 0.6


#### Freq. Table For Play Store Genres

In [21]:
display_table(google_free_apps,-4)

Tools : 8.45
Entertainment : 6.07
Education : 5.35
Business : 4.59
Productivity : 3.89
Lifestyle : 3.89
Finance : 3.7
Medical : 3.53
Sports : 3.46
Personalization : 3.32
Communication : 3.24
Action : 3.1
Health & Fitness : 3.08
Photography : 2.94
News & Magazines : 2.8
Social : 2.66
Travel & Local : 2.32
Shopping : 2.25
Books & Reference : 2.14
Simulation : 2.04
Dating : 1.86
Arcade : 1.85
Video Players & Editors : 1.77
Casual : 1.76
Maps & Navigation : 1.4
Food & Drink : 1.24
Puzzle : 1.13
Racing : 0.99
Role Playing : 0.94
Libraries & Demo : 0.94
Auto & Vehicles : 0.93
Strategy : 0.91
House & Home : 0.82
Weather : 0.8
Events : 0.71
Adventure : 0.68
Comics : 0.61
Beauty : 0.6
Art & Design : 0.6
Parenting : 0.5
Card : 0.45
Casino : 0.43
Trivia : 0.42
Educational;Education : 0.39
Board : 0.38
Educational : 0.37
Education;Education : 0.34
Word : 0.26
Casual;Pretend Play : 0.24
Music : 0.2
Racing;Action & Adventure : 0.17
Puzzle;Brain Games : 0.17
Entertainment;Music & Video : 0.17
Casual;

#### Freq. Table For App Store Prime Genres

In [22]:
display_table(apple_free_apps,-5)

Games : 58.16
Entertainment : 7.88
Photo & Video : 4.97
Education : 3.66
Social Networking : 3.29
Shopping : 2.61
Utilities : 2.51
Sports : 2.14
Music : 2.05
Health & Fitness : 2.02
Productivity : 1.74
Lifestyle : 1.58
News : 1.33
Travel : 1.24
Finance : 1.12
Weather : 0.87
Food & Drink : 0.81
Reference : 0.56
Business : 0.53
Book : 0.43
Navigation : 0.19
Medical : 0.19
Catalogs : 0.12


### Isolating Most Popular Genres by Install/Reviews

By analyzing a number of the reviews we can get an understanding of competition in the certain Genere. 

From analyzing numbers below its clear that 'Navigation', 'Reference' & 'Social Networking' are most competitive genres with 86090, 74942, 71548 reviews per app on average. 

Health & Fitness might be a good category for a new app since it has an average market share (2.02%)  and a decent number of reviews (23298).

##### App Store:

In [23]:
apple_genre_table = freq_table(apple_free_apps,-5)
apple_installs_table = {}

for genre in apple_genre_table:
    total = 0
    len_genre = 0 
    for i in apple_free_apps:
        genre_app = i[-5]
        if genre_app == genre:
            n_r = float(i[5])
            total += n_r
            len_genre += 1
    avgres = round(total / len_genre)
    print('App Genre -',genre,'- Revs -', avgres)
#     print('App Genre - ',genre,'\nAvg. App Reviews',avgnumb,'\n')    

App Genre - Photo & Video - Revs - 28442
App Genre - Travel - Revs - 28244
App Genre - Food & Drink - Revs - 33334
App Genre - Business - Revs - 7491
App Genre - Social Networking - Revs - 71548
App Genre - Navigation - Revs - 86090
App Genre - Shopping - Revs - 26920
App Genre - News - Revs - 21248
App Genre - Reference - Revs - 74942
App Genre - Sports - Revs - 23009
App Genre - Book - Revs - 39758
App Genre - Weather - Revs - 52280
App Genre - Utilities - Revs - 18684
App Genre - Medical - Revs - 612
App Genre - Music - Revs - 57327
App Genre - Education - Revs - 7004
App Genre - Finance - Revs - 31468
App Genre - Catalogs - Revs - 4004
App Genre - Games - Revs - 22789
App Genre - Lifestyle - Revs - 16486
App Genre - Entertainment - Revs - 14030
App Genre - Health & Fitness - Revs - 23298
App Genre - Productivity - Revs - 21028


##### Play Store:

From analyzing numbers below its clear that 'COMMUNICATION', 'VIDEO_PLAYERS' & 'SOCIAL' are most competitive genres with 38456119, 24727872, 23253652 installs per app on average.

PRODUCTIVITY might be a good category for a new app since it has an average market share (3.89%) and a decent number of reviews (16787331).

In [34]:
google_genrecat_table = freq_table(google_free_apps,1)

for category in google_genrecat_table:
    total = 0
    len_cat = 0
    for i in google_free_apps:
        cat_app = i[1]
        if cat_app == category:
            installs = float(i[5].replace('+','').replace(',', ''))
            total += installs
            len_cat += 1
                
    avginst = round(total / len_cat)
    print(category, avginst)
                

MEDICAL 120551
SOCIAL 23253652
HEALTH_AND_FITNESS 4188822
FOOD_AND_DRINK 1924898
BOOKS_AND_REFERENCE 8767812
BEAUTY 513152
BUSINESS 1712290
TRAVEL_AND_LOCAL 13984078
EVENTS 253542
EDUCATION 1833495
HOUSE_AND_HOME 1331541
PHOTOGRAPHY 17840110
FINANCE 1387692
SPORTS 3638640
PRODUCTIVITY 16787331
TOOLS 10801391
VIDEO_PLAYERS 24727872
MAPS_AND_NAVIGATION 4056942
PARENTING 542604
WEATHER 5074486
LIBRARIES_AND_DEMO 638504
LIFESTYLE 1437816
ART_AND_DESIGN 1986335
FAMILY 3695642
PERSONALIZATION 5201483
COMMUNICATION 38456119
COMICS 817657
NEWS_AND_MAGAZINES 9549178
GAME 15588016
ENTERTAINMENT 11640706
AUTO_AND_VEHICLES 647318
SHOPPING 7036877
DATING 854029
