## Profitable Mobile App Profile Analysis for the Google Play Store and App Store

The project aims to analyze free mobile app profiles in the application stores. At the end of the project, data driven decisions can be provided to developers and product managers for future projects.

In [1]:
def explore_data(dataset,start,end,rows_and_columns=False):
    dataset_slice= dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
    if rows_and_columns:
        print('Number of rows:',len(dataset))
        print('Number of columns: ',len(dataset[0])) # As it contains the headers

In [2]:
from csv import reader

## Reading App Store data

opened_file = open('Data\AppleStore.csv',encoding='utf8')
read_file = reader(opened_file)
ios = list(read_file)
ios_dataset_header = ios[0]
ios_dataset = ios[1:]


#Reading Google Play Store data
opened_file = open('Data\googleplaystore.csv',encoding='utf8')
read_file = reader(opened_file)
android = list(read_file)
android_dataset_header = android[0]
android_dataset= android[1:]

print('IOS: \n',ios_dataset_header)
print('\n')
print('Android: \n ',android_dataset_header)

IOS: 
 ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


Android: 
  ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


# Data Cleaning
### Deleting Wrong Data
Row 10472 in Google Play Store data set is an incorrect row, so the row will be cleaned. -Found in Kaggle discussion-

In [3]:
explore_data(android_dataset,10472,10473,True)
print('\n')
explore_data(android_dataset,10473,10474,True)

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


Number of rows: 10841
Number of columns:  13


['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


Number of rows: 10841
Number of columns:  13


In [4]:
del android_dataset[10472]

### Detecting and Deleting Duplicated Entries

When we investigate Google Play Store entries we may see there are many duplicated rows. These rows should not be used for analysis that project aims to.

In [5]:
for app in android_dataset:
    app_name= app[0]
    if app_name == 'Box': #Facebook, Instagram, etc..
        print(app_name)
    

Box
Box
Box


In [6]:
duplicated_apps=[]
unique_apps=[]
for app in android_dataset:
    app_name = app[0]
    if app_name in unique_apps:
        duplicated_apps.append(app_name)
    else:
        unique_apps.append(app_name)
print(f"There are {len(duplicated_apps)} duplicated entries in Google Play Store data set")
print('\n')
print(f"Some of the duplicated applications are: \n{duplicated_apps[:10]}")
print('\n')
print(f"The expected data set length is :{len(android_dataset)-len(duplicated_apps)}")


There are 1181 duplicated entries in Google Play Store data set


Some of the duplicated applications are: 
['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


The expected data set length is :9659


As can be seen above there are 1181 duplicated rows, but what should be the criteria for choosing only one row?

The code below shows all Instagram entries in the Google Play Store data set, as you might noticed values in the 4th position is different than each others and 4th row corresponds to the number of reviews column in the data set. This means the data was collected at different times.

By logic, the highest number of reviews should be the latest data, so let's sort them accordingly

In [7]:
for app in android_dataset:
    app_name = app[0]
    if app_name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [8]:
reviews_max = {}
for app in android_dataset:
    name = app[0]
    n_reviews=float(app[3])
    if name in reviews_max and reviews_max[name]<n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name]= n_reviews
        
        
print(f"Expected length {len(android_dataset)-len(duplicated_apps)} ")
print('\n')        
print(f"Actual length {len(reviews_max)} ")
print('\n')
print(f"Highest rating for Instagram is: {reviews_max['Instagram']}") # Can be checked in the previous step
        

Expected length 9659 


Actual length 9659 


Highest rating for Instagram is: 66577446.0


In [9]:
android_clean = [] # Will store cleaned data
already_added = [] # Just for app name storing

for app in android_dataset:
    name = app[0]
    n_reviews = float(app[3])
    # The purpose of list already_added is preventing the cases where the max number of reviews is same for multiple entries
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)
        
print(len(android_clean))
explore_data(android_clean,0,2,True)
    
 

9659
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 9659
Number of columns:  13


### Removing Non English Apps
According to ASCII, all English texts are in the range 0-127. Based on this number range and use of ``` ord(c)```, it is easy to detect whether a char belongs to English charachters or not. This is a good metric however to gain more from our data set we should also include the following scenario and adjust the code accordingly.

There may be some application names containing some special charachters, like emojis. For example, Docs To Go™ Free Office Suite or Instachat 😜, both contains special char or emoji which goes beyond 127. We can limit non English chars to 4 per application name.  

In [10]:
def isEnglish(a_name):
    nonEnglish=0
    for char in a_name:
        if  ord(char) > 127:
            nonEnglish +=1 
        if nonEnglish >3:
            return False
    
    return True

print(isEnglish('Docs To Go™ Free Office Suite'))
print(isEnglish('Instagram'))
print(isEnglish('Instachat 😜😜😜😜'))
print(isEnglish('Docs To Go™ Free Office Suite'))


True
True
False
True


In [11]:
ios_english=[]
android_english=[]

for app in android_clean:
    name = app[0]
    if isEnglish(name):
        android_english.append(app)

for app in ios_dataset:
    name = app[1]
    if isEnglish(name):
        ios_english.append(app)

explore_data(ios_english,0,3,True)
print('\n')
explore_data(android_english,0,3,True)
        



['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 6183
Number of columns:  16


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Var

### Remove Paid Apps
The project aims to analyze free applications, this is why paid applications need to be removed from the data sets.

In [12]:
free_ios =[]
free_android=[]

for app in ios_english:
    price = float(app[4])
    if price == 0.0:
        free_ios.append(app)

        
for app in android_english:
    isPaid = app[6]
    if isPaid == 'Free':
        free_android.append(app)

explore_data(free_ios,0,3,True)
print('\n')
explore_data(free_android,0,3,True)
        

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 3222
Number of columns:  16


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Var

## Most Common Apps by Genre

In [13]:
def freq_table(data_set,index):
    freq_dict= {}
    total = 0
    
    for data in data_set:
        total +=1
        col = data[index]
        if col in freq_dict:
            freq_dict[col] +=1
        else:
            freq_dict[col] = 1
            
    percentages_table = {}
    for key in freq_dict:
        percentage = (freq_dict[key]/total)*100
        percentages_table[key]= percentage
    
    return percentages_table


In [14]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [15]:

display_table(free_ios, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


In [16]:
display_table(free_android,1) #category

FAMILY : 18.898792733837304
GAME : 9.725826469592688
TOOLS : 8.462146000225657
BUSINESS : 4.592124562789123
LIFESTYLE : 3.9038700214374367
PRODUCTIVITY : 3.8925871601038025
FINANCE : 3.7007785174320205
MEDICAL : 3.5315355974275078
SPORTS : 3.396141261423897
PERSONALIZATION : 3.317161232088458
COMMUNICATION : 3.2381812027530184
HEALTH_AND_FITNESS : 3.0802211440821394
PHOTOGRAPHY : 2.944826808078529
NEWS_AND_MAGAZINES : 2.798149610741284
SOCIAL : 2.6627552747376737
TRAVEL_AND_LOCAL : 2.335552296062281
SHOPPING : 2.245289405393208
BOOKS_AND_REFERENCE : 2.1437436533904997
DATING : 1.8616721200496444
VIDEO_PLAYERS : 1.7939749520478394
MAPS_AND_NAVIGATION : 1.399074805370642
FOOD_AND_DRINK : 1.241114746699763
EDUCATION : 1.1621347173643235
ENTERTAINMENT : 0.9590432133589079
LIBRARIES_AND_DEMO : 0.9364774906916393
AUTO_AND_VEHICLES : 0.9251946293580051
HOUSE_AND_HOME : 0.8236488773552973
WEATHER : 0.8010831546880289
EVENTS : 0.7108202640189552
PARENTING : 0.6544059573507841
ART_AND_DESIGN : 0

In [17]:
display_table(free_android,-4)

Tools : 8.450863138892023
Entertainment : 6.070179397495204
Education : 5.348076272142616
Business : 4.592124562789123
Productivity : 3.8925871601038025
Lifestyle : 3.8925871601038025
Finance : 3.7007785174320205
Medical : 3.5315355974275078
Sports : 3.463838429425702
Personalization : 3.317161232088458
Communication : 3.2381812027530184
Action : 3.102786866749408
Health & Fitness : 3.0802211440821394
Photography : 2.944826808078529
News & Magazines : 2.798149610741284
Social : 2.6627552747376737
Travel & Local : 2.324269434728647
Shopping : 2.245289405393208
Books & Reference : 2.1437436533904997
Simulation : 2.042197901387792
Dating : 1.8616721200496444
Arcade : 1.8503892587160102
Video Players & Editors : 1.771409229380571
Casual : 1.7601263680469368
Maps & Navigation : 1.399074805370642
Food & Drink : 1.241114746699763
Puzzle : 1.128286133363421
Racing : 0.9928917973598104
Role Playing : 0.9364774906916393
Libraries & Demo : 0.9364774906916393
Auto & Vehicles : 0.9251946293580051
S

# Most Popular Apps by Genre on the App Store

In order to find out what genres are the most popular is to calculate average number of installs for each app genre. Google Play dataset contains this information in Installs columns however App Store misses this information. As a workaround, it is possible to take total number of user ratings which can be found in the rating_count_tot column

Below code calculates the average number of user ratings per app genre on the App Store

In [18]:
ios_genres= freq_table(free_ios,-5)

for genre in ios_genres:
    total=0
    len_genre=0
    for app in free_ios:
        genre_app=app[-5]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre,':',avg_n_ratings)
    

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


On average, navigation apps have the highest number of user reviews, when we check the App Store data set, it's easy to see that this figure is heavily influenced by Waze and Google Maps. Both apps have almost a half million user reviews together.

In [19]:
for app in free_ios:
    if app[-5] =='Navigation':
        print(app[1], ':', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


We can also see the same pattern in social networking apps, Facebook, Instagram, Printerest ,etc. apps have giant number of user reviews.The same pattern also applies to music apps, like Spotify and Shazam.

Reference apps have 74942 user ratings on average, but again two apps higlights here, the Bible and Dictionary.com have very high ratings.

In [20]:
for app in free_ios:
    if app[-5] == 'Reference':
        print(app[1],':',app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


From the analysis above, it is possible to suggest an app which takes another popular book and turns into an app where  different features can be added besides the raw version of the book. For instance, daily quotes from the book, an audio version of the book, flashcards or quizzes about the book, etc. On top of that, a dictionary can also be added to the app, so users won't need to exit the app to look up some words in an external site or app.

This idea fits well with the fact that the App Store is dominated by for-fun apps. A practical app might have more of a chance to stand out among the huge number of apps on the App Store.

Other popular genres are weather, book, food and drink or finance. The book genre overlaps the idea given above, but for the other genres, it is possible to say that they are not very interesting for the following reasons:

 * Weather apps. People usually does not spend too much time in the app it self, chance of making profit out of app via ads is low. Also finding reliable weather data might require non-free API connections.
 * Food and drink. As there are huge competitors in the market, like DeliveryHero, Mc Donalds, etc. Making a popular food and drink app requires actual cooking-serving and a delivery service
 * Finance apps. In order to make a successfull finance app a good domaing knowledge required. This means hiring a finance expert or financial consultant

# Most Popular Apps by Category on the Google Play Store


In [23]:
display_table(free_android,5) #5 is the installs column

1,000,000+ : 15.728308699086089
100,000+ : 11.55365000564143
10,000,000+ : 10.549475346947986
10,000+ : 10.199706645605326
1,000+ : 8.394448832223853
100+ : 6.916393997517771
5,000,000+ : 6.826131106848697
500,000+ : 5.562450637481666
50,000+ : 4.772650344127271
5,000+ : 4.513144533453684
10+ : 3.542818458761142
500+ : 3.2494640640866526
50,000,000+ : 2.3017037120613786
100,000,000+ : 2.1324607920568655
50+ : 1.9180864267178157
5+ : 0.7898002933543946
1+ : 0.5077287600135394
500,000,000+ : 0.270788672007221
1,000,000,000+ : 0.2256572266726842
0+ : 0.045131445334536835


As can be seen, the data above is not precise, so it's not possible to determine whether an app with installs 100.000+ installs has 100,000 installs or 150,000 or 370,000. However the purpose of my analysis is getting an idea about which apps attract the most users.

Let's leave the numbers as they are, meaning an app with 100.000+ installs has 100.000 installs.

It can also seen from the displayed table above, install column has '+' and ','. This means we need to modify the column values as we loop through. Code below computes the average number of installs for each category.

In [22]:
android_categories= freq_table(free_android,1)

for category in android_categories:
    total=0
    len_category= 0
    for app in free_android:
        app_category= app[1]
        if app_category == category:
            total += float(app[5].replace('+','').replace(',',''))
            len_category +=1
            
    avg_installs= total/len_category
    print(category,':',avg_installs)
            
        

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3697848.1731343283
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

Communication apps have the most installs, the reason behind of this is similar to what we have seen for App Store data, a few apps has over one billion installs, such as Whatsapp, Skype, Gmail, etc.

In [24]:
for app in free_android:
    if app[1] =='COMMUNICATION' and (app[5] =='1,000,000,000+' or app[5]=='500,000,000+' or app[5]=='100,000,000+'):
        print(app[0],':',app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

As we have discussed for App Store, books and reference category is also popular for anrdoid devices. Let's continue to investigate the category books and reference more in depth for Google Play Store.

In [25]:
for app in free_android:
    if app[1]=='BOOKS_AND_REFERENCE':
        print(app[0],':',app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

Again some apps are dominating the category however the books and references still in my interest. 

Some of the dominating apps are Google Play Books (1,000,000,000+), Amazon Kindle (100,000,000+), or Audiobooks from Auidible(100,000,000+).

Besides dominating apps, the market still shows potential. Let's aim somewhere between 1,000,000 and 100,000,000 downloads.

In [26]:
for app in free_android:
    if app[1]=='BOOKS_AND_REFERENCE' and (app[5]=='1,000,000+' or app[5]=='5,000,000+' or app[5]=='10,000,000+' or app[5]=='50,000,000+'):
        print(app[0],':',app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

The outcome above shows that market for our range is dominated by ebook apps and dictionaries or libraries, this means it's not a good idea to develop similar apps.

As analyzed for the App Store, building an app around a popular book can be profitable. For exapmle, there are many popular apps built around the book Quran. 

Finally, it looks like building an app around a popular book with some new features besides a raw content can be profitable. New features can be the followings:
 - Daily quotes from the book
 - Audio version of the book
 - Quizzes on a book
 - A forum where people discuss about books
 - A library with book compendiums.

# Conclusion

The project analyzed data from App Store and Google Play Store with the aim of recommending an app profile that can be profitable for both markets.

As a result of project, an app around a popular book with some new features (audio version, quizzes, etc.) can be profitable for both App Store and Google Play store.