# Profitable App Profiles for the App Store and Google Play Markets

Our aim in this project is to find mobile app profiles that are profitable for the App Store and Google Play markets. We're working as data analysts for a company that builds Android and iOS mobile apps, and our job is to enable our team of developers to make data-driven decisions with respect to the kind of apps they build.

At our company, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.

```
```



**Analyzing Data`**

In [1]:
from csv import reader
class Data():
    def __init__(self, datapath):
        opened_file = open(datapath)
        read_file = reader(opened_file)
        self.dataset = list(read_file)
        self.header = self.dataset[0]
        self.data = self.dataset[1:]

    def explore_data(self, start, end, rows_and_columns=False):
        dataset_slice = self.data[start:end]
        for row in dataset_slice:
            print(row)
            print('\n') 

        if rows_and_columns:
            print('Number of rows:', len(self.data)) 
            print('Number of columns:', len(self.data[0]))

    def delete_wrong_data(self, id):
        del self.data[id]

    def verify_check_duplicate_entries(self):
        duplicate_apps = []
        unique_apps = []

        for app in self.data:
            name = app[0]
            if name in unique_apps:
                duplicate_apps.append(name)
            else:
                unique_apps.append(name)
            
        print('Number of duplicate apps:', len(duplicate_apps))
        print('\n')
        print('Examples of duplicate apps:', duplicate_apps[:15])

    def build_dictionary(self):
        self.reviews_max = {}

        for app in self.data:
            name = app[0]
            n_reviews = float(app[3]) 

            if name in self.reviews_max and self.reviews_max[name] < n_reviews: 
                self.reviews_max[name] = n_reviews
                
            elif name not in self.reviews_max:
                self.reviews_max[name] = n_reviews

        print('Expected length:', len(self.data) - 1181)
        print('Actual length:', len(self.reviews_max))

    def delete_duplicate(self):
        data_clean = []
        already_added = []
      
        for app in self.data:
            name = app[0]
            n_reviews = float(app[3])
        
            if (self.reviews_max[name] == n_reviews) and (name not in already_added):
                data_clean.append(app)                                        
                already_added.append(name)
        self.data = data_clean

    def is_english(self, string):
        non_ascii = 0
        for character in string:
            if ord(character) > 127:
                non_ascii += 1
    
        if non_ascii > 3: 
            return False
        else:
            return True
  
    def delete_non_english(self, id):
        english_data = []

        for app in self.data:
            name = app[id]
            if self.is_english(name):
                english_data.append(app)
        self.data = english_data
  
    def isolating_free(self, formact_price, index):
        final_data = []

        for app in self.data:
            price = app[index]
            if price == formact_price:
                final_data.append(app)
        self.data = final_data
        print(len(final_data))

    def freq_table(self, id):
        table = {}
        total = 0
    
        for row in self.data:
            total += 1
            value = row[id]
            if value in table:
                table[value] += 1
            else:
                table[value] = 1
        
        table_percentages = {}

        for key in table:
            percentage = (table[key] / total) * 100
            table_percentages[key] = percentage 
        
        return table_percentages
  
    def display_table(self, id):
        table = self.freq_table(id)
        table_display = []
        for key in table:
            key_val_as_tuple = (table[key], key)
            table_display.append(key_val_as_tuple)
            
        table_sorted = sorted(table_display, reverse = True)
        for entry in table_sorted:
            print(entry[1], ':', entry[0])

In [2]:
ios = Data('AppleStore.csv')
ios.explore_data(0, 3, True)

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 8137: character maps to <undefined>

In [None]:
android = Data('googleplaystore.csv')
android.explore_data(0, 3, True)

In [None]:
android.explore_data(10472, 10473)

In [None]:
android.delete_wrong_data(10472)

In [None]:
android.verify_check_duplicate_entries()

In [None]:
android.build_dictionary()

In [40]:
android.delete_duplicate()

In [41]:
android.explore_data(0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


In [42]:
print(android.is_english('Instagram'))
print(android.is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
False


In [46]:
print(android.is_english('Docs To Go™ Free Office Suite'))
print(android.is_english('Instachat 😜'))

True
True


In [48]:
android.delete_non_english(0)
ios.delete_non_english(1)

In [50]:
android.isolating_free('0', 3)
ios.isolating_free('0.0', 7)

591
283


In [55]:
ios.display_table(-5)

Games : 39.2226148409894
Education : 12.7208480565371
Entertainment : 7.06713780918728
Photo & Video : 6.713780918727916
Utilities : 6.36042402826855
Health & Fitness : 4.593639575971731
Sports : 2.8268551236749118
Social Networking : 2.4734982332155475
Book : 2.4734982332155475
News : 2.1201413427561837
Lifestyle : 1.76678445229682
Weather : 1.4134275618374559
Travel : 1.4134275618374559
Productivity : 1.4134275618374559
Music : 1.4134275618374559
Finance : 1.4134275618374559
Shopping : 1.0600706713780919
Reference : 1.0600706713780919
Navigation : 0.7067137809187279
Medical : 0.7067137809187279
Food & Drink : 0.7067137809187279
Business : 0.35335689045936397


In [56]:
android.display_table(1) #Category

BUSINESS : 13.19796954314721
FAMILY : 11.844331641285956
MEDICAL : 10.490693739424705
TOOLS : 7.2758037225042305
PRODUCTIVITY : 6.598984771573605
LIFESTYLE : 5.414551607445008
PERSONALIZATION : 5.245346869712352
COMMUNICATION : 4.906937394247039
HEALTH_AND_FITNESS : 4.060913705583756
SPORTS : 3.8917089678511
BOOKS_AND_REFERENCE : 3.7225042301184432
DATING : 3.2148900169204735
NEWS_AND_MAGAZINES : 2.707275803722504
FINANCE : 2.5380710659898478
TRAVEL_AND_LOCAL : 2.3688663282571913
SOCIAL : 2.030456852791878
GAME : 2.030456852791878
FOOD_AND_DRINK : 1.5228426395939088
SHOPPING : 1.015228426395939
EVENTS : 1.015228426395939
PHOTOGRAPHY : 0.8460236886632826
HOUSE_AND_HOME : 0.8460236886632826
VIDEO_PLAYERS : 0.676818950930626
AUTO_AND_VEHICLES : 0.676818950930626
WEATHER : 0.5076142131979695
MAPS_AND_NAVIGATION : 0.5076142131979695
LIBRARIES_AND_DEMO : 0.338409475465313
BEAUTY : 0.338409475465313
ART_AND_DESIGN : 0.1692047377326565


In [57]:
android.display_table(-4)

Business : 13.19796954314721
Medical : 10.490693739424705
Tools : 7.2758037225042305
Productivity : 6.598984771573605
Lifestyle : 5.414551607445008
Personalization : 5.245346869712352
Education : 5.245346869712352
Communication : 4.906937394247039
Health & Fitness : 4.060913705583756
Entertainment : 4.060913705583756
Sports : 3.8917089678511
Books & Reference : 3.7225042301184432
Dating : 3.2148900169204735
News & Magazines : 2.707275803722504
Finance : 2.5380710659898478
Travel & Local : 2.3688663282571913
Social : 2.030456852791878
Food & Drink : 1.5228426395939088
Shopping : 1.015228426395939
Events : 1.015228426395939
Photography : 0.8460236886632826
House & Home : 0.8460236886632826
Arcade : 0.8460236886632826
Video Players & Editors : 0.676818950930626
Auto & Vehicles : 0.676818950930626
Weather : 0.5076142131979695
Strategy : 0.5076142131979695
Puzzle : 0.5076142131979695
Maps & Navigation : 0.5076142131979695
Trivia : 0.338409475465313
Role Playing : 0.338409475465313
Libraries