# Alppaka: Analysis Of Profitable App Profiles for the App Store and Google Play Markets (refactored to pandas)

## What is this project about?
Analyze data from the App Store and the Google Play Store to identify the most profitable mobile apps in order to suggest data-driven decisions for type of features or/and products that should to be implemented.

## What is its goal?
Develop personal knowledge and essential skills for data analysis in Python, in this case especially with Pandas library.

## Next steps
- <s>Import pandas and matplotlib</s>
- Data visualization
- More conclusions

## Resources
- Dataquest.io:
https://app.dataquest.io/m/350/guided-project%3A-profitable-app-profiles-for-the-app-store-and-google-play-markets
- App Store data set:
https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home
- Google Play Store data set: https://www.kaggle.com/lava18/google-play-store-apps/home



In [158]:
from csv import reader
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
import numpy as np
from pathlib import Path
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()


from scipy.stats import shapiro
from scipy.stats import ttest_ind
from scipy.stats import f

# Google Play Store data set
android = open('googleplaystore.csv')
df_android = pd.read_csv('googleplaystore.csv')

# Apple Store data set
ios = open('AppleStore.csv')
df_ios = pd.read_csv('AppleStore.csv')

In [159]:
# Checking the data types - must be pandas data frames
print(type(df_android))
print(type(df_ios))

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>


In [160]:
df_android.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up


In [161]:
df_ios.head()

Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
0,284882215,Facebook,389879808,USD,0.0,2974676,212,3.5,3.5,95.0,4+,Social Networking,37,1,29,1
1,389801252,Instagram,113954816,USD,0.0,2161558,1289,4.5,4.0,10.23,12+,Photo & Video,37,0,29,1
2,529479190,Clash of Clans,116476928,USD,0.0,2130805,579,4.5,4.5,9.24.12,9+,Games,38,5,18,1
3,420009108,Temple Run,65921024,USD,0.0,1724546,3842,4.5,4.0,1.6.2,9+,Games,40,5,1,1
4,284035177,Pandora - Music & Radio,130242560,USD,0.0,1126879,3594,4.0,4.5,8.4.1,12+,Music,37,4,1,1


In [162]:
# Check number of apps and columns for Google Play Store

print("Number of apps:", len(df_android))
print("Number of columns:", len(df_android.columns))

Number of apps: 10841
Number of columns: 13


In [163]:
# Check number of apps and columns for Apple Store

print("Number of apps:", len(df_ios))
print("Number of columns:", len(df_ios.columns))

Number of apps: 7197
Number of columns: 16


## Data cleaning

- Checking values of columns
- Checking the row/s with n/a value
- Removing duplicates entities
- Removing non-English apps
- Isolating the free apps

### Checking values of columns for Android

In [164]:
# Check values of column 'Category'
df_android['Category'].unique()

array(['ART_AND_DESIGN', 'AUTO_AND_VEHICLES', 'BEAUTY',
       'BOOKS_AND_REFERENCE', 'BUSINESS', 'COMICS', 'COMMUNICATION',
       'DATING', 'EDUCATION', 'ENTERTAINMENT', 'EVENTS', 'FINANCE',
       'FOOD_AND_DRINK', 'HEALTH_AND_FITNESS', 'HOUSE_AND_HOME',
       'LIBRARIES_AND_DEMO', 'LIFESTYLE', 'GAME', 'FAMILY', 'MEDICAL',
       'SOCIAL', 'SHOPPING', 'PHOTOGRAPHY', 'SPORTS', 'TRAVEL_AND_LOCAL',
       'TOOLS', 'PERSONALIZATION', 'PRODUCTIVITY', 'PARENTING', 'WEATHER',
       'VIDEO_PLAYERS', 'NEWS_AND_MAGAZINES', 'MAPS_AND_NAVIGATION',
       '1.9'], dtype=object)

In [165]:
# Check which row contains value `1.9` in the column `Category`
array = ['1.9']
df_android.loc[df_android['Category'].isin(array)]

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
10472,Life Made WI-Fi Touchscreen Photo Frame,1.9,19.0,3.0M,"1,000+",Free,0,Everyone,,"February 11, 2018",1.0.19,4.0 and up,


In [166]:
# Delete row 10472
df_android = df_android.drop(10472, axis=0)
df_android.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up


In [167]:
# Check values of column 'Rating'
df_android['Rating'].unique()

array([4.1, 3.9, 4.7, 4.5, 4.3, 4.4, 3.8, 4.2, 4.6, 3.2, 4. , nan, 4.8,
       4.9, 3.6, 3.7, 3.3, 3.4, 3.5, 3.1, 5. , 2.6, 3. , 1.9, 2.5, 2.8,
       2.7, 1. , 2.9, 2.3, 2.2, 1.7, 2. , 1.8, 2.4, 1.6, 2.1, 1.4, 1.5,
       1.2])

In [168]:
# Check values of column 'Reviews'
df_android['Reviews'].unique()

array(['159', '967', '87510', ..., '603', '1195', '398307'], dtype=object)

In [169]:
# Check values of column 'Size'
df_android['Size'].unique()

array(['19M', '14M', '8.7M', '25M', '2.8M', '5.6M', '29M', '33M', '3.1M',
       '28M', '12M', '20M', '21M', '37M', '2.7M', '5.5M', '17M', '39M',
       '31M', '4.2M', '7.0M', '23M', '6.0M', '6.1M', '4.6M', '9.2M',
       '5.2M', '11M', '24M', 'Varies with device', '9.4M', '15M', '10M',
       '1.2M', '26M', '8.0M', '7.9M', '56M', '57M', '35M', '54M', '201k',
       '3.6M', '5.7M', '8.6M', '2.4M', '27M', '2.5M', '16M', '3.4M',
       '8.9M', '3.9M', '2.9M', '38M', '32M', '5.4M', '18M', '1.1M',
       '2.2M', '4.5M', '9.8M', '52M', '9.0M', '6.7M', '30M', '2.6M',
       '7.1M', '3.7M', '22M', '7.4M', '6.4M', '3.2M', '8.2M', '9.9M',
       '4.9M', '9.5M', '5.0M', '5.9M', '13M', '73M', '6.8M', '3.5M',
       '4.0M', '2.3M', '7.2M', '2.1M', '42M', '7.3M', '9.1M', '55M',
       '23k', '6.5M', '1.5M', '7.5M', '51M', '41M', '48M', '8.5M', '46M',
       '8.3M', '4.3M', '4.7M', '3.3M', '40M', '7.8M', '8.8M', '6.6M',
       '5.1M', '61M', '66M', '79k', '8.4M', '118k', '44M', '695k', '1.6M',
     

In [170]:
# Check which rows contain value `Varies with device`
array = ['Varies with device']
df_android.loc[df_android['Size'].isin(array)]

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
37,Floor Plan Creator,ART_AND_DESIGN,4.1,36639,Varies with device,"5,000,000+",Free,0,Everyone,Art & Design,"July 14, 2018",Varies with device,2.3.3 and up
42,Textgram - write on photos,ART_AND_DESIGN,4.4,295221,Varies with device,"10,000,000+",Free,0,Everyone,Art & Design,"July 30, 2018",Varies with device,Varies with device
52,Used Cars and Trucks for Sale,AUTO_AND_VEHICLES,4.6,17057,Varies with device,"1,000,000+",Free,0,Everyone,Auto & Vehicles,"July 30, 2018",Varies with device,Varies with device
67,Ulysse Speedometer,AUTO_AND_VEHICLES,4.3,40211,Varies with device,"5,000,000+",Free,0,Everyone,Auto & Vehicles,"July 30, 2018",Varies with device,Varies with device
68,REPUVE,AUTO_AND_VEHICLES,3.9,356,Varies with device,"100,000+",Free,0,Everyone,Auto & Vehicles,"May 25, 2018",Varies with device,Varies with device
73,PDD-UA,AUTO_AND_VEHICLES,4.8,736,Varies with device,"100,000+",Free,0,Everyone,Auto & Vehicles,"July 29, 2018",2.9,2.3.3 and up
85,CarMax – Cars for Sale: Search Used Car Inventory,AUTO_AND_VEHICLES,4.4,21777,Varies with device,"1,000,000+",Free,0,Everyone,Auto & Vehicles,"August 4, 2018",Varies with device,Varies with device
88,AutoScout24 Switzerland – Find your new car,AUTO_AND_VEHICLES,4.6,13372,Varies with device,"1,000,000+",Free,0,Everyone,Auto & Vehicles,"August 3, 2018",Varies with device,Varies with device
89,Zona Azul Digital Fácil SP CET - OFFICIAL São ...,AUTO_AND_VEHICLES,4.6,7880,Varies with device,"100,000+",Free,0,Everyone,Auto & Vehicles,"May 10, 2018",4.6.5,Varies with device
92,Fuelio: Gas log & costs,AUTO_AND_VEHICLES,4.6,65786,Varies with device,"1,000,000+",Free,0,Everyone,Auto & Vehicles,"August 2, 2018",Varies with device,4.0.3 and up


Since there is a lot of rows containing `Varies with device` value, none of the rows are deleted.

In [171]:
# Check values of column 'Installs'
df_android['Installs'].unique()

array(['10,000+', '500,000+', '5,000,000+', '50,000,000+', '100,000+',
       '50,000+', '1,000,000+', '10,000,000+', '5,000+', '100,000,000+',
       '1,000,000,000+', '1,000+', '500,000,000+', '50+', '100+', '500+',
       '10+', '1+', '5+', '0+', '0'], dtype=object)

In [172]:
# Check values of column 'Type'
df_android['Type'].unique()

array(['Free', 'Paid', nan], dtype=object)

In [173]:
# Check values of column 'Price'
df_android['Price'].unique()

array(['0', '$4.99', '$3.99', '$6.99', '$1.49', '$2.99', '$7.99', '$5.99',
       '$3.49', '$1.99', '$9.99', '$7.49', '$0.99', '$9.00', '$5.49',
       '$10.00', '$24.99', '$11.99', '$79.99', '$16.99', '$14.99',
       '$1.00', '$29.99', '$12.99', '$2.49', '$10.99', '$1.50', '$19.99',
       '$15.99', '$33.99', '$74.99', '$39.99', '$3.95', '$4.49', '$1.70',
       '$8.99', '$2.00', '$3.88', '$25.99', '$399.99', '$17.99',
       '$400.00', '$3.02', '$1.76', '$4.84', '$4.77', '$1.61', '$2.50',
       '$1.59', '$6.49', '$1.29', '$5.00', '$13.99', '$299.99', '$379.99',
       '$37.99', '$18.99', '$389.99', '$19.90', '$8.49', '$1.75',
       '$14.00', '$4.85', '$46.99', '$109.99', '$154.99', '$3.08',
       '$2.59', '$4.80', '$1.96', '$19.40', '$3.90', '$4.59', '$15.46',
       '$3.04', '$4.29', '$2.60', '$3.28', '$4.60', '$28.99', '$2.95',
       '$2.90', '$1.97', '$200.00', '$89.99', '$2.56', '$30.99', '$3.61',
       '$394.99', '$1.26', '$1.20', '$1.04'], dtype=object)

In [174]:
# Check values of column 'Content Rating'
df_android['Content Rating'].unique()

array(['Everyone', 'Teen', 'Everyone 10+', 'Mature 17+',
       'Adults only 18+', 'Unrated'], dtype=object)

In [175]:
# Check values of column 'Genres'
df_android['Genres'].unique()

array(['Art & Design', 'Art & Design;Pretend Play',
       'Art & Design;Creativity', 'Art & Design;Action & Adventure',
       'Auto & Vehicles', 'Beauty', 'Books & Reference', 'Business',
       'Comics', 'Comics;Creativity', 'Communication', 'Dating',
       'Education;Education', 'Education', 'Education;Creativity',
       'Education;Music & Video', 'Education;Action & Adventure',
       'Education;Pretend Play', 'Education;Brain Games', 'Entertainment',
       'Entertainment;Music & Video', 'Entertainment;Brain Games',
       'Entertainment;Creativity', 'Events', 'Finance', 'Food & Drink',
       'Health & Fitness', 'House & Home', 'Libraries & Demo',
       'Lifestyle', 'Lifestyle;Pretend Play',
       'Adventure;Action & Adventure', 'Arcade', 'Casual', 'Card',
       'Casual;Pretend Play', 'Action', 'Strategy', 'Puzzle', 'Sports',
       'Music', 'Word', 'Racing', 'Casual;Creativity',
       'Casual;Action & Adventure', 'Simulation', 'Adventure', 'Board',
       'Trivia', 'Role 

In [176]:
# Check values of column 'Last Updated'
df_android['Last Updated'].unique()

array(['January 7, 2018', 'January 15, 2018', 'August 1, 2018', ...,
       'January 20, 2014', 'February 16, 2014', 'March 23, 2014'],
      dtype=object)

In [177]:
# Check values of column 'Current Ver'
df_android['Current Ver'].unique()

array(['1.0.0', '2.0.0', '1.2.4', ..., '1.0.612928', '0.3.4', '2.0.148.0'],
      dtype=object)

In [178]:
# Check values of column 'Android Ver'
df_android['Android Ver'].unique()

array(['4.0.3 and up', '4.2 and up', '4.4 and up', '2.3 and up',
       '3.0 and up', '4.1 and up', '4.0 and up', '2.3.3 and up',
       'Varies with device', '2.2 and up', '5.0 and up', '6.0 and up',
       '1.6 and up', '1.5 and up', '2.1 and up', '7.0 and up',
       '5.1 and up', '4.3 and up', '4.0.3 - 7.1.1', '2.0 and up',
       '3.2 and up', '4.4W and up', '7.1 and up', '7.0 - 7.1.1',
       '8.0 and up', '5.0 - 8.0', '3.1 and up', '2.0.1 and up',
       '4.1 - 7.1.1', nan, '5.0 - 6.0', '1.0 and up', '2.2 - 7.1.1',
       '5.0 - 7.1.1'], dtype=object)

### Checking values of columns for Apple Store

In [179]:
df_ios.head()

Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
0,284882215,Facebook,389879808,USD,0.0,2974676,212,3.5,3.5,95.0,4+,Social Networking,37,1,29,1
1,389801252,Instagram,113954816,USD,0.0,2161558,1289,4.5,4.0,10.23,12+,Photo & Video,37,0,29,1
2,529479190,Clash of Clans,116476928,USD,0.0,2130805,579,4.5,4.5,9.24.12,9+,Games,38,5,18,1
3,420009108,Temple Run,65921024,USD,0.0,1724546,3842,4.5,4.0,1.6.2,9+,Games,40,5,1,1
4,284035177,Pandora - Music & Radio,130242560,USD,0.0,1126879,3594,4.0,4.5,8.4.1,12+,Music,37,4,1,1


In [180]:
# Check values of column 'size_bytes'
df_ios['size_bytes'].unique()

array([389879808, 113954816, 116476928, ...,  16808960,  91468800,
        83026944])

In [181]:
# Check values of column 'currency'
df_ios['currency'].unique()

array(['USD'], dtype=object)

In [182]:
# Check values of column 'price'
df_ios['price'].unique()

array([  0.  ,   1.99,   0.99,   6.99,   2.99,   7.99,   4.99,   9.99,
         3.99,   8.99,   5.99,  14.99,  13.99,  19.99,  17.99,  15.99,
        24.99,  20.99,  29.99,  12.99,  39.99,  74.99,  16.99, 249.99,
        11.99,  27.99,  49.99,  59.99,  22.99,  18.99,  99.99,  21.99,
        34.99, 299.99,  23.99,  47.99])

In [183]:
# Check values of column 'rating_count_tot'
df_ios['rating_count_tot'].unique()

array([2974676, 2161558, 2130805, ...,       2,       1,       0])

In [184]:
# Check values of column 'rating_count_ver'
df_ios['rating_count_ver'].unique()

array([ 212, 1289,  579, ...,  219,  218,  200])

In [185]:
# Check values of column 'user_rating'
df_ios['user_rating'].unique()

array([3.5, 4.5, 4. , 3. , 5. , 2.5, 2. , 1.5, 1. , 0. ])

In [186]:
# Check values of column 'user_rating_ver'
df_ios['user_rating_ver'].unique()

array([3.5, 4. , 4.5, 5. , 3. , 0. , 2.5, 1.5, 2. , 1. ])

In [187]:
# Check values of column 'ver'
df_ios['ver'].unique()

array(['95.0', '10.23', '9.24.12', ..., '1.1.21', '4.7.02', '1.0.2.5'],
      dtype=object)

In [188]:
# Check values of column 'cont_rating'
df_ios['cont_rating'].unique()

array(['4+', '12+', '9+', '17+'], dtype=object)

In [189]:
# Check values of column 'prime_genre'
df_ios['prime_genre'].unique()

array(['Social Networking', 'Photo & Video', 'Games', 'Music',
       'Reference', 'Health & Fitness', 'Weather', 'Utilities', 'Travel',
       'Shopping', 'News', 'Navigation', 'Lifestyle', 'Entertainment',
       'Food & Drink', 'Sports', 'Book', 'Finance', 'Education',
       'Productivity', 'Business', 'Catalogs', 'Medical'], dtype=object)

In [190]:
# Check values of column 'sup_devices.num'
df_ios['sup_devices.num'].unique()

array([37, 38, 40, 43, 39, 12, 24, 47, 45, 25, 26, 11, 35, 16, 36,  9, 15,
       33, 13, 23])

In [191]:
# Check values of column 'ipadSc_urls.num'
df_ios['ipadSc_urls.num'].unique()

array([1, 0, 5, 4, 3, 2])

In [192]:
# Check values of column 'lang.num'
df_ios['lang.num'].unique()

array([29, 18,  1, 27, 45, 24, 10, 13, 11, 19, 33, 16, 12, 30,  5,  7,  3,
        9, 26, 32, 36, 22, 20,  2,  8, 35, 34,  6, 15, 14,  4, 17, 21, 23,
       43, 42, 46, 56, 39, 31, 25, 75, 69, 47, 37, 41, 28,  0, 55, 58, 40,
       59, 63, 50, 74, 68, 54])

In [193]:
# Check values of column 'vpp_lic'
df_ios['vpp_lic'].unique()

array([1, 0])

All the columns have appropriate value types (numbers, letter etc.)

### Checking the row/s with n/a value

In [194]:
# Check for missing values in Google Play Store
null_columns=df_android.columns[df_android.isnull().any()]
df_android[null_columns].isnull().sum()

Rating         1474
Type              1
Current Ver       8
Android Ver       2
dtype: int64

In [195]:
# Check for missing values in Apple Store
null_columns=df_ios.columns[df_ios.isnull().any()]
df_ios[null_columns].isnull().sum()

Series([], dtype: float64)

`Google Play Store` had some missing values in the data set. This is important to notice while analysing for example `rating` since the number of the missing data is quite high (1485 rows).
However, `Apple Store` is fully completed - no missing values.

### Checking duplicate entries

In [196]:
# Check for duplicates in Google Play Store
print("Number of unique apps:", df_android['App'].nunique())
print("Number of all apps:", len(df_android))

Number of unique apps: 9659
Number of all apps: 10840


In [197]:
# Delete duplicates in Google Play Store (leave app with higher number of reviews)
df_android = df_android.sort_values('Reviews', ascending=False)
df_android = df_android.drop_duplicates(subset='App', keep='first')
print(len(df_android))

9659


In [198]:
# Check for duplicates in Apple Store
print("Number of unique apps:", df_ios['track_name'].nunique())
print("Number of all apps:", len(df_ios))

Number of unique apps: 7195
Number of all apps: 7197


In [199]:
# Delete duplicates in Apple Store (leave app with higher number of reviews)
df_ios = df_ios.sort_values('rating_count_tot', ascending=False)
df_ios = df_ios.drop_duplicates(subset='track_name', keep='first')
print(len(df_ios))

7195


The process of deleting duplicate rows was done properly. The length of `df_android` and `df_ios` in now equal to the length of unique app names in those data sets.

In [200]:
# Check how the Google Play Store data frames look like
df_android.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
2989,GollerCepte Live Score,SPORTS,4.2,9992,31M,"1,000,000+",Free,0,Everyone,Sports,"May 23, 2018",6.5,4.1 and up
4970,Ad Block REMOVER - NEED ROOT,TOOLS,3.3,999,91k,"100,000+",Free,0,Everyone,Tools,"December 17, 2013",3.2,2.2 and up
2723,SnipSnap Coupon App,SHOPPING,4.2,9975,18M,"1,000,000+",Free,0,Everyone,Shopping,"January 22, 2018",1.4,4.3 and up
3079,US Open Tennis Championships 2018,SPORTS,4.0,9971,33M,"1,000,000+",Free,0,Everyone,Sports,"June 5, 2018",7.1,5.0 and up
3229,DreamTrips,TRAVEL_AND_LOCAL,4.7,9971,22M,"500,000+",Free,0,Teen,Travel & Local,"August 6, 2018",1.28.1,5.0 and up


In [201]:
# Check how the Apple Store data frames look like
df_ios.head()

Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
0,284882215,Facebook,389879808,USD,0.0,2974676,212,3.5,3.5,95.0,4+,Social Networking,37,1,29,1
1,389801252,Instagram,113954816,USD,0.0,2161558,1289,4.5,4.0,10.23,12+,Photo & Video,37,0,29,1
2,529479190,Clash of Clans,116476928,USD,0.0,2130805,579,4.5,4.5,9.24.12,9+,Games,38,5,18,1
3,420009108,Temple Run,65921024,USD,0.0,1724546,3842,4.5,4.0,1.6.2,9+,Games,40,5,1,1
4,284035177,Pandora - Music & Radio,130242560,USD,0.0,1126879,3594,4.0,4.5,8.4.1,12+,Music,37,4,1,1


Google Play Store data frame looks a bit messy.

In [202]:
df_android = df_android.sort_values('Category', ascending=True)
df_android.head(1000)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
33,Easy Origami Ideas,ART_AND_DESIGN,4.2,1015,11M,"100,000+",Free,0,Everyone,Art & Design,"January 6, 2018",1.1.0,4.1 and up
12,Tattoo Name On My Photo Editor,ART_AND_DESIGN,4.2,44829,20M,"10,000,000+",Free,0,Teen,Art & Design,"April 2, 2018",3.8,4.1 and up
8888,Spring flowers theme couleurs d t space,ART_AND_DESIGN,5.0,1,2.9M,100+,Free,0,Everyone,Art & Design,"April 18, 2018",1.0.2,4.0 and up
9,Kids Paint Free - Drawing Fun,ART_AND_DESIGN,4.7,121,3.1M,"10,000+",Free,0,Everyone,Art & Design;Creativity,"July 3, 2018",2.8,4.0.3 and up
16,Photo Designer - Write your name with shapes,ART_AND_DESIGN,4.7,3632,5.5M,"500,000+",Free,0,Everyone,Art & Design,"July 31, 2018",3.1,4.1 and up
25,Harley Quinn wallpapers HD,ART_AND_DESIGN,4.8,192,6.0M,"10,000+",Free,0,Everyone,Art & Design,"April 25, 2018",1.5,3.0 and up
47,Little Teddy Bear Colouring Book Game,ART_AND_DESIGN,4.2,85,8.0M,"100,000+",Free,0,Everyone,Art & Design,"December 17, 2017",2.0.0,4.1 and up
4193,صور حرف H,ART_AND_DESIGN,4.4,13,4.5M,"1,000+",Free,0,Everyone,Art & Design,"March 27, 2018",2.0,4.0.3 and up
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
10,Text on Photo - Fonteee,ART_AND_DESIGN,4.4,13880,28M,"1,000,000+",Free,0,Everyone,Art & Design,"October 27, 2017",1.0.4,4.1 and up


### Removing non-English apps

In [203]:
df_android['App']

33                                      Easy Origami Ideas
12                          Tattoo Name On My Photo Editor
8888               Spring flowers theme couleurs d t space
9                            Kids Paint Free - Drawing Fun
16            Photo Designer - Write your name with shapes
                               ...                        
10612                    Clearwater, FL - weather and more
3626     The Weather Channel: Rain Forecast & Storm Alerts
3650                                             Info BMKG
9562                                       Weather 14 Days
3660                         New 2018 Weather App & Widget
Name: App, Length: 9659, dtype: object

In [204]:
# Add a new row to show if the app name is English for Google Play Store

df_android['Lang'] = df_android['App'].apply(lambda x: 'English' if all(ord(letter) < 127 for word in x for letter in word) else 'non-English')

In [205]:
df_android.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,Lang
33,Easy Origami Ideas,ART_AND_DESIGN,4.2,1015,11M,"100,000+",Free,0,Everyone,Art & Design,"January 6, 2018",1.1.0,4.1 and up,English
12,Tattoo Name On My Photo Editor,ART_AND_DESIGN,4.2,44829,20M,"10,000,000+",Free,0,Teen,Art & Design,"April 2, 2018",3.8,4.1 and up,English
8888,Spring flowers theme couleurs d t space,ART_AND_DESIGN,5.0,1,2.9M,100+,Free,0,Everyone,Art & Design,"April 18, 2018",1.0.2,4.0 and up,English
9,Kids Paint Free - Drawing Fun,ART_AND_DESIGN,4.7,121,3.1M,"10,000+",Free,0,Everyone,Art & Design;Creativity,"July 3, 2018",2.8,4.0.3 and up,English
16,Photo Designer - Write your name with shapes,ART_AND_DESIGN,4.7,3632,5.5M,"500,000+",Free,0,Everyone,Art & Design,"July 31, 2018",3.1,4.1 and up,English


In [206]:
df_android.groupby('Lang').size()

Lang
English        9117
non-English     542
dtype: int64

In [209]:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', -1)

df_android[df_android['Lang'] == 'non-English']

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,Lang
4193,صور حرف H,ART_AND_DESIGN,4.4,13,4.5M,"1,000+",Free,0,Everyone,Art & Design,"March 27, 2018",2.0,4.0.3 and up,non-English
2,"U Launcher Lite – FREE Live Cool Themes, Hide Apps",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up,non-English
85,CarMax – Cars for Sale: Search Used Car Inventory,AUTO_AND_VEHICLES,4.4,21777,Varies with device,"1,000,000+",Free,0,Everyone,Auto & Vehicles,"August 4, 2018",Varies with device,Varies with device,non-English
88,AutoScout24 Switzerland – Find your new car,AUTO_AND_VEHICLES,4.6,13372,Varies with device,"1,000,000+",Free,0,Everyone,Auto & Vehicles,"August 3, 2018",Varies with device,Varies with device,non-English
7183,Билеты ПДД CD 2019 PRO,AUTO_AND_VEHICLES,,21,16M,100+,Paid,$1.49,Everyone,Auto & Vehicles,"July 27, 2018",1.49,4.0 and up,non-English
89,Zona Azul Digital Fácil SP CET - OFFICIAL São Paulo,AUTO_AND_VEHICLES,4.6,7880,Varies with device,"100,000+",Free,0,Everyone,Auto & Vehicles,"May 10, 2018",4.6.5,Varies with device,non-English
6244,B y H Niños ES,BOOKS_AND_REFERENCE,4.6,53,16M,"5,000+",Free,0,Everyone,Books & Reference,"September 22, 2015",1.0.2,2.3 and up,non-English
5346,Al Quran Free - القرآن (Islam),BOOKS_AND_REFERENCE,4.7,1777,23M,"50,000+",Free,0,Everyone,Books & Reference,"February 15, 2015",1.1,2.2 and up,non-English
9777,FAHREDDİN er-RÂZİ TEFSİRİ,BOOKS_AND_REFERENCE,,9,20M,"1,000+",Free,0,Everyone,Books & Reference,"March 19, 2018",1.1,4.0.3 and up,non-English
6165,Cъновник BG,BOOKS_AND_REFERENCE,,13,4.1M,"1,000+",Free,0,Everyone,Books & Reference,"January 21, 2017",250,4.0 and up,non-English


In [152]:
df_android.loc[[2, 85, 88, 10173, 4715, 8631, 5105, 192, 6264, 8991, 8980, 7963, ], 'Lang']

'non-English'

In [273]:
print('- ', ord('-')) # < than 127
print(',', ord(',')) # < than 127
print(':', ord(':')) # < than 127
print('.', ord('.')) # < than 127
print('&', ord('&')) # < than 127
print('•', ord('•')) # > than 127
print('°', ord('°')) # > than 127
print('®', ord('®')) # > than 127
print('™', ord('™')) # > than 127
print('★', ord('★')) # > than 127
print('✨', ord('✨')) # > than 127
print('⏰', ord('⏰')) # > than 127
print('📏', ord('📏')) # > than 127
print('#', ord('#')) # < than 127
print('·', ord('·')) # > than 127
print('💘', ord('💘')) # > than 127
print('😘', ord('😘')) # > than 127
print('🔥', ord('🔥')) # > than 127
print('😜', ord('😜')) # > than 127
print('’', ord('’'))
print('🏆', ord('🏆'))
print('/', ord('/'))
print('"', ord('"'))
print('💎', ord('💎'))
print('🌏', ord('🌏'))
print('🚀', ord('🚀'))
print('+', ord('+'))
print('|', ord('|'))
print('(', ord('('))
print(')', ord(')'))
print('🎨', ord('🎨'))
print('😂', ord('😂'))
print('💞', ord('💞'))
#print(ord('🗓️'))
print('℠', ord('℠'))
print('🔔', ord('🔔'))
print('🏠', ord('🏠'))
#print(ord('🇺🇸'))
print('🌸', ord('🌸'))
print('🔫', ord('🔫'))
print('💣', ord('💣'))
print('🍀', ord('🍀'))
print('👍', ord('👍'))
print('►', ord('►'))
print('?', ord('?'))
print('!', ord('!'))
print('❤', ord('❤'))
print('»', ord('»'))
print('📖', ord('📖'))
print('🦄', ord('🦄'))
#print(ord('✔️'))
print('♪', ord('♪'))
print('🐶', ord('🐶'))
print('🎈', ord('🎈'))
print('🐕', ord('🐕'))
print('∞', ord('∞'))
print('🐈', ord('🐈'))
print('😍', ord('😍'))
print('🐬', ord('🐬'))
print('–', ord('–'))
print('—', ord('—'))
print('、', ord('、'))
print('♥', ord('♥'))
print('😄', ord('😄'))
print('👍', ord('👍'))
print('【', ord('【'))
print('】', ord('】'))
print('≡', ord('≡'))
print('V', ord('V'))
#print('', ord(''))

-  45
, 44
: 58
. 46
& 38
• 8226
° 176
® 174
™ 8482
★ 9733
✨ 10024
⏰ 9200
📏 128207
# 35
· 183
💘 128152
😘 128536
🔥 128293
😜 128540
’ 8217
🏆 127942
/ 47
" 34
💎 128142
🌏 127759
🚀 128640
+ 43
| 124
( 40
) 41
🎨 127912
😂 128514
💞 128158
℠ 8480
🔔 128276
🏠 127968
🌸 127800
🔫 128299
💣 128163
🍀 127808
👍 128077
► 9658
? 63
! 33
❤ 10084
» 187
📖 128214
🦄 129412
♪ 9834
🐶 128054
🎈 127880
🐕 128021
∞ 8734
🐈 128008
😍 128525
🐬 128044
– 8211
— 8212
、 12289
♥ 9829
😄 128516
👍 128077
【 12304
】 12305
≡ 8801
V 86


In [270]:
# Add above symbols so that apps with those signs are English still for Google Play Store
symbols = [8226, 176, 174, 8482, 9733, 10024, 9200, 128207, 183, 128152, 128536, 128293, 128540, 8217, 127942, 128142, 127759, 128640, 127912, 128514, 128158, 8480, 128276, 127968, 127800, 128299, 128163, 127808, 128077, 9658, 10084, 187, 128214, 129412, 9834, 128054, 127880, 128021, 8734, 128008, 128525, 128044, 8211, 8212, 12289, 9829, 128516, 128077, 12304, 12305, 8801]
df_android['Lang'] = df_android['App'].apply(lambda x: 'English' if all(ord(letter) < 127 or ord(letter) in symbols for word in x for letter in word) else 'non-English')

In [271]:
df_android.groupby('Lang').size()

Lang
English        9458
non-English    201 
dtype: int64

In [272]:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', -1)

df_android[df_android['Lang'] == 'non-English']

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,Lang
4193,صور حرف H,ART_AND_DESIGN,4.4,13,4.5M,"1,000+",Free,0,Everyone,Art & Design,"March 27, 2018",2.0,4.0.3 and up,non-English
7183,Билеты ПДД CD 2019 PRO,AUTO_AND_VEHICLES,,21,16M,100+,Paid,$1.49,Everyone,Auto & Vehicles,"July 27, 2018",1.49,4.0 and up,non-English
89,Zona Azul Digital Fácil SP CET - OFFICIAL São Paulo,AUTO_AND_VEHICLES,4.6,7880,Varies with device,"100,000+",Free,0,Everyone,Auto & Vehicles,"May 10, 2018",4.6.5,Varies with device,non-English
6244,B y H Niños ES,BOOKS_AND_REFERENCE,4.6,53,16M,"5,000+",Free,0,Everyone,Books & Reference,"September 22, 2015",1.0.2,2.3 and up,non-English
5346,Al Quran Free - القرآن (Islam),BOOKS_AND_REFERENCE,4.7,1777,23M,"50,000+",Free,0,Everyone,Books & Reference,"February 15, 2015",1.1,2.2 and up,non-English
9777,FAHREDDİN er-RÂZİ TEFSİRİ,BOOKS_AND_REFERENCE,,9,20M,"1,000+",Free,0,Everyone,Books & Reference,"March 19, 2018",1.1,4.0.3 and up,non-English
6165,Cъновник BG,BOOKS_AND_REFERENCE,,13,4.1M,"1,000+",Free,0,Everyone,Books & Reference,"January 21, 2017",250,4.0 and up,non-English
10669,Pistolet FN GP35 expliqué,BOOKS_AND_REFERENCE,,2,7.9M,5+,Paid,$5.99,Everyone,Books & Reference,"August 19, 2014",Android 2.0 - 2014,1.6 and up,non-English
5698,日本AV历史,BOOKS_AND_REFERENCE,4.1,215,30M,"10,000+",Free,0,Teen,Books & Reference,"March 6, 2018",1.2,4.0 and up,non-English
8160,Modlitební knížka CZ,BOOKS_AND_REFERENCE,,4,18M,500+,Free,0,Everyone,Books & Reference,"February 4, 2018",4.0,4.0.3 and up,non-English


Most of the apps are correctly assigned to be `non-English` judging by the name, however some of them seem to be English. But due to the fact that in their names there is one foreign sign, the app is not assigned as `English`. I want to change some of them to have `English` value in the column `Lang` manually.

In [149]:
# Change English apps tagged `non_English` to be set as `English`
# Names of apps: 


In [69]:
# Add a new row to show if the app name is English for Apple Store

df_ios['lang'] = df_ios['track_name'].apply(lambda x: 'English' if all(ord(letter) < 127 for word in x for letter in word) else 'non-English')

In [70]:
df_ios.head()

Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic,lang
0,284882215,Facebook,389879808,USD,0.0,2974676,212,3.5,3.5,95.0,4+,Social Networking,37,1,29,1,English
1,389801252,Instagram,113954816,USD,0.0,2161558,1289,4.5,4.0,10.23,12+,Photo & Video,37,0,29,1,English
2,529479190,Clash of Clans,116476928,USD,0.0,2130805,579,4.5,4.5,9.24.12,9+,Games,38,5,18,1,English
3,420009108,Temple Run,65921024,USD,0.0,1724546,3842,4.5,4.0,1.6.2,9+,Games,40,5,1,1,English
4,284035177,Pandora - Music & Radio,130242560,USD,0.0,1126879,3594,4.0,4.5,8.4.1,12+,Music,37,4,1,1,English


In [71]:
df_ios.groupby('lang').size()

lang
English        5705
non-English    1490
dtype: int64

In [72]:
df_ios[df_ios['lang'] == 'non-English'].head(1490)

Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic,lang
24,284815942,Google – Search made just for mobile,179979264,USD,0.00,479440,203,3.5,4.0,27.0,17+,Utilities,37,4,33,1,non-English
26,466965151,The Sims™ FreePlay,695603200,USD,0.00,446880,1832,4.5,4.0,5.29.0,12+,Games,38,5,12,1,non-English
31,543186831,8 Ball Pool™,86776832,USD,0.00,416736,19076,4.5,4.5,3.9.1,4+,Games,38,5,10,1,non-English
42,297368629,Lose It! – Weight Loss Program and Calorie Cou...,182054912,USD,0.00,373835,402,4.0,4.5,8.0.2,4+,Health & Fitness,37,3,1,1,non-English
46,366247306,▻Sudoku,71002112,USD,0.00,359832,17119,4.5,5.0,5.4,4+,Games,40,5,7,1,non-English
52,403858572,Fruit Ninja®,163801088,USD,0.00,327025,82,4.5,4.0,2.5.1,4+,Games,38,5,13,1,non-English
60,290638154,iHeartRadio – Free Music & Radio Stations,116443136,USD,0.00,293228,110,4.0,3.0,8.0.0,12+,Music,37,5,2,1,non-English
67,497595276,The Simpsons™: Tapped Out,86079488,USD,0.00,274501,44,4.0,4.0,4.27.0,12+,Games,38,5,18,1,non-English
68,597986893,Plants vs. Zombies™ 2,98507776,USD,0.00,267394,1763,4.5,4.5,6.0.1,9+,Games,37,5,6,1,non-English
74,1094591345,Pokémon GO,290762752,USD,0.00,257627,1284,3.0,3.5,1.33.1,9+,Games,37,5,8,1,non-English


In [73]:
# Update `android_clean` and `ios` with just English apps

df_android = df_android[df_android['Lang'] == 'English']
df_ios = df_ios[df_ios['lang'] == 'English']

In [74]:
df_android.shape

(9117, 14)

The number of rows is equal to the number of English apps - the update was done properly for Google Play dataframe.

In [75]:
df_ios.shape

(5705, 17)

The number of rows is equal to the number of English apps - the update was done properly for Apple Store dataframe.

The data sets for Google Play Store and App Store were updated to include just English apps.

### Isolating the free apps

In [26]:
# Number of the free and non-free apps

# Google Play Store

print('Number of free apps (Google): ', len(df_android[df_android['Type'] == 'Free']))
print('Number of paid apps (Google): ', len(df_android[df_android['Type'] != 'Free']))

# Apple Store

print('Number of free apps (Apple): ', len(df_ios[df_ios['price'] == '0.0']))
print('Number of paid apps (Apple): ', len(df_ios[df_ios['price'] != '0.0']))

Number of free apps (Google):  105
Number of paid apps (Google):  5
Number of free apps (Apple):  0
Number of paid apps (Apple):  98


  result = method(y)


In [27]:
df_android.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,Lang
172,Ancestry,BOOKS_AND_REFERENCE,4.3,64513,Varies with device,"5,000,000+",Free,0,Everyone,Books & Reference,"July 31, 2018",Varies with device,Varies with device,English
3941,Bible,BOOKS_AND_REFERENCE,4.7,2440695,Varies with device,"100,000,000+",Free,0,Teen,Books & Reference,"August 2, 2018",Varies with device,Varies with device,English
8293,Dictionary,BOOKS_AND_REFERENCE,4.5,264260,Varies with device,"10,000,000+",Free,0,Everyone,Books & Reference,"June 22, 2018",Varies with device,Varies with device,English
7827,CS,BUSINESS,,5,8.3M,100+,Free,0,Everyone,Business,"August 13, 2015",Release 1.0,4.1 and up,English
294,Slack,BUSINESS,4.4,51510,Varies with device,"5,000,000+",Free,0,Everyone,Business,"August 2, 2018",Varies with device,Varies with device,English


In [32]:
# Most common genres in each market

df_android_genres = df_android.groupby(['Genres']).size().reset_index(name='count') 
print(df_android_genres)

                         Genres  count
0                        Action      1
1                     Adventure      1
2                        Arcade      5
3                         Board      2
4             Books & Reference      3
5                      Business      3
6              Card;Brain Games      1
7                        Casino      1
8                        Casual      3
9                        Comics      1
10                Communication      5
11                       Dating      2
12                    Education      3
13         Education;Creativity      1
14                Entertainment      2
15  Entertainment;Music & Video      2
16                       Events      3
17                      Finance      7
18                 Food & Drink      1
19             Health & Fitness      2
20             Libraries & Demo      3
21                    Lifestyle      6
22            Maps & Navigation      1
23                      Medical      3
24             News & Mag

In [29]:
# Most common genres in each market
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [30]:
display_table(ios_english, -5)

Games : 54.860100274947435
Entertainment : 7.261846999838266
Education : 6.6310852337053205
Photo & Video : 5.515122109008572
Utilities : 3.4449296458030085
Productivity : 2.7171276079573023
Health & Fitness : 2.6686074721009216
Music : 2.215752870774705
Social Networking : 2.037845705967977
Sports : 1.6820313763545207
Lifestyle : 1.6011644832605532
Shopping : 1.3747371825974446
Weather : 1.1159631246967492
Travel : 0.9704027171276078
News : 0.9218825812712276
Book : 0.8895358240336406
Reference : 0.8571890667960537
Business : 0.8571890667960537
Finance : 0.7924955523208799
Food & Drink : 0.7116286592269124
Navigation : 0.452854601326217
Medical : 0.3396409509946628
Catalogs : 0.08086689309396733


The data represents percentage distribution of apps genres in App Store.

In [57]:
display_table(android_clean_english, -4)

Tools : 8.602038693571874
Entertainment : 5.793634283336801
Education : 5.231953401289786
Business : 4.358227584772207
Medical : 4.108591637195756
Personalization : 3.900561680882047
Productivity : 3.879758685250676
Lifestyle : 3.775743707093822
Finance : 3.588516746411483
Sports : 3.442895776991887
Communication : 3.2660703141252343
Action : 3.110047846889952
Health & Fitness : 2.995631370917412
Photography : 2.9124193883919283
News & Magazines : 2.600374453921365
Social : 2.485957977948825
Travel & Local : 2.26752652381943
Books & Reference : 2.26752652381943
Shopping : 2.090701060952777
Simulation : 1.9762845849802373
Arcade : 1.9138755980861244
Dating : 1.768254628666528
Casual : 1.7162471395881007
Video Players & Editors : 1.674641148325359
Maps & Navigation : 1.3417932182234242
Puzzle : 1.2377782400665696
Food & Drink : 1.1649677553567712
Role Playing : 1.0817557728312877
Strategy : 0.9777407946744331
Racing : 0.9465363012273768
Libraries & Demo : 0.8737258165175785
Auto & Vehicl

In [58]:
display_table(android_clean_english, 1) # Category

FAMILY : 19.325982941543582
GAME : 9.819013938007073
TOOLS : 8.61244019138756
BUSINESS : 4.358227584772207
MEDICAL : 4.108591637195756
PERSONALIZATION : 3.900561680882047
PRODUCTIVITY : 3.879758685250676
LIFESTYLE : 3.786145204909507
FINANCE : 3.588516746411483
SPORTS : 3.3804867900977738
COMMUNICATION : 3.2660703141252343
HEALTH_AND_FITNESS : 2.995631370917412
PHOTOGRAPHY : 2.9124193883919283
NEWS_AND_MAGAZINES : 2.600374453921365
SOCIAL : 2.485957977948825
TRAVEL_AND_LOCAL : 2.2779280216351157
BOOKS_AND_REFERENCE : 2.26752652381943
SHOPPING : 2.090701060952777
DATING : 1.768254628666528
VIDEO_PLAYERS : 1.6954441439567296
MAPS_AND_NAVIGATION : 1.3417932182234242
FOOD_AND_DRINK : 1.1649677553567712
EDUCATION : 1.1025587684626585
ENTERTAINMENT : 0.9049303099646349
LIBRARIES_AND_DEMO : 0.8737258165175785
AUTO_AND_VEHICLES : 0.8737258165175785
WEATHER : 0.8217183274391513
HOUSE_AND_HOME : 0.7593093405450385
EVENTS : 0.6656958602038693
PARENTING : 0.6240898689411275
ART_AND_DESIGN : 0.6240

The data represents percentage distribution of apps genres and category
in Google Play Store.

In [32]:
#The average number of user ratings per app genre on the App Store

genres_ios = freq_table(ios_english, -5)

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_english:
        genre_app = app[-5]
        if genre_app == genre:            
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)

Social Networking : 60253.84920634921
Photo & Video : 14688.715542521993
Games : 15586.759433962265
Music : 29047.109489051094
Reference : 27037.188679245282
Health & Fitness : 10802.157575757576
Weather : 23145.246376811596
Utilities : 7927.525821596244
Travel : 19030.183333333334
Shopping : 26635.011764705883
News : 16980.315789473683
Navigation : 19370.821428571428
Lifestyle : 8930.373737373737
Entertainment : 8862.409799554565
Food & Drink : 19934.386363636364
Sports : 15350.913461538461
Book : 10359.2
Finance : 23353.530612244896
Education : 2472.278048780488
Productivity : 8508.089285714286
Business : 5149.320754716981
Catalogs : 3465.0
Medical : 648.952380952381


#### Conclusions

The highest number of user ratings per app on the App Store were within entertirement apps including social networking, photo & video apps related and game apps.

In [61]:
#The average number of user ratings per app genre on the Google Play Store

category_android = freq_table(android_clean_english, 1)

for category in category_android:
    total = 0
    len_category = 0
    for app in android_clean_english:
        category_app = app[1]
        if category_app == category:            
            n_installs = app[5]
            n_installs = n_installs.replace('+', '')
            n_installs = n_installs.replace(',', '')
            total += float(n_installs)
            len_category += 1
    avg_n_ratings = total / len_category
    print(category, ':', avg_n_ratings)

AUTO_AND_VEHICLES : 632501.3214285715
FOOD_AND_DRINK : 1891060.2767857143
SPORTS : 3373767.6861538463
ART_AND_DESIGN : 1887285.0
HOUSE_AND_HOME : 1331540.5616438356
DATING : 828971.2176470588
TRAVEL_AND_LOCAL : 13218662.767123288
WEATHER : 4570892.658227848
BUSINESS : 1663758.627684964
LIFESTYLE : 1369954.7774725275
HEALTH_AND_FITNESS : 3972300.388888889
MAPS_AND_NAVIGATION : 3900634.7286821706
SOCIAL : 22961790.384937238
MEDICAL : 96944.49873417722
SHOPPING : 6966908.880597015
PERSONALIZATION : 4086652.4853333333
COMICS : 817657.2727272727
EDUCATION : 1782566.0377358492
BOOKS_AND_REFERENCE : 7641777.871559633
LIBRARIES_AND_DEMO : 630903.6904761905
BEAUTY : 513151.88679245283
FAMILY : 3345018.516684607
TOOLS : 9785955.211352658
PARENTING : 525351.8333333334
VIDEO_PLAYERS : 24121489.079754602
PRODUCTIVITY : 15530942.008042896
PHOTOGRAPHY : 16636241.267857144
COMMUNICATION : 35153714.17515924
NEWS_AND_MAGAZINES : 9472807.04
ENTERTAINMENT : 11375402.298850575
EVENTS : 249580.640625
FINANC

#### Conclusions

The highest number of user ratings per app on the Google App Store were within video apps but also educational/ growth apps including those for books or education.