# Before your start:
- Read the README.md file
- Comment as much as you can and use the resources in the README.md file
- Happy learning!

In [40]:
#Import your libraries
import numpy as np
import pandas as pd

# Introduction

In this lab, we will use two datasets. Both datasets contain variables that describe apps from the Google Play Store. We will use our knowledge in feature extraction to process these datasets and prepare them for the use of a ML algorithm.

# Challenge 1 - Loading and Extracting Features from the First Dataset

#### In this challenge, our goals are: 

* Exploring the dataset.
* Identify the columns with missing values.
* Either replacing the missing values in each column or drop the columns.
* Conver each column to the appropriate type.

#### The first dataset contains different information describing the apps. 

Load the dataset into the variable `google_play` in the cell below. The dataset is in the file `googleplaystore.csv`

In [41]:
gp = pd.read_csv('../../data/googleplaystore.csv')

#### Examine all variables and their types in the following cell

In [42]:
# Your code here:
gp.dtypes

App                object
Category           object
Rating            float64
Reviews            object
Size               object
Installs           object
Type               object
Price              object
Content Rating     object
Genres             object
Last Updated       object
Current Ver        object
Android Ver        object
dtype: object

#### Since this dataset only contains one numeric column, let's skip the `describe()` function and look at the first 5 rows using the `head()` function

In [43]:
# Your code here:
gp.describe()

Unnamed: 0,Rating
count,9367.0
mean,4.193338
std,0.537431
min,1.0
25%,4.0
50%,4.3
75%,4.5
max,19.0


In [44]:
gp.sample(10)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
335,Messenger – Text and Video Chat for Free,COMMUNICATION,4.0,56642847,Varies with device,"1,000,000,000+",Free,0,Everyone,Communication,"August 1, 2018",Varies with device,Varies with device
4094,EBookDroid - PDF & DJVU Reader,PRODUCTIVITY,4.5,75951,Varies with device,"5,000,000+",Free,0,Everyone,Productivity,"July 8, 2018",Varies with device,Varies with device
2079,Leo and Tig,FAMILY,4.5,47644,27M,"1,000,000+",Free,0,Everyone,Adventure;Action & Adventure,"May 31, 2018",10.180530,4.0.3 and up
815,Teacher's Gradebook - Additio,EDUCATION,4.2,3241,16M,"100,000+",Free,0,Everyone,Education,"August 1, 2018",4.8.2,4.0.3 and up
9809,2019 Tricks Es File Explores,TOOLS,3.0,4,2.0M,"1,000+",Free,0,Everyone,Tools,"March 16, 2016",1.0,2.2 and up
2283,Lab Values + Medical Reference,MEDICAL,4.5,133,2.8M,"10,000+",Paid,$2.99,Everyone,Medical,"August 2, 2014",3.0,2.3 and up
8271,DC Rider,MAPS_AND_NAVIGATION,3.6,258,6.9M,"50,000+",Free,0,Everyone,Maps & Navigation,"December 24, 2017",2.1,4.0 and up
4821,Zello PTT Walkie Talkie,SOCIAL,4.4,695576,Varies with device,"50,000,000+",Free,0,Everyone,Social,"August 2, 2018",Varies with device,Varies with device
1473,Mortgage by Zillow: Calculator & Rates,HOUSE_AND_HOME,4.3,4435,7.9M,"500,000+",Free,0,Everyone,House & Home,"March 2, 2016",2.6.0.287,4.0 and up
5490,True Skate,SPORTS,4.4,129409,73M,"1,000,000+",Paid,$1.99,Everyone,Sports,"August 4, 2018",1.5.1,4.0.3 and up


#### We can see that there are a few columns that could be coerced to numeric.

Start with the reviews column. We can evaluate what value is causing this column to be of object type finding the non-numeric values in this column. To do this, we recall the `to_numeric()` function. With this function, we are able to coerce all non-numeric data to null. We can then use the `isnull()` function to subset our dataframe using the True/False column that this function generates.

In the cell below, transform the Reviews column to numeric and assign this new column to the variable `Reviews_numeric`. Make sure to coerce the errors.

In [45]:
# Your code here:
gp['reviews_numeric'] = pd.to_numeric(gp.Reviews, errors = 'coerce')

Next, create a column containing True/False values using the `isnull()` function. Assign this column to the `Reviews_isnull` variable.

In [46]:
gp['reviews_isnull'] = gp.reviews_numeric.isna()

Finally, subset the `google_play` with `Reviews_isnull`. This should give you all the rows that contain non-numeric characters.

Your output should look like:

![Reviews_bool.png](../images/reviews-bool.png)

In [47]:
## this is a very weird way to get to this lol
gp.loc[gp.reviews_isnull == True, ['Reviews', 'reviews_numeric']]

Unnamed: 0,Reviews,reviews_numeric
10472,3.0M,


#### We see that Google Play is using a shorthand for millions. 

Let's write a function to transform this data.

Steps:

1. Create a function that returns the correct numeric values of *Reviews*.
1. Define a test string with `M` in the last character.
1. Test your function with the test string. Make sure your function works correctly. If not, modify your functions and test again.

In [48]:
# Your code here

def convert_string_to_numeric(s):
    """
    Convert a string value to numeric. If the last character of the string is `M`, obtain the 
    numeric part of the string, multiply it with 1,000,000, then return the result. Otherwise, 
    convert the string to numeric value and return the result.
    
    Args:
        s: The Reviews score in string format.

    Returns:
        The correct numeric value of the Reviews score.
    """
    if s.endswith('M'):
        return float(s[:-1])*10**6
    else:
        return float(s)

test_string = '4.0M'

convert_string_to_numeric(test_string) == 4000000

True

The last step is to apply the function to the `Reviews` column in the following cell:

In [49]:
# Your code here:
gp.Reviews = gp.Reviews.apply(convert_string_to_numeric)

Check the non-numeric `Reviews` row again. It should have been fixed now and you should see:

![Reviews_bool_fixed.png](../images/reviews-bool-fixed.png)

In [50]:
# Your code here
gp.Reviews.isna().sum()

0

Also check the variable types of `google_play`. The `Reviews` column should be a `float64` type now.

In [51]:
gp.dtypes

App                 object
Category            object
Rating             float64
Reviews            float64
Size                object
Installs            object
Type                object
Price               object
Content Rating      object
Genres              object
Last Updated        object
Current Ver         object
Android Ver         object
reviews_numeric    float64
reviews_isnull        bool
dtype: object

#### The next column we will look at is `Size`. We start by looking at all unique values in `Size`:

*Hint: use `unique()` ([documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.unique.html))*.

In [59]:
# Your code here:
gp.Size.unique()

array(['19M', '14M', '8.7M', '25M', '2.8M', '5.6M', '29M', '33M', '3.1M',
       '28M', '12M', '20M', '21M', '37M', '2.7M', '5.5M', '17M', '39M',
       '31M', '4.2M', '7.0M', '23M', '6.0M', '6.1M', '4.6M', '9.2M',
       '5.2M', '11M', '24M', 'Varies with device', '9.4M', '15M', '10M',
       '1.2M', '26M', '8.0M', '7.9M', '56M', '57M', '35M', '54M', '201k',
       '3.6M', '5.7M', '8.6M', '2.4M', '27M', '2.5M', '16M', '3.4M',
       '8.9M', '3.9M', '2.9M', '38M', '32M', '5.4M', '18M', '1.1M',
       '2.2M', '4.5M', '9.8M', '52M', '9.0M', '6.7M', '30M', '2.6M',
       '7.1M', '3.7M', '22M', '7.4M', '6.4M', '3.2M', '8.2M', '9.9M',
       '4.9M', '9.5M', '5.0M', '5.9M', '13M', '73M', '6.8M', '3.5M',
       '4.0M', '2.3M', '7.2M', '2.1M', '42M', '7.3M', '9.1M', '55M',
       '23k', '6.5M', '1.5M', '7.5M', '51M', '41M', '48M', '8.5M', '46M',
       '8.3M', '4.3M', '4.7M', '3.3M', '40M', '7.8M', '8.8M', '6.6M',
       '5.1M', '61M', '66M', '79k', '8.4M', '118k', '44M', '695k', '1.6M',
     

You should have seen lots of unique values of the app sizes.

#### While we can convert most of the `Size` values to numeric in the same way we converted the `Reviews` values, there is one value that is impossible to convert.

What is that badass value? Enter it in the next cell and calculate the proportion of its occurence to the total number of records of `google_play`.

In [66]:
[val for val in list(gp.Size.unique()) if val.endswith('k') == False and val.endswith('M') == False]

## There are actually 2 values that are impossible to convert to numeric...

['Varies with device', '1,000+']

#### While this column may be useful for other types of analysis, we opt to drop it from our dataset. 

There are two reasons. First, the majority of the data are ordinal but a sizeable proportion are missing because we cannot convert them to numerical values. Ordinal data are both numerical and categorical, and they usually can be ranked (e.g. 82k is smaller than 91M). In contrast, non-ordinal categorical data such as blood type and eye color cannot be ranked. The second reason is as a categorical column, it has too many unique values to produce meaningful insights. Therefore, in our case the simplest strategy would be to drop the column.

Drop the column in the cell below (use `inplace=True`)

In [67]:
gp.drop('Size', inplace = True, axis = 1)

#### Now let's look at how many missing values are in each column. 

This will give us an idea of whether we should come up with a missing data strategy or give up on the column all together. In the next column, find the number of missing values in each column: 

*Hint: use the `isna()` and `sum()` functions.*

In [68]:
gp.isna().sum()

App                   0
Category              0
Rating             1474
Reviews               0
Installs              0
Type                  1
Price                 0
Content Rating        1
Genres                0
Last Updated          0
Current Ver           8
Android Ver           3
reviews_numeric       1
reviews_isnull        0
dtype: int64

You should find the column with the most missing values is now `Rating`.

#### What is the proportion of the missing values in `Rating` to the total number of records?

Enter your answer in the cell below.

In [71]:
gp.isna().sum()/gp.shape[0]
## 14% more or less

App                0.000000
Category           0.000000
Rating             0.135965
Reviews            0.000000
Installs           0.000000
Type               0.000092
Price              0.000000
Content Rating     0.000092
Genres             0.000000
Last Updated       0.000000
Current Ver        0.000738
Android Ver        0.000277
reviews_numeric    0.000092
reviews_isnull     0.000000
dtype: float64

A sizeable proportion of the `Rating` column is missing. A few other columns also contain several missing values.

#### We opt to preserve these columns and remove the rows containing missing data.

In particular, we don't want to drop the `Rating` column because:

* It is one of the most important columns in our dataset. 

* Since the dataset is not a time series, the loss of these rows will not have a negative impact on our ability to analyze the data. It will, however, cause us to lose some meaningful observations. But the loss is limited compared to the gain we receive by preserving these columns.

In the cell below, remove all rows containing at least one missing value. Use the `dropna()` function ([documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html)). Assign the new dataframe to the variable `google_missing_removed`.

In [72]:
# Your code here:
gmr =  gp.dropna()

From now on, we use the `google_missing_removed` variable instead of `google_play`.

#### Next, we look at the `Last Updated` column.

The `Last Updated` column seems to contain a date, though it is classified as an object type. Let's convert this column using the `pd.to_datetime` function ([documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html)).

In [108]:
gmr.rename(columns = {'Last Updated': 'last_updated'}, inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(**kwargs)


In [109]:
gmr.last_updated

0         January 7, 2018
1        January 15, 2018
2          August 1, 2018
3            June 8, 2018
4           June 20, 2018
               ...       
10834       June 18, 2017
10836       July 25, 2017
10837        July 6, 2018
10839    January 19, 2015
10840       July 25, 2018
Name: last_updated, Length: 9360, dtype: object

In [114]:
gmr.last_updated = pd.to_datetime(gmr.last_updated)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value


#### The last column we will transform is `Price`. 

We start by looking at the unique values of this column.

In [116]:
gmr.Price.unique()

array(['0', '$4.99', '$3.99', '$6.99', '$7.99', '$5.99', '$2.99', '$3.49',
       '$1.99', '$9.99', '$7.49', '$0.99', '$9.00', '$5.49', '$10.00',
       '$24.99', '$11.99', '$79.99', '$16.99', '$14.99', '$29.99',
       '$12.99', '$2.49', '$10.99', '$1.50', '$19.99', '$15.99', '$33.99',
       '$39.99', '$3.95', '$4.49', '$1.70', '$8.99', '$1.49', '$3.88',
       '$399.99', '$17.99', '$400.00', '$3.02', '$1.76', '$4.84', '$4.77',
       '$1.61', '$2.50', '$1.59', '$6.49', '$1.29', '$299.99', '$379.99',
       '$37.99', '$18.99', '$389.99', '$8.49', '$1.75', '$14.00', '$2.00',
       '$3.08', '$2.59', '$19.40', '$3.90', '$4.59', '$15.46', '$3.04',
       '$13.99', '$4.29', '$3.28', '$4.60', '$1.00', '$2.95', '$2.90',
       '$1.97', '$2.56', '$1.20'], dtype=object)

Since all prices are ordinal data without exceptions, we can tranform this column by removing the dollar sign and converting to numeric. We can create a new column called `Price Numerical` and drop the original column.

We will achieve our goal in three steps. Follow the instructions of each step below.

#### First we remove the dollar sign. Do this in the next cell by applying the `str.replace` function to the column to replace `$` with an empty string (`''`).

In [138]:
# Your code here:
gmr.Price = pd.to_numeric(gmr.Price.str.replace('$', '', regex=False), errors = 'coerce')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value


#### Second step, coerce the `Price Numerical` column to numeric.

In [139]:
# done above

**Finally, drop the original `Price` column.**

In [140]:
# left it in the same column, don't need to create and drop columns
# i can rename it if you want, but... why?

Now check the variable types of `google_missing_removed`. Make sure:

* `Size` and `Price` columns have been removed.
* `Rating`, `Reviews`, and `Price Numerical` have the type of `float64`.
* `Last Updated` has the type of `datetime64`.

In [141]:
gmr.dtypes

App                        object
Category                   object
Rating                    float64
Reviews                   float64
Installs                   object
Type                       object
Price                     float64
Content Rating             object
Genres                     object
last_updated       datetime64[ns]
Current Ver                object
Android Ver                object
reviews_numeric           float64
reviews_isnull               bool
last_update_dt     datetime64[ns]
dtype: object

# Challenge 2 - Loading and Extracting Features from the Second Dataset

Load the second dataset to the variable `google_reviews`. The data is in the file `googleplaystore_user_reviews.csv`.

In [142]:
# Your code here:
gr = pd.read_csv('../../data/googleplaystore_user_reviews.csv')

#### This dataset contains the top 100 reviews for each app. 

Let's examine this dataset using the `head` function

In [143]:
gr.sample(10)

Unnamed: 0,App,Translated_Review,Sentiment,Sentiment_Polarity,Sentiment_Subjectivity
41730,Extreme Coupon Finder,,,,
38302,ESPN,,,,
2740,ADW Launcher 2,,,,
28417,Colorfy: Coloring Book for Adults - Free,"It okay. It could better, like pictures colour...",Positive,0.066667,0.633333
18726,Burner - Free Phone Number,,,,
22062,Calorie Counter - MyFitnessPal,Have able post pictures news feed.,Positive,0.5,0.625
42833,"Face Filter, Selfie Editor - Sweet Camera",How undo private pictures.,Neutral,0.0,0.375
31310,Cute Pet Puppies,,,,
59777,Hangouts,,,,
50623,Fuelio: Gas log & costs,Very good!,Positive,1.0,0.78


#### The main piece of information we would like to extract from this dataset is the proportion of positive reviews of each app. 

Columns like `Sentiment_Polarity` and `Sentiment_Subjectivity` are not to our interests because we have no clue how to use them. We do not care about `Translated_Review` because natural language processing is too complex for us at present (in fact the `Sentiment`, `Sentiment_Polarity`, and `Sentiment_Subjectivity` columns are derived from `Translated_Review` the data scientists). 

What we care about in this challenge is `Sentiment`. To be more precise, we care about **what is the proportion of *Positive* sentiment of each app**. This will require us to aggregate the `Sentiment` data by `App` in order to calculate the proportions.

Now that you are clear about what we are trying to achieve, follow the steps below that will walk you through towards our goal.

#### Our first step will be to remove all rows with missing sentiment. 

In the next cell, drop all rows with missing data using the `dropna()` function and assign this new dataframe to `review_missing_removed`.

In [147]:
# Your code here:
gr.dropna(inplace = True)
## we should stop creating dataframes left and right :/ just sayin'

#### Now, use the `value_counts()` function ([documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html)) to get a sense on how many apps are in this dataset and their review counts.

In [154]:
# Your code here:
gr.App.value_counts()

Bowmasters                                 312
Helix Jump                                 273
Angry Birds Classic                        273
Calorie Counter - MyFitnessPal             254
Duolingo: Learn Languages Free             240
                                          ... 
GPS Map Free                                 1
CBS News                                     1
Google Slides                                1
Bed Time Fan - White Noise Sleep Sounds      1
Google Trips - Travel Planner                1
Name: App, Length: 865, dtype: int64

#### Now the tough part comes. Let's plan how we will achieve our goal:

1. We will count the number of reviews that contain *Positive* in the `Sentiment` column.

1. We will create a new dataframe to contain the `App` name, the number of positive reviews, and the total number of reviews of each app.

1. We will then loop the new dataframe to calculate the postivie review portion of each app.

#### Step 1: Count the number of positive reviews.

In the following cell, write a function that takes a column and returns the number of times *Positive* appears in the column. 

*Hint: One option is to use the `np.where()` function ([documentation](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.where.html)).*

In [157]:
# Your code below

def positive_function(x):
    """
    Count how many times the string `Positive` appears in a column (exact string match).
    
    Args:
        x: data column
    
    Returns:
        The number of occurrences of `Positive` in the column data.
    """
    return len([i for i in x if i == 'Positive'])


positive_function(['sadas', 'das', 'Positive'])

1

#### Step 2: Create a new dataframe to contain the `App` name, the number of positive reviews, and the total number of reviews of each app

We will group `review_missing_removed` by the `App` column, then aggregate the grouped dataframe on the number of positive reviews and the total review counts of each app. The result will be assigned to a new variable `google_agg`. Here is the ([documentation on how to achieve it](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.DataFrameGroupBy.agg.html)). Take a moment or two to read the documentation and google examples because it is pretty complex.

When you obtain `google_agg`, check its values to make sure it has an `App` column as its index as well as a `Positive` column and a `Total` column. Your output should look like:

![Positive Reviews Agg](../images/positive-review-agg.png)

*Hint: Use `positive_function` you created earlier as part of the param passed to the `agg()` function in order to aggregate the number of positive reviews.*

#### Bonus:

As of Pandas v0.23.4, you may opt to supply an array or an object to `agg()`. If you use the array param, you'll need to rename the columns so that their names are `Positive` and `Total`. Using the object param will allow you to create the aggregated columns with the desirable names without renaming them. However, you will probably encounter a warning indicating supplying an object to `agg()` will become outdated. It's up to you which way you will use. Try both ways out. Any way is fine as long as it works.

In [175]:
gr.columns

0        Positive
1        Positive
3        Positive
4        Positive
5        Positive
           ...   
64222    Positive
64223    Positive
64226    Negative
64227    Positive
64230    Negative
Name: Sentiment, Length: 37427, dtype: object

In [184]:
google_agg = gr.groupby('App').agg({'App': {'Total' : 'count'}, 'Sentiment': {'Positive': positive_function}}).droplevel(0, axis = 1)

Print the first 5 rows of `google_agg` to check it.

In [185]:
# Your code here
google_agg.sample(10)

Unnamed: 0_level_0,Total,Positive
App,Unnamed: 1_level_1,Unnamed: 2_level_1
"Hide App, Private Dating, Safe Chat - PrivacyHider",53,21
HD Movie Video Player,31,21
Candy Crush Jelly Saga,73,21
Die TK-App – alles im Griff,27,16
Divar,37,19
2GIS: directory & navigator,40,23
Google Earth,38,23
Age Calculator,32,19
Banco do Brasil,40,22
Doodle Jump,90,56


#### Add a derived column to `google_agg` that is the ratio of the `Positive` and the `Total` columns. Call this column `Positive Ratio`. 

Make sure to account for the case where the denominator is zero using the `np.where()` function.

In [194]:
## Make sure to account for the case where the denominator is zero using the `np.where()` function. ??? there is never a case where we don't have a value for total because this is an aggregate, what?
google_agg['positive_ratio'] = google_agg.Positive/google_agg.Total

#### Now drop the `Positive` and `Total` columns. Do this with `inplace=True`.

In [197]:
# Your code here:
google_agg.drop(['Total', 'Positive'], axis = 1, inplace = True)

Print the first 5 rows of `google_agg`. Your output should look like:

![Positive Reviews Agg](../images/positive-review-ratio.png)

In [199]:
# Your code here:
google_agg.sample(10)

Unnamed: 0_level_0,positive_ratio
App,Unnamed: 1_level_1
Diary with lock,0.878788
Home Street – Home Design Game,0.736842
"Evernote – Organizer, Planner for Notes & Memos",0.679487
BaBe Lite - Baca Berita Hemat Kuota,0.615385
Easy Recipes,0.575
Credit Sesame,0.552632
Fun Kid Racing,0.727273
Free Sports TV,0.592593
AppLock,0.461538
"HelloTalk — Chat, Speak & Learn Foreign Languages",0.666667


# Challenge 3 - Join the Dataframes

In this part of the lab, we will join the two dataframes and obtain a dataframe that contains features we can use in our ML algorithm.

In the next cell, join the `google_missing_removed` dataframe with the `google_agg` dataframe on the `App` column. Assign this dataframe to the variable `google`.

In [205]:
google_agg


Unnamed: 0_level_0,positive_ratio
App,Unnamed: 1_level_1
10 Best Foods for You,0.835052
104 找工作 - 找工作 找打工 找兼職 履歷健檢 履歷診療室,0.775000
11st,0.589744
1800 Contacts - Lens Store,0.800000
1LINE – One Line with One Touch,0.710526
...,...
Hotels.com: Book Hotel Rooms & Find Vacation Deals,0.573529
Hotspot Shield Free VPN Proxy & Wi-Fi Security,0.500000
Hotstar,0.437500
Hotwire Hotel & Car Rental App,0.484848


In [212]:
# Your code here:
gmr = gmr.merge(google_agg, how = 'inner',right_index=True, left_on='App')

#### Let's look at the final result using the `head()` function. Your final product should look like:

![Final Product](../images/google-final-head.png)

In [214]:
# Your code here:
gmr.sample(10)

Unnamed: 0,App,Category,Rating,Reviews,Installs,Type,Price,Content Rating,Genres,last_updated,Current Ver,Android Ver,reviews_numeric,reviews_isnull,last_update_dt,positive_ratio
3185,Fly Delta,TRAVEL_AND_LOCAL,3.7,27560.0,"5,000,000+",Free,0.0,Everyone,Travel & Local,2018-07-31,4.13.2,5.0 and up,27560.0,False,2018-07-31,0.661972
3724,CNN Breaking US & World News,NEWS_AND_MAGAZINES,4.0,293080.0,"10,000,000+",Free,0.0,Everyone 10+,News & Magazines,2018-08-06,5.17,4.4 and up,293080.0,False,2018-08-06,0.580357
982,Comedy Central,ENTERTAINMENT,3.9,22378.0,"1,000,000+",Free,0.0,Teen,Entertainment,2018-07-08,11.45.0,4.4 and up,22378.0,False,2018-07-08,0.525
812,HomeWork,EDUCATION,4.3,16195.0,"1,000,000+",Free,0.0,Everyone,Education,2016-09-20,8.5.2,4.0 and up,16195.0,False,2016-09-20,1.0
1095,Acorns - Invest Spare Change,FINANCE,4.3,45957.0,"1,000,000+",Free,0.0,Everyone,Finance,2018-07-31,Varies with device,Varies with device,45957.0,False,2018-07-31,0.4
3849,GPS Traffic Speedcam Route Planner by ViaMichelin,MAPS_AND_NAVIGATION,4.3,63920.0,"5,000,000+",Free,0.0,Everyone,Maps & Navigation,2018-07-28,7.16.8,4.0.3 and up,63920.0,False,2018-07-28,0.694444
3664,ForecaWeather,WEATHER,4.2,18425.0,"1,000,000+",Free,0.0,Everyone,Weather,2017-07-29,3.1.7,4.1 and up,18425.0,False,2017-07-29,0.763158
3510,"Evernote – Organizer, Planner for Notes & Memos",PRODUCTIVITY,4.6,1488396.0,"100,000,000+",Free,0.0,Everyone,Productivity,2018-08-03,Varies with device,Varies with device,1488396.0,False,2018-08-03,0.679487
2756,"AliExpress - Smarter Shopping, Better Living",SHOPPING,4.6,5916569.0,"100,000,000+",Free,0.0,Teen,Shopping,2018-08-06,Varies with device,Varies with device,5916569.0,False,2018-08-06,0.644444
3363,Birds Sounds Ringtones & Wallpapers,PERSONALIZATION,4.6,5073.0,"1,000,000+",Free,0.0,Everyone 10+,Personalization,2017-09-25,1.1,4.0 and up,5073.0,False,2017-09-25,0.72973
