From b6ebfbf7516cb3b0d88cfeb588f6d8de7d6b5dc5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E2=80=9Cdanielmdepaoli=E2=80=9D?=
 <“danielmdepaoli@gmail.com”>
Date: Sat, 26 Aug 2023 13:09:37 +0100
Subject: [PATCH] Lab Done

---
 ...pervisedLearningFeatureExtractionLab.ipynb | 1837 +++++++++++++++++
 .../notebooks}/googleplaystore.csv            |    0
 .../googleplaystore_user_reviews.csv          |    0
 your-code/notebooks/main.ipynb                |  773 -------
 4 files changed, 1837 insertions(+), 773 deletions(-)
 create mode 100755 your-code/notebooks/SupervisedLearningFeatureExtractionLab.ipynb
 rename {data => your-code/notebooks}/googleplaystore.csv (100%)
 rename {data => your-code/notebooks}/googleplaystore_user_reviews.csv (100%)
 delete mode 100755 your-code/notebooks/main.ipynb
diff --git a/your-code/notebooks/SupervisedLearningFeatureExtractionLab.ipynb b/your-code/notebooks/SupervisedLearningFeatureExtractionLab.ipynb
new file mode 100755
index 0000000..104263a
--- /dev/null
+++ b/your-code/notebooks/SupervisedLearningFeatureExtractionLab.ipynb
@@ -0,0 +1,1837 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Before your start:\n",
+    "- Read the README.md file\n",
+    "- Comment as much as you can and use the resources in the README.md file\n",
+    "- Happy learning!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "import seaborn as sn\n",
+    "from sklearn.linear_model import LinearRegression\n",
+    "from sklearn.neighbors import KNeighborsRegressor\n",
+    "from sklearn.preprocessing import MinMaxScaler\n",
+    "import datetime"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Introduction\n",
+    "\n",
+    "In this lab, we will use two datasets. Both datasets contain variables that describe apps from the Google Play Store. We will use our knowledge in feature extraction to process these datasets and prepare them for the use of a ML algorithm."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Challenge 1 - Loading and Extracting Features from the First Dataset"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### In this challenge, our goals are: \n",
+    "\n",
+    "* Exploring the dataset.\n",
+    "* Identify the columns with missing values.\n",
+    "* Either replacing the missing values in each column or drop the columns.\n",
+    "* Conver each column to the appropriate type."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### The first dataset contains different information describing the apps. \n",
+    "\n",
+    "Load the dataset into the variable `google_play` in the cell below. The dataset is in the file `googleplaystore.csv`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>App</th>\n",
+       "      <th>Category</th>\n",
+       "      <th>Rating</th>\n",
+       "      <th>Reviews</th>\n",
+       "      <th>Size</th>\n",
+       "      <th>Installs</th>\n",
+       "      <th>Type</th>\n",
+       "      <th>Price</th>\n",
+       "      <th>Content Rating</th>\n",
+       "      <th>Genres</th>\n",
+       "      <th>Last Updated</th>\n",
+       "      <th>Current Ver</th>\n",
+       "      <th>Android Ver</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Photo Editor &amp; Candy Camera &amp; Grid &amp; ScrapBook</td>\n",
+       "      <td>ART_AND_DESIGN</td>\n",
+       "      <td>4.1</td>\n",
+       "      <td>159</td>\n",
+       "      <td>19M</td>\n",
+       "      <td>10,000+</td>\n",
+       "      <td>Free</td>\n",
+       "      <td>0</td>\n",
+       "      <td>Everyone</td>\n",
+       "      <td>Art &amp; Design</td>\n",
+       "      <td>January 7, 2018</td>\n",
+       "      <td>1.0.0</td>\n",
+       "      <td>4.0.3 and up</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>Coloring book moana</td>\n",
+       "      <td>ART_AND_DESIGN</td>\n",
+       "      <td>3.9</td>\n",
+       "      <td>967</td>\n",
+       "      <td>14M</td>\n",
+       "      <td>500,000+</td>\n",
+       "      <td>Free</td>\n",
+       "      <td>0</td>\n",
+       "      <td>Everyone</td>\n",
+       "      <td>Art &amp; Design;Pretend Play</td>\n",
+       "      <td>January 15, 2018</td>\n",
+       "      <td>2.0.0</td>\n",
+       "      <td>4.0.3 and up</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>U Launcher Lite – FREE Live Cool Themes, Hide ...</td>\n",
+       "      <td>ART_AND_DESIGN</td>\n",
+       "      <td>4.7</td>\n",
+       "      <td>87510</td>\n",
+       "      <td>8.7M</td>\n",
+       "      <td>5,000,000+</td>\n",
+       "      <td>Free</td>\n",
+       "      <td>0</td>\n",
+       "      <td>Everyone</td>\n",
+       "      <td>Art &amp; Design</td>\n",
+       "      <td>August 1, 2018</td>\n",
+       "      <td>1.2.4</td>\n",
+       "      <td>4.0.3 and up</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>Sketch - Draw &amp; Paint</td>\n",
+       "      <td>ART_AND_DESIGN</td>\n",
+       "      <td>4.5</td>\n",
+       "      <td>215644</td>\n",
+       "      <td>25M</td>\n",
+       "      <td>50,000,000+</td>\n",
+       "      <td>Free</td>\n",
+       "      <td>0</td>\n",
+       "      <td>Teen</td>\n",
+       "      <td>Art &amp; Design</td>\n",
+       "      <td>June 8, 2018</td>\n",
+       "      <td>Varies with device</td>\n",
+       "      <td>4.2 and up</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>Pixel Draw - Number Art Coloring Book</td>\n",
+       "      <td>ART_AND_DESIGN</td>\n",
+       "      <td>4.3</td>\n",
+       "      <td>967</td>\n",
+       "      <td>2.8M</td>\n",
+       "      <td>100,000+</td>\n",
+       "      <td>Free</td>\n",
+       "      <td>0</td>\n",
+       "      <td>Everyone</td>\n",
+       "      <td>Art &amp; Design;Creativity</td>\n",
+       "      <td>June 20, 2018</td>\n",
+       "      <td>1.1</td>\n",
+       "      <td>4.4 and up</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>...</th>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10836</th>\n",
+       "      <td>Sya9a Maroc - FR</td>\n",
+       "      <td>FAMILY</td>\n",
+       "      <td>4.5</td>\n",
+       "      <td>38</td>\n",
+       "      <td>53M</td>\n",
+       "      <td>5,000+</td>\n",
+       "      <td>Free</td>\n",
+       "      <td>0</td>\n",
+       "      <td>Everyone</td>\n",
+       "      <td>Education</td>\n",
+       "      <td>July 25, 2017</td>\n",
+       "      <td>1.48</td>\n",
+       "      <td>4.1 and up</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10837</th>\n",
+       "      <td>Fr. Mike Schmitz Audio Teachings</td>\n",
+       "      <td>FAMILY</td>\n",
+       "      <td>5.0</td>\n",
+       "      <td>4</td>\n",
+       "      <td>3.6M</td>\n",
+       "      <td>100+</td>\n",
+       "      <td>Free</td>\n",
+       "      <td>0</td>\n",
+       "      <td>Everyone</td>\n",
+       "      <td>Education</td>\n",
+       "      <td>July 6, 2018</td>\n",
+       "      <td>1.0</td>\n",
+       "      <td>4.1 and up</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10838</th>\n",
+       "      <td>Parkinson Exercices FR</td>\n",
+       "      <td>MEDICAL</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>3</td>\n",
+       "      <td>9.5M</td>\n",
+       "      <td>1,000+</td>\n",
+       "      <td>Free</td>\n",
+       "      <td>0</td>\n",
+       "      <td>Everyone</td>\n",
+       "      <td>Medical</td>\n",
+       "      <td>January 20, 2017</td>\n",
+       "      <td>1.0</td>\n",
+       "      <td>2.2 and up</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10839</th>\n",
+       "      <td>The SCP Foundation DB fr nn5n</td>\n",
+       "      <td>BOOKS_AND_REFERENCE</td>\n",
+       "      <td>4.5</td>\n",
+       "      <td>114</td>\n",
+       "      <td>Varies with device</td>\n",
+       "      <td>1,000+</td>\n",
+       "      <td>Free</td>\n",
+       "      <td>0</td>\n",
+       "      <td>Mature 17+</td>\n",
+       "      <td>Books &amp; Reference</td>\n",
+       "      <td>January 19, 2015</td>\n",
+       "      <td>Varies with device</td>\n",
+       "      <td>Varies with device</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10840</th>\n",
+       "      <td>iHoroscope - 2018 Daily Horoscope &amp; Astrology</td>\n",
+       "      <td>LIFESTYLE</td>\n",
+       "      <td>4.5</td>\n",
+       "      <td>398307</td>\n",
+       "      <td>19M</td>\n",
+       "      <td>10,000,000+</td>\n",
+       "      <td>Free</td>\n",
+       "      <td>0</td>\n",
+       "      <td>Everyone</td>\n",
+       "      <td>Lifestyle</td>\n",
+       "      <td>July 25, 2018</td>\n",
+       "      <td>Varies with device</td>\n",
+       "      <td>Varies with device</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>10841 rows × 13 columns</p>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                                                     App             Category  \\\n",
+       "0         Photo Editor & Candy Camera & Grid & ScrapBook       ART_AND_DESIGN   \n",
+       "1                                    Coloring book moana       ART_AND_DESIGN   \n",
+       "2      U Launcher Lite – FREE Live Cool Themes, Hide ...       ART_AND_DESIGN   \n",
+       "3                                  Sketch - Draw & Paint       ART_AND_DESIGN   \n",
+       "4                  Pixel Draw - Number Art Coloring Book       ART_AND_DESIGN   \n",
+       "...                                                  ...                  ...   \n",
+       "10836                                   Sya9a Maroc - FR               FAMILY   \n",
+       "10837                   Fr. Mike Schmitz Audio Teachings               FAMILY   \n",
+       "10838                             Parkinson Exercices FR              MEDICAL   \n",
+       "10839                      The SCP Foundation DB fr nn5n  BOOKS_AND_REFERENCE   \n",
+       "10840      iHoroscope - 2018 Daily Horoscope & Astrology            LIFESTYLE   \n",
+       "\n",
+       "       Rating Reviews                Size     Installs  Type Price  \\\n",
+       "0         4.1     159                 19M      10,000+  Free     0   \n",
+       "1         3.9     967                 14M     500,000+  Free     0   \n",
+       "2         4.7   87510                8.7M   5,000,000+  Free     0   \n",
+       "3         4.5  215644                 25M  50,000,000+  Free     0   \n",
+       "4         4.3     967                2.8M     100,000+  Free     0   \n",
+       "...       ...     ...                 ...          ...   ...   ...   \n",
+       "10836     4.5      38                 53M       5,000+  Free     0   \n",
+       "10837     5.0       4                3.6M         100+  Free     0   \n",
+       "10838     NaN       3                9.5M       1,000+  Free     0   \n",
+       "10839     4.5     114  Varies with device       1,000+  Free     0   \n",
+       "10840     4.5  398307                 19M  10,000,000+  Free     0   \n",
+       "\n",
+       "      Content Rating                     Genres      Last Updated  \\\n",
+       "0           Everyone               Art & Design   January 7, 2018   \n",
+       "1           Everyone  Art & Design;Pretend Play  January 15, 2018   \n",
+       "2           Everyone               Art & Design    August 1, 2018   \n",
+       "3               Teen               Art & Design      June 8, 2018   \n",
+       "4           Everyone    Art & Design;Creativity     June 20, 2018   \n",
+       "...              ...                        ...               ...   \n",
+       "10836       Everyone                  Education     July 25, 2017   \n",
+       "10837       Everyone                  Education      July 6, 2018   \n",
+       "10838       Everyone                    Medical  January 20, 2017   \n",
+       "10839     Mature 17+          Books & Reference  January 19, 2015   \n",
+       "10840       Everyone                  Lifestyle     July 25, 2018   \n",
+       "\n",
+       "              Current Ver         Android Ver  \n",
+       "0                   1.0.0        4.0.3 and up  \n",
+       "1                   2.0.0        4.0.3 and up  \n",
+       "2                   1.2.4        4.0.3 and up  \n",
+       "3      Varies with device          4.2 and up  \n",
+       "4                     1.1          4.4 and up  \n",
+       "...                   ...                 ...  \n",
+       "10836                1.48          4.1 and up  \n",
+       "10837                 1.0          4.1 and up  \n",
+       "10838                 1.0          2.2 and up  \n",
+       "10839  Varies with device  Varies with device  \n",
+       "10840  Varies with device  Varies with device  \n",
+       "\n",
+       "[10841 rows x 13 columns]"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "google_play = pd.read_csv('googleplaystore.csv')\n",
+    "google_play"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Examine all variables and their types in the following cell"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "App                object\n",
+       "Category           object\n",
+       "Rating            float64\n",
+       "Reviews            object\n",
+       "Size               object\n",
+       "Installs           object\n",
+       "Type               object\n",
+       "Price              object\n",
+       "Content Rating     object\n",
+       "Genres             object\n",
+       "Last Updated       object\n",
+       "Current Ver        object\n",
+       "Android Ver        object\n",
+       "Reviews_isnull       bool\n",
+       "dtype: object"
+      ]
+     },
+     "execution_count": 20,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "google_play.dtypes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Since this dataset only contains one numeric column, let's skip the `describe()` function and look at the first 5 rows using the `head()` function"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>App</th>\n",
+       "      <th>Category</th>\n",
+       "      <th>Rating</th>\n",
+       "      <th>Reviews</th>\n",
+       "      <th>Size</th>\n",
+       "      <th>Installs</th>\n",
+       "      <th>Type</th>\n",
+       "      <th>Price</th>\n",
+       "      <th>Content Rating</th>\n",
+       "      <th>Genres</th>\n",
+       "      <th>Last Updated</th>\n",
+       "      <th>Current Ver</th>\n",
+       "      <th>Android Ver</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Photo Editor &amp; Candy Camera &amp; Grid &amp; ScrapBook</td>\n",
+       "      <td>ART_AND_DESIGN</td>\n",
+       "      <td>4.1</td>\n",
+       "      <td>159</td>\n",
+       "      <td>19M</td>\n",
+       "      <td>10,000+</td>\n",
+       "      <td>Free</td>\n",
+       "      <td>0</td>\n",
+       "      <td>Everyone</td>\n",
+       "      <td>Art &amp; Design</td>\n",
+       "      <td>January 7, 2018</td>\n",
+       "      <td>1.0.0</td>\n",
+       "      <td>4.0.3 and up</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>Coloring book moana</td>\n",
+       "      <td>ART_AND_DESIGN</td>\n",
+       "      <td>3.9</td>\n",
+       "      <td>967</td>\n",
+       "      <td>14M</td>\n",
+       "      <td>500,000+</td>\n",
+       "      <td>Free</td>\n",
+       "      <td>0</td>\n",
+       "      <td>Everyone</td>\n",
+       "      <td>Art &amp; Design;Pretend Play</td>\n",
+       "      <td>January 15, 2018</td>\n",
+       "      <td>2.0.0</td>\n",
+       "      <td>4.0.3 and up</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>U Launcher Lite – FREE Live Cool Themes, Hide ...</td>\n",
+       "      <td>ART_AND_DESIGN</td>\n",
+       "      <td>4.7</td>\n",
+       "      <td>87510</td>\n",
+       "      <td>8.7M</td>\n",
+       "      <td>5,000,000+</td>\n",
+       "      <td>Free</td>\n",
+       "      <td>0</td>\n",
+       "      <td>Everyone</td>\n",
+       "      <td>Art &amp; Design</td>\n",
+       "      <td>August 1, 2018</td>\n",
+       "      <td>1.2.4</td>\n",
+       "      <td>4.0.3 and up</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>Sketch - Draw &amp; Paint</td>\n",
+       "      <td>ART_AND_DESIGN</td>\n",
+       "      <td>4.5</td>\n",
+       "      <td>215644</td>\n",
+       "      <td>25M</td>\n",
+       "      <td>50,000,000+</td>\n",
+       "      <td>Free</td>\n",
+       "      <td>0</td>\n",
+       "      <td>Teen</td>\n",
+       "      <td>Art &amp; Design</td>\n",
+       "      <td>June 8, 2018</td>\n",
+       "      <td>Varies with device</td>\n",
+       "      <td>4.2 and up</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>Pixel Draw - Number Art Coloring Book</td>\n",
+       "      <td>ART_AND_DESIGN</td>\n",
+       "      <td>4.3</td>\n",
+       "      <td>967</td>\n",
+       "      <td>2.8M</td>\n",
+       "      <td>100,000+</td>\n",
+       "      <td>Free</td>\n",
+       "      <td>0</td>\n",
+       "      <td>Everyone</td>\n",
+       "      <td>Art &amp; Design;Creativity</td>\n",
+       "      <td>June 20, 2018</td>\n",
+       "      <td>1.1</td>\n",
+       "      <td>4.4 and up</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                                                 App        Category  Rating  \\\n",
+       "0     Photo Editor & Candy Camera & Grid & ScrapBook  ART_AND_DESIGN     4.1   \n",
+       "1                                Coloring book moana  ART_AND_DESIGN     3.9   \n",
+       "2  U Launcher Lite – FREE Live Cool Themes, Hide ...  ART_AND_DESIGN     4.7   \n",
+       "3                              Sketch - Draw & Paint  ART_AND_DESIGN     4.5   \n",
+       "4              Pixel Draw - Number Art Coloring Book  ART_AND_DESIGN     4.3   \n",
+       "\n",
+       "  Reviews  Size     Installs  Type Price Content Rating  \\\n",
+       "0     159   19M      10,000+  Free     0       Everyone   \n",
+       "1     967   14M     500,000+  Free     0       Everyone   \n",
+       "2   87510  8.7M   5,000,000+  Free     0       Everyone   \n",
+       "3  215644   25M  50,000,000+  Free     0           Teen   \n",
+       "4     967  2.8M     100,000+  Free     0       Everyone   \n",
+       "\n",
+       "                      Genres      Last Updated         Current Ver  \\\n",
+       "0               Art & Design   January 7, 2018               1.0.0   \n",
+       "1  Art & Design;Pretend Play  January 15, 2018               2.0.0   \n",
+       "2               Art & Design    August 1, 2018               1.2.4   \n",
+       "3               Art & Design      June 8, 2018  Varies with device   \n",
+       "4    Art & Design;Creativity     June 20, 2018                 1.1   \n",
+       "\n",
+       "    Android Ver  \n",
+       "0  4.0.3 and up  \n",
+       "1  4.0.3 and up  \n",
+       "2  4.0.3 and up  \n",
+       "3    4.2 and up  \n",
+       "4    4.4 and up  "
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "google_play.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### We can see that there are a few columns that could be coerced to numeric.\n",
+    "\n",
+    "Start with the reviews column. We can evaluate what value is causing this column to be of object type finding the non-numeric values in this column. To do this, we recall the `to_numeric()` function. With this function, we are able to coerce all non-numeric data to null. We can then use the `isnull()` function to subset our dataframe using the True/False column that this function generates.\n",
+    "\n",
+    "In the cell below, transform the Reviews column to numeric and assign this new column to the variable `Reviews_numeric`. Make sure to coerce the errors."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "google_play['Reviews'] = pd.to_numeric(google_play['Reviews'], errors='coerce')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, create a column containing True/False values using the `isnull()` function. Assign this column to the `Reviews_isnull` variable."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "google_play['Reviews_isnull'] = google_play['Reviews'].isnull()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Finally, subset the `google_play` with `Reviews_isnull`. This should give you all the rows that contain non-numeric characters.\n",
+    "\n",
+    "Your output should look like:\n",
+    "\n",
+    "![Reviews_bool.png](../images/reviews-bool.png)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "non_numeric_rows = google_play[google_play['Reviews_isnull']]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>App</th>\n",
+       "      <th>Category</th>\n",
+       "      <th>Rating</th>\n",
+       "      <th>Reviews</th>\n",
+       "      <th>Size</th>\n",
+       "      <th>Installs</th>\n",
+       "      <th>Type</th>\n",
+       "      <th>Price</th>\n",
+       "      <th>Content Rating</th>\n",
+       "      <th>Genres</th>\n",
+       "      <th>Last Updated</th>\n",
+       "      <th>Current Ver</th>\n",
+       "      <th>Android Ver</th>\n",
+       "      <th>Reviews_isnull</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>10472</th>\n",
+       "      <td>Life Made WI-Fi Touchscreen Photo Frame</td>\n",
+       "      <td>1.9</td>\n",
+       "      <td>19.0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1,000+</td>\n",
+       "      <td>Free</td>\n",
+       "      <td>0</td>\n",
+       "      <td>Everyone</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>February 11, 2018</td>\n",
+       "      <td>1.0.19</td>\n",
+       "      <td>4.0 and up</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                                           App Category  Rating  Reviews  \\\n",
+       "10472  Life Made WI-Fi Touchscreen Photo Frame      1.9    19.0      NaN   \n",
+       "\n",
+       "         Size Installs Type     Price Content Rating             Genres  \\\n",
+       "10472  1,000+     Free    0  Everyone            NaN  February 11, 2018   \n",
+       "\n",
+       "      Last Updated Current Ver Android Ver  Reviews_isnull  \n",
+       "10472       1.0.19  4.0 and up         NaN            True  "
+      ]
+     },
+     "execution_count": 28,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "non_numeric_rows"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### We see that Google Play is using a shorthand for millions. \n",
+    "\n",
+    "Let's write a function to transform this data.\n",
+    "\n",
+    "Steps:\n",
+    "\n",
+    "1. Create a function that returns the correct numeric values of *Reviews*.\n",
+    "1. Define a test string with `M` in the last character.\n",
+    "1. Test your function with the test string. Make sure your function works correctly. If not, modify your functions and test again."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "4000000.0"
+      ]
+     },
+     "execution_count": 38,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Your code here\n",
+    "\n",
+    "def convert_string_to_numeric(s):\n",
+    "    if isinstance(s, str) and s[-1] == 'M':\n",
+    "        return float(s[:-1]) * 1_000_000\n",
+    "    else:\n",
+    "        return s\n",
+    "\n",
+    "test_string = '4.0M'\n",
+    "\n",
+    "convert_string_to_numeric(test_string)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "3000000.0"
+      ]
+     },
+     "execution_count": 40,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "convert_string_to_numeric('3.0M')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The last step is to apply the function to the `Reviews` column in the following cell:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "google_play['Reviews'] = google_play['Reviews'].apply(convert_string_to_numeric)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Check the non-numeric `Reviews` row again. It should have been fixed now and you should see:\n",
+    "\n",
+    "![Reviews_bool_fixed.png](../images/reviews-bool-fixed.png)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 65,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>App</th>\n",
+       "      <th>Category</th>\n",
+       "      <th>Rating</th>\n",
+       "      <th>Reviews</th>\n",
+       "      <th>Size</th>\n",
+       "      <th>Installs</th>\n",
+       "      <th>Type</th>\n",
+       "      <th>Price</th>\n",
+       "      <th>Content Rating</th>\n",
+       "      <th>Genres</th>\n",
+       "      <th>Last Updated</th>\n",
+       "      <th>Current Ver</th>\n",
+       "      <th>Android Ver</th>\n",
+       "      <th>Reviews_isnull</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>10472</th>\n",
+       "      <td>Life Made WI-Fi Touchscreen Photo Frame</td>\n",
+       "      <td>1.9</td>\n",
+       "      <td>19.0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1,000+</td>\n",
+       "      <td>Free</td>\n",
+       "      <td>0</td>\n",
+       "      <td>Everyone</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>February 11, 2018</td>\n",
+       "      <td>1.0.19</td>\n",
+       "      <td>4.0 and up</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                                           App Category  Rating  Reviews  \\\n",
+       "10472  Life Made WI-Fi Touchscreen Photo Frame      1.9    19.0      NaN   \n",
+       "\n",
+       "         Size Installs Type     Price Content Rating             Genres  \\\n",
+       "10472  1,000+     Free    0  Everyone            NaN  February 11, 2018   \n",
+       "\n",
+       "      Last Updated Current Ver Android Ver  Reviews_isnull  \n",
+       "10472       1.0.19  4.0 and up         NaN            True  "
+      ]
+     },
+     "execution_count": 65,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "non_numeric_rows "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Also check the variable types of `google_play`. The `Reviews` column should be a `float64` type now."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 41,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "App                object\n",
+       "Category           object\n",
+       "Rating            float64\n",
+       "Reviews           float64\n",
+       "Size               object\n",
+       "Installs           object\n",
+       "Type               object\n",
+       "Price              object\n",
+       "Content Rating     object\n",
+       "Genres             object\n",
+       "Last Updated       object\n",
+       "Current Ver        object\n",
+       "Android Ver        object\n",
+       "Reviews_isnull       bool\n",
+       "dtype: object"
+      ]
+     },
+     "execution_count": 41,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "google_play.dtypes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### The next column we will look at is `Size`. We start by looking at all unique values in `Size`:\n",
+    "\n",
+    "*Hint: use `unique()` ([documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.unique.html))*."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 42,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array(['19M', '14M', '8.7M', '25M', '2.8M', '5.6M', '29M', '33M', '3.1M',\n",
+       "       '28M', '12M', '20M', '21M', '37M', '2.7M', '5.5M', '17M', '39M',\n",
+       "       '31M', '4.2M', '7.0M', '23M', '6.0M', '6.1M', '4.6M', '9.2M',\n",
+       "       '5.2M', '11M', '24M', 'Varies with device', '9.4M', '15M', '10M',\n",
+       "       '1.2M', '26M', '8.0M', '7.9M', '56M', '57M', '35M', '54M', '201k',\n",
+       "       '3.6M', '5.7M', '8.6M', '2.4M', '27M', '2.5M', '16M', '3.4M',\n",
+       "       '8.9M', '3.9M', '2.9M', '38M', '32M', '5.4M', '18M', '1.1M',\n",
+       "       '2.2M', '4.5M', '9.8M', '52M', '9.0M', '6.7M', '30M', '2.6M',\n",
+       "       '7.1M', '3.7M', '22M', '7.4M', '6.4M', '3.2M', '8.2M', '9.9M',\n",
+       "       '4.9M', '9.5M', '5.0M', '5.9M', '13M', '73M', '6.8M', '3.5M',\n",
+       "       '4.0M', '2.3M', '7.2M', '2.1M', '42M', '7.3M', '9.1M', '55M',\n",
+       "       '23k', '6.5M', '1.5M', '7.5M', '51M', '41M', '48M', '8.5M', '46M',\n",
+       "       '8.3M', '4.3M', '4.7M', '3.3M', '40M', '7.8M', '8.8M', '6.6M',\n",
+       "       '5.1M', '61M', '66M', '79k', '8.4M', '118k', '44M', '695k', '1.6M',\n",
+       "       '6.2M', '18k', '53M', '1.4M', '3.0M', '5.8M', '3.8M', '9.6M',\n",
+       "       '45M', '63M', '49M', '77M', '4.4M', '4.8M', '70M', '6.9M', '9.3M',\n",
+       "       '10.0M', '8.1M', '36M', '84M', '97M', '2.0M', '1.9M', '1.8M',\n",
+       "       '5.3M', '47M', '556k', '526k', '76M', '7.6M', '59M', '9.7M', '78M',\n",
+       "       '72M', '43M', '7.7M', '6.3M', '334k', '34M', '93M', '65M', '79M',\n",
+       "       '100M', '58M', '50M', '68M', '64M', '67M', '60M', '94M', '232k',\n",
+       "       '99M', '624k', '95M', '8.5k', '41k', '292k', '11k', '80M', '1.7M',\n",
+       "       '74M', '62M', '69M', '75M', '98M', '85M', '82M', '96M', '87M',\n",
+       "       '71M', '86M', '91M', '81M', '92M', '83M', '88M', '704k', '862k',\n",
+       "       '899k', '378k', '266k', '375k', '1.3M', '975k', '980k', '4.1M',\n",
+       "       '89M', '696k', '544k', '525k', '920k', '779k', '853k', '720k',\n",
+       "       '713k', '772k', '318k', '58k', '241k', '196k', '857k', '51k',\n",
+       "       '953k', '865k', '251k', '930k', '540k', '313k', '746k', '203k',\n",
+       "       '26k', '314k', '239k', '371k', '220k', '730k', '756k', '91k',\n",
+       "       '293k', '17k', '74k', '14k', '317k', '78k', '924k', '902k', '818k',\n",
+       "       '81k', '939k', '169k', '45k', '475k', '965k', '90M', '545k', '61k',\n",
+       "       '283k', '655k', '714k', '93k', '872k', '121k', '322k', '1.0M',\n",
+       "       '976k', '172k', '238k', '549k', '206k', '954k', '444k', '717k',\n",
+       "       '210k', '609k', '308k', '705k', '306k', '904k', '473k', '175k',\n",
+       "       '350k', '383k', '454k', '421k', '70k', '812k', '442k', '842k',\n",
+       "       '417k', '412k', '459k', '478k', '335k', '782k', '721k', '430k',\n",
+       "       '429k', '192k', '200k', '460k', '728k', '496k', '816k', '414k',\n",
+       "       '506k', '887k', '613k', '243k', '569k', '778k', '683k', '592k',\n",
+       "       '319k', '186k', '840k', '647k', '191k', '373k', '437k', '598k',\n",
+       "       '716k', '585k', '982k', '222k', '219k', '55k', '948k', '323k',\n",
+       "       '691k', '511k', '951k', '963k', '25k', '554k', '351k', '27k',\n",
+       "       '82k', '208k', '913k', '514k', '551k', '29k', '103k', '898k',\n",
+       "       '743k', '116k', '153k', '209k', '353k', '499k', '173k', '597k',\n",
+       "       '809k', '122k', '411k', '400k', '801k', '787k', '237k', '50k',\n",
+       "       '643k', '986k', '97k', '516k', '837k', '780k', '961k', '269k',\n",
+       "       '20k', '498k', '600k', '749k', '642k', '881k', '72k', '656k',\n",
+       "       '601k', '221k', '228k', '108k', '940k', '176k', '33k', '663k',\n",
+       "       '34k', '942k', '259k', '164k', '458k', '245k', '629k', '28k',\n",
+       "       '288k', '775k', '785k', '636k', '916k', '994k', '309k', '485k',\n",
+       "       '914k', '903k', '608k', '500k', '54k', '562k', '847k', '957k',\n",
+       "       '688k', '811k', '270k', '48k', '329k', '523k', '921k', '874k',\n",
+       "       '981k', '784k', '280k', '24k', '518k', '754k', '892k', '154k',\n",
+       "       '860k', '364k', '387k', '626k', '161k', '879k', '39k', '970k',\n",
+       "       '170k', '141k', '160k', '144k', '143k', '190k', '376k', '193k',\n",
+       "       '246k', '73k', '658k', '992k', '253k', '420k', '404k', '1,000+',\n",
+       "       '470k', '226k', '240k', '89k', '234k', '257k', '861k', '467k',\n",
+       "       '157k', '44k', '676k', '67k', '552k', '885k', '1020k', '582k',\n",
+       "       '619k'], dtype=object)"
+      ]
+     },
+     "execution_count": 42,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "google_play['Size'].unique()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You should have seen lots of unique values of the app sizes.\n",
+    "\n",
+    "#### While we can convert most of the `Size` values to numeric in the same way we converted the `Reviews` values, there is one value that is impossible to convert.\n",
+    "\n",
+    "What is that badass value? Enter it in the next cell and calculate the proportion of its occurence to the total number of records of `google_play`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Your code here:\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### While this column may be useful for other types of analysis, we opt to drop it from our dataset. \n",
+    "\n",
+    "There are two reasons. First, the majority of the data are ordinal but a sizeable proportion are missing because we cannot convert them to numerical values. Ordinal data are both numerical and categorical, and they usually can be ranked (e.g. 82k is smaller than 91M). In contrast, non-ordinal categorical data such as blood type and eye color cannot be ranked. The second reason is as a categorical column, it has too many unique values to produce meaningful insights. Therefore, in our case the simplest strategy would be to drop the column.\n",
+    "\n",
+    "Drop the column in the cell below (use `inplace=True`)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 43,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "google_play.drop('Size', axis=1, inplace=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Now let's look at how many missing values are in each column. \n",
+    "\n",
+    "This will give us an idea of whether we should come up with a missing data strategy or give up on the column all together. In the next column, find the number of missing values in each column: \n",
+    "\n",
+    "*Hint: use the `isna()` and `sum()` functions.*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 45,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "App                  0\n",
+       "Category             0\n",
+       "Rating            1474\n",
+       "Reviews              1\n",
+       "Installs             0\n",
+       "Type                 1\n",
+       "Price                0\n",
+       "Content Rating       1\n",
+       "Genres               0\n",
+       "Last Updated         0\n",
+       "Current Ver          8\n",
+       "Android Ver          3\n",
+       "Reviews_isnull       0\n",
+       "dtype: int64"
+      ]
+     },
+     "execution_count": 45,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "missing_values = google_play.isna().sum()\n",
+    "missing_values"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You should find the column with the most missing values is now `Rating`.\n",
+    "\n",
+    "#### What is the proportion of the missing values in `Rating` to the total number of records?\n",
+    "\n",
+    "Enter your answer in the cell below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 46,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "0.13596531685268887"
+      ]
+     },
+     "execution_count": 46,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "missing_rating_proportion = google_play['Rating'].isna().sum() / len(google_play)\n",
+    "missing_rating_proportion"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "A sizeable proportion of the `Rating` column is missing. A few other columns also contain several missing values.\n",
+    "\n",
+    "#### We opt to preserve these columns and remove the rows containing missing data.\n",
+    "\n",
+    "In particular, we don't want to drop the `Rating` column because:\n",
+    "\n",
+    "* It is one of the most important columns in our dataset. \n",
+    "\n",
+    "* Since the dataset is not a time series, the loss of these rows will not have a negative impact on our ability to analyze the data. It will, however, cause us to lose some meaningful observations. But the loss is limited compared to the gain we receive by preserving these columns.\n",
+    "\n",
+    "In the cell below, remove all rows containing at least one missing value. Use the `dropna()` function ([documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html)). Assign the new dataframe to the variable `google_missing_removed`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 47,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "google_missing_removed = google_play.dropna()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "From now on, we use the `google_missing_removed` variable instead of `google_play`.\n",
+    "\n",
+    "#### Next, we look at the `Last Updated` column.\n",
+    "\n",
+    "The `Last Updated` column seems to contain a date, though it is classified as an object type. Let's convert this column using the `pd.to_datetime` function ([documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html))."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 48,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/var/folders/d9/783c_j055nj0b0y037qm_1zc0000gn/T/ipykernel_61933/1540321181.py:1: SettingWithCopyWarning: \n",
+      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
+      "Try using .loc[row_indexer,col_indexer] = value instead\n",
+      "\n",
+      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
+      "  google_missing_removed['Last Updated'] = pd.to_datetime(google_missing_removed['Last Updated'])\n"
+     ]
+    }
+   ],
+   "source": [
+    "google_missing_removed['Last Updated'] = pd.to_datetime(google_missing_removed['Last Updated'])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### The last column we will transform is `Price`. \n",
+    "\n",
+    "We start by looking at the unique values of this column."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 49,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array(['0', '$4.99', '$3.99', '$6.99', '$7.99', '$5.99', '$2.99', '$3.49',\n",
+       "       '$1.99', '$9.99', '$7.49', '$0.99', '$9.00', '$5.49', '$10.00',\n",
+       "       '$24.99', '$11.99', '$79.99', '$16.99', '$14.99', '$29.99',\n",
+       "       '$12.99', '$2.49', '$10.99', '$1.50', '$19.99', '$15.99', '$33.99',\n",
+       "       '$39.99', '$3.95', '$4.49', '$1.70', '$8.99', '$1.49', '$3.88',\n",
+       "       '$399.99', '$17.99', '$400.00', '$3.02', '$1.76', '$4.84', '$4.77',\n",
+       "       '$1.61', '$2.50', '$1.59', '$6.49', '$1.29', '$299.99', '$379.99',\n",
+       "       '$37.99', '$18.99', '$389.99', '$8.49', '$1.75', '$14.00', '$2.00',\n",
+       "       '$3.08', '$2.59', '$19.40', '$3.90', '$4.59', '$15.46', '$3.04',\n",
+       "       '$13.99', '$4.29', '$3.28', '$4.60', '$1.00', '$2.95', '$2.90',\n",
+       "       '$1.97', '$2.56', '$1.20'], dtype=object)"
+      ]
+     },
+     "execution_count": 49,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "google_missing_removed['Price'].unique()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Since all prices are ordinal data without exceptions, we can tranform this column by removing the dollar sign and converting to numeric. We can create a new column called `Price Numerical` and drop the original column.\n",
+    "\n",
+    "We will achieve our goal in three steps. Follow the instructions of each step below.\n",
+    "\n",
+    "#### First we remove the dollar sign. Do this in the next cell by applying the `str.replace` function to the column to replace `$` with an empty string (`''`)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 50,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/var/folders/d9/783c_j055nj0b0y037qm_1zc0000gn/T/ipykernel_61933/4234198823.py:1: FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions will *not* be treated as literal strings when regex=True.\n",
+      "  google_missing_removed['Price Numerical'] = google_missing_removed['Price'].str.replace('$', '')\n",
+      "/var/folders/d9/783c_j055nj0b0y037qm_1zc0000gn/T/ipykernel_61933/4234198823.py:1: SettingWithCopyWarning: \n",
+      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
+      "Try using .loc[row_indexer,col_indexer] = value instead\n",
+      "\n",
+      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
+      "  google_missing_removed['Price Numerical'] = google_missing_removed['Price'].str.replace('$', '')\n"
+     ]
+    }
+   ],
+   "source": [
+    "google_missing_removed['Price Numerical'] = google_missing_removed['Price'].str.replace('$', '')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Second step, coerce the `Price Numerical` column to numeric."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 51,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/var/folders/d9/783c_j055nj0b0y037qm_1zc0000gn/T/ipykernel_61933/3587452444.py:1: SettingWithCopyWarning: \n",
+      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
+      "Try using .loc[row_indexer,col_indexer] = value instead\n",
+      "\n",
+      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
+      "  google_missing_removed['Price Numerical'] = pd.to_numeric(google_missing_removed['Price Numerical'], errors='coerce')\n"
+     ]
+    }
+   ],
+   "source": [
+    "google_missing_removed['Price Numerical'] = pd.to_numeric(google_missing_removed['Price Numerical'], errors='coerce')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Finally, drop the original `Price` column.**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 52,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/var/folders/d9/783c_j055nj0b0y037qm_1zc0000gn/T/ipykernel_61933/2559333238.py:1: SettingWithCopyWarning: \n",
+      "A value is trying to be set on a copy of a slice from a DataFrame\n",
+      "\n",
+      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
+      "  google_missing_removed.drop('Price', axis=1, inplace=True)\n"
+     ]
+    }
+   ],
+   "source": [
+    "google_missing_removed.drop('Price', axis=1, inplace=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now check the variable types of `google_missing_removed`. Make sure:\n",
+    "\n",
+    "* `Size` and `Price` columns have been removed.\n",
+    "* `Rating`, `Reviews`, and `Price Numerical` have the type of `float64`.\n",
+    "* `Last Updated` has the type of `datetime64`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 53,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "App                        object\n",
+       "Category                   object\n",
+       "Rating                    float64\n",
+       "Reviews                   float64\n",
+       "Installs                   object\n",
+       "Type                       object\n",
+       "Content Rating             object\n",
+       "Genres                     object\n",
+       "Last Updated       datetime64[ns]\n",
+       "Current Ver                object\n",
+       "Android Ver                object\n",
+       "Reviews_isnull               bool\n",
+       "Price Numerical           float64\n",
+       "dtype: object"
+      ]
+     },
+     "execution_count": 53,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "google_missing_removed.dtypes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Challenge 2 - Loading and Extracting Features from the Second Dataset"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Load the second dataset to the variable `google_reviews`. The data is in the file `googleplaystore_user_reviews.csv`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 54,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "google_reviews = pd.read_csv(\"googleplaystore_user_reviews.csv\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### This dataset contains the top 100 reviews for each app. \n",
+    "\n",
+    "Let's examine this dataset using the `head` function"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 55,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>App</th>\n",
+       "      <th>Translated_Review</th>\n",
+       "      <th>Sentiment</th>\n",
+       "      <th>Sentiment_Polarity</th>\n",
+       "      <th>Sentiment_Subjectivity</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>10 Best Foods for You</td>\n",
+       "      <td>I like eat delicious food. That's I'm cooking ...</td>\n",
+       "      <td>Positive</td>\n",
+       "      <td>1.00</td>\n",
+       "      <td>0.533333</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>10 Best Foods for You</td>\n",
+       "      <td>This help eating healthy exercise regular basis</td>\n",
+       "      <td>Positive</td>\n",
+       "      <td>0.25</td>\n",
+       "      <td>0.288462</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>10 Best Foods for You</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>10 Best Foods for You</td>\n",
+       "      <td>Works great especially going grocery store</td>\n",
+       "      <td>Positive</td>\n",
+       "      <td>0.40</td>\n",
+       "      <td>0.875000</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>10 Best Foods for You</td>\n",
+       "      <td>Best idea us</td>\n",
+       "      <td>Positive</td>\n",
+       "      <td>1.00</td>\n",
+       "      <td>0.300000</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                     App                                  Translated_Review  \\\n",
+       "0  10 Best Foods for You  I like eat delicious food. That's I'm cooking ...   \n",
+       "1  10 Best Foods for You    This help eating healthy exercise regular basis   \n",
+       "2  10 Best Foods for You                                                NaN   \n",
+       "3  10 Best Foods for You         Works great especially going grocery store   \n",
+       "4  10 Best Foods for You                                       Best idea us   \n",
+       "\n",
+       "  Sentiment  Sentiment_Polarity  Sentiment_Subjectivity  \n",
+       "0  Positive                1.00                0.533333  \n",
+       "1  Positive                0.25                0.288462  \n",
+       "2       NaN                 NaN                     NaN  \n",
+       "3  Positive                0.40                0.875000  \n",
+       "4  Positive                1.00                0.300000  "
+      ]
+     },
+     "execution_count": 55,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "google_reviews.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### The main piece of information we would like to extract from this dataset is the proportion of positive reviews of each app. \n",
+    "\n",
+    "Columns like `Sentiment_Polarity` and `Sentiment_Subjectivity` are not to our interests because we have no clue how to use them. We do not care about `Translated_Review` because natural language processing is too complex for us at present (in fact the `Sentiment`, `Sentiment_Polarity`, and `Sentiment_Subjectivity` columns are derived from `Translated_Review` the data scientists). \n",
+    "\n",
+    "What we care about in this challenge is `Sentiment`. To be more precise, we care about **what is the proportion of *Positive* sentiment of each app**. This will require us to aggregate the `Sentiment` data by `App` in order to calculate the proportions.\n",
+    "\n",
+    "Now that you are clear about what we are trying to achieve, follow the steps below that will walk you through towards our goal."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Our first step will be to remove all rows with missing sentiment. \n",
+    "\n",
+    "In the next cell, drop all rows with missing data using the `dropna()` function and assign this new dataframe to `review_missing_removed`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 56,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "review_missing_removed = google_reviews.dropna(subset=['Sentiment'])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Now, use the `value_counts()` function ([documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html)) to get a sense on how many apps are in this dataset and their review counts."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 57,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "App                                                 Translated_Review                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Sentiment  Sentiment_Polarity  Sentiment_Subjectivity\n",
+       "FastMeet: Chat, Dating, Love                        Good                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Positive    0.700000           0.600000                  8\n",
+       "BestCam Selfie-selfie, beauty camera, photo editor  Good                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Positive    0.700000           0.600000                  7\n",
+       "Bubble Shooter                                      Good                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Positive    0.700000           0.600000                  7\n",
+       "Candy Crush Saga                                    I love game TOO many pop ups. I want open game play. But 7 notices first every time open game. Once daily enough. I level 2475 NEVER hit jackpot what's purpose wheel. What perks committed player can't hit jackpot there. The move suggestion Too fast.                                                                                                                                                                                                                                  Positive    0.022727           0.430303                  6\n",
+       "Duolingo: Learn Languages Free                      Duolingo deserves place number 1 education apps. The problem really teach much pronounciation, rules exceptions words 'The'. Duolingo, could please add in, I think would possible contest position number 1. Also, I think would awesome really helpful could choose accent pronounciation words could in. Please consider request, and, need money this, I'm sure everyone would ok amount ads increased slightly whilst work this. Thanks duo, saved sanity whilst trying learn German  Positive    0.263333           0.435556                  6\n",
+       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       ..\n",
+       "Calorie Counter & Diet Tracker                      I've using weeks now. I My Fitness Pal user years. But I found I like now. I accurate serving counts log food whereas MFP seemingly lost feature decimals. It's interactive social. I gave 4 stars instead 5 food database comprehensive accurate I'd like.                                                                                                                                                                                                                                Positive    0.277778           0.444444                  1\n",
+       "                                                    I've using days like far. Easy navigate scanner really handy. Would like take supplements consideration daily nutrition totals daily tally number fruit, vegetable, meat, grain, dairy, fat servings would help diets require track food groups, i.e. DASH. Something similar water tracker. But overall, I like app.                                                                                                                                                                      Positive    0.161905           0.447619                  1\n",
+       "                                                    I've used Spark People web-based tool, back 2012 & 2013 gotten inconvenient foods still needed manually entered. With works phone, I've found food database become enormous; I enter single food, even brand names available!                                                                                                                                                                                                                                                              Negative   -0.034286           0.502857                  1\n",
+       "                                                    I've tried several good ones best best! It's actually fun notifying things I committed gives much needed push.                                                                                                                                                                                                                                                                                                                                                                             Positive    0.533333           0.266667                  1\n",
+       "Housing-Real Estate & Property                      Worst app. We get nothing Time waste . They update properties details. They delete sold properties. Both sellers buyers get time waste. Don't install waste                                                                                                                                                                                                                                                                                                                                Negative   -0.400000           0.250000                  1\n",
+       "Length: 29692, dtype: int64"
+      ]
+     },
+     "execution_count": 57,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "review_missing_removed.value_counts()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Now the tough part comes. Let's plan how we will achieve our goal:\n",
+    "\n",
+    "1. We will count the number of reviews that contain *Positive* in the `Sentiment` column.\n",
+    "\n",
+    "1. We will create a new dataframe to contain the `App` name, the number of positive reviews, and the total number of reviews of each app.\n",
+    "\n",
+    "1. We will then loop the new dataframe to calculate the postivie review portion of each app."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Step 1: Count the number of positive reviews.\n",
+    "\n",
+    "In the following cell, write a function that takes a column and returns the number of times *Positive* appears in the column. \n",
+    "\n",
+    "*Hint: One option is to use the `np.where()` function ([documentation](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.where.html)).*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 59,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Your code below\n",
+    "\n",
+    "def positive_function(x):\n",
+    "    \"\"\"\n",
+    "    Count how many times the string 'Positive' appears in a column (exact string match).\n",
+    "    \n",
+    "    Args:\n",
+    "        x: data column\n",
+    "    \n",
+    "    Returns:\n",
+    "        The number of occurrences of 'Positive' in the column data.\n",
+    "    \"\"\"\n",
+    "    count_positive = np.sum(np.where(x == 'Positive', 1, 0))\n",
+    "    return count_positive"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Step 2: Create a new dataframe to contain the `App` name, the number of positive reviews, and the total number of reviews of each app\n",
+    "\n",
+    "We will group `review_missing_removed` by the `App` column, then aggregate the grouped dataframe on the number of positive reviews and the total review counts of each app. The result will be assigned to a new variable `google_agg`. Here is the ([documentation on how to achieve it](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.DataFrameGroupBy.agg.html)). Take a moment or two to read the documentation and google examples because it is pretty complex.\n",
+    "\n",
+    "When you obtain `google_agg`, check its values to make sure it has an `App` column as its index as well as a `Positive` column and a `Total` column. Your output should look like:\n",
+    "\n",
+    "![Positive Reviews Agg](../images/positive-review-agg.png)\n",
+    "\n",
+    "*Hint: Use `positive_function` you created earlier as part of the param passed to the `agg()` function in order to aggregate the number of positive reviews.*\n",
+    "\n",
+    "#### Bonus:\n",
+    "\n",
+    "As of Pandas v0.23.4, you may opt to supply an array or an object to `agg()`. If you use the array param, you'll need to rename the columns so that their names are `Positive` and `Total`. Using the object param will allow you to create the aggregated columns with the desirable names without renaming them. However, you will probably encounter a warning indicating supplying an object to `agg()` will become outdated. It's up to you which way you will use. Try both ways out. Any way is fine as long as it works."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 60,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "google_agg = review_missing_removed.groupby('App').agg({\n",
+    "    'Sentiment': positive_function,      # Number of positive reviews\n",
+    "    'Sentiment': 'count'                 # Total number of reviews\n",
+    "}).rename(columns={'Sentiment': 'Total'}).copy()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Print the first 5 rows of `google_agg` to check it."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 61,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Total</th>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>App</th>\n",
+       "      <th></th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>10 Best Foods for You</th>\n",
+       "      <td>194</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>104 找工作 - 找工作 找打工 找兼職 履歷健檢 履歷診療室</th>\n",
+       "      <td>40</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>11st</th>\n",
+       "      <td>40</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1800 Contacts - Lens Store</th>\n",
+       "      <td>80</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1LINE – One Line with One Touch</th>\n",
+       "      <td>38</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                                  Total\n",
+       "App                                    \n",
+       "10 Best Foods for You               194\n",
+       "104 找工作 - 找工作 找打工 找兼職 履歷健檢 履歷診療室     40\n",
+       "11st                                 40\n",
+       "1800 Contacts - Lens Store           80\n",
+       "1LINE – One Line with One Touch      38"
+      ]
+     },
+     "execution_count": 61,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "google_agg.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Add a derived column to `google_agg` that is the ratio of the `Positive` and the `Total` columns. Call this column `Positive Ratio`. \n",
+    "\n",
+    "Make sure to account for the case where the denominator is zero using the `np.where()` function."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Now drop the `Positive` and `Total` columns. Do this with `inplace=True`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Your code here:\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Print the first 5 rows of `google_agg`. Your output should look like:\n",
+    "\n",
+    "![Positive Reviews Agg](../images/positive-review-ratio.png)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Your code here:\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Challenge 3 - Join the Dataframes\n",
+    "\n",
+    "In this part of the lab, we will join the two dataframes and obtain a dataframe that contains features we can use in our ML algorithm.\n",
+    "\n",
+    "In the next cell, join the `google_missing_removed` dataframe with the `google_agg` dataframe on the `App` column. Assign this dataframe to the variable `google`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Your code here:\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Let's look at the final result using the `head()` function. Your final product should look like:\n",
+    "\n",
+    "![Final Product](../images/google-final-head.png)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Your code here:\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/data/googleplaystore.csv b/your-code/notebooks/googleplaystore.csv
similarity index 100%
rename from data/googleplaystore.csv
rename to your-code/notebooks/googleplaystore.csv
diff --git a/data/googleplaystore_user_reviews.csv b/your-code/notebooks/googleplaystore_user_reviews.csv
similarity index 100%
rename from data/googleplaystore_user_reviews.csv
rename to your-code/notebooks/googleplaystore_user_reviews.csv
diff --git a/your-code/notebooks/main.ipynb b/your-code/notebooks/main.ipynb
deleted file mode 100755
index 07928c1..0000000
--- a/your-code/notebooks/main.ipynb
+++ /dev/null
@@ -1,773 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Before your start:\n",
-    "- Read the README.md file\n",
-    "- Comment as much as you can and use the resources in the README.md file\n",
-    "- Happy learning!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "#Import your libraries\n",
-    "import numpy as np\n",
-    "import pandas as pd"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Introduction\n",
-    "\n",
-    "In this lab, we will use two datasets. Both datasets contain variables that describe apps from the Google Play Store. We will use our knowledge in feature extraction to process these datasets and prepare them for the use of a ML algorithm."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Challenge 1 - Loading and Extracting Features from the First Dataset"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### In this challenge, our goals are: \n",
-    "\n",
-    "* Exploring the dataset.\n",
-    "* Identify the columns with missing values.\n",
-    "* Either replacing the missing values in each column or drop the columns.\n",
-    "* Conver each column to the appropriate type."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### The first dataset contains different information describing the apps. \n",
-    "\n",
-    "Load the dataset into the variable `google_play` in the cell below. The dataset is in the file `googleplaystore.csv`"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Examine all variables and their types in the following cell"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Since this dataset only contains one numeric column, let's skip the `describe()` function and look at the first 5 rows using the `head()` function"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### We can see that there are a few columns that could be coerced to numeric.\n",
-    "\n",
-    "Start with the reviews column. We can evaluate what value is causing this column to be of object type finding the non-numeric values in this column. To do this, we recall the `to_numeric()` function. With this function, we are able to coerce all non-numeric data to null. We can then use the `isnull()` function to subset our dataframe using the True/False column that this function generates.\n",
-    "\n",
-    "In the cell below, transform the Reviews column to numeric and assign this new column to the variable `Reviews_numeric`. Make sure to coerce the errors."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Next, create a column containing True/False values using the `isnull()` function. Assign this column to the `Reviews_isnull` variable."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Finally, subset the `google_play` with `Reviews_isnull`. This should give you all the rows that contain non-numeric characters.\n",
-    "\n",
-    "Your output should look like:\n",
-    "\n",
-    "![Reviews_bool.png](../images/reviews-bool.png)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### We see that Google Play is using a shorthand for millions. \n",
-    "\n",
-    "Let's write a function to transform this data.\n",
-    "\n",
-    "Steps:\n",
-    "\n",
-    "1. Create a function that returns the correct numeric values of *Reviews*.\n",
-    "1. Define a test string with `M` in the last character.\n",
-    "1. Test your function with the test string. Make sure your function works correctly. If not, modify your functions and test again."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here\n",
-    "\n",
-    "def convert_string_to_numeric(s):\n",
-    "    \"\"\"\n",
-    "    Convert a string value to numeric. If the last character of the string is `M`, obtain the \n",
-    "    numeric part of the string, multiply it with 1,000,000, then return the result. Otherwise, \n",
-    "    convert the string to numeric value and return the result.\n",
-    "    \n",
-    "    Args:\n",
-    "        s: The Reviews score in string format.\n",
-    "\n",
-    "    Returns:\n",
-    "        The correct numeric value of the Reviews score.\n",
-    "    \"\"\"\n",
-    "    return np.NaN\n",
-    "\n",
-    "test_string = '4.0M'\n",
-    "\n",
-    "convert_string_to_numeric(test_string) == 4000000"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The last step is to apply the function to the `Reviews` column in the following cell:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Check the non-numeric `Reviews` row again. It should have been fixed now and you should see:\n",
-    "\n",
-    "![Reviews_bool_fixed.png](../images/reviews-bool-fixed.png)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Also check the variable types of `google_play`. The `Reviews` column should be a `float64` type now."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### The next column we will look at is `Size`. We start by looking at all unique values in `Size`:\n",
-    "\n",
-    "*Hint: use `unique()` ([documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.unique.html))*."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "You should have seen lots of unique values of the app sizes.\n",
-    "\n",
-    "#### While we can convert most of the `Size` values to numeric in the same way we converted the `Reviews` values, there is one value that is impossible to convert.\n",
-    "\n",
-    "What is that badass value? Enter it in the next cell and calculate the proportion of its occurence to the total number of records of `google_play`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### While this column may be useful for other types of analysis, we opt to drop it from our dataset. \n",
-    "\n",
-    "There are two reasons. First, the majority of the data are ordinal but a sizeable proportion are missing because we cannot convert them to numerical values. Ordinal data are both numerical and categorical, and they usually can be ranked (e.g. 82k is smaller than 91M). In contrast, non-ordinal categorical data such as blood type and eye color cannot be ranked. The second reason is as a categorical column, it has too many unique values to produce meaningful insights. Therefore, in our case the simplest strategy would be to drop the column.\n",
-    "\n",
-    "Drop the column in the cell below (use `inplace=True`)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Now let's look at how many missing values are in each column. \n",
-    "\n",
-    "This will give us an idea of whether we should come up with a missing data strategy or give up on the column all together. In the next column, find the number of missing values in each column: \n",
-    "\n",
-    "*Hint: use the `isna()` and `sum()` functions.*"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "You should find the column with the most missing values is now `Rating`.\n",
-    "\n",
-    "#### What is the proportion of the missing values in `Rating` to the total number of records?\n",
-    "\n",
-    "Enter your answer in the cell below."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "A sizeable proportion of the `Rating` column is missing. A few other columns also contain several missing values.\n",
-    "\n",
-    "#### We opt to preserve these columns and remove the rows containing missing data.\n",
-    "\n",
-    "In particular, we don't want to drop the `Rating` column because:\n",
-    "\n",
-    "* It is one of the most important columns in our dataset. \n",
-    "\n",
-    "* Since the dataset is not a time series, the loss of these rows will not have a negative impact on our ability to analyze the data. It will, however, cause us to lose some meaningful observations. But the loss is limited compared to the gain we receive by preserving these columns.\n",
-    "\n",
-    "In the cell below, remove all rows containing at least one missing value. Use the `dropna()` function ([documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html)). Assign the new dataframe to the variable `google_missing_removed`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "From now on, we use the `google_missing_removed` variable instead of `google_play`.\n",
-    "\n",
-    "#### Next, we look at the `Last Updated` column.\n",
-    "\n",
-    "The `Last Updated` column seems to contain a date, though it is classified as an object type. Let's convert this column using the `pd.to_datetime` function ([documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html))."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### The last column we will transform is `Price`. \n",
-    "\n",
-    "We start by looking at the unique values of this column."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Since all prices are ordinal data without exceptions, we can tranform this column by removing the dollar sign and converting to numeric. We can create a new column called `Price Numerical` and drop the original column.\n",
-    "\n",
-    "We will achieve our goal in three steps. Follow the instructions of each step below.\n",
-    "\n",
-    "#### First we remove the dollar sign. Do this in the next cell by applying the `str.replace` function to the column to replace `$` with an empty string (`''`)."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Second step, coerce the `Price Numerical` column to numeric."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "**Finally, drop the original `Price` column.**"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Now check the variable types of `google_missing_removed`. Make sure:\n",
-    "\n",
-    "* `Size` and `Price` columns have been removed.\n",
-    "* `Rating`, `Reviews`, and `Price Numerical` have the type of `float64`.\n",
-    "* `Last Updated` has the type of `datetime64`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Challenge 2 - Loading and Extracting Features from the Second Dataset"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Load the second dataset to the variable `google_reviews`. The data is in the file `googleplaystore_user_reviews.csv`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### This dataset contains the top 100 reviews for each app. \n",
-    "\n",
-    "Let's examine this dataset using the `head` function"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### The main piece of information we would like to extract from this dataset is the proportion of positive reviews of each app. \n",
-    "\n",
-    "Columns like `Sentiment_Polarity` and `Sentiment_Subjectivity` are not to our interests because we have no clue how to use them. We do not care about `Translated_Review` because natural language processing is too complex for us at present (in fact the `Sentiment`, `Sentiment_Polarity`, and `Sentiment_Subjectivity` columns are derived from `Translated_Review` the data scientists). \n",
-    "\n",
-    "What we care about in this challenge is `Sentiment`. To be more precise, we care about **what is the proportion of *Positive* sentiment of each app**. This will require us to aggregate the `Sentiment` data by `App` in order to calculate the proportions.\n",
-    "\n",
-    "Now that you are clear about what we are trying to achieve, follow the steps below that will walk you through towards our goal."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Our first step will be to remove all rows with missing sentiment. \n",
-    "\n",
-    "In the next cell, drop all rows with missing data using the `dropna()` function and assign this new dataframe to `review_missing_removed`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Now, use the `value_counts()` function ([documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html)) to get a sense on how many apps are in this dataset and their review counts."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Now the tough part comes. Let's plan how we will achieve our goal:\n",
-    "\n",
-    "1. We will count the number of reviews that contain *Positive* in the `Sentiment` column.\n",
-    "\n",
-    "1. We will create a new dataframe to contain the `App` name, the number of positive reviews, and the total number of reviews of each app.\n",
-    "\n",
-    "1. We will then loop the new dataframe to calculate the postivie review portion of each app."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Step 1: Count the number of positive reviews.\n",
-    "\n",
-    "In the following cell, write a function that takes a column and returns the number of times *Positive* appears in the column. \n",
-    "\n",
-    "*Hint: One option is to use the `np.where()` function ([documentation](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.where.html)).*"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code below\n",
-    "\n",
-    "def positive_function(x):\n",
-    "    \"\"\"\n",
-    "    Count how many times the string `Positive` appears in a column (exact string match).\n",
-    "    \n",
-    "    Args:\n",
-    "        x: data column\n",
-    "    \n",
-    "    Returns:\n",
-    "        The number of occurrences of `Positive` in the column data.\n",
-    "    \"\"\"\n",
-    "    return 0"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Step 2: Create a new dataframe to contain the `App` name, the number of positive reviews, and the total number of reviews of each app\n",
-    "\n",
-    "We will group `review_missing_removed` by the `App` column, then aggregate the grouped dataframe on the number of positive reviews and the total review counts of each app. The result will be assigned to a new variable `google_agg`. Here is the ([documentation on how to achieve it](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.DataFrameGroupBy.agg.html)). Take a moment or two to read the documentation and google examples because it is pretty complex.\n",
-    "\n",
-    "When you obtain `google_agg`, check its values to make sure it has an `App` column as its index as well as a `Positive` column and a `Total` column. Your output should look like:\n",
-    "\n",
-    "![Positive Reviews Agg](../images/positive-review-agg.png)\n",
-    "\n",
-    "*Hint: Use `positive_function` you created earlier as part of the param passed to the `agg()` function in order to aggregate the number of positive reviews.*\n",
-    "\n",
-    "#### Bonus:\n",
-    "\n",
-    "As of Pandas v0.23.4, you may opt to supply an array or an object to `agg()`. If you use the array param, you'll need to rename the columns so that their names are `Positive` and `Total`. Using the object param will allow you to create the aggregated columns with the desirable names without renaming them. However, you will probably encounter a warning indicating supplying an object to `agg()` will become outdated. It's up to you which way you will use. Try both ways out. Any way is fine as long as it works."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Print the first 5 rows of `google_agg` to check it."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Add a derived column to `google_agg` that is the ratio of the `Positive` and the `Total` columns. Call this column `Positive Ratio`. \n",
-    "\n",
-    "Make sure to account for the case where the denominator is zero using the `np.where()` function."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Now drop the `Positive` and `Total` columns. Do this with `inplace=True`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Print the first 5 rows of `google_agg`. Your output should look like:\n",
-    "\n",
-    "![Positive Reviews Agg](../images/positive-review-ratio.png)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Challenge 3 - Join the Dataframes\n",
-    "\n",
-    "In this part of the lab, we will join the two dataframes and obtain a dataframe that contains features we can use in our ML algorithm.\n",
-    "\n",
-    "In the next cell, join the `google_missing_removed` dataframe with the `google_agg` dataframe on the `App` column. Assign this dataframe to the variable `google`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Let's look at the final result using the `head()` function. Your final product should look like:\n",
-    "\n",
-    "![Final Product](../images/google-final-head.png)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Your code here:\n"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.7.2"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}

	App	Category	Rating	Reviews	Size	Installs	Type	Price	Content Rating	Genres	Last Updated	Current Ver	Android Ver
0	Photo Editor & Candy Camera & Grid & ScrapBook	ART_AND_DESIGN	4.1	159	19M	10,000+	Free	0	Everyone	Art & Design	January 7, 2018	1.0.0	4.0.3 and up
1	Coloring book moana	ART_AND_DESIGN	3.9	967	14M	500,000+	Free	0	Everyone	Art & Design;Pretend Play	January 15, 2018	2.0.0	4.0.3 and up
2	U Launcher Lite – FREE Live Cool Themes, Hide ...	ART_AND_DESIGN	4.7	87510	8.7M	5,000,000+	Free	0	Everyone	Art & Design	August 1, 2018	1.2.4	4.0.3 and up
3	Sketch - Draw & Paint	ART_AND_DESIGN	4.5	215644	25M	50,000,000+	Free	0	Teen	Art & Design	June 8, 2018	Varies with device	4.2 and up
4	Pixel Draw - Number Art Coloring Book	ART_AND_DESIGN	4.3	967	2.8M	100,000+	Free	0	Everyone	Art & Design;Creativity	June 20, 2018	1.1	4.4 and up
...	...	...	...	...	...	...	...	...	...	...	...	...	...
10836	Sya9a Maroc - FR	FAMILY	4.5	38	53M	5,000+	Free	0	Everyone	Education	July 25, 2017	1.48	4.1 and up
10837	Fr. Mike Schmitz Audio Teachings	FAMILY	5.0	4	3.6M	100+	Free	0	Everyone	Education	July 6, 2018	1.0	4.1 and up
10838	Parkinson Exercices FR	MEDICAL	NaN	3	9.5M	1,000+	Free	0	Everyone	Medical	January 20, 2017	1.0	2.2 and up
10839	The SCP Foundation DB fr nn5n	BOOKS_AND_REFERENCE	4.5	114	Varies with device	1,000+	Free	0	Mature 17+	Books & Reference	January 19, 2015	Varies with device	Varies with device
10840	iHoroscope - 2018 Daily Horoscope & Astrology	LIFESTYLE	4.5	398307	19M	10,000,000+	Free	0	Everyone	Lifestyle	July 25, 2018	Varies with device	Varies with device
	App	Translated_Review	Sentiment	Sentiment_Polarity	Sentiment_Subjectivity
0	10 Best Foods for You	I like eat delicious food. That's I'm cooking ...	Positive	1.00	0.533333
1	10 Best Foods for You	This help eating healthy exercise regular basis	Positive	0.25	0.288462
2	10 Best Foods for You	NaN	NaN	NaN	NaN
3	10 Best Foods for You	Works great especially going grocery store	Positive	0.40	0.875000
4	10 Best Foods for You	Best idea us	Positive	1.00	0.300000
	Total
App
10 Best Foods for You	194
104 找工作 - 找工作找打工找兼職履歷健檢履歷診療室	40
11st	40
1800 Contacts - Lens Store	80
1LINE – One Line with One Touch	38