https://www.kaggle.com/datasets/iamsouravbanerjee/customer-shopping-trends-dataset?select=shopping_trends.csv



# Applied Data Lab

# Project 01: Customer Shopping Trends Dataset

![Person Shopping gif](https://cdn.dribbble.com/users/1948198/screenshots/4377223/dribble.gif)

## About Dataset


## Project Objective:

### Step 1: Handling Null Values
#### Objective:
The first step is to handle null values in the dataset effectively, ensuring that we can work with clean and complete data for analysis.

1.1. **Identify Null Values:**
   - Identify which columns in the dataset have null values and assess the extent of missing data in each.

1.2. **Approaches to Handle Null Values:**
   - **Dropping Null Records:**
     - Utilize the straightforward approach of removing records with null values. Note that this may result in data loss, but it can be necessary for certain cases.
   - **Using `fillna()` Method:** [Pandas DataFrame fillna() Method](https://www.w3schools.com/python/pandas/ref_df_fillna.asp#:~:text=The%20fillna()%20method%20replaces,in%20the%20original%20DataFrame%20instead)
     - Employ the `fillna()` method from the Pandas DataFrame to replace null values with meaningful data, considering appropriate strategies for each column.
   - **Utilizing Non-Null Fields:**
     - For columns with no null values, explore the possibility of grouping the data by a field such as "boat type" and filling null values based on the grouping.

### Step 2: Extracting Additional Features
#### Objective:
Enhance the dataset by extracting additional features from existing columns, providing more insights for analysis.

2.1. **Extract Location Information:**
   - Utilize the "location" field containing data on country and city, split the information by the `Â»` keyword, and extract the city or other relevant data.

2.2. **Explore "Type" Field:**
  - Utilize the "type" field and employ methods like `unique` or `value_counts` to explore the data and understand the different values within this column. The "type" field may contain information about whether a boat is "new" or "used." To create a new feature or column that distinguishes between "new" and "used" boats, you can use string manipulation techniques like splitting the values by commas (`,`) or employing regular expressions to extract this information.

2.3. **Create Ranges for Size:**
   - Use attributes like "length" and "width" to define size ranges by finding the minimum and maximum values for these attributes. Create categories like big, small, and medium boats based on the ranges.

2.4. **Convert Price to a Common Currency:**
   - Standardize the currency of the "price" field to a common currency, e.g., USD, using the provided conversion rates.

| Currency                | Official Name          | Dollar Equivalent (Approx.) |
|-------------------------|------------------------|------------------------------|
| EUR                     | Euro                   | 1 EUR ≈ 1.12 USD             |
| CHF                     | Swiss Franc            | 1 CHF ≈ 1.08 USD             |
| Â£ (GBP)                 | British Pound Sterling | 1 GBP ≈ 1.32 USD             |
| DKK                     | Danish Krone           | 1 DKK ≈ 0.16 USD             |

### Step 3: Analyzing Boat Prices
#### Objective:
Perform a comprehensive analysis of boat prices based on specific criteria.

3.1. **Filter Data:**
   - Filter the dataset to include boats built from 2019 to the most recent year.

3.2. **Calculate Median Views:**
   - Calculate the median number of views of boat listings in the filtered dataset.

3.3. **Calculate Mean and Median Prices:**
   - Calculate the mean and median prices of boats in the filtered dataset, where the number of views in the last 7 days is greater than the calculated median views.

3.4. Analyze by Type and Size:
  - Group the dataset by "used" and "new" boat types, as well as "medium," "big," and "small" boat sizes. Calculate the mean and median prices for each of these combinations to provide insights into how prices vary based on boat type and size.

## Project Start From Here

## Setting Up the Address
In this cell, a path variable is set with the value of the current directory where the notebook is open. This is done to easily upload the dataset file from this location.

In [None]:
import pandas as pd

In [None]:
# Run this cell
import os
PATH = os.getcwd() + '/'
PATH

'/content/'

**ONLY FOR GOOGLE COLAB USERS**

For those who are using **Google Colab**, uncomment and run the cell below.

**Note**: You have to repalce value of variable `YOUR_PATH_TO_DATASET_DIRECTORY` with the path where your dataset is placed in the Google Drive folder.



In [None]:
# from google.colab import drive
# drive.mount('/content/drive/')
# YOUR_PATH_TO_DATASET_DIRECTORY = "work/Applied_Data_Lab/phase_2"
# PATH = "/content/drive/MyDrive/"+YOUR_PATH_TO_DATASET_DIRECTORY+"/"
# PATH

Importing the `boat_data.csv` file into the `data` variable.
