# Capstone Project: Data Analytics for Shopping Cart Database

## A) Business Background

### Business Objective
Based on shopping cart historical data, we aim to increase sales and revenue by providing insights and recommendations for product management, sales team, and marketing staff.

### Business Question
How can we use shopping cart historical data to increase sales and revenue by providing insights and recommendations for product management, sales team, and marketing staff?

### Hypothesis
If we succeed in identifying and understanding the key drivers that influence sales and revenue performance across various product categories over 10 months from Jan - Oct 2021, then we will be able to increase sales significantly.

## B) Data Preparation

### Data Collection
We will collect historical shopping cart data from Jan - Oct 2021. This data includes customer ID, product ID, order quantity, order date, delivery date, total sales, price per product, and customer age.

#### **Dataset:**

* https://docs.google.com/spreadsheets/d/1Q16Wmij2wxoziNtpc3ubbf1Ms3fjOGy5jNMVU5UHbCo/edit#gid=403512802


#### **Data Dictionary:**

* https://docs.google.com/spreadsheets/d/1Q16Wmij2wxoziNtpc3ubbf1Ms3fjOGy5jNMVU5UHbCo/edit#gid=510760296

### Libraries Used

#### **NumPy (import numpy as np)**
NumPy is a very useful library for array manipulation and mathematical operations on arrays. NumPy provides an efficient array data structure and functions for working with numerical data.

#### **Pandas (import pandas as pd)**
Pandas is a library used for data manipulation and analysis. Pandas provides data structures such as DataFrame, which make it easy to process and analyze tabular data.

#### **Matplotlib (import matplotlib.pyplot as plt)**
Matplotlib is a library for creating 2D graphics visualizations. With Matplotlib, you can create various types of plots such as line plots, bar plots, scatter plots, and so on.

#### **Seaborn (import seaborn as sns)**
Seaborn is a library built on top of Matplotlib and provides a high-level interface for creating statistical plots. Seaborn makes it easier to create plots with nice styles and provides additional functions for adding statistical elements to plots.

### Import Library

In [14]:
# Import necessary libraries
import numpy as np # linear arrays operations
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt # data visualization
import seaborn as sns # data visualization

### Import Dataset

In [31]:
#Import dataset form spreadsheet
sheet_url = 'https://docs.google.com/spreadsheets/d/1Q16Wmij2wxoziNtpc3ubbf1Ms3fjOGy5jNMVU5UHbCo/edit#gid=403512802'
sheet_url_replace = sheet_url.replace('/edit#gid=', '/export?format=csv&gid=') #to convert the spreadsheet format into csv

print(sheet_url_replace) #To show the link to csv

df = pd.read_csv(sheet_url_replace) #to load/read the csv into pandas dataframe
df.head(5) #Show only first 5 rows (To see how the dataset lookslike without)

https://docs.google.com/spreadsheets/d/1Q16Wmij2wxoziNtpc3ubbf1Ms3fjOGy5jNMVU5UHbCo/export?format=csv&gid=403512802


Unnamed: 0,sales_id,order_id,product_id,price_per_unit,quantity_sales,total_price,customer_id,payment,order_date,delivery_date,...,home_address,city,state,product_type,product_name,size,colour,price,quantity_products,description
0,0,1,218,"$106,00",2,"$212,00",64,"$30.811,00",8/30/2021,9/24/2021,...,4927 Alice MeadowApt. 960,Sanfordborough,South Australia,Shirt,Chambray,L,orange,"$105,00",44,"A orange coloured, L sized, Chambray Shirt"
1,1,1,481,"$118,00",1,"$118,00",64,"$30.811,00",8/30/2021,9/24/2021,...,4927 Alice MeadowApt. 960,Sanfordborough,South Australia,Jacket,Puffer,S,indigo,"$110,00",62,"A indigo coloured, S sized, Puffer Jacket"
2,2,1,2,"$96,00",3,"$288,00",64,"$30.811,00",8/30/2021,9/24/2021,...,4927 Alice MeadowApt. 960,Sanfordborough,South Australia,Shirt,Oxford Cloth,M,red,"$114,00",54,"A red coloured, M sized, Oxford Cloth Shirt"
3,3,1,1002,"$106,00",2,"$212,00",64,"$30.811,00",8/30/2021,9/24/2021,...,4927 Alice MeadowApt. 960,Sanfordborough,South Australia,Trousers,Wool,M,blue,"$111,00",52,"A blue coloured, M sized, Wool Trousers"
4,4,1,691,"$113,00",3,"$339,00",64,"$30.811,00",8/30/2021,9/24/2021,...,4927 Alice MeadowApt. 960,Sanfordborough,South Australia,Jacket,Parka,S,indigo,"$119,00",53,"A indigo coloured, S sized, Parka Jacket"


## C) Data Cleaning

The data cleaning process has been meticulously carried out by our data cleaning team using spreadsheet tools, adhering to the appropriate guidelines. This process included removing or imputing missing values, eliminating duplicates, and ensuring that the data is in the correct format.

## D) Exploratory Data Analysis (EDA)