---
<a name = Section2></a>
# **1. Problem Statement**
---

- Retail companies rely heavily on **customer purchases** to generate **revenue** and maintain **profitability**. 

- With thousands of products and customers, understanding sales patterns, customer behavior, and product performance can be **challenging** but is crucial for **strategic decision-making**.

- Analyzing this data helps businesses identify **high-value customers**, forecast **demand**, and optimize **pricing strategies** for better **profitability**.


---

<a name = Section21></a>
### **Scenario**

- Imagine you are part of the **data science team** at a leading retail company.

- The company is facing challenges with **declining profits** in some regions, **ineffective discounts**, and the inability to **target customers effectively**.

- You have been tasked with analyzing the company’s **order and sales data** to:

  - Uncover **key insights** about customer behavior and product performance.
  - Identify **trends** in sales, discounts, and profitability across different regions, categories, and segments.
  - Provide **actionable recommendations** to improve **sales** and **profitability**.

- The objectives are to:

  - Perform a **statistical analysis** of factors affecting **profitability** and **sales**.
  - Conduct an **Exploratory Data Analysis** (EDA) with **visualizations** and **storytelling** to help stakeholders understand the trends.
  - Offer **data-driven recommendations** to improve the **business’s bottom line**.

## 2. Data Dictionary & Description
The dataset contains information about customer orders and product sales in a retail environment. Each record represents an individual order with details about the customer, product, and transaction.

| Records | Features
|---------|----------
| 9994  | 21       

## Feature Details

| ID  | Feature Name        | Description of the Feature                                      |
|-----|---------------------|----------------------------------------------------------------|
| 01  | Row ID              | Unique identifier for each row in the dataset.                 |
| 02  | Order ID            | Unique identifier for each order.                              |
| 03  | Order Date          | Date when the order was placed.                                |
| 04  | Ship Date           | Date when the order was shipped.                               |
| 05  | Ship Mode           | Mode of shipping (e.g., Standard, Second Class).               |
| 06  | Customer ID         | Unique identifier for the customer.                            |
| 07  | Customer Name       | Full name of the customer.                                     |
| 08  | Segment             | Customer segment (e.g., Consumer, Corporate).                  |
| 09  | Country             | Country where the customer is located.                        |
| 10  | City                | City where the customer is located.                           |
| 11  | State               | State where the customer is located.                          |
| 12  | Postal Code         | Postal code of the customer's location.                       |
| 13  | Region              | Region of the customer.                                       |
| 14  | Product ID          | Unique identifier for the product.                            |
| 15  | Category            | Category of the product (e.g., Furniture, Office Supplies).   |
| 16  | Sub-Category        | Sub-category of the product (e.g., Bookcases, Chairs).        |
| 17  | Product Name        | Full name of the product.                                     |
| 18  | Sales               | Total sales amount for the product in the order.              |
| 19  | Quantity            | Number of units of the product sold.                          |
| 20  | Discount            | Discount applied to the order (as a percentage).              |
| 21  | Profit              | Profit generated from the order.                              |

## Dataset Link 
link = "https://raw.githubusercontent.com/vasudevgupta31/acadamic_datasets/master/eda_2_sample_%20superstore_large.csv"

## **3. Importing Libraries**

In [6]:
#-------------------------------------------------------------------------------------------------------------------------------
import pandas as pd
from ydata_profiling import ProfileReport
pd.set_option('display.max_columns', None)                          # Unfolding hidden features if the cardinality is high
pd.set_option('display.max_rows', 20)                             # Unfolding hidden data points if the cardinality is high
pd.set_option('display.float_format', lambda x: '%.5f' % x)         # To suppress scientific notation over exponential values
#-------------------------------------------------------------------------------------------------------------------------------
import numpy as np                                                  # Importing package numpys (For Numerical Python)
#-------------------------------------------------------------------------------------------------------------------------------
import matplotlib.pyplot as plt                                     # Importing pyplot interface of matplotlib
import seaborn as sns                                               # Importing seaborn library for interactive visualization
#-------------------------------------------------------------------------------------------------------------------------------
import warnings                                                     # Importing warning to disable runtime warnings
warnings.filterwarnings("ignore")                                   # Warnings will appear only once

In [8]:
data = pd.read_csv(filepath_or_buffer="https://raw.githubusercontent.com/vasudevgupta31/acadamic_datasets/master/eda_2_sample_%20superstore_large.csv", 
                   encoding='latin1')

In [10]:
data.head()

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,State,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit
0,1,CA-2016-152156,11/8/2016,11/11/2016,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96,2,0.0,41.9136
1,2,CA-2016-152156,11/8/2016,11/11/2016,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94,3,0.0,219.582
2,3,CA-2016-138688,6/12/2016,6/16/2016,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,California,90036,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62,2,0.0,6.8714
3,4,US-2015-108966,10/11/2015,10/18/2015,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775,5,0.45,-383.031
4,5,US-2015-108966,10/11/2015,10/18/2015,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.368,2,0.2,2.5164


In [11]:
!open .