# Blinkit Sales: Exploratory Data Analysis with Python

### Purpose:
This project aims to analyze sales data to uncover trends, optimize inventory management, and evaluate marketing effectiveness. By leveraging Python libraries such as Pandas, Matplotlib, and NumPy, I will explore patterns in order performance, delivery efficiency, and inventory levels, leading to actionable business insights.


### Methodology:
1. Data Cleaning & Preprocessing:
Load and inspect the datasets using Pandas.
Handle missing values and inconsistent data entries.
Convert categorical data into appropriate formats.
Merge relevant datasets to create a unified view for analysis.
2. Exploratory Data Analysis (EDA):
Generate summary statistics for key numerical and categorical variables.
Analyze sales distribution and seasonal trends.
Identify top-selling and least-selling products.
Investigate correlations between pricing, sales volume, and marketing efforts.
3. Data Visualization:
Create bar charts and histograms to show sales trends across different categories.
Develop time-series plots to analyze seasonal patterns in sales.
Use heatmaps to identify relationships between different features (e.g., sales and marketing spend).
Visualize inventory turnover and stockout trends.
4. Research Questions:
What are the top-selling products and categories based on sales revenue and order volume?
How does pricing impact sales performance?
What is the relationship between marketing spend and sales growth?
How efficient is the delivery process, and what factors influence delivery times?
Are there seasonal patterns in sales, and how do they affect inventory levels?
How can inventory management be optimized to reduce stockouts and overstocking?

### Tools & Technologies:
- Python Libraries: Pandas, NumPy, and Matplotlib.
- Jupyter Notebook: For code execution and analysis

### Documentation Overview:

This dataset consists of 9 different CSV files containing different categories of information. For this initial analysis using Python, I will keep the files separate. I will leverage SQL to manipulate the tables and produce further insights later on. This document is organized by file, where the EDA process is repeated for each one. I will clean and explore each file individually before laying out all of my findings at the end to produce a report containing key business insights and recommendations. 

### File 1: Blinket Products
This file contains key product information. Upon viewing it, I developed these questions to answer:
- What is the average list price and profit margin across all products?
- Which products have the highest and lowest profit margin?
- fill

\
*Note: the 'mrp' column stands for Maximum Retail Price.*

In [28]:
# importing libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# loading csv into a dataframe
product_df = pd.read_csv('/Users/tovi/Documents/blinkit_products.csv')
product_df

Unnamed: 0,product_id,product_name,category,brand,price,mrp,margin_percentage,shelf_life_days,min_stock_level,max_stock_level
0,153019,Onions,Fruits & Vegetables,Aurora LLC,947.95,1263.93,25.0,3,13,88
1,11422,Potatoes,Fruits & Vegetables,Ramaswamy-Tata,127.16,169.55,25.0,3,20,65
2,669378,Potatoes,Fruits & Vegetables,Chadha and Sons,212.14,282.85,25.0,3,23,70
3,848226,Tomatoes,Fruits & Vegetables,Barad and Sons,209.59,279.45,25.0,3,10,51
4,890623,Onions,Fruits & Vegetables,"Sangha, Nagar and Varty",354.52,472.69,25.0,3,27,55
...,...,...,...,...,...,...,...,...,...,...
263,444361,Pain Reliever,Pharmacy,"Prakash, Bawa and Kale",822.63,1028.29,20.0,365,20,71
264,679284,Cough Syrup,Pharmacy,Pant LLC,877.89,1097.36,20.0,365,28,95
265,240179,Cough Syrup,Pharmacy,Ram-Suri,90.56,113.20,20.0,365,20,56
266,673058,Cough Syrup,Pharmacy,Balan-Madan,765.76,957.20,20.0,365,30,94


In [29]:
# general descriptive stats
product_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 268 entries, 0 to 267
Data columns (total 10 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   product_id         268 non-null    int64  
 1   product_name       268 non-null    object 
 2   category           268 non-null    object 
 3   brand              268 non-null    object 
 4   price              268 non-null    float64
 5   mrp                268 non-null    float64
 6   margin_percentage  268 non-null    float64
 7   shelf_life_days    268 non-null    int64  
 8   min_stock_level    268 non-null    int64  
 9   max_stock_level    268 non-null    int64  
dtypes: float64(3), int64(4), object(3)
memory usage: 21.1+ KB


In [30]:
# checking for null values
null_check = product_df.isnull().sum()
print(null_check)

product_id           0
product_name         0
category             0
brand                0
price                0
mrp                  0
margin_percentage    0
shelf_life_days      0
min_stock_level      0
max_stock_level      0
dtype: int64


All good, now this file is ready for analysis! First let's take a look at some averages:

In [32]:
product_df[['price', 'margin_percentage']].mean()

price                488.356828
margin_percentage     27.779851
dtype: float64

The average product list price is $488.36, and the average profit margin across all products is 27.78%. 

### File 2: Blinket Deliveries
This file contains key information on delivery times.\
Research Questions:
- How frequently are items not delivered on time?
- Are any delivery partners, customers, or stores having repeated issues with delivery?

In [35]:
# loading csv into a dataframe
delivery_df = pd.read_csv('/Users/tovi/Documents/blinkit_orders.csv')
delivery_df.head()

Unnamed: 0,order_id,customer_id,order_date,promised_delivery_time,actual_delivery_time,delivery_status,order_total,payment_method,delivery_partner_id,store_id
0,1961864118,30065862,2024-07-17 08:34:01,2024-07-17 08:52:01,2024-07-17 08:47:01,On Time,3197.07,Cash,63230,4771
1,1549769649,9573071,2024-05-28 13:14:29,2024-05-28 13:25:29,2024-05-28 13:27:29,On Time,976.55,Cash,14983,7534
2,9185164487,45477575,2024-09-23 13:07:12,2024-09-23 13:25:12,2024-09-23 13:29:12,On Time,839.05,UPI,39859,9886
3,9644738826,88067569,2023-11-24 16:16:56,2023-11-24 16:34:56,2023-11-24 16:33:56,On Time,440.23,Card,61497,7917
4,5427684290,83298567,2023-11-20 05:00:39,2023-11-20 05:17:39,2023-11-20 05:18:39,On Time,2526.68,Cash,84315,2741


In [36]:
# general descriptive stats
product_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 268 entries, 0 to 267
Data columns (total 10 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   product_id         268 non-null    int64  
 1   product_name       268 non-null    object 
 2   category           268 non-null    object 
 3   brand              268 non-null    object 
 4   price              268 non-null    float64
 5   mrp                268 non-null    float64
 6   margin_percentage  268 non-null    float64
 7   shelf_life_days    268 non-null    int64  
 8   min_stock_level    268 non-null    int64  
 9   max_stock_level    268 non-null    int64  
dtypes: float64(3), int64(4), object(3)
memory usage: 21.1+ KB


In [37]:
# checking for null values
null_check = delivery_df.isnull().sum()
print(null_check)

order_id                  0
customer_id               0
order_date                0
promised_delivery_time    0
actual_delivery_time      0
delivery_status           0
order_total               0
payment_method            0
delivery_partner_id       0
store_id                  0
dtype: int64


### File 3: Blinket Units
The third file contains information on unit sales and pricing:

In [None]:
df3 = pd.read_csv('/Users/tovi/Documents/blinkit_order_items.csv')
df3

The fourth file:

In [None]:
df4 = pd.read_csv('/Users/tovi/Documents/blinkit_marketing_performance.csv')
df4

The fifth file: **NOTE -NEW version, or og version**

In [None]:
df5 = pd.read_csv('/Users/tovi/Documents/blinkit_inventoryNew.csv')
df5

The sixth file: **NOTE**

In [None]:
df6 = pd.read_csv('/Users/tovi/Documents/blinkit_inventory.csv')
df6

The seventh file: **NOTE maybe to be excluded**

In [None]:
df7 = pd.read_csv('/Users/tovi/Documents/blinkit_delivery_performance.csv')
df7

The eigth file:

In [None]:
df8 = pd.read_csv('/Users/tovi/Documents/blinkit_customers.csv')
df8

The ninth file:

In [None]:
df9 = pd.read_csv('/Users/tovi/Documents/blinkit_customer_feedback.csv')
df9

**TO DO**
- create separate doc for analysis w SQL documentation
- upload CSVs to pgadmin
- select files for PYTHON EDA, NOT SQL, reduce it, upload, view, and choose- explain in beginning why certain python files were left out of the python analysis
- should anything be combined?
- choose top 2-3 files for matplot and numpy demo
- Tableau dashboard

## Exploratory 