# Projects -Cohort Analysis for assessing customer retention in E-commerce industry

## 02 - Exploratory Data Analysis (EDA) - Sales & Customer Behavior

In this notebook, we explore the dataset to better understand:

- Overall sales trends over time
- Top-selling products
- Customer activity and purchasing frequency
- Patterns in purchase behavior (quantity, price, time)

This will help us identify key insights before moving on to modeling or segmentation.

Dataset: Cleaned_Dataset_ecommerce.csv

In [34]:
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import os

dataset_path = os.path.join("..", "dataset", "Cleaned_Dataset_ecommerce.csv")
#dataset_path = os.path.join("..", "dataset", "Cleaned_Dataset_ecommerce.pkl")

df = pd.read_csv(dataset_path)
#df = pd.read_pickle(dataset_path)

df.shape
df

Unnamed: 0,InvoiceNo,InvoiceDate,CustomerID,StockCode,Description,Quantity,UnitPrice,TotalPrice,Country
0,536365,2010-12-01 08:26:00,17850.0,SC1734,Electronics,65,10.23,664.95,Egypt
1,536365,2010-12-01 08:26:00,17850.0,SC2088,Furniture,95,19.61,1862.95,Mali
2,536365,2010-12-01 08:26:00,17850.0,SC3463,Books,78,61.49,4796.22,Mali
3,536365,2010-12-01 08:26:00,17850.0,SC6228,Toys,15,24.73,370.95,South Africa
4,536365,2010-12-01 08:26:00,17850.0,SC2149,Toys,50,38.83,1941.50,Rwanda
...,...,...,...,...,...,...,...,...,...
406824,581587,2011-12-09 12:50:00,12680.0,SC5440,Furniture,25,36.53,913.25,Mozambique
406825,581587,2011-12-09 12:50:00,12680.0,SC3576,Cloths,94,90.65,8521.10,Tunisia
406826,581587,2011-12-09 12:50:00,12680.0,SC5283,Furniture,14,40.07,560.98,Somalia
406827,581587,2011-12-09 12:50:00,12680.0,SC1099,Cosmetics,45,79.83,3592.35,Senegal


### Sales Trends

We analyze sales performance over time to identify business trends, seasonality, and growth patterns.  
This includes tracking revenue by month and visualizing key changes that inform strategic decisions.

In [44]:
df['InvoiceDate'] = pd.to_datetime(df['InvoiceDate'], errors='coerce')
df['InvoiceMonth'] = df['InvoiceDate'].dt.to_period('M')
df

Unnamed: 0,InvoiceNo,InvoiceDate,CustomerID,StockCode,Description,Quantity,UnitPrice,TotalPrice,Country,InvoiceMonth
0,536365,2010-12-01 08:26:00,17850.0,SC1734,Electronics,65,10.23,664.95,Egypt,2010-12
1,536365,2010-12-01 08:26:00,17850.0,SC2088,Furniture,95,19.61,1862.95,Mali,2010-12
2,536365,2010-12-01 08:26:00,17850.0,SC3463,Books,78,61.49,4796.22,Mali,2010-12
3,536365,2010-12-01 08:26:00,17850.0,SC6228,Toys,15,24.73,370.95,South Africa,2010-12
4,536365,2010-12-01 08:26:00,17850.0,SC2149,Toys,50,38.83,1941.50,Rwanda,2010-12
...,...,...,...,...,...,...,...,...,...,...
406824,581587,2011-12-09 12:50:00,12680.0,SC5440,Furniture,25,36.53,913.25,Mozambique,2011-12
406825,581587,2011-12-09 12:50:00,12680.0,SC3576,Cloths,94,90.65,8521.10,Tunisia,2011-12
406826,581587,2011-12-09 12:50:00,12680.0,SC5283,Furniture,14,40.07,560.98,Somalia,2011-12
406827,581587,2011-12-09 12:50:00,12680.0,SC1099,Cosmetics,45,79.83,3592.35,Senegal,2011-12


In [70]:
monthly_revenue = df.groupby('InvoiceMonth')['TotalPrice'].sum().reset_index()
monthly_revenue = monthly_revenue.sort_values("TotalPrice", ascending=False).reset_index()

In [72]:
monthly_revenue

Unnamed: 0,index,InvoiceMonth,TotalPrice
0,11,2011-11,167733100.0
1,10,2011-10,128742500.0
2,9,2011-09,103628100.0
3,5,2011-05,74341510.0
4,3,2011-03,71372100.0
5,6,2011-06,71096390.0
6,7,2011-07,70407890.0
7,8,2011-08,69943260.0
8,0,2010-12,67923670.0
9,4,2011-04,59140030.0


In [91]:
Quantity_by_country = df.groupby('Country')['Quantity'].sum().reset_index()

In [93]:
Quantity_by_country

Unnamed: 0,Country,Quantity
0,Algeria,736181
1,Benin,738133
2,Burkina Faso,728332
3,Cameroon,730881
4,Cote d'Ivoire,740229
5,Egypt,729299
6,Ethiopia,732940
7,Ghana,735354
8,Kenya,732247
9,Libya,739206


In [85]:
Quantity_by_country = Quantity_by_country.sort_values('Quantity', ascending = False).reset_index()
Quantity_by_country

Unnamed: 0,index,Country,Quantity
0,23,Togo,741223
1,20,South Africa,740589
2,4,Cote d'Ivoire,740229
3,15,Nigeria,739708
4,9,Libya,739206
5,1,Benin,738133
6,27,Zimbabwe,737522
7,19,Somalia,736219
8,0,Algeria,736181
9,18,Sierra Leone,735477
