# **Global E-Commerce Transactions Analysis Using Python**


# 3. Exploratory Data Analysis (EDA):

This notebook performs exploratory data analysis on the "Clean_Global_E_Commerce_Transactions.csv" dataset to uncover trends, patterns, and insights across **Countries, Customers, Products, and Transactions**.

---

In [None]:
# Libraries:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

pd.set_option("display.max_columns", None)

In [None]:
# Dataset:

df = pd.read_csv("Clean_Global_E_Commerce_Transactions.csv")
df.head()

--- 
1) Overall Revenue Overview:

In [None]:
total_revenue = df["Order_Value_USD"].sum()
total_order = df.shape[0]

total_revenue, total_order

---
2) Country-wise Revenue & Orders:

In [None]:
country_summary = (
    df.groupby("Country")
      .agg(
          Total_Orders=("Transaction_ID", "count"),
          Total_Revenue=("Order_Value_USD", "sum"),
          Avg_Order_Value=("Order_Value_USD", "mean")
      )
      .sort_values(by="Total_Revenue", ascending=False)
)

country_summary

- Visualization: Country Revenue (Bar Chart):

In [None]:
plt.Figure(figsize = (10, 5))

sns.barplot(
    data = country_summary.reset_index(),
    x="Country",
    y="Total_Revenue"
)

plt.title("Total Revenue by Country")
plt.xticks(rotation = 45)
plt.grid(axis = 'y')
plt.show()

---
3) Top 10 Cities by Revenue

In [None]:
top_cities = (
    df.groupby("Drop_City")["Order_Value_USD"]
      .sum()
      .sort_values(ascending = False)
      .head(10)
)

top_cities

---
4. Product Category Analysis:

In [None]:
product_summary = (
    df.groupby("Product_Category")
      .agg(
          Orders = ("Transaction_ID", "count"),
          Revenue = ("Order_Value_USD", "sum"),
          Avg_Value = ("Order_Value_USD", "mean")
      )
      .sort_values(by = "Revenue", ascending = False)
)

product_summary

- Visualization: Product Revenue Share (Pie Chart):

In [None]:
plt.figure(figsize = (7,7))

plt.pie(
    product_summary["Revenue"],
    labels = product_summary.index,
    autopct = "%1.1f%%",
    startangle = 140
)

plt.title("Revenue Share by Product Category")
plt.show()

---
5) Customer Demographics:

In [None]:
# Age Distribution (histogram Chart): 

plt.figure(figsize = (8,5))

sns.histplot(
    df["Customer_Age"],
    bins = 20,
    kde = True
)

plt.title("Customer Age Distribution")
plt.grid(axis = 'y')
plt.show()

---
6. Ratings Analysis:

In [None]:
rating_summary = df["Customer_Rating"].value_counts().sort_index()
rating_summary

- Visualization: Customer Ratings Distribution (Bar Chart):

In [None]:
plt.figure(figsize = (8,5))

sns.barplot(
    x = rating_summary.index,
    y = rating_summary.values
)

plt.title("Customer Ratings Distribution")
plt.xlabel("Rating")
plt.ylabel("Count")
plt.grid(axis = 'y')
plt.show()

---
7) Repeat vs New Customers:

In [None]:
repeat_summary = df["Repeat_Customer"].value_counts(normalize = True) * 100
repeat_summary

---
8. Order Status Breakdown:

In [None]:
order_status_summary = df["Order_Status"].value_counts(normalize = True) * 100
order_status_summary

---

# Key EDA Insights:

- Majority of revenue comes from a **Few Top-Performing Countries**.
- **Electronics and Travel** generate the highest revenue.
- Repeat customers contribute a significant portion of transactions.
- Most orders are **Successfully Completed**.
- Customer ratings are generally **Positive**.
---