# **Global E-Commerce Transactions Analysis Using Python**


# 2. Data Cleaning & Preprocessing:

In this notebook, we clean and preprocess the global e-commerce dataset to ensure accuracy, consistency, and readiness for advanced analysis.

---

In [None]:
# Libraries:

import pandas as pd
import numpy as np

pd.set_option("display.max_columns", None)

In [None]:
# Dataset:

df = pd.read_csv('Global_E-Commerce_Transactions.csv')
df.head()

In [None]:
# Convert Date Column:

df['Transaction_Date'] = pd.to_datetime(df['Transaction_Date'], format = '%d-%m-%Y')

In [None]:
# Data Type Validation:

df.dtypes

In [None]:
# Missing Value Handling:

df.isnull().sum()

---
"State_Region" missing values replaced with "Not Applicable".

In [None]:
df["State_Region"] = df["State_Region"].fillna("Not Applicable")

---
Customer_Rating: Fill with median.

In [None]:
df["Customer_Rating"] = df["Customer_Rating"].fillna(df["Customer_Rating"].median())

---
Duplicate Removal:

In [None]:
df = df.drop_duplicates()
df.shape

---
Date-Based Features:

In [None]:
df["Year"]  = df["Transaction_Date"].dt.year
df["Month"] = df["Transaction_Date"].dt.month
df["Month_Name"] = df["Transaction_Date"].dt.month_name()
df["Day"] = df["Transaction_Date"].dt.day
df["Weekday"] = df["Transaction_Date"].dt.day_name()

---
Order Value Buckets:

In [None]:
df["Order_Value_Segment"] = pd.cut(
    df["Order_Value_USD"],
    bins = [0, 100, 300, 700, 2000, np.inf],
    labels = ["Low", "Medium", "High", "Premium", "Luxury"]
)

---
IQR Method:

In [None]:
Q1 = df["Order_Value_USD"].quantile(0.25)
Q3 = df["Order_Value_USD"].quantile(0.75)
IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

df = df[(df["Order_Value_USD"] >= lower_bound) & 
        (df["Order_Value_USD"] <= upper_bound)]

---
Export Clean DataSet:

In [None]:
df.to_csv("Clean_Global_E_Commerce_Transactions.csv", index = False)

---
Dataset After Cleaning:

In [None]:
df.shape

---
# Cleaning Summary:

- Date columns converted successfully.
- Missing values handled appropriately.
- Outliers removed using IQR method.
- New time-based and value-based features created.
- Clean dataset exported for analysis.

---