# Data Loading

In [None]:
import pandas as pd

# load dataset
df = pd.read_csv("../data/superstore.csv")

# view first rows
df.head()


In [None]:
df.info()


# Data Cleaning & Preprocessing


This section cleans the dataset by handling missing values,
removing duplicates, fixing date format, and checking outliers.


In [None]:
# check missing values
df.isnull().sum()


In [None]:
# check duplicates
df.duplicated().sum()


In [None]:
# convert order date to datetime format
df["Order Date"] = pd.to_datetime(df["Order Date"], format="mixed")


In [None]:
df["Order Date"].head()
df.info()


In [None]:
df.describe()


# Exploratory Data Analysis (EDA)


## Monthly Sales Trend


In [None]:
df["Month"] = df["Order Date"].dt.month

monthly_sales = df.groupby("Month")["Sales"].sum()

monthly_sales


In [None]:
# convert month number to month name
df["Month"] = df["Order Date"].dt.month_name()

monthly_sales = df.groupby("Month")["Sales"].sum().sort_values(ascending=False)

monthly_sales


Sales show strong seasonality with highest revenue in November and December, indicating increased demand during holiday seasons. February shows lowest sales, suggesting opportunity for promotional strategies.

# Top Selling Products 


In [None]:
# top 10 selling products
top_products = df.groupby("Product Name")["Sales"].sum() \
                 .sort_values(ascending=False) \
                 .head(10)

top_products


In [None]:
top_products.to_frame()


Top products generate a large share of total revenue, with office equipment like copiers and binding machines showing the highest sales performance.
This indicates strong demand for business technology products and helps companies focus marketing, inventory, and profit strategies on high-performing items.

# Regional Sales Performance 


In [None]:
# sales by region
region_sales = df.groupby("Region")["Sales"].sum()

region_sales


In [None]:
region_sales.to_frame()


West and East regions show the highest sales performance, while South has the lowest sales, indicating a need for improved marketing and expansion strategies.

# Profit Analysis (Category Wise)


In [None]:
# profit by category â†’ convert to DataFrame
category_profit = df.groupby("Category")["Profit"].sum().reset_index()

category_profit


Technology generates the highest profit, while Furniture shows the lowest profitability, indicating a need for improvement in the Furniture category.