# Sales & Profit Performance Analysis (Superstore Dataset)


## Project Overview
This project analyzes retail sales data from the Superstore dataset to understand sales performance, profitability, and customer segments.


In [None]:
import pandas as pd

df = pd.read_csv("SampleSuperstore.csv")


In [25]:
df.head()


Unnamed: 0,Ship Mode,Segment,Country,City,State,Postal Code,Region,Category,Sub-Category,Sales,Quantity,Discount,Profit
0,Second Class,Consumer,United States,Henderson,Kentucky,42420,South,Furniture,Bookcases,261.96,2,0.0,41.9136
1,Second Class,Consumer,United States,Henderson,Kentucky,42420,South,Furniture,Chairs,731.94,3,0.0,219.582
2,Second Class,Corporate,United States,Los Angeles,California,90036,West,Office Supplies,Labels,14.62,2,0.0,6.8714
3,Standard Class,Consumer,United States,Fort Lauderdale,Florida,33311,South,Furniture,Tables,957.5775,5,0.45,-383.031
4,Standard Class,Consumer,United States,Fort Lauderdale,Florida,33311,South,Office Supplies,Storage,22.368,2,0.2,2.5164


## Data Cleaning
Column names were standardized to ensure consistency and ease of analysis.


In [26]:
df.columns = df.columns.str.strip().str.lower().str.replace(" ", "_")
df.columns


Index(['ship_mode', 'segment', 'country', 'city', 'state', 'postal_code',
       'region', 'category', 'sub-category', 'sales', 'quantity', 'discount',
       'profit'],
      dtype='object')

In [27]:
df.rename(columns={"sub-category": "sub_category"}, inplace=True)
df.columns


Index(['ship_mode', 'segment', 'country', 'city', 'state', 'postal_code',
       'region', 'category', 'sub_category', 'sales', 'quantity', 'discount',
       'profit'],
      dtype='object')

## Sales and Profit Analysis
The following analysis explores sales and profit across different business dimensions.


In [28]:
df[["sales", "profit"]].sum()


sales     2.297201e+06
profit    2.863970e+05
dtype: float64

### Sales by Category


In [29]:
df.groupby("category")["sales"].sum().sort_values(ascending=False)


category
Technology         836154.0330
Furniture          741999.7953
Office Supplies    719047.0320
Name: sales, dtype: float64

In [30]:
df.groupby("region")["profit"].sum().sort_values(ascending=False)


region
West       108418.4489
East        91522.7800
South       46749.4303
Central     39706.3625
Name: profit, dtype: float64

In [31]:
df.groupby("segment")["sales"].sum().sort_values(ascending=False)


segment
Consumer       1.161401e+06
Corporate      7.061464e+05
Home Office    4.296531e+05
Name: sales, dtype: float64

## Key Insights

- **Technology** is the top-performing category with the highest total sales, followed by Furniture and Office Supplies.
- The **West region** generates the highest overall profit, indicating strong regional performance compared to other regions.
- The **Consumer segment** contributes the largest share of total sales, making it the most valuable customer segment for the business.
