# Inventory Optimization Case Study for Superstore Dataset

This project analyzes the Superstore dataset to optimize inventory by identifying high- and low-performing product categories and sub-categories based on sales, tailored for a Trainee Analyst role at MathCo.

## Setup and Data Exploration

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load dataset
df = pd.read_csv('data/superstore.csv')

# Explore dataset
print('Dataset Info:')
print(df.info())
print('\nSummary Statistics for Sales:')
print(df['Sales'].describe())
print('\nMissing Values:')
print(df[['Sales', 'Category', 'Sub-Category']].isnull().sum())
print('\nCategory Values:', df['Category'].unique())
print('Sub-Category Values:', df['Sub-Category'].unique())

## Analysis

In [None]:
# Aggregate sales by category and sub-category
category_analysis = df.groupby('Category').agg({
    'Sales': ['sum', 'mean', 'count'],
    'Order ID': 'nunique'
}).round(2)
category_analysis.columns = ['Total_Sales', 'Avg_Sales_per_Order', 'Order_Count', 'Unique_Orders']
category_analysis = category_analysis.sort_values('Total_Sales', ascending=False)

subcategory_analysis = df.groupby('Sub-Category').agg({
    'Sales': ['sum', 'mean', 'count'],
    'Order ID': 'nunique'
}).round(2)
subcategory_analysis.columns = ['Total_Sales', 'Avg_Sales_per_Order', 'Order_Count', 'Unique_Orders']
subcategory_analysis = subcategory_analysis.sort_values('Total_Sales', ascending=False)

print('Sales by Category:\n', category_analysis)
print('\nSales by Sub-Category:\n', subcategory_analysis)

# Save to CSV
category_analysis.to_csv('category_analysis.csv')
subcategory_analysis.to_csv('subcategory_analysis.csv')

## Visualizations

In [None]:
# Sales by Category
plt.figure(figsize=(8, 6))
sns.barplot(x='Total_Sales', y=category_analysis.index, data=category_analysis)
plt.title('Total Sales by Category')
plt.xlabel('Total Sales ($)')
plt.savefig('sales_by_category.png')
plt.show()

# Sales by Sub-Category (Top 10)
plt.figure(figsize=(10, 6))
sns.barplot(x='Total_Sales', y=subcategory_analysis.index[:10], data=subcategory_analysis.head(10))
plt.title('Top 10 Sub-Categories by Sales')
plt.xlabel('Total Sales ($)')
plt.savefig('sales_by_subcategory.png')
plt.show()

## Insights and Recommendations
- **High Performers**: Technology category and sub-categories like Phones and Chairs have the highest sales, indicating strong demand.
- **Low Performers**: Sub-categories like Bookcases and Tables have lower sales, suggesting overstocking or lack of demand.
- **Recommendations**:
  - Increase inventory for Technology (especially Phones) to meet demand.
  - Reduce stock for low-sales sub-categories like Bookcases to minimize costs.
  - Explore marketing strategies for underperforming sub-categories to boost sales.