# Sales Forecasting & Product Mix Optimization Analysis

This notebook contains the analysis for JioMart regional stores in Tamil Nadu. We'll perform:
1. Exploratory Data Analysis
2. Sales Forecasting using ARIMA and Prophet
3. Product Mix Optimization
4. Regional Performance Analysis

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.arima.model import ARIMA
from prophet import Prophet
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Set plotting style
plt.style.use('seaborn')
plt.rcParams['figure.figsize'] = (12, 6)

## Data Loading and Initial Exploration

In [None]:
# Load the dataset
df = pd.read_csv('../data/sales_data.csv')

# Convert date to datetime
df['Date'] = pd.to_datetime(df['Date'])

# Display basic information
print("Dataset Shape:", df.shape)
print("
Columns:", df.columns.tolist())
print("
Missing Values:")
print(df.isnull().sum())

# Basic statistics
df.describe()

## Exploratory Data Analysis

In [None]:
# Store-wise sales distribution
plt.figure(figsize=(15, 8))
sns.boxplot(x='City', y='Revenue', data=df)
plt.title('Revenue Distribution by City')
plt.xticks(rotation=45)
plt.show()

# Category-wise sales
category_sales = df.groupby('Category')['Revenue'].sum().sort_values(ascending=False)
plt.figure(figsize=(12, 6))
category_sales.plot(kind='bar')
plt.title('Total Revenue by Category')
plt.ylabel('Revenue')
plt.show()

## Time Series Analysis

In [None]:
# Monthly sales trend
monthly_sales = df.groupby(df['Date'].dt.to_period('M'))['Revenue'].sum()

# Decompose time series
result = seasonal_decompose(monthly_sales, model='additive')
result.plot()
plt.show()

# Check stationarity
result = adfuller(monthly_sales)
print('ADF Statistic:', result[0])
print('p-value:', result[1])

## Sales Forecasting

In [None]:
# Prepare data for ARIMA
arima_data = monthly_sales.to_frame().reset_index()
arima_data.columns = ['Date', 'Revenue']

# Split data
train = arima_data.iloc[:-12]  # Last 12 months as test
test = arima_data.iloc[-12:]

# Train ARIMA model
model = ARIMA(train['Revenue'], order=(1, 1, 1))
model_fit = model.fit()

# Forecast
forecast = model_fit.forecast(steps=12)

# Calculate metrics
mae = mean_absolute_error(test['Revenue'], forecast)
rmse = np.sqrt(mean_squared_error(test['Revenue'], forecast))

print(f'ARIMA Model Performance:
MAE: {mae:.2f}
RMSE: {rmse:.2f}')

# Plot results
plt.figure(figsize=(12, 6))
plt.plot(train['Date'], train['Revenue'], label='Train')
plt.plot(test['Date'], test['Revenue'], label='Test')
plt.plot(test['Date'], forecast, label='Forecast')
plt.legend()
plt.title('ARIMA Forecast')
plt.show()

## Product Mix Optimization

In [None]:
# Calculate product performance metrics
product_perf = df.groupby(['Product_Name', 'Category']).agg({
    'Units_Sold': 'sum',
    'Revenue': 'sum',
    'Profit_Margin': 'mean'
}).reset_index()

# Calculate ROI
product_perf['ROI'] = (product_perf['Revenue'] * product_perf['Profit_Margin']) / product_perf['Units_Sold']

# Identify underperforming products
threshold = product_perf['ROI'].quantile(0.25)  # Bottom 25%
underperforming = product_perf[product_perf['ROI'] < threshold]

print('Top 10 Underperforming Products:
')
print(underperforming.sort_values('ROI').head(10))

## Regional Performance Analysis

In [None]:
# Store-wise performance
store_perf = df.groupby(['City', 'Store_ID']).agg({
    'Revenue': 'sum',
    'Profit_Margin': 'mean',
    'Units_Sold': 'sum'
}).reset_index()

# Calculate store efficiency
store_perf['Efficiency'] = store_perf['Revenue'] / store_perf['Units_Sold']

# Visualize store performance
plt.figure(figsize=(15, 8))
sns.scatterplot(data=store_perf, x='Units_Sold', y='Revenue', 
    hue='City', size='Profit_Margin', sizes=(50, 200))
plt.title('Store Performance Analysis')
plt.xlabel('Total Units Sold')
plt.ylabel('Total Revenue')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()

## Recommendations

1. **Sales Forecasting:**
   - Implement ARIMA model for monthly sales predictions
   - Monitor forecast accuracy and update models periodically

2. **Product Mix Optimization:**
   - Review and potentially discontinue underperforming SKUs
   - Focus on high ROI products in each category
   - Consider category-specific promotions

3. **Regional Strategy:**
   - Tailor marketing strategies based on city-wise performance
   - Optimize inventory levels based on regional demand patterns
   - Implement targeted promotions in high-potential cities