# Sales Insights Analysis

This notebook provides a comprehensive analysis of supermarket sales data.
We will explore sales patterns across different cities, product lines, and time periods.

## 1. Import Required Libraries

We'll use pandas for data manipulation, matplotlib and seaborn for visualization.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set the style for better-looking plots
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

## 2. Load the Data

Load the supermarket sales dataset from CSV file.

In [None]:
# Load the dataset
df = pd.read_csv('supermarket_sales.csv')
print("Data loaded successfully!")
print(f"Dataset shape: {df.shape}")

## 3. Data Inspection

Let's examine the structure and content of our dataset.

In [None]:
# Display first few rows
print("First 5 rows of the dataset:")
df.head()

In [None]:
# Get information about the dataset
print("Dataset Information:")
df.info()

In [None]:
# Statistical summary of numerical columns
print("Statistical Summary:")
df.describe()

In [None]:
# Check for missing values
print("Missing values in each column:")
missing_values = df.isnull().sum()
print(missing_values)
print(f"\nTotal missing values: {missing_values.sum()}")

## 4. Data Cleaning

Handle any missing values and prepare the data for analysis.

In [None]:
# Check for missing values and clean if necessary
if df.isnull().sum().sum() > 0:
    print("Missing values detected. Cleaning data...")
    
    # For numerical columns, fill with median
    numerical_cols = df.select_dtypes(include=[np.number]).columns
    for col in numerical_cols:
        if df[col].isnull().sum() > 0:
            df[col].fillna(df[col].median(), inplace=True)
    
    # For categorical columns, fill with mode
    categorical_cols = df.select_dtypes(include=['object']).columns
    for col in categorical_cols:
        if df[col].isnull().sum() > 0:
            df[col].fillna(df[col].mode()[0], inplace=True)
    
    print("Data cleaning completed.")
else:
    print("No missing values found. Data is clean!")

# Convert Date column to datetime format for time-based analysis
df['Date'] = pd.to_datetime(df['Date'])
print("\nDate column converted to datetime format.")

## 5. Sales Analysis by City

Visualize total sales across different cities.

In [None]:
# Calculate total sales by city
sales_by_city = df.groupby('City')['Total'].sum().sort_values(ascending=False)

# Create bar plot
plt.figure(figsize=(10, 6))
sales_by_city.plot(kind='bar', color='skyblue', edgecolor='black')
plt.title('Total Sales by City', fontsize=16, fontweight='bold')
plt.xlabel('City', fontsize=12)
plt.ylabel('Total Sales', fontsize=12)
plt.xticks(rotation=45)
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

print("\nSales by City:")
print(sales_by_city)

## 6. Sales Analysis by Product Line

Examine sales performance across different product categories.

In [None]:
# Calculate total sales by product line
sales_by_product = df.groupby('Product line')['Total'].sum().sort_values(ascending=False)

# Create horizontal bar plot for better readability
plt.figure(figsize=(10, 8))
sales_by_product.plot(kind='barh', color='coral', edgecolor='black')
plt.title('Total Sales by Product Line', fontsize=16, fontweight='bold')
plt.xlabel('Total Sales', fontsize=12)
plt.ylabel('Product Line', fontsize=12)
plt.grid(axis='x', alpha=0.3)
plt.tight_layout()
plt.show()

print("\nSales by Product Line:")
print(sales_by_product)

In [None]:
# Create a pie chart for product line sales distribution
plt.figure(figsize=(10, 8))
plt.pie(sales_by_product, labels=sales_by_product.index, autopct='%1.1f%%', startangle=90)
plt.title('Sales Distribution by Product Line', fontsize=16, fontweight='bold')
plt.axis('equal')
plt.tight_layout()
plt.show()

## 7. Sales Analysis Over Time

Analyze sales trends over time to identify patterns.

In [None]:
# Calculate daily sales
daily_sales = df.groupby('Date')['Total'].sum().sort_index()

# Create line plot for sales over time
plt.figure(figsize=(14, 6))
plt.plot(daily_sales.index, daily_sales.values, marker='o', linestyle='-', color='green', linewidth=2, markersize=6)
plt.title('Daily Sales Over Time', fontsize=16, fontweight='bold')
plt.xlabel('Date', fontsize=12)
plt.ylabel('Total Sales', fontsize=12)
plt.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

print("\nDaily Sales Statistics:")
print(f"Average daily sales: ${daily_sales.mean():.2f}")
print(f"Maximum daily sales: ${daily_sales.max():.2f}")
print(f"Minimum daily sales: ${daily_sales.min():.2f}")

In [None]:
# Sales by month
df['Month'] = df['Date'].dt.month_name()
monthly_sales = df.groupby('Month')['Total'].sum()

# Order months properly
month_order = ['January', 'February', 'March', 'April', 'May', 'June', 
               'July', 'August', 'September', 'October', 'November', 'December']
monthly_sales = monthly_sales.reindex([m for m in month_order if m in monthly_sales.index])

# Create bar plot
plt.figure(figsize=(12, 6))
monthly_sales.plot(kind='bar', color='purple', edgecolor='black')
plt.title('Total Sales by Month', fontsize=16, fontweight='bold')
plt.xlabel('Month', fontsize=12)
plt.ylabel('Total Sales', fontsize=12)
plt.xticks(rotation=45)
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

print("\nMonthly Sales:")
print(monthly_sales)

## 8. Summary

Key insights from the analysis:
- Sales distribution across different cities
- Product line performance comparison
- Temporal patterns in sales data

This analysis provides a foundation for deeper investigation into sales patterns and business intelligence.