# Comparing sales on holidays versus non-holidays

This Python script visualizes sales data from a Walmart dataset, specifically comparing sales on holidays versus non-holidays. It uses the pandas library for data manipulation and plotly.express for creating the scatter plot.

This code snippet loads a dataset of Walmart sales, converts date information for proper plotting, and creates a scatter plot that visualizes sales figures on holidays versus non-holidays. The plot uses color coding to distinguish between sales data for holidays and non-holidays, providing insights into how sales vary across these two categories.

- ![Holidays versus non-holidays Scatter Plot (PNG)](../visualizations/holiday_vs_non_holiday_sales.png)

In [19]:
import pandas as pd
import plotly.express as px
import plotly.io as pio

# Load the dataset
data = pd.read_csv('../data/raw/Walmart_Store_sales.csv')

# Convert 'Date' to datetime format
data['Date'] = pd.to_datetime(data['Date'], format='%d-%m-%Y')

# Replace 0 and 1 with 'Non-Holiday' and 'Holiday'
data['Holiday_Flag'] = data['Holiday_Flag'].map({0: 'Non-Holiday', 1: 'Holiday'})

# Aggregate the data by month and Holiday_Flag
data['Month'] = data['Date'].dt.to_period('M').astype(str)  # Convert Period to string
monthly_sales = data.groupby(['Month', 'Holiday_Flag'])['Weekly_Sales'].mean().reset_index()

# Create Line Plot
fig = px.line(monthly_sales, x='Month', y='Weekly_Sales', color='Holiday_Flag',
              title='Average Monthly Sales on Holidays vs Non-Holidays',
              labels={'Holiday_Flag': 'Holiday Flag'})
fig.update_layout(yaxis_title='Average Weekly Sales', xaxis_title='Month')

# Show the plot
fig.show()

# Save the image
pio.write_image(fig, '../visualizations/holiday_vs_non_holiday_sales.png')


# Comparison of Sales in first & second half

This Python script calculates and visualizes the average first and second half of the year sales f using a bar plot. It employs pandas for data manipulation and plotly.express for creating the bar plot.

The plot allows for a straightforward comparison of sales performance between these two categories, making it easy to see how sales differ on first half compared to second half of the year.

- ![Comparison of Sales in first & second half (PNG)](../visualizations/first_vs_second_half.png)


In [20]:
import pandas as pd
import plotly.express as px
import plotly.io as pio

# Load the dataset
data = pd.read_csv('../data/raw/Walmart_Store_sales.csv')

# Convert 'Date' to datetime format
data['Date'] = pd.to_datetime(data['Date'], format='%d-%m-%Y')

# Add a column to indicate the half of the year
data['Half'] = data['Date'].dt.month.apply(lambda x: 'First Half' if x <= 6 else 'Second Half')

# Calculate average sales for each half of the year
avg_sales = data.groupby('Half')['Weekly_Sales'].mean().reset_index()
avg_sales.columns = ['Half', 'Average_Weekly_Sales']

# Create Bar Plot
fig = px.bar(avg_sales, x='Half', y='Average_Weekly_Sales',
             title='Comparison of Average Weekly Sales: First Half vs Second Half',
             labels={'Half': 'Year Half'})
fig.update_layout(yaxis_title='Average Weekly Sales')
fig.show()

# Save the image
pio.write_image(fig, '../visualizations/first_vs_second_half.png')


# Correlation between specific holidays and sales (Labor day, Thanksgiving day, Super bowl, Christmas )

This script analyzes Walmart store sales data by visualizing sales on specific holidays (Labor Day, Thanksgiving Day, Super Bowl, and Christmas) compared to non-holidays. It uses pandas for data manipulation and plotly.express for creating interactive scatter plots.

 It produces scatter plots that allow for an analysis of how sales performance varies on these special days.

 - ![Combined Holidays Scatter Plot (PNG)](../visualizations/combined_holiday_sales.png)


In [28]:
import pandas as pd
import plotly.express as px
import plotly.subplots as sp
import plotly.graph_objects as go

# Load the dataset
data = pd.read_csv('../data/raw/Walmart_Store_sales.csv')

# Convert 'Date' to datetime format
data['Date'] = pd.to_datetime(data['Date'], format='%d-%m-%Y')

# Filter data for the date range
start_date = '2010-02-05'
end_date = '2012-10-26'
data = data[(data['Date'] >= start_date) & (data['Date'] <= end_date)]

# Create binary columns for specific holidays
data['Labor_Day'] = data['Date'].apply(lambda x: 1 if (x.month == 9 and x.weekday() == 0) else 0)
data['Thanksgiving'] = data['Date'].apply(lambda x: 1 if (x.month == 11 and x.weekday() == 3) else 0)
data['Super_Bowl'] = data['Date'].apply(lambda x: 1 if (x.month == 2 and x.weekday() == 6) else 0)
data['Christmas'] = data['Date'].apply(lambda x: 1 if (x.month == 12 and x.day == 25) else 0)

# Aggregate data by week for better visibility
data['Week'] = data['Date'].dt.to_period('W').apply(lambda r: r.start_time)
weekly_data = data.groupby(['Week', 'Labor_Day', 'Thanksgiving', 'Super_Bowl', 'Christmas'])['Weekly_Sales'].mean().reset_index()

# Melt the data for plotting
melted_data = weekly_data.melt(id_vars=['Week', 'Weekly_Sales'], value_vars=['Labor_Day', 'Thanksgiving', 'Super_Bowl', 'Christmas'],
                                var_name='Holiday', value_name='Is_Holiday')

# Create subplots for each holiday
fig = sp.make_subplots(rows=2, cols=2, subplot_titles=['Labor Day', 'Thanksgiving', 'Super Bowl', 'Christmas'])

# Plot for each holiday
for i, (holiday, color) in enumerate([('Labor_Day', 'blue'), ('Thanksgiving', 'orange'), ('Super_Bowl', 'red'), ('Christmas', 'green')]):
    row = i // 2 + 1
    col = i % 2 + 1
    holiday_data = melted_data[melted_data['Holiday'] == holiday]
    
    fig.add_trace(
        go.Scatter(x=holiday_data['Week'], y=holiday_data['Weekly_Sales'], mode='markers', marker=dict(color=color), name=holiday),
        row=row, col=col
    )

fig.update_layout(title_text='Sales on Various Holidays vs Non-Holidays', height=800, width=1000)
fig.update_xaxes(title_text='Week')
fig.update_yaxes(title_text='Average Weekly Sales')

# Save the figure as an image file
fig.write_image('../visualizations/combined_holiday_sales.png')

# Show the figure
fig.show()


In [None]:
import pandas as pd
import plotly.express as px
import plotly.subplots as sp
import plotly.graph_objects as go

# Load the dataset
data = pd.read_csv('../data/raw/Walmart_Store_sales.csv')

# Convert 'Date' to datetime format
data['Date'] = pd.to_datetime(data['Date'], format='%d-%m-%Y')

# Filter data for the date range
start_date = '2010-02-05'
end_date = '2012-10-26'
data = data[(data['Date'] >= start_date) & (data['Date'] <= end_date)]

# Create binary columns for specific holidays
data['Labor_Day'] = data['Date'].apply(lambda x: 1 if (x.month == 9 and x.weekday() == 0) else 0)
data['Thanksgiving'] = data['Date'].apply(lambda x: 1 if (x.month == 11 and x.weekday() == 3) else 0)
data['Super_Bowl'] = data['Date'].apply(lambda x: 1 if (x.month == 2 and x.weekday() == 6) else 0)
data['Christmas'] = data['Date'].apply(lambda x: 1 if (x.month == 12 and x.day == 25) else 0)

# Aggregate data by week
data['Week'] = data['Date'].dt.to_period('W').apply(lambda r: r.start_time)
weekly_data = data.groupby(['Week', 'Labor_Day', 'Thanksgiving', 'Super_Bowl', 'Christmas'])['Weekly_Sales'].mean().reset_index()

# Melt the data for plotting
melted_data = weekly_data.melt(id_vars=['Week', 'Weekly_Sales'], value_vars=['Labor_Day', 'Thanksgiving', 'Super_Bowl', 'Christmas'],
                                var_name='Holiday', value_name='Is_Holiday')

# Create subplots for each holiday
fig = sp.make_subplots(rows=2, cols=2, subplot_titles=['Labor Day', 'Thanksgiving', 'Super Bowl', 'Christmas'])

# Plot for each holiday
for i, (holiday, color) in enumerate([('Labor_Day', 'blue'), ('Thanksgiving', 'orange'), ('Super_Bowl', 'red'), ('Christmas', 'green')]):
    row = i // 2 + 1
    col = i % 2 + 1
    holiday_data = melted_data[melted_data['Holiday'] == holiday]
    
    fig.add_trace(
        go.Scatter(x=holiday_data['Week'], y=holiday_data['Weekly_Sales'], mode='markers', marker=dict(color=color), name=holiday),
        row=row, col=col
    )

fig.update_layout(title_text='Sales on Various Holidays vs Non-Holidays', height=800, width=1000)
fig.update_xaxes(title_text='Week')
fig.update_yaxes(title_text='Average Weekly Sales')

# Save the combined figure as an image file
fig.write_image('../visualizations/combined_holiday_sales.png')

# Show the combined figure
fig.show()


# Combining Visualizations into a Single Image

- ![Comparison of Sales in first & second half (PNG)](../visualizations/combined_visualizations.png)

In [47]:
from PIL import Image

# Paths to your images
image_paths = [
    '../visualizations/combined_holiday_sales.png',
    '../visualizations/first_vs_second_half.png',
    '../visualizations/holiday_vs_non_holiday_sales.png',
]

# Load the images
images = [Image.open(img_path) for img_path in image_paths]

# Determine the size of the final image
widths, heights = zip(*(img.size for img in images))
max_width = max(widths)
max_height = max(heights)

# Create a new image with a size that can contain all three images in a 2x2 grid
total_width = max_width * 2
total_height = max_height * 2
combined_image = Image.new('RGB', (total_width, total_height), (255, 255, 255))

# Paste each image into the combined image
for index, img in enumerate(images):
    row = index // 2
    col = index % 2
    x = col * max_width
    y = row * max_height
    combined_image.paste(img, (x, y))

# Save the combined image
combined_image.save('../visualizations/combined_visualizations.png')
combined_image.show()


# Outcomes and summary of the graphs presented:

## Outcomes:
 
 ### 1. Sales on Various Holidays vs Non-Holidays:
 
  - Labor Day: (marked with blue dots) tend to have higher average weekly sales compared to non-holiday weeks.
  - Thanksgiving: (marked with orange dots) also show an increase compared to non-holidays.
  - Super Bowl: (marked with red dots) exhibit higher average weekly sales than non-holidays.
  - Christmas: (marked with green dots) weeks show higher sales compared to non-holiday weeks.
  
  ### 2. Comparison of Average Weekly Sales: 

  First Half vs Second Half:  There is a bar chart comparing the average weekly sales of the first half and the second half of the year, showing similar values for both halves.
  
  ### 3. Average Monthly Sales on Holidays vs Non-Holidays: 
  
  - A line graph compares monthly average sales for holidays (blue line) and non-holidays (red line). 
  - The sales spikes are noticeable during holiday periods, indicating higher sales during months with holidays compared to non-holiday periods. 
  
## Summary: 

  The visualizations collectively indicate that holidays have a significant impact on sales, with notable increases in average weekly and monthly sales during holidays such as Labor Day, Thanksgiving, Super Bowl, and Christmas. The comparison of the first half and the second half of the year shows that average weekly sales are relatively consistent throughout the year. Specifically, holiday periods drive up sales sharply, as illustrated by both the dot plots and the line graph, suggesting that holidays are critical periods for increased consumer spending. This consistent pattern highlights the importance of holidays in boosting sales operations and planning marketing strategies around these times to maximize revenue.