# PO Data Analysis Tool

This notebook provides interactive analysis of Purchase Order pricing data including:
- Monthly price trend visualization
- Weighted average price calculator
- Deviation analysis from average
- Deviation banding and distribution
- Benchmark comparison between date ranges

## Section 1: Setup & Data Loading

In [None]:
# =============================================================================
# SECTION 1: LIBRARY IMPORTS
# =============================================================================
# pandas: Data manipulation and analysis
# matplotlib: Static plotting (used as fallback)
# plotly: Interactive visualizations (primary charting library)
# ipywidgets: Interactive UI widgets (date pickers, dropdowns, buttons)
# =============================================================================

import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import ipywidgets as widgets
from IPython.display import display, HTML
import warnings
warnings.filterwarnings('ignore')

print("Libraries loaded successfully!")

In [None]:
# =============================================================================
# LOAD RAW DATA
# =============================================================================
# skiprows=1: The CSV has an empty first row that needs to be skipped
# The file contains 60+ columns but we only use a subset for analysis
# =============================================================================

df_raw = pd.read_csv('PO Data.csv', skiprows=1)  # Skip the first empty row
print(f"Raw data loaded: {len(df_raw)} rows")
df_raw.head()

In [None]:
# =============================================================================
# DATA CLEANING
# =============================================================================
# This cell performs critical data cleaning steps. PAY ATTENTION to the
# Purchase Price parsing - there's a known issue with the dollar sign.
# =============================================================================

df = df_raw.copy()

# -----------------------------------------------------------------------------
# PURCHASE PRICE PARSING - CRITICAL!
# -----------------------------------------------------------------------------
# The ' Purchase Price ' column (note: has spaces in name!) contains values
# formatted as ' $41.70 ' (with leading/trailing spaces AND a dollar sign).
#
# WHAT DOESN'T WORK:
#   df['Price'] = df[' Purchase Price '].str.replace('$', '', regex=False).astype(float)
#   This fails because regex=False doesn't handle '$' consistently in pandas.
#
# SOLUTION: Use regex=True with escaped dollar sign '\\$'
# The '$' has special meaning in regex (end of string), so we escape it.
# -----------------------------------------------------------------------------
df['Purchase_Price'] = pd.to_numeric(
    df[' Purchase Price '].astype(str).str.replace('\\$', '', regex=True).str.strip(),
    errors='coerce'  # Convert unparseable values to NaN instead of raising error
)

# -----------------------------------------------------------------------------
# PO DATE PARSING
# -----------------------------------------------------------------------------
# Format: M/DD/YY (e.g., '1/29/25' = January 29, 2025)
# IMPORTANT: Always use 'PO Date' for time-based analysis!
# Do NOT use other date fields (XFD, ETA, Delivery dates, etc.)
# -----------------------------------------------------------------------------
df['PO_Date'] = pd.to_datetime(df['PO Date'], format='%m/%d/%y', errors='coerce')

# -----------------------------------------------------------------------------
# ORDERED QUANTITY
# -----------------------------------------------------------------------------
# Column name has apostrophe: "Ordered Q'ty"
# Convert to numeric and fill NaN with 0
# -----------------------------------------------------------------------------
df['Ordered_Qty'] = pd.to_numeric(df["Ordered Q'ty"], errors='coerce').fillna(0).astype(int)

# -----------------------------------------------------------------------------
# FILTER OUT ZERO-QUANTITY ROWS
# -----------------------------------------------------------------------------
# Rows with Ordered Qty = 0 don't contribute to weighted averages
# and would cause division issues, so we exclude them
# -----------------------------------------------------------------------------
df = df[df['Ordered_Qty'] > 0].copy()

# -----------------------------------------------------------------------------
# ADD HELPER COLUMNS
# -----------------------------------------------------------------------------
# PO_Month: For grouping by month (format: '2025-01')
# Total_Value: Price * Qty for weighted average calculations
# Product_Label: Human-readable label for charts (Material ID + Short Name)
# -----------------------------------------------------------------------------
df['PO_Month'] = df['PO_Date'].dt.to_period('M').astype(str)
df['Total_Value'] = df['Purchase_Price'] * df['Ordered_Qty']
df['Product_Label'] = df['Material'].astype(str) + ' - ' + df['Short Name'].fillna('')

# -----------------------------------------------------------------------------
# DISPLAY SUMMARY
# -----------------------------------------------------------------------------
print(f"Cleaned data: {len(df)} rows")
print(f"Date range: {df['PO_Date'].min().strftime('%Y-%m-%d')} to {df['PO_Date'].max().strftime('%Y-%m-%d')}")
print(f"Unique products (Materials): {df['Material'].nunique()}")

df[['PO#', 'Material', 'Short Name', 'Size', 'PO_Date', 'PO_Month', 'Ordered_Qty', 'Purchase_Price', 'Total_Value']].head(10)

## Section 2: Monthly Price Trend Visualization

This section shows the weighted average price trend for each product over time, grouped by PO Date month.

In [None]:
# =============================================================================
# CALCULATE MONTHLY WEIGHTED AVERAGE PRICE PER PRODUCT
# =============================================================================
# Formula: Weighted Avg = SUM(Price * Qty) / SUM(Qty)
#
# This gives more weight to larger orders, which is the correct way to 
# calculate average price when order sizes vary.
#
# Example: If you buy 10 units at $50 and 100 units at $45:
#   Simple average: ($50 + $45) / 2 = $47.50
#   Weighted average: (10*$50 + 100*$45) / 110 = $45.45  <-- This is correct!
# =============================================================================

def calculate_monthly_trends(data):
    """
    Calculate weighted average price per Material per month.
    
    Groups by Material ID (not Short Name) to aggregate all sizes/colors.
    Returns dataframe with one row per product-month combination.
    """
    monthly = data.groupby(['Material', 'PO_Month']).agg({
        'Total_Value': 'sum',      # Sum of (Price * Qty) for weighted avg numerator
        'Ordered_Qty': 'sum',      # Sum of Qty for weighted avg denominator
        'Short Name': 'first',     # Keep product name for display
        'Product_Label': 'first'   # Keep label for chart legends
    }).reset_index()
    
    # Weighted Average = Total Value / Total Quantity
    monthly['Weighted_Avg_Price'] = monthly['Total_Value'] / monthly['Ordered_Qty']
    monthly = monthly.sort_values(['Material', 'PO_Month'])
    
    return monthly

monthly_trends = calculate_monthly_trends(df)
print(f"Monthly trend data points: {len(monthly_trends)}")
monthly_trends.head(10)

In [None]:
# =============================================================================
# INTERACTIVE MONTHLY TREND CHART
# =============================================================================
# This creates a multi-select widget to choose which products to display
# on the trend chart. Hold Ctrl/Cmd to select multiple products.
#
# The chart is a Plotly line chart with markers, allowing hover to see
# exact values.
# =============================================================================

# Build dropdown options: (display_label, material_id) pairs
products = df[['Material', 'Product_Label']].drop_duplicates().sort_values('Product_Label')
product_options = [(row['Product_Label'], row['Material']) for _, row in products.iterrows()]

# Multi-select widget for product selection
product_selector = widgets.SelectMultiple(
    options=product_options,
    value=[product_options[0][1]] if product_options else [],  # Default: first product
    description='Products:',
    disabled=False,
    layout=widgets.Layout(width='600px', height='200px')
)

# Output area for the chart
trend_output = widgets.Output()

def update_trend_chart(change):
    """
    Callback function triggered when product selection changes.
    Filters monthly_trends data and renders a new Plotly line chart.
    """
    with trend_output:
        trend_output.clear_output(wait=True)
        selected_materials = list(product_selector.value)
        
        if not selected_materials:
            print("Please select at least one product.")
            return
        
        # Filter to selected products only
        filtered = monthly_trends[monthly_trends['Material'].isin(selected_materials)]
        
        # Create interactive line chart
        fig = px.line(
            filtered, 
            x='PO_Month', 
            y='Weighted_Avg_Price', 
            color='Product_Label',  # Different color per product
            markers=True,           # Show data points
            title='Monthly Weighted Average Price Trend by Product',
            labels={
                'PO_Month': 'Month',
                'Weighted_Avg_Price': 'Weighted Avg Price ($)',
                'Product_Label': 'Product'
            }
        )
        
        fig.update_layout(
            xaxis_title='PO Date (Month)',
            yaxis_title='Weighted Average Price ($)',
            hovermode='x unified',  # Show all values at same x position
            height=500
        )
        
        fig.show()

# Connect the callback to the widget
product_selector.observe(update_trend_chart, names='value')

# Display the interface
print("Select products to view their price trends (hold Ctrl/Cmd to select multiple):")
display(product_selector)
display(trend_output)

# Trigger initial chart render
update_trend_chart(None)

## Section 3: Weighted Average Price Calculator

Calculate weighted average price for a selected date range, optionally filtered by product.

In [None]:
# =============================================================================
# WEIGHTED AVERAGE CALCULATOR - INTERACTIVE
# =============================================================================
# This section allows you to:
# 1. Select a date range (start and end dates)
# 2. Optionally filter to specific products
# 3. Calculate weighted average price for each product in that range
#
# Output includes: Weighted Avg, Total Quantity, Total FOB Value
# =============================================================================

# Get date boundaries from the data
min_date = df['PO_Date'].min().date()
max_date = df['PO_Date'].max().date()

# Date picker widgets
start_date_picker = widgets.DatePicker(
    description='Start Date:',
    value=min_date,
    disabled=False
)

end_date_picker = widgets.DatePicker(
    description='End Date:',
    value=max_date,
    disabled=False
)

# Product filter - includes "All Products" option
product_filter = widgets.SelectMultiple(
    options=[('All Products', 'ALL')] + product_options,
    value=['ALL'],  # Default to all products
    description='Products:',
    layout=widgets.Layout(width='600px', height='150px')
)

# Button to trigger calculation
calc_button = widgets.Button(description='Calculate', button_style='primary')
calc_output = widgets.Output()

def calculate_weighted_avg(b):
    """
    Calculate and display weighted average prices for selected criteria.
    
    Weighted Average = SUM(Price * Qty) / SUM(Qty)
    
    Groups by Material ID to aggregate all sizes/colors.
    """
    with calc_output:
        calc_output.clear_output(wait=True)
        
        # Convert widget dates to pandas Timestamps for comparison
        start = pd.Timestamp(start_date_picker.value)
        end = pd.Timestamp(end_date_picker.value)
        
        # Filter by date range
        filtered = df[(df['PO_Date'] >= start) & (df['PO_Date'] <= end)].copy()
        
        # Filter by products if specific ones selected (not 'ALL')
        selected_products = list(product_filter.value)
        if 'ALL' not in selected_products:
            filtered = filtered[filtered['Material'].isin(selected_products)]
        
        if len(filtered) == 0:
            print("No data found for the selected criteria.")
            return
        
        # Calculate weighted averages per product
        summary = filtered.groupby(['Material', 'Short Name']).agg({
            'Total_Value': 'sum',   # Numerator for weighted avg
            'Ordered_Qty': 'sum'    # Denominator for weighted avg
        }).reset_index()
        
        summary['Weighted_Avg_Price'] = summary['Total_Value'] / summary['Ordered_Qty']
        summary = summary.rename(columns={
            'Material': 'Material ID',
            'Short Name': 'Product Name',
            'Ordered_Qty': 'Total Quantity',
            'Total_Value': 'Total FOB Value ($)',
            'Weighted_Avg_Price': 'Weighted Avg Price ($)'
        })
        
        summary = summary.sort_values('Product Name')
        
        # Display results
        print(f"\n=== Weighted Average Price Summary ===")
        print(f"Date Range: {start.strftime('%Y-%m-%d')} to {end.strftime('%Y-%m-%d')}")
        print(f"Total Line Items: {len(filtered)}")
        print(f"Products: {len(summary)}\n")
        
        # Format numbers for display
        display_df = summary.copy()
        display_df['Weighted Avg Price ($)'] = display_df['Weighted Avg Price ($)'].apply(lambda x: f'${x:.2f}')
        display_df['Total FOB Value ($)'] = display_df['Total FOB Value ($)'].apply(lambda x: f'${x:,.2f}')
        display_df['Total Quantity'] = display_df['Total Quantity'].apply(lambda x: f'{x:,}')
        
        display(display_df)

# Connect button click to callback
calc_button.on_click(calculate_weighted_avg)

# Display the interface
print("Select date range and products to calculate weighted average prices:")
display(widgets.HBox([start_date_picker, end_date_picker]))
print("\nFilter by products (select 'All Products' or specific products):")
display(product_filter)
display(calc_button)
display(calc_output)

## Section 4: Deviation Analysis

Calculate how individual line items deviate from the weighted average price.

In [None]:
# =============================================================================
# DEVIATION ANALYSIS - INTERACTIVE
# =============================================================================
# This section shows how individual line items deviate from the weighted
# average price for their product.
#
# For each line item, we calculate:
#   Dollar Deviation = Item Price - Weighted Average
#   Percent Deviation = (Dollar Deviation / Weighted Average) * 100
#
# Use the threshold filter to find outliers:
#   - "Above Average" shows items priced HIGHER than average
#   - "Below Average" shows items priced LOWER than average
# =============================================================================

# Date pickers (separate from Section 3 to allow different ranges)
dev_start_picker = widgets.DatePicker(
    description='Start Date:',
    value=min_date,
    disabled=False
)

dev_end_picker = widgets.DatePicker(
    description='End Date:',
    value=max_date,
    disabled=False
)

# Product filter
dev_product_filter = widgets.SelectMultiple(
    options=[('All Products', 'ALL')] + product_options,
    value=['ALL'],
    description='Products:',
    layout=widgets.Layout(width='600px', height='150px')
)

# Threshold filter dropdown
threshold_type = widgets.Dropdown(
    options=[
        ('All Items', 'all'),           # Show everything
        ('Above Average', 'above'),     # Items with positive deviation
        ('Below Average', 'below')      # Items with negative deviation
    ],
    value='all',
    description='Filter:'
)

# Threshold amount (only applies when above/below selected)
threshold_value = widgets.FloatText(
    value=0.0,
    description='Threshold ($):',
    disabled=False
)

dev_button = widgets.Button(description='Analyze Deviations', button_style='primary')
dev_output = widgets.Output()

def analyze_deviations(b):
    """
    Calculate deviation for each line item from its product's weighted average.
    
    Process:
    1. Filter data by date range and products
    2. Calculate weighted average per product (within the date range)
    3. Merge averages back to line items
    4. Calculate dollar and percent deviation
    5. Apply threshold filter if selected
    """
    with dev_output:
        dev_output.clear_output(wait=True)
        
        start = pd.Timestamp(dev_start_picker.value)
        end = pd.Timestamp(dev_end_picker.value)
        
        # Filter by date
        filtered = df[(df['PO_Date'] >= start) & (df['PO_Date'] <= end)].copy()
        
        # Filter by products if not 'ALL'
        selected_products = list(dev_product_filter.value)
        if 'ALL' not in selected_products:
            filtered = filtered[filtered['Material'].isin(selected_products)]
        
        if len(filtered) == 0:
            print("No data found for the selected criteria.")
            return
        
        # Calculate weighted average per product for the date range
        # This is the baseline to compare individual items against
        product_avgs = filtered.groupby('Material').agg({
            'Total_Value': 'sum',
            'Ordered_Qty': 'sum'
        }).reset_index()
        product_avgs['Weighted_Avg'] = product_avgs['Total_Value'] / product_avgs['Ordered_Qty']
        product_avgs = product_avgs[['Material', 'Weighted_Avg']]
        
        # Merge weighted averages back to each line item
        result = filtered.merge(product_avgs, on='Material')
        
        # Calculate deviations
        result['Dollar_Deviation'] = result['Purchase_Price'] - result['Weighted_Avg']
        result['Pct_Deviation'] = (result['Dollar_Deviation'] / result['Weighted_Avg']) * 100
        
        # Apply threshold filter
        if threshold_type.value == 'above':
            # Show items that are AT LEAST threshold_value above average
            result = result[result['Dollar_Deviation'] >= threshold_value.value]
        elif threshold_type.value == 'below':
            # Show items that are AT LEAST threshold_value below average
            result = result[result['Dollar_Deviation'] <= -threshold_value.value]
        
        # Prepare display dataframe
        display_cols = [
            'PO#', 'Material', 'Short Name', 'Size', 'PO_Date', 
            'Purchase_Price', 'Weighted_Avg', 'Dollar_Deviation', 'Pct_Deviation', 'Ordered_Qty'
        ]
        result_display = result[display_cols].copy()
        result_display = result_display.rename(columns={
            'Short Name': 'Product',
            'PO_Date': 'PO Date',
            'Purchase_Price': 'Unit Price',
            'Weighted_Avg': 'Wtd Avg',
            'Dollar_Deviation': 'Dev ($)',
            'Pct_Deviation': 'Dev (%)',
            'Ordered_Qty': 'Qty'
        })
        
        # Sort by deviation (largest positive first)
        result_display = result_display.sort_values('Dev ($)', ascending=False)
        
        # Display results header
        print(f"\n=== Deviation Analysis ===")
        print(f"Date Range: {start.strftime('%Y-%m-%d')} to {end.strftime('%Y-%m-%d')}")
        print(f"Filter: {threshold_type.value}")
        if threshold_type.value != 'all':
            print(f"Threshold: ${threshold_value.value:.2f}")
        print(f"Results: {len(result_display)} line items\n")
        
        # Format for display
        result_display['PO Date'] = result_display['PO Date'].dt.strftime('%Y-%m-%d')
        result_display['Unit Price'] = result_display['Unit Price'].apply(lambda x: f'${x:.2f}')
        result_display['Wtd Avg'] = result_display['Wtd Avg'].apply(lambda x: f'${x:.2f}')
        result_display['Dev ($)'] = result_display['Dev ($)'].apply(
            lambda x: f'+${x:.2f}' if x >= 0 else f'-${abs(x):.2f}'
        )
        result_display['Dev (%)'] = result_display['Dev (%)'].apply(
            lambda x: f'+{x:.2f}%' if x >= 0 else f'{x:.2f}%'
        )
        
        # Show first 100 rows (for performance)
        display(result_display.head(100))
        
        if len(result_display) > 100:
            print(f"\n... showing first 100 of {len(result_display)} results")

dev_button.on_click(analyze_deviations)

# Display the interface
print("Analyze how individual line items deviate from the weighted average price:")
display(widgets.HBox([dev_start_picker, dev_end_picker]))
print("\nFilter by products:")
display(dev_product_filter)
print("\nDeviation threshold filter:")
display(widgets.HBox([threshold_type, threshold_value]))
display(dev_button)
display(dev_output)

## Section 5: Deviation Banding

Categorize items into $1 deviation bands and visualize the distribution.

In [None]:
# =============================================================================
# DEVIATION BANDING - HISTOGRAM AND PIE CHART
# =============================================================================
# This section categorizes items into $1 deviation bands and visualizes
# the distribution using:
#   1. Histogram: Count of items in each band (red=below avg, green=above)
#   2. Pie Chart: Percentage distribution across bands
#
# Bands range from "$5+ below" to "$5+ above" in $1 increments.
# This helps identify if pricing is mostly consistent or has many outliers.
# =============================================================================

# Date pickers
band_start_picker = widgets.DatePicker(
    description='Start Date:',
    value=min_date,
    disabled=False
)

band_end_picker = widgets.DatePicker(
    description='End Date:',
    value=max_date,
    disabled=False
)

# Product filter
band_product_filter = widgets.SelectMultiple(
    options=[('All Products', 'ALL')] + product_options,
    value=['ALL'],
    description='Products:',
    layout=widgets.Layout(width='600px', height='150px')
)

band_button = widgets.Button(description='Generate Banding Charts', button_style='primary')
band_output = widgets.Output()

def assign_band(deviation):
    """
    Assign a dollar deviation value to a named band.
    
    Bands are $1 wide, from "$5+ below" to "$5+ above".
    
    Examples:
        deviation = -6.50  -> "$5+ below"
        deviation = -2.30  -> "$2 to $3 below"
        deviation = 0.50   -> "$0 to $1 above"
        deviation = 3.99   -> "$3 to $4 above"
    """
    if deviation <= -5:
        return '$5+ below'
    elif deviation <= -4:
        return '$4 to $5 below'
    elif deviation <= -3:
        return '$3 to $4 below'
    elif deviation <= -2:
        return '$2 to $3 below'
    elif deviation <= -1:
        return '$1 to $2 below'
    elif deviation < 0:
        return '$0 to $1 below'
    elif deviation < 1:
        return '$0 to $1 above'
    elif deviation < 2:
        return '$1 to $2 above'
    elif deviation < 3:
        return '$2 to $3 above'
    elif deviation < 4:
        return '$3 to $4 above'
    elif deviation < 5:
        return '$4 to $5 above'
    else:
        return '$5+ above'

# Define band order for consistent chart display (worst to best)
band_order = [
    '$5+ below', '$4 to $5 below', '$3 to $4 below', '$2 to $3 below',
    '$1 to $2 below', '$0 to $1 below', '$0 to $1 above', '$1 to $2 above',
    '$2 to $3 above', '$3 to $4 above', '$4 to $5 above', '$5+ above'
]

def generate_banding(b):
    """
    Generate histogram and pie chart showing deviation band distribution.
    
    Process:
    1. Calculate deviations for each line item
    2. Assign each to a band
    3. Count items per band
    4. Generate visualizations
    """
    with band_output:
        band_output.clear_output(wait=True)
        
        start = pd.Timestamp(band_start_picker.value)
        end = pd.Timestamp(band_end_picker.value)
        
        # Filter by date
        filtered = df[(df['PO_Date'] >= start) & (df['PO_Date'] <= end)].copy()
        
        # Filter by products if not 'ALL'
        selected_products = list(band_product_filter.value)
        if 'ALL' not in selected_products:
            filtered = filtered[filtered['Material'].isin(selected_products)]
        
        if len(filtered) == 0:
            print("No data found for the selected criteria.")
            return
        
        # Calculate weighted average per product
        product_avgs = filtered.groupby('Material').agg({
            'Total_Value': 'sum',
            'Ordered_Qty': 'sum'
        }).reset_index()
        product_avgs['Weighted_Avg'] = product_avgs['Total_Value'] / product_avgs['Ordered_Qty']
        
        # Calculate deviations
        result = filtered.merge(product_avgs[['Material', 'Weighted_Avg']], on='Material')
        result['Dollar_Deviation'] = result['Purchase_Price'] - result['Weighted_Avg']
        
        # Assign each item to a band
        result['Band'] = result['Dollar_Deviation'].apply(assign_band)
        
        # Count items per band (reindex to ensure all bands appear even if empty)
        band_counts = result['Band'].value_counts().reindex(band_order, fill_value=0).reset_index()
        band_counts.columns = ['Band', 'Count']
        
        print(f"\n=== Deviation Banding Analysis ===")
        print(f"Date Range: {start.strftime('%Y-%m-%d')} to {end.strftime('%Y-%m-%d')}")
        print(f"Total Line Items: {len(result)}\n")
        
        # ---------------------------------------------------------------------
        # HISTOGRAM
        # ---------------------------------------------------------------------
        # Red bars for "below average" bands, green for "above average"
        colors = ['#d62728'] * 6 + ['#2ca02c'] * 6  # 6 below, 6 above
        
        fig_hist = go.Figure(data=[
            go.Bar(
                x=band_counts['Band'],
                y=band_counts['Count'],
                marker_color=colors,
                text=band_counts['Count'],   # Show count on each bar
                textposition='outside'
            )
        ])
        
        fig_hist.update_layout(
            title='Deviation Banding Distribution (Histogram)',
            xaxis_title='Deviation Band',
            yaxis_title='Number of Line Items',
            xaxis_tickangle=-45,  # Rotate labels for readability
            height=500
        )
        
        fig_hist.show()
        
        # ---------------------------------------------------------------------
        # PIE CHART
        # ---------------------------------------------------------------------
        # Only show bands that have items (filter out zero-count bands)
        fig_pie = px.pie(
            band_counts[band_counts['Count'] > 0], 
            values='Count', 
            names='Band',
            title='Deviation Distribution (Pie Chart)',
            category_orders={'Band': band_order}  # Maintain consistent order
        )
        
        fig_pie.update_layout(height=500)
        fig_pie.show()
        
        # Summary table
        print("\nBand Summary:")
        band_counts['Percentage'] = (band_counts['Count'] / band_counts['Count'].sum() * 100).round(1).astype(str) + '%'
        display(band_counts)

band_button.on_click(generate_banding)

# Display the interface
print("Categorize items into deviation bands and visualize distribution:")
display(widgets.HBox([band_start_picker, band_end_picker]))
print("\nFilter by products:")
display(band_product_filter)
display(band_button)
display(band_output)

## Section 6: Benchmark Comparison

Compare pricing between two different date ranges to identify price changes.

In [None]:
# =============================================================================
# BENCHMARK COMPARISON - COMPARE TWO DATE RANGES
# =============================================================================
# This section allows you to compare pricing between two time periods:
#   - Baseline Period: Your reference/historical period
#   - Comparison Period: The period you want to compare against baseline
#
# For each product, it calculates:
#   - Baseline weighted average price
#   - Comparison weighted average price
#   - Dollar change (Comparison - Baseline)
#   - Percent change ((Comparison - Baseline) / Baseline * 100)
#
# Use this to identify products with significant price changes over time.
# =============================================================================

# Baseline date range
print("=== Baseline Date Range ===")
baseline_start = widgets.DatePicker(
    description='Start:',
    value=min_date,
    disabled=False
)

baseline_end = widgets.DatePicker(
    description='End:',
    value=max_date,
    disabled=False
)

# Comparison date range
print("=== Comparison Date Range ===")
compare_start = widgets.DatePicker(
    description='Start:',
    value=min_date,
    disabled=False
)

compare_end = widgets.DatePicker(
    description='End:',
    value=max_date,
    disabled=False
)

# Product filter
benchmark_product_filter = widgets.SelectMultiple(
    options=[('All Products', 'ALL')] + product_options,
    value=['ALL'],
    description='Products:',
    layout=widgets.Layout(width='600px', height='150px')
)

benchmark_button = widgets.Button(description='Compare Ranges', button_style='primary')
benchmark_output = widgets.Output()

def compare_benchmarks(b):
    """
    Compare weighted average prices between two date ranges.
    
    Process:
    1. Calculate weighted averages for baseline period
    2. Calculate weighted averages for comparison period
    3. Merge results (outer join to catch products in only one period)
    4. Calculate dollar and percent change
    5. Display comparison table with summary statistics
    """
    with benchmark_output:
        benchmark_output.clear_output(wait=True)
        
        # Get date ranges from widgets
        b_start = pd.Timestamp(baseline_start.value)
        b_end = pd.Timestamp(baseline_end.value)
        c_start = pd.Timestamp(compare_start.value)
        c_end = pd.Timestamp(compare_end.value)
        
        # Filter baseline data
        baseline = df[(df['PO_Date'] >= b_start) & (df['PO_Date'] <= b_end)].copy()
        
        # Filter comparison data
        comparison = df[(df['PO_Date'] >= c_start) & (df['PO_Date'] <= c_end)].copy()
        
        # Filter by products if not 'ALL'
        selected_products = list(benchmark_product_filter.value)
        if 'ALL' not in selected_products:
            baseline = baseline[baseline['Material'].isin(selected_products)]
            comparison = comparison[comparison['Material'].isin(selected_products)]
        
        if len(baseline) == 0 or len(comparison) == 0:
            print("Insufficient data in one or both date ranges.")
            return
        
        # Calculate weighted averages for BASELINE period
        baseline_avg = baseline.groupby(['Material', 'Short Name']).agg({
            'Total_Value': 'sum',
            'Ordered_Qty': 'sum'
        }).reset_index()
        baseline_avg['Baseline_Avg'] = baseline_avg['Total_Value'] / baseline_avg['Ordered_Qty']
        baseline_avg['Baseline_Qty'] = baseline_avg['Ordered_Qty']
        baseline_avg = baseline_avg[['Material', 'Short Name', 'Baseline_Avg', 'Baseline_Qty']]
        
        # Calculate weighted averages for COMPARISON period
        compare_avg = comparison.groupby(['Material', 'Short Name']).agg({
            'Total_Value': 'sum',
            'Ordered_Qty': 'sum'
        }).reset_index()
        compare_avg['Compare_Avg'] = compare_avg['Total_Value'] / compare_avg['Ordered_Qty']
        compare_avg['Compare_Qty'] = compare_avg['Ordered_Qty']
        compare_avg = compare_avg[['Material', 'Short Name', 'Compare_Avg', 'Compare_Qty']]
        
        # Merge results (outer join to include products in only one period)
        result = baseline_avg.merge(compare_avg, on=['Material', 'Short Name'], how='outer')
        
        # Calculate changes
        result['Change_Dollar'] = result['Compare_Avg'] - result['Baseline_Avg']
        result['Change_Pct'] = (result['Change_Dollar'] / result['Baseline_Avg']) * 100
        
        # Sort by absolute dollar change (largest changes first)
        result = result.sort_values('Change_Dollar', ascending=False, key=abs)
        
        # Display header
        print(f"\n=== Benchmark Comparison ===")
        print(f"Baseline: {b_start.strftime('%Y-%m-%d')} to {b_end.strftime('%Y-%m-%d')}")
        print(f"Comparison: {c_start.strftime('%Y-%m-%d')} to {c_end.strftime('%Y-%m-%d')}")
        print(f"Products Compared: {len(result)}\n")
        
        # Format for display
        display_df = result.copy()
        display_df = display_df.rename(columns={
            'Short Name': 'Product',
            'Baseline_Avg': 'Baseline Avg',
            'Compare_Avg': 'Comparison Avg',
            'Change_Dollar': 'Change ($)',
            'Change_Pct': 'Change (%)',
            'Baseline_Qty': 'Baseline Qty',
            'Compare_Qty': 'Compare Qty'
        })
        
        # Format numbers (handle NaN for products only in one period)
        display_df['Baseline Avg'] = display_df['Baseline Avg'].apply(
            lambda x: f'${x:.2f}' if pd.notna(x) else 'N/A'
        )
        display_df['Comparison Avg'] = display_df['Comparison Avg'].apply(
            lambda x: f'${x:.2f}' if pd.notna(x) else 'N/A'
        )
        display_df['Change ($)'] = display_df['Change ($)'].apply(
            lambda x: f'+${x:.2f}' if pd.notna(x) and x >= 0 else (f'-${abs(x):.2f}' if pd.notna(x) else 'N/A')
        )
        display_df['Change (%)'] = display_df['Change (%)'].apply(
            lambda x: f'+{x:.2f}%' if pd.notna(x) and x >= 0 else (f'{x:.2f}%' if pd.notna(x) else 'N/A')
        )
        display_df['Baseline Qty'] = display_df['Baseline Qty'].apply(
            lambda x: f'{int(x):,}' if pd.notna(x) else 'N/A'
        )
        display_df['Compare Qty'] = display_df['Compare Qty'].apply(
            lambda x: f'{int(x):,}' if pd.notna(x) else 'N/A'
        )
        
        display(display_df[['Material', 'Product', 'Baseline Avg', 'Comparison Avg', 
                           'Change ($)', 'Change (%)', 'Baseline Qty', 'Compare Qty']])
        
        # Summary statistics
        valid_changes = result.dropna(subset=['Change_Dollar'])
        if len(valid_changes) > 0:
            increased = len(valid_changes[valid_changes['Change_Dollar'] > 0])
            decreased = len(valid_changes[valid_changes['Change_Dollar'] < 0])
            unchanged = len(valid_changes[valid_changes['Change_Dollar'] == 0])
            avg_change = valid_changes['Change_Dollar'].mean()
            
            print(f"\nSummary:")
            print(f"  Products with price increase: {increased}")
            print(f"  Products with price decrease: {decreased}")
            print(f"  Products unchanged: {unchanged}")
            print(f"  Average price change: ${avg_change:+.2f}")

benchmark_button.on_click(compare_benchmarks)

# Display the interface
print("Compare pricing between two date ranges:\n")
print("Baseline Period:")
display(widgets.HBox([baseline_start, baseline_end]))
print("\nComparison Period:")
display(widgets.HBox([compare_start, compare_end]))
print("\nFilter by products:")
display(benchmark_product_filter)
display(benchmark_button)
display(benchmark_output)

---

## Quick Reference

| Section | Purpose |
|---------|----------|
| Section 2 | View monthly price trends per product |
| Section 3 | Calculate weighted average prices for date range |
| Section 4 | Analyze line item deviations from average |
| Section 5 | Visualize deviation distribution with banding |
| Section 6 | Compare prices between two periods |