# PO Data Analysis Tool

This notebook provides interactive analysis of Purchase Order pricing data including:
- Monthly price trend visualization
- Weighted average price calculator
- Deviation analysis from average
- Deviation banding and distribution
- Benchmark comparison between date ranges

## Section 1: Setup & Data Loading

In [1]:
# =============================================================================
# SECTION 1: LIBRARY IMPORTS
# =============================================================================
# pandas: Data manipulation and analysis
# matplotlib: Static plotting (used as fallback)
# plotly: Interactive visualizations (primary charting library)
# ipywidgets: Interactive UI widgets (date pickers, dropdowns, buttons)
# =============================================================================

import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import ipywidgets as widgets
from IPython.display import display, HTML
import warnings
warnings.filterwarnings('ignore')

print("Libraries loaded successfully!")

Libraries loaded successfully!


In [2]:
# =============================================================================
# LOAD RAW DATA
# =============================================================================
# skiprows=1: The CSV has an empty first row that needs to be skipped
# The file contains 60+ columns but we only use a subset for analysis
# =============================================================================

df_raw = pd.read_csv('PO Data.csv', skiprows=1)  # Skip the first empty row
print(f"Raw data loaded: {len(df_raw)} rows")
df_raw.head()

Raw data loaded: 3230 rows


Unnamed: 0,PO#,PO Item Number,Material,B-grade (Y/N),Short Name,Color,Gender,Width,Size,UPC,...,Plant,Destination,Item Category,Acount Assignment,ExtFOB,ExtLanded,WSP,ExtWhlse,MPS Kit Material,MPS Kit Quantity
0,4500175676,10,10027887,Production,MNS FR TWL DRST WRK ST WHT MULT,Multi 1,Men,,SML,889360000000.0,...,DFW1,DFW1,0,,0,0,87.75,0,,0
1,4500175676,20,10027887,Production,MNS FR TWL DRST WRK ST WHT MULT,Multi 1,Men,,MED,889360000000.0,...,DFW1,DFW1,0,,0,0,87.75,0,,0
2,4500175676,30,10027887,Production,MNS FR TWL DRST WRK ST WHT MULT,Multi 1,Men,,LG,889360000000.0,...,DFW1,DFW1,0,,0,0,87.75,0,,0
3,4500175676,40,10027887,Production,MNS FR TWL DRST WRK ST WHT MULT,Multi 1,Men,,XL,889360000000.0,...,DFW1,DFW1,0,,0,0,87.75,0,,0
4,4500175676,50,10027887,Production,MNS FR TWL DRST WRK ST WHT MULT,Multi 1,Men,,XXL,889360000000.0,...,DFW1,DFW1,0,,0,0,87.75,0,,0


In [None]:
# =============================================================================
# DATA CLEANING AND PO-LEVEL AGGREGATION
# =============================================================================
# This cell performs critical data cleaning steps. PAY ATTENTION to the
# Purchase Price parsing - there's a known issue with the dollar sign.
#
# IMPORTANT: We aggregate rows at the PO# + Material level because multiple
# rows with the same PO# just represent different SIZES of the same order.
# The price is the same across sizes, so we sum the quantities.
# =============================================================================

df = df_raw.copy()

# -----------------------------------------------------------------------------
# PURCHASE PRICE PARSING - CRITICAL!
# -----------------------------------------------------------------------------
# The ' Purchase Price ' column (note: has spaces in name!) contains values
# formatted as ' $41.70 ' (with leading/trailing spaces AND a dollar sign).
#
# WHAT DOESN'T WORK:
#   df['Price'] = df[' Purchase Price '].str.replace('$', '', regex=False).astype(float)
#   This fails because regex=False doesn't handle '$' consistently in pandas.
#
# SOLUTION: Use regex=True with escaped dollar sign '\\$'
# The '$' has special meaning in regex (end of string), so we escape it.
# -----------------------------------------------------------------------------
df['Purchase_Price'] = pd.to_numeric(
    df[' Purchase Price '].astype(str).str.replace('\\$', '', regex=True).str.strip(),
    errors='coerce'  # Convert unparseable values to NaN instead of raising error
)

# -----------------------------------------------------------------------------
# PO DATE PARSING
# -----------------------------------------------------------------------------
# Format: M/DD/YY (e.g., '1/29/25' = January 29, 2025)
# IMPORTANT: Always use 'PO Date' for time-based analysis!
# Do NOT use other date fields (XFD, ETA, Delivery dates, etc.)
# -----------------------------------------------------------------------------
df['PO_Date'] = pd.to_datetime(df['PO Date'], format='%m/%d/%y', errors='coerce')

# -----------------------------------------------------------------------------
# ORDERED QUANTITY
# -----------------------------------------------------------------------------
# Column name has apostrophe: "Ordered Q'ty"
# Convert to numeric and fill NaN with 0
# -----------------------------------------------------------------------------
df['Ordered_Qty'] = pd.to_numeric(df["Ordered Q'ty"], errors='coerce').fillna(0).astype(int)

# -----------------------------------------------------------------------------
# FILTER OUT ZERO-QUANTITY ROWS
# -----------------------------------------------------------------------------
# Rows with Ordered Qty = 0 don't contribute to weighted averages
# and would cause division issues, so we exclude them
# -----------------------------------------------------------------------------
df = df[df['Ordered_Qty'] > 0].copy()

print(f"After initial cleaning: {len(df)} rows (before PO aggregation)")

# -----------------------------------------------------------------------------
# PO-LEVEL AGGREGATION - COMBINE SIZE VARIATIONS
# -----------------------------------------------------------------------------
# Multiple rows with the same PO# + Material represent different SIZES of the
# same order. Since sizes are irrelevant for price analysis (same price per unit),
# we aggregate by summing quantities while keeping the price.
#
# Category columns preserved for filtering (from CSV):
#   - Material Category Description
#   - Material Line Description
#   - Material Type Description (actually "Material Product Type Description")
#   - Material Class Description
#   - Construction Type Description
# -----------------------------------------------------------------------------
df = df.groupby(['PO#', 'Material', 'Short Name', 'PO_Date', 'Purchase_Price']).agg({
    'Ordered_Qty': 'sum',                              # Sum quantities across all sizes
    'Vendor Name': 'first',                            # Keep vendor info
    'Color': 'first',                                  # Keep color info
    'Material Category Description': 'first',          # Category for filtering
    'Material Line Description': 'first',              # Line for filtering
    'Material Product Type Description': 'first',      # Type for filtering
    'Material Class Description': 'first',             # Class for filtering
    'Construction Type Description': 'first'           # Construction type for filtering
}).reset_index()

print(f"After PO aggregation: {len(df)} rows (combined sizes within same PO)")

# -----------------------------------------------------------------------------
# ADD HELPER COLUMNS
# -----------------------------------------------------------------------------
# PO_Month: For grouping by month (format: '2025-01')
# Total_Value: Price * Qty for weighted average calculations
# Product_Label: Human-readable label for charts (Material ID + Short Name)
# -----------------------------------------------------------------------------
df['PO_Month'] = df['PO_Date'].dt.to_period('M').astype(str)
df['Total_Value'] = df['Purchase_Price'] * df['Ordered_Qty']
df['Product_Label'] = df['Material'].astype(str) + ' - ' + df['Short Name'].fillna('')

# -----------------------------------------------------------------------------
# CREATE PRODUCT-LEVEL LOOKUP FOR FILTERING
# -----------------------------------------------------------------------------
# This maps each Material to its category attributes (for filter dropdowns)
# Using the exact columns from the CSV file
# -----------------------------------------------------------------------------
product_categories = df.groupby('Material').agg({
    'Short Name': 'first',
    'Product_Label': 'first',
    'Material Category Description': 'first',
    'Material Line Description': 'first',
    'Material Product Type Description': 'first',
    'Material Class Description': 'first',
    'Construction Type Description': 'first'
}).reset_index()

print(f"\nDate range: {df['PO_Date'].min().strftime('%Y-%m-%d')} to {df['PO_Date'].max().strftime('%Y-%m-%d')}")
print(f"Unique POs: {df['PO#'].nunique()}")
print(f"Unique products (Materials): {df['Material'].nunique()}")

# Show category breakdowns for filtering (handle NaN values by filtering them out)
print(f"\nFilter categories available:")
mat_categories = [v for v in df['Material Category Description'].unique() if pd.notna(v)]
mat_lines = [v for v in df['Material Line Description'].unique() if pd.notna(v)]
mat_types = [v for v in df['Material Product Type Description'].unique() if pd.notna(v)]
mat_classes = [v for v in df['Material Class Description'].unique() if pd.notna(v)]
construction = [v for v in df['Construction Type Description'].unique() if pd.notna(v)]

print(f"  Material Category: {len(mat_categories)} ({', '.join(mat_categories[:3])}{'...' if len(mat_categories) > 3 else ''})")
print(f"  Material Line: {len(mat_lines)} ({', '.join(mat_lines[:3])}{'...' if len(mat_lines) > 3 else ''})")
print(f"  Material Type: {len(mat_types)} ({', '.join(mat_types[:3])}{'...' if len(mat_types) > 3 else ''})")
print(f"  Material Class: {len(mat_classes)} ({', '.join(mat_classes[:3])}{'...' if len(mat_classes) > 3 else ''})")
print(f"  Construction Type: {len(construction)} ({', '.join(construction[:3]) if construction else 'None'}{'...' if len(construction) > 3 else ''})")

print("\nSample data (note: Ordered_Qty is now aggregated across sizes):")
df[['PO#', 'Material', 'Short Name', 'PO_Date', 'PO_Month', 'Ordered_Qty', 'Purchase_Price', 'Total_Value']].head(10)

## Section 2: Monthly Price Trend Visualization

This section shows the weighted average price trend for each product over time, grouped by PO Date month.

In [4]:
# =============================================================================
# CALCULATE MONTHLY WEIGHTED AVERAGE PRICE PER PRODUCT
# =============================================================================
# Formula: Weighted Avg = SUM(Price * Qty) / SUM(Qty)
#
# This gives more weight to larger orders, which is the correct way to 
# calculate average price when order sizes vary.
#
# Example: If you buy 10 units at $50 and 100 units at $45:
#   Simple average: ($50 + $45) / 2 = $47.50
#   Weighted average: (10*$50 + 100*$45) / 110 = $45.45  <-- This is correct!
# =============================================================================

def calculate_monthly_trends(data):
    """
    Calculate weighted average price per Material per month.
    
    Groups by Material ID (not Short Name) to aggregate all sizes/colors.
    Returns dataframe with one row per product-month combination.
    """
    monthly = data.groupby(['Material', 'PO_Month']).agg({
        'Total_Value': 'sum',      # Sum of (Price * Qty) for weighted avg numerator
        'Ordered_Qty': 'sum',      # Sum of Qty for weighted avg denominator
        'Short Name': 'first',     # Keep product name for display
        'Product_Label': 'first'   # Keep label for chart legends
    }).reset_index()
    
    # Weighted Average = Total Value / Total Quantity
    monthly['Weighted_Avg_Price'] = monthly['Total_Value'] / monthly['Ordered_Qty']
    monthly = monthly.sort_values(['Material', 'PO_Month'])
    
    return monthly

monthly_trends = calculate_monthly_trends(df)
print(f"Monthly trend data points: {len(monthly_trends)}")
monthly_trends.head(10)

Monthly trend data points: 256


Unnamed: 0,Material,PO_Month,Total_Value,Ordered_Qty,Short Name,Product_Label,Weighted_Avg_Price
0,10012250,2025-01,67403.34,1206,MNS FR BASIC LS WRK SHRT BLD BLUE STRP,10012250 - MNS FR BASIC LS WRK SHRT BLD BLUE STRP,55.89
1,10012250,2025-02,33166.49,593,MNS FR BASIC LS WRK SHRT BLD BLUE STRP,10012250 - MNS FR BASIC LS WRK SHRT BLD BLUE STRP,55.93
2,10012250,2025-03,33340.24,596,MNS FR BASIC LS WRK SHRT BLD BLUE STRP,10012250 - MNS FR BASIC LS WRK SHRT BLD BLUE STRP,55.94
3,10012250,2025-04,33423.68,596,MNS FR BASIC LS WRK SHRT BLD BLUE STRP,10012250 - MNS FR BASIC LS WRK SHRT BLD BLUE STRP,56.08
4,10012250,2025-06,61431.78,1191,MNS FR BASIC LS WRK SHRT BLD BLUE STRP,10012250 - MNS FR BASIC LS WRK SHRT BLD BLUE STRP,51.58
5,10012250,2025-07,30584.94,599,MNS FR BASIC LS WRK SHRT BLD BLUE STRP,10012250 - MNS FR BASIC LS WRK SHRT BLD BLUE STRP,51.06
6,10012250,2025-08,58821.12,1152,MNS FR BASIC LS WRK SHRT BLD BLUE STRP,10012250 - MNS FR BASIC LS WRK SHRT BLD BLUE STRP,51.06
7,10012250,2025-10,29410.56,576,MNS FR BASIC LS WRK SHRT BLD BLUE STRP,10012250 - MNS FR BASIC LS WRK SHRT BLD BLUE STRP,51.06
8,10012250,2025-11,29410.56,576,MNS FR BASIC LS WRK SHRT BLD BLUE STRP,10012250 - MNS FR BASIC LS WRK SHRT BLD BLUE STRP,51.06
9,10012250,2025-12,29410.56,576,MNS FR BASIC LS WRK SHRT BLD BLUE STRP,10012250 - MNS FR BASIC LS WRK SHRT BLD BLUE STRP,51.06


In [None]:
# =============================================================================
# INTERACTIVE MONTHLY TREND CHART
# =============================================================================
# This creates a multi-select widget to choose which products to display
# on the trend chart. Hold Ctrl/Cmd to select multiple products.
#
# BY DEFAULT: All products are shown. User can deselect to focus on specific ones.
#
# The chart is a Plotly line chart with markers, allowing hover to see
# exact values. Each product gets a distinct color.
#
# LAYOUT: Chart is TALL (900px) to be readable with many products.
# Legend is placed BELOW the chart in a scrollable area to maximize chart space.
# =============================================================================

# Build dropdown options: (display_label, material_id) pairs
products = df[['Material', 'Product_Label']].drop_duplicates().sort_values('Product_Label')
product_options = [(row['Product_Label'], row['Material']) for _, row in products.iterrows()]

# Get all material IDs for default selection (show ALL products initially)
all_material_ids = [mat_id for _, mat_id in product_options]

# Multi-select widget for product selection - DEFAULT TO ALL SELECTED
product_selector = widgets.SelectMultiple(
    options=product_options,
    value=all_material_ids,  # Default: ALL products selected
    description='Products:',
    disabled=False,
    layout=widgets.Layout(width='600px', height='250px')
)

# Buttons for quick selection
select_all_btn = widgets.Button(description='Select All', button_style='info')
clear_all_btn = widgets.Button(description='Clear All', button_style='warning')

def select_all(b):
    product_selector.value = all_material_ids
    
def clear_all(b):
    product_selector.value = []

select_all_btn.on_click(select_all)
clear_all_btn.on_click(clear_all)

# Output area for the chart
trend_output = widgets.Output()

def update_trend_chart(change):
    """
    Callback function triggered when product selection changes.
    Filters monthly_trends data and renders a new Plotly line chart.
    Each product gets a distinct color from Plotly's color palette.
    
    CHART HEIGHT: 900px for better readability with many products.
    LEGEND: Placed below chart to maximize chart area.
    """
    with trend_output:
        trend_output.clear_output(wait=True)
        selected_materials = list(product_selector.value)
        
        if not selected_materials:
            print("Please select at least one product (or click 'Select All').")
            return
        
        # Filter to selected products only
        filtered = monthly_trends[monthly_trends['Material'].isin(selected_materials)]
        
        # Create interactive line chart with distinct colors per product
        fig = px.line(
            filtered, 
            x='PO_Month', 
            y='Weighted_Avg_Price', 
            color='Product_Label',  # Different color per product
            markers=True,           # Show data points
            title=f'Monthly Weighted Average Price Trend ({len(selected_materials)} products)',
            labels={
                'PO_Month': 'Month',
                'Weighted_Avg_Price': 'Weighted Avg Price ($)',
                'Product_Label': 'Product'
            },
            color_discrete_sequence=px.colors.qualitative.Dark24  # Use 24-color palette for many products
        )
        
        # Dynamic height: taller when more products selected
        num_products = len(selected_materials)
        chart_height = max(900, 600 + num_products * 5)  # Minimum 900px, scales with products
        
        fig.update_layout(
            xaxis_title='PO Date (Month)',
            yaxis_title='Weighted Average Price ($)',
            hovermode='x unified',  # Show all values at same x position
            height=chart_height,    # Much taller chart for readability
            # Legend at the bottom, horizontal, with scroll if needed
            legend=dict(
                orientation="h",        # Horizontal legend
                yanchor="top",
                y=-0.15,               # Position below chart
                xanchor="center",
                x=0.5,
                font=dict(size=9),
                itemsizing='constant',
                traceorder='normal'
            ),
            # Margins to accommodate legend below
            margin=dict(
                t=60,    # Top margin for title
                b=200,   # Bottom margin for legend (increased for many items)
                l=60,
                r=40
            )
        )
        
        # Make lines clickable to toggle visibility
        fig.update_traces(mode='lines+markers')
        
        fig.show()

# Connect the callback to the widget
product_selector.observe(update_trend_chart, names='value')

# Display the interface
print("Select products to view their price trends:")
print("(Hold Ctrl/Cmd to select multiple, or use buttons below)")
print("TIP: Click on legend items to show/hide individual products")
display(widgets.HBox([select_all_btn, clear_all_btn]))
display(product_selector)
display(trend_output)

# Trigger initial chart render with ALL products
update_trend_chart(None)

In [6]:
# =============================================================================
# MONTHLY PRICE TREND DATA TABLE
# =============================================================================
# This table shows the raw data behind the chart above:
# - Each row = one Product + Month combination
# - Weighted Avg Price = SUM(Price * Qty) / SUM(Qty) for that month
# - Total Qty = total units ordered that month
# =============================================================================

# Create a pivot table view: Products as rows, Months as columns
pivot_price = monthly_trends.pivot_table(
    index=['Material', 'Short Name'],
    columns='PO_Month',
    values='Weighted_Avg_Price',
    aggfunc='first'
).round(2)

# Format prices with $ sign
pivot_price_display = pivot_price.applymap(lambda x: f'${x:.2f}' if pd.notna(x) else '-')

print("=== Monthly Weighted Average Price by Product ===")
print(f"(Rows: {len(pivot_price)} products, Columns: {len(pivot_price.columns)} months)\n")
display(pivot_price_display)

# Also show a detailed table sorted by product and month
print("\n\n=== Detailed Monthly Trend Data ===")
detail_table = monthly_trends[['Material', 'Short Name', 'PO_Month', 'Weighted_Avg_Price', 'Ordered_Qty']].copy()
detail_table = detail_table.sort_values(['Short Name', 'PO_Month'])
detail_table['Weighted_Avg_Price'] = detail_table['Weighted_Avg_Price'].apply(lambda x: f'${x:.2f}')
detail_table['Ordered_Qty'] = detail_table['Ordered_Qty'].apply(lambda x: f'{x:,}')
detail_table = detail_table.rename(columns={
    'Short Name': 'Product',
    'PO_Month': 'Month',
    'Weighted_Avg_Price': 'Wtd Avg Price',
    'Ordered_Qty': 'Total Qty'
})
display(detail_table)

=== Monthly Weighted Average Price by Product ===
(Rows: 74 products, Columns: 13 months)



Unnamed: 0_level_0,PO_Month,2025-01,2025-02,2025-03,2025-04,2025-05,2025-06,2025-07,2025-08,2025-09,2025-10,2025-11,2025-12,2026-01
Material,Short Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
10012250,MNS FR BASIC LS WRK SHRT BLD BLUE STRP,$55.89,$55.93,$55.94,$56.08,-,$51.58,$51.06,$51.06,-,$51.06,$51.06,$51.06,-
10012251,MNS FR SOLID LS WRK SHRT KHAKI,$54.80,$54.85,$55.19,-,-,$50.36,$50.74,$50.74,-,$50.74,$50.74,$50.74,-
10012253,MNS FR SOLID LS WRK SHRT SILVER FOX,$54.64,$54.22,$54.67,$54.72,-,$52.75,$50.74,$50.74,-,$50.74,$50.74,$50.74,-
10013513,MNS FR BASIC LS WRK SHRT BLUE MULTI,$54.99,$54.78,-,$54.89,-,$53.61,$51.28,$51.28,-,$51.28,$51.28,$51.28,-
10014857,MNS FR GAUGE WORK SHIRT,$59.23,$58.07,$58.68,$59.65,-,$57.39,$55.49,$55.49,-,$55.49,$55.49,$55.49,-
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
20001046,MNS FR SOLID LS WRK SHRT NAVY,-,$40.34,-,-,-,$40.34,-,-,-,-,-,-,-
20001047,MNS FR SOLID LS WRK SHRT KHAKI,-,$37.42,-,-,-,-,-,-,-,-,-,-,-
20001050,MNS FR AIR INHRNT WRK SHR CHAR HTHR901,-,-,-,-,-,$48.36,-,-,$48.36,-,-,-,-
20001051,MNS FR SOLID LS WRK SHRT SILVER FOX,-,$37.42,-,-,-,-,-,-,-,-,-,-,-




=== Detailed Monthly Trend Data ===


Unnamed: 0,Material,Product,Month,Wtd Avg Price,Total Qty
242,20000684,FR Vented LS Wrk Shrt Malbec 35432,2025-02,$50.34,23
243,20000684,FR Vented LS Wrk Shrt Malbec 35432,2025-06,$50.34,29
244,20000684,FR Vented LS Wrk Shrt Malbec 35432,2025-09,$50.34,1
239,20000683,FR Vented LS Wrk Shrt Stell Blue 35433,2025-02,$50.34,25
240,20000683,FR Vented LS Wrk Shrt Stell Blue 35433,2025-06,$50.34,19
...,...,...,...,...,...
237,20000556,WMS FR FTHRLGT LS WRKSHRT GNMTL 0335,2025-09,$28.53,2
203,10079699,WMS FR GAUGE LS WRK SHRT WHT MLT,2025-12,$49.32,1152
186,10071321,WMS FR PAIGE LOGO LS WRK SHRT CSTL FJRD,2025-07,$34.58,812
178,10071082,WMS FR RUTH LS SNP WRK SHRT INDG,2025-07,$40.20,987


## Section 3: Weighted Average Price Calculator

Calculate weighted average price for a selected date range, optionally filtered by product.

In [7]:
# =============================================================================
# WEIGHTED AVERAGE CALCULATOR - INTERACTIVE
# =============================================================================
# This section allows you to:
# 1. Select a date range (start and end dates)
# 2. Optionally filter to specific products
# 3. Calculate weighted average price for each product in that range
#
# Output includes:
# - Summary table with Weighted Avg, Total Quantity, Total FOB Value
# - Excel-format table showing monthly price trends within the selected range
# =============================================================================

# Get date boundaries from the data
min_date = df['PO_Date'].min().date()
max_date = df['PO_Date'].max().date()

# Date picker widgets
start_date_picker = widgets.DatePicker(
    description='Start Date:',
    value=min_date,
    disabled=False
)

end_date_picker = widgets.DatePicker(
    description='End Date:',
    value=max_date,
    disabled=False
)

# Product filter - includes "All Products" option
product_filter = widgets.SelectMultiple(
    options=[('All Products', 'ALL')] + product_options,
    value=['ALL'],  # Default to all products
    description='Products:',
    layout=widgets.Layout(width='600px', height='150px')
)

# Button to trigger calculation
calc_button = widgets.Button(description='Calculate', button_style='primary')
calc_output = widgets.Output()

def calculate_weighted_avg(b):
    """
    Calculate and display weighted average prices for selected criteria.
    
    Outputs:
    1. Summary table with overall weighted averages per product
    2. Excel-format table showing monthly trends within the selected date range
    """
    with calc_output:
        calc_output.clear_output(wait=True)
        
        # Convert widget dates to pandas Timestamps for comparison
        start = pd.Timestamp(start_date_picker.value)
        end = pd.Timestamp(end_date_picker.value)
        
        # Filter by date range
        filtered = df[(df['PO_Date'] >= start) & (df['PO_Date'] <= end)].copy()
        
        # Filter by products if specific ones selected (not 'ALL')
        selected_products = list(product_filter.value)
        if 'ALL' not in selected_products:
            filtered = filtered[filtered['Material'].isin(selected_products)]
        
        if len(filtered) == 0:
            print("No data found for the selected criteria.")
            return
        
        # =================================================================
        # HEADER - Show selected date range prominently
        # =================================================================
        print("=" * 80)
        print(f"WEIGHTED AVERAGE PRICE ANALYSIS")
        print("=" * 80)
        print(f"Selected Date Range: {start.strftime('%Y-%m-%d')} to {end.strftime('%Y-%m-%d')}")
        print(f"Total PO Line Items: {len(filtered)}")
        print(f"Unique Products: {filtered['Material'].nunique()}")
        print("=" * 80)
        
        # =================================================================
        # TABLE 1: Overall Weighted Average Summary
        # =================================================================
        summary = filtered.groupby(['Material', 'Short Name']).agg({
            'Total_Value': 'sum',   # Numerator for weighted avg
            'Ordered_Qty': 'sum'    # Denominator for weighted avg
        }).reset_index()
        
        summary['Weighted_Avg_Price'] = summary['Total_Value'] / summary['Ordered_Qty']
        summary = summary.rename(columns={
            'Material': 'Material ID',
            'Short Name': 'Product Name',
            'Ordered_Qty': 'Total Quantity',
            'Total_Value': 'Total FOB Value ($)',
            'Weighted_Avg_Price': 'Weighted Avg Price ($)'
        })
        
        summary = summary.sort_values('Product Name')
        
        print(f"\n--- OVERALL WEIGHTED AVERAGES (for entire date range) ---\n")
        
        # Format numbers for display
        display_df = summary.copy()
        display_df['Weighted Avg Price ($)'] = display_df['Weighted Avg Price ($)'].apply(lambda x: f'${x:.2f}')
        display_df['Total FOB Value ($)'] = display_df['Total FOB Value ($)'].apply(lambda x: f'${x:,.2f}')
        display_df['Total Quantity'] = display_df['Total Quantity'].apply(lambda x: f'{x:,}')
        
        display(display_df)
        
        # =================================================================
        # TABLE 2: Monthly Trend (Excel-format pivot table)
        # =================================================================
        print(f"\n\n--- MONTHLY PRICE TREND (within selected date range) ---")
        print(f"Date Range: {start.strftime('%Y-%m-%d')} to {end.strftime('%Y-%m-%d')}\n")
        
        # Calculate monthly weighted averages within the filtered data
        monthly = filtered.groupby(['Material', 'Short Name', 'PO_Month']).agg({
            'Total_Value': 'sum',
            'Ordered_Qty': 'sum'
        }).reset_index()
        monthly['Monthly_Avg'] = monthly['Total_Value'] / monthly['Ordered_Qty']
        
        # Create pivot table: rows = products, columns = months
        pivot_price = monthly.pivot_table(
            index=['Material', 'Short Name'],
            columns='PO_Month',
            values='Monthly_Avg',
            aggfunc='first'
        ).round(2)
        
        # Get the overall weighted avg for each product (for the last column)
        overall_avg = summary.set_index(['Material ID', 'Product Name'])['Weighted Avg Price ($)']
        
        # Format prices with $ sign for display
        pivot_display = pivot_price.copy()
        for col in pivot_display.columns:
            pivot_display[col] = pivot_display[col].apply(lambda x: f'${x:.2f}' if pd.notna(x) else '-')
        
        # Add overall weighted average as the last column
        pivot_display['Overall Wtd Avg'] = overall_avg
        
        print("Excel-Format Table: Monthly Weighted Average Price by Product")
        print("(Each cell = weighted avg price for that product in that month)")
        print("")
        display(pivot_display)
        
        # =================================================================
        # TABLE 3: Monthly Quantity Trend
        # =================================================================
        print(f"\n\n--- MONTHLY QUANTITY TREND (within selected date range) ---")
        print(f"Date Range: {start.strftime('%Y-%m-%d')} to {end.strftime('%Y-%m-%d')}\n")
        
        pivot_qty = monthly.pivot_table(
            index=['Material', 'Short Name'],
            columns='PO_Month',
            values='Ordered_Qty',
            aggfunc='sum'
        ).fillna(0).astype(int)
        
        # Get total qty for each product (for the last column)
        total_qty = summary.set_index(['Material ID', 'Product Name'])['Total Quantity']
        
        # Format quantities
        pivot_qty_display = pivot_qty.copy()
        for col in pivot_qty_display.columns:
            pivot_qty_display[col] = pivot_qty_display[col].apply(lambda x: f'{x:,}' if x > 0 else '-')
        
        pivot_qty_display['Total Qty'] = total_qty
        
        print("Excel-Format Table: Monthly Quantity Ordered by Product")
        print("(Each cell = total quantity ordered for that product in that month)")
        print("")
        display(pivot_qty_display)

# Connect button click to callback
calc_button.on_click(calculate_weighted_avg)

# Display the interface
print("Select date range and products to calculate weighted average prices:")
display(widgets.HBox([start_date_picker, end_date_picker]))
print("\nFilter by products (select 'All Products' or specific products):")
display(product_filter)
display(calc_button)
display(calc_output)

Select date range and products to calculate weighted average prices:


HBox(children=(DatePicker(value=datetime.date(2025, 1, 29), description='Start Date:', step=1), DatePicker(val…


Filter by products (select 'All Products' or specific products):


SelectMultiple(description='Products:', index=(0,), layout=Layout(height='150px', width='600px'), options=(('A…

Button(button_style='primary', description='Calculate', style=ButtonStyle())

Output()

## Section 4: Deviation Analysis

Calculate how individual line items deviate from the weighted average price.

In [8]:
# =============================================================================
# DEVIATION ANALYSIS - INTERACTIVE
# =============================================================================
# This section shows how individual PO line items deviate from the weighted
# average price for their product.
#
# For each PO line item, we calculate:
#   Dollar Deviation = Item Price - Weighted Average
#   Percent Deviation = (Dollar Deviation / Weighted Average) * 100
#
# OUTPUT TABLES:
# 1. Monthly Deviation Summary (Excel format) - Shows each product's monthly
#    weighted avg price deviation from the overall weighted avg
# 2. Individual PO Line Item Deviations - Detailed list of each PO
#
# NOTE: After PO-level aggregation, each row represents one PO + Material
# combination (sizes are already aggregated together).
# =============================================================================

# Date pickers (separate from Section 3 to allow different ranges)
dev_start_picker = widgets.DatePicker(
    description='Start Date:',
    value=min_date,
    disabled=False
)

dev_end_picker = widgets.DatePicker(
    description='End Date:',
    value=max_date,
    disabled=False
)

# Product filter
dev_product_filter = widgets.SelectMultiple(
    options=[('All Products', 'ALL')] + product_options,
    value=['ALL'],
    description='Products:',
    layout=widgets.Layout(width='600px', height='150px')
)

# Threshold filter dropdown
threshold_type = widgets.Dropdown(
    options=[
        ('All Items', 'all'),           # Show everything
        ('Above Average', 'above'),     # Items with positive deviation
        ('Below Average', 'below')      # Items with negative deviation
    ],
    value='all',
    description='Filter:'
)

# Threshold amount (only applies when above/below selected)
threshold_value = widgets.FloatText(
    value=0.0,
    description='Threshold ($):',
    disabled=False
)

dev_button = widgets.Button(description='Analyze Deviations', button_style='primary')
dev_output = widgets.Output()

def analyze_deviations(b):
    """
    Calculate deviation for each PO line item from its product's weighted average.
    
    Output:
    1. Monthly Deviation Summary Table (Excel format)
       - Rows: Products (Material ID)
       - Columns: Months
       - Values: Monthly weighted avg deviation from overall weighted avg
    
    2. Individual PO Line Items with deviations
    """
    with dev_output:
        dev_output.clear_output(wait=True)
        
        start = pd.Timestamp(dev_start_picker.value)
        end = pd.Timestamp(dev_end_picker.value)
        
        # Filter by date
        filtered = df[(df['PO_Date'] >= start) & (df['PO_Date'] <= end)].copy()
        
        # Filter by products if not 'ALL'
        selected_products = list(dev_product_filter.value)
        if 'ALL' not in selected_products:
            filtered = filtered[filtered['Material'].isin(selected_products)]
        
        if len(filtered) == 0:
            print("No data found for the selected criteria.")
            return
        
        # =================================================================
        # HEADER
        # =================================================================
        print("=" * 80)
        print("DEVIATION ANALYSIS")
        print("=" * 80)
        print(f"Selected Date Range: {start.strftime('%Y-%m-%d')} to {end.strftime('%Y-%m-%d')}")
        print(f"Total PO Line Items: {len(filtered)}")
        print(f"Unique Products: {filtered['Material'].nunique()}")
        print("=" * 80)
        
        # =================================================================
        # Calculate OVERALL weighted average per product (for the entire date range)
        # This is the baseline to compare monthly averages against
        # =================================================================
        product_avgs = filtered.groupby(['Material', 'Short Name']).agg({
            'Total_Value': 'sum',
            'Ordered_Qty': 'sum'
        }).reset_index()
        product_avgs['Overall_Weighted_Avg'] = product_avgs['Total_Value'] / product_avgs['Ordered_Qty']
        
        # =================================================================
        # TABLE 1: MONTHLY DEVIATION SUMMARY (Excel Format)
        # Shows how each month's weighted avg deviates from overall weighted avg
        # =================================================================
        print("\n--- MONTHLY PRICE DEVIATION FROM OVERALL WEIGHTED AVERAGE ---")
        print(f"Date Range: {start.strftime('%Y-%m-%d')} to {end.strftime('%Y-%m-%d')}")
        print("(Positive = month's avg was ABOVE overall avg, Negative = BELOW)\n")
        
        # Calculate monthly weighted averages
        monthly = filtered.groupby(['Material', 'Short Name', 'PO_Month']).agg({
            'Total_Value': 'sum',
            'Ordered_Qty': 'sum'
        }).reset_index()
        monthly['Monthly_Avg'] = monthly['Total_Value'] / monthly['Ordered_Qty']
        
        # Merge with overall averages
        monthly = monthly.merge(
            product_avgs[['Material', 'Overall_Weighted_Avg']], 
            on='Material'
        )
        
        # Calculate deviation: Monthly Avg - Overall Avg
        monthly['Monthly_Deviation'] = monthly['Monthly_Avg'] - monthly['Overall_Weighted_Avg']
        monthly['Monthly_Deviation_Pct'] = (monthly['Monthly_Deviation'] / monthly['Overall_Weighted_Avg']) * 100
        
        # Create pivot table for DOLLAR deviation
        pivot_dev = monthly.pivot_table(
            index=['Material', 'Short Name'],
            columns='PO_Month',
            values='Monthly_Deviation',
            aggfunc='first'
        ).round(2)
        
        # Add overall weighted avg as last column for reference
        overall_avg_series = product_avgs.set_index(['Material', 'Short Name'])['Overall_Weighted_Avg'].round(2)
        
        # Format the deviation values with +/- signs
        pivot_dev_display = pivot_dev.copy()
        for col in pivot_dev_display.columns:
            pivot_dev_display[col] = pivot_dev_display[col].apply(
                lambda x: f'+${x:.2f}' if pd.notna(x) and x > 0 else (f'-${abs(x):.2f}' if pd.notna(x) and x < 0 else ('-' if pd.isna(x) else '$0.00'))
            )
        
        # Add overall weighted avg column
        pivot_dev_display['Overall Wtd Avg'] = overall_avg_series.apply(lambda x: f'${x:.2f}')
        
        print("Table 1: Monthly Deviation in DOLLARS from Overall Weighted Average")
        print("(Last column shows the overall weighted avg for reference)\n")
        display(pivot_dev_display)
        
        # Create pivot table for PERCENTAGE deviation
        pivot_pct = monthly.pivot_table(
            index=['Material', 'Short Name'],
            columns='PO_Month',
            values='Monthly_Deviation_Pct',
            aggfunc='first'
        ).round(2)
        
        # Format percentage values
        pivot_pct_display = pivot_pct.copy()
        for col in pivot_pct_display.columns:
            pivot_pct_display[col] = pivot_pct_display[col].apply(
                lambda x: f'+{x:.1f}%' if pd.notna(x) and x > 0 else (f'{x:.1f}%' if pd.notna(x) and x < 0 else ('-' if pd.isna(x) else '0.0%'))
            )
        
        print("\n\nTable 2: Monthly Deviation in PERCENTAGE from Overall Weighted Average\n")
        display(pivot_pct_display)
        
        # =================================================================
        # TABLE 2: Monthly Weighted Average Prices (for context)
        # =================================================================
        pivot_price = monthly.pivot_table(
            index=['Material', 'Short Name'],
            columns='PO_Month',
            values='Monthly_Avg',
            aggfunc='first'
        ).round(2)
        
        pivot_price_display = pivot_price.copy()
        for col in pivot_price_display.columns:
            pivot_price_display[col] = pivot_price_display[col].apply(
                lambda x: f'${x:.2f}' if pd.notna(x) else '-'
            )
        pivot_price_display['Overall Wtd Avg'] = overall_avg_series.apply(lambda x: f'${x:.2f}')
        
        print("\n\nTable 3: Monthly Weighted Average Prices (for reference)\n")
        display(pivot_price_display)
        
        # =================================================================
        # TABLE 3: INDIVIDUAL PO LINE ITEM DEVIATIONS
        # =================================================================
        print("\n" + "=" * 80)
        print("INDIVIDUAL PO LINE ITEM DEVIATIONS")
        print("=" * 80)
        
        # Merge weighted averages back to each line item
        result = filtered.merge(product_avgs[['Material', 'Overall_Weighted_Avg']], on='Material')
        
        # Calculate deviations for individual items
        result['Dollar_Deviation'] = result['Purchase_Price'] - result['Overall_Weighted_Avg']
        result['Pct_Deviation'] = (result['Dollar_Deviation'] / result['Overall_Weighted_Avg']) * 100
        
        # Apply threshold filter
        if threshold_type.value == 'above':
            result = result[result['Dollar_Deviation'] >= threshold_value.value]
        elif threshold_type.value == 'below':
            result = result[result['Dollar_Deviation'] <= -threshold_value.value]
        
        # Prepare display dataframe
        display_cols = [
            'PO#', 'Material', 'Short Name', 'PO_Date', 
            'Purchase_Price', 'Overall_Weighted_Avg', 'Dollar_Deviation', 'Pct_Deviation', 'Ordered_Qty'
        ]
        result_display = result[display_cols].copy()
        result_display = result_display.rename(columns={
            'Short Name': 'Product',
            'PO_Date': 'PO Date',
            'Purchase_Price': 'Unit Price',
            'Overall_Weighted_Avg': 'Wtd Avg',
            'Dollar_Deviation': 'Dev ($)',
            'Pct_Deviation': 'Dev (%)',
            'Ordered_Qty': 'Qty'
        })
        
        # Sort by deviation (largest positive first)
        result_display = result_display.sort_values('Dev ($)', ascending=False)
        
        # Display results header
        print(f"Filter: {threshold_type.value}")
        if threshold_type.value != 'all':
            print(f"Threshold: ${threshold_value.value:.2f}")
        print(f"Results: {len(result_display)} PO line items")
        print(f"Total Quantity: {result['Ordered_Qty'].sum():,} units\n")
        
        # Format for display
        result_display['PO Date'] = result_display['PO Date'].dt.strftime('%Y-%m-%d')
        result_display['Unit Price'] = result_display['Unit Price'].apply(lambda x: f'${x:.2f}')
        result_display['Wtd Avg'] = result_display['Wtd Avg'].apply(lambda x: f'${x:.2f}')
        result_display['Dev ($)'] = result_display['Dev ($)'].apply(
            lambda x: f'+${x:.2f}' if x >= 0 else f'-${abs(x):.2f}'
        )
        result_display['Dev (%)'] = result_display['Dev (%)'].apply(
            lambda x: f'+{x:.2f}%' if x >= 0 else f'{x:.2f}%'
        )
        result_display['Qty'] = result_display['Qty'].apply(lambda x: f'{x:,}')
        
        # Show first 100 rows (for performance)
        display(result_display.head(100))
        
        if len(result_display) > 100:
            print(f"\n... showing first 100 of {len(result_display)} results")

dev_button.on_click(analyze_deviations)

# Display the interface
print("Analyze how PO line items deviate from the weighted average price:")
print("(Each row = one PO + Material combination, sizes already aggregated)")
display(widgets.HBox([dev_start_picker, dev_end_picker]))
print("\nFilter by products:")
display(dev_product_filter)
print("\nDeviation threshold filter (for individual PO table):")
display(widgets.HBox([threshold_type, threshold_value]))
display(dev_button)
display(dev_output)

Analyze how PO line items deviate from the weighted average price:
(Each row = one PO + Material combination, sizes already aggregated)


HBox(children=(DatePicker(value=datetime.date(2025, 1, 29), description='Start Date:', step=1), DatePicker(val…


Filter by products:


SelectMultiple(description='Products:', index=(0,), layout=Layout(height='150px', width='600px'), options=(('A…


Deviation threshold filter (for individual PO table):


HBox(children=(Dropdown(description='Filter:', options=(('All Items', 'all'), ('Above Average', 'above'), ('Be…

Button(button_style='primary', description='Analyze Deviations', style=ButtonStyle())

Output()

## Section 5: Deviation Banding

Categorize items into $1 deviation bands and visualize the distribution.

In [None]:
# =============================================================================
# DEVIATION BANDING - HISTOGRAM AND PIE CHART
# =============================================================================
# This section categorizes PO line items into $1 deviation bands and visualizes
# the distribution using:
#   1. Histogram: TOTAL QUANTITY in each band (not count of items!)
#   2. Pie Chart: Percentage distribution by quantity across bands
#
# WHY TOTAL QUANTITY? Because what matters is "HOW MUCH YOU BOUGHT AT THAT PRICE"
# A single PO for 10,000 units at a bad price is worse than 10 POs for 100 units.
#
# Bands range from "$5+ below" to "$5+ above" in $1 increments.
#
# COLORS:
#   GREEN = Below average (savings - you paid less than average!)
#   RED = Above average (loss - you paid more than average!)
# =============================================================================

# Date pickers
band_start_picker = widgets.DatePicker(
    description='Start Date:',
    value=min_date,
    disabled=False
)

band_end_picker = widgets.DatePicker(
    description='End Date:',
    value=max_date,
    disabled=False
)

# Product filter
band_product_filter = widgets.SelectMultiple(
    options=[('All Products', 'ALL')] + product_options,
    value=['ALL'],
    description='Products:',
    layout=widgets.Layout(width='600px', height='150px')
)

band_button = widgets.Button(description='Generate Banding Charts', button_style='primary')
band_output = widgets.Output()

def assign_band(deviation):
    """
    Assign a dollar deviation value to a named band.
    
    Bands are $1 wide, from "$5+ below" to "$5+ above".
    
    Examples:
        deviation = -6.50  -> "$5+ below"
        deviation = -2.30  -> "$2 to $3 below"
        deviation = 0.50   -> "$0 to $1 above"
        deviation = 3.99   -> "$3 to $4 above"
    """
    if deviation <= -5:
        return '$5+ below'
    elif deviation <= -4:
        return '$4 to $5 below'
    elif deviation <= -3:
        return '$3 to $4 below'
    elif deviation <= -2:
        return '$2 to $3 below'
    elif deviation <= -1:
        return '$1 to $2 below'
    elif deviation < 0:
        return '$0 to $1 below'
    elif deviation < 1:
        return '$0 to $1 above'
    elif deviation < 2:
        return '$1 to $2 above'
    elif deviation < 3:
        return '$2 to $3 above'
    elif deviation < 4:
        return '$3 to $4 above'
    elif deviation < 5:
        return '$4 to $5 above'
    else:
        return '$5+ above'

# Define band order for consistent chart display (worst to best)
band_order = [
    '$5+ below', '$4 to $5 below', '$3 to $4 below', '$2 to $3 below',
    '$1 to $2 below', '$0 to $1 below', '$0 to $1 above', '$1 to $2 above',
    '$2 to $3 above', '$3 to $4 above', '$4 to $5 above', '$5+ above'
]

def generate_banding(b):
    """
    Generate histogram and pie chart showing deviation band distribution.
    
    IMPORTANT: Uses TOTAL QUANTITY per band, not count of line items!
    This reflects the actual volume of purchasing at each price deviation level.
    """
    with band_output:
        band_output.clear_output(wait=True)
        
        start = pd.Timestamp(band_start_picker.value)
        end = pd.Timestamp(band_end_picker.value)
        
        # Filter by date
        filtered = df[(df['PO_Date'] >= start) & (df['PO_Date'] <= end)].copy()
        
        # Filter by products if not 'ALL'
        selected_products = list(band_product_filter.value)
        if 'ALL' not in selected_products:
            filtered = filtered[filtered['Material'].isin(selected_products)]
        
        if len(filtered) == 0:
            print("No data found for the selected criteria.")
            return
        
        # Calculate weighted average per product
        product_avgs = filtered.groupby('Material').agg({
            'Total_Value': 'sum',
            'Ordered_Qty': 'sum'
        }).reset_index()
        product_avgs['Weighted_Avg'] = product_avgs['Total_Value'] / product_avgs['Ordered_Qty']
        
        # Calculate deviations
        result = filtered.merge(product_avgs[['Material', 'Weighted_Avg']], on='Material')
        result['Dollar_Deviation'] = result['Purchase_Price'] - result['Weighted_Avg']
        
        # Assign each item to a band
        result['Band'] = result['Dollar_Deviation'].apply(assign_band)
        
        # ---------------------------------------------------------------------
        # SUM TOTAL QUANTITY per band (not count!)
        # This shows HOW MUCH was purchased at each deviation level
        # ---------------------------------------------------------------------
        band_qty = result.groupby('Band')['Ordered_Qty'].sum().reindex(band_order, fill_value=0).reset_index()
        band_qty.columns = ['Band', 'Total_Qty']
        
        # Also get count for reference
        band_counts = result['Band'].value_counts().reindex(band_order, fill_value=0).reset_index()
        band_counts.columns = ['Band', 'PO_Count']
        
        # Merge for display
        band_summary = band_qty.merge(band_counts, on='Band')
        
        print(f"\n=== Deviation Banding Analysis ===")
        print(f"Date Range: {start.strftime('%Y-%m-%d')} to {end.strftime('%Y-%m-%d')}")
        print(f"Total PO Line Items: {len(result)}")
        print(f"Total Quantity: {result['Ordered_Qty'].sum():,} units\n")
        
        # ---------------------------------------------------------------------
        # HISTOGRAM - BY TOTAL QUANTITY
        # ---------------------------------------------------------------------
        # GREEN for "below average" bands (savings - you paid less!)
        # RED for "above average" bands (loss - you paid more!)
        colors = ['#2ca02c'] * 6 + ['#d62728'] * 6  # 6 below (green), 6 above (red)

        fig_hist = go.Figure(data=[
            go.Bar(
                x=band_summary['Band'],
                y=band_summary['Total_Qty'],
                marker_color=colors,
                text=band_summary['Total_Qty'].apply(lambda x: f'{x:,}'),
                textposition='outside'
            )
        ])

        # Calculate appropriate y-axis max for auto-scaling
        max_qty = band_summary['Total_Qty'].max()
        y_max = max_qty * 1.2 if max_qty > 0 else 100  # 20% headroom for text labels

        fig_hist.update_layout(
            title='Deviation Banding by TOTAL QUANTITY Purchased<br><sub>Green = Below Average (Savings) | Red = Above Average (Paid More)</sub>',
            xaxis_title='Deviation Band',
            yaxis_title='Total Quantity (units)',
            xaxis_tickangle=-45,
            height=500,
            # Auto-scale: explicitly set range based on data
            yaxis=dict(
                range=[0, y_max],
                autorange=False  # Use our calculated range
            ),
            # Ensure the chart fits well
            margin=dict(t=80, b=100)  # Top margin for subtitle, bottom for rotated labels
        )

        fig_hist.show()
        
        # ---------------------------------------------------------------------
        # PIE CHART - BY TOTAL QUANTITY
        # ---------------------------------------------------------------------
        # Use same color mapping: green for below, red for above
        pie_colors = {
            '$5+ below': '#2ca02c', '$4 to $5 below': '#2ca02c', '$3 to $4 below': '#2ca02c',
            '$2 to $3 below': '#2ca02c', '$1 to $2 below': '#2ca02c', '$0 to $1 below': '#2ca02c',
            '$0 to $1 above': '#d62728', '$1 to $2 above': '#d62728', '$2 to $3 above': '#d62728',
            '$3 to $4 above': '#d62728', '$4 to $5 above': '#d62728', '$5+ above': '#d62728'
        }
        
        pie_data = band_summary[band_summary['Total_Qty'] > 0]
        
        fig_pie = go.Figure(data=[
            go.Pie(
                labels=pie_data['Band'],
                values=pie_data['Total_Qty'],
                marker_colors=[pie_colors.get(b, '#888888') for b in pie_data['Band']],
                sort=False
            )
        ])
        
        fig_pie.update_layout(
            title='Quantity Distribution by Deviation Band<br><sub>Green = Savings | Red = Paid More</sub>',
            height=500
        )
        fig_pie.show()
        
        # Summary table with both count and quantity
        print("\nBand Summary:")
        band_summary['Qty %'] = (band_summary['Total_Qty'] / band_summary['Total_Qty'].sum() * 100).round(1).astype(str) + '%'
        band_summary['Total_Qty'] = band_summary['Total_Qty'].apply(lambda x: f'{x:,}')
        display(band_summary)
        
        # Additional stats
        below_qty = result[result['Dollar_Deviation'] < 0]['Ordered_Qty'].sum()
        above_qty = result[result['Dollar_Deviation'] >= 0]['Ordered_Qty'].sum()
        total_qty = result['Ordered_Qty'].sum()
        print(f"\nSummary:")
        print(f"  Quantity bought BELOW average: {below_qty:,} ({below_qty/total_qty*100:.1f}%) - GREEN = SAVINGS")
        print(f"  Quantity bought AT/ABOVE average: {above_qty:,} ({above_qty/total_qty*100:.1f}%) - RED = PAID MORE")

band_button.on_click(generate_banding)

# Display the interface
print("Categorize purchases into deviation bands by TOTAL QUANTITY:")
print("(Shows HOW MUCH you bought at each price deviation level)")
print("COLOR KEY: Green = Below Avg (Savings) | Red = Above Avg (Paid More)")
display(widgets.HBox([band_start_picker, band_end_picker]))
print("\nFilter by products:")
display(band_product_filter)
display(band_button)
display(band_output)

## Section 6: Benchmark Comparison

Compare pricing between two different date ranges to identify price changes.

In [None]:
# =============================================================================
# BENCHMARK COMPARISON - COMPARE TWO DATE RANGES
# =============================================================================
# This section allows you to compare pricing between two time periods:
#   - Baseline Period: Your reference/historical period (used as price benchmark)
#   - Comparison Period: The period you want to analyze against baseline
#
# OUTPUT TABLES:
# 1. Product-by-Product Comparison with Gain/Loss calculation
#    - Gain/Loss = Price Delta × Comparison Period Quantity
#    - Total Gain/Loss across all products
#
# 2. Monthly Comparison vs Baseline Benchmark
#    - Uses Baseline weighted avg price as the benchmark
#    - Shows each comparison month's delta and gain/loss vs benchmark
# =============================================================================

# Baseline date range
print("=== Baseline Date Range ===")
baseline_start = widgets.DatePicker(
    description='Start:',
    value=min_date,
    disabled=False
)

baseline_end = widgets.DatePicker(
    description='End:',
    value=max_date,
    disabled=False
)

# Comparison date range
print("=== Comparison Date Range ===")
compare_start = widgets.DatePicker(
    description='Start:',
    value=min_date,
    disabled=False
)

compare_end = widgets.DatePicker(
    description='End:',
    value=max_date,
    disabled=False
)

# Product filter
benchmark_product_filter = widgets.SelectMultiple(
    options=[('All Products', 'ALL')] + product_options,
    value=['ALL'],
    description='Products:',
    layout=widgets.Layout(width='600px', height='150px')
)

benchmark_button = widgets.Button(description='Compare Ranges', button_style='primary')
benchmark_output = widgets.Output()

def compare_benchmarks(b):
    """
    Compare weighted average prices between two date ranges.
    
    Shows:
    1. Period summaries
    2. Product-by-product comparison with Gain/Loss
    3. Monthly breakdown vs baseline benchmark
    """
    with benchmark_output:
        benchmark_output.clear_output(wait=True)
        
        # Get date ranges from widgets
        b_start = pd.Timestamp(baseline_start.value)
        b_end = pd.Timestamp(baseline_end.value)
        c_start = pd.Timestamp(compare_start.value)
        c_end = pd.Timestamp(compare_end.value)
        
        # Filter baseline data
        baseline = df[(df['PO_Date'] >= b_start) & (df['PO_Date'] <= b_end)].copy()
        
        # Filter comparison data
        comparison = df[(df['PO_Date'] >= c_start) & (df['PO_Date'] <= c_end)].copy()
        
        # Filter by products if not 'ALL'
        selected_products = list(benchmark_product_filter.value)
        if 'ALL' not in selected_products:
            baseline = baseline[baseline['Material'].isin(selected_products)]
            comparison = comparison[comparison['Material'].isin(selected_products)]
        
        # =================================================================
        # BASELINE PERIOD SUMMARY
        # =================================================================
        print("=" * 70)
        print("BASELINE PERIOD (Price Benchmark)")
        print("=" * 70)
        print(f"Date Range: {b_start.strftime('%Y-%m-%d')} to {b_end.strftime('%Y-%m-%d')}")
        
        if len(baseline) == 0:
            print("  NO DATA in baseline period!")
            print("=" * 70)
        else:
            baseline_total_qty = baseline['Ordered_Qty'].sum()
            baseline_total_value = baseline['Total_Value'].sum()
            baseline_overall_avg = baseline_total_value / baseline_total_qty
            
            print(f"  PO Line Items:    {len(baseline):,}")
            print(f"  Unique POs:       {baseline['PO#'].nunique():,}")
            print(f"  Total Quantity:   {baseline_total_qty:,} units")
            print(f"  Total FOB Value:  ${baseline_total_value:,.2f}")
            print(f"  Overall Wtd Avg:  ${baseline_overall_avg:.2f}")
            print("=" * 70)
        
        # =================================================================
        # COMPARISON PERIOD SUMMARY
        # =================================================================
        print("\nCOMPARISON PERIOD")
        print("=" * 70)
        print(f"Date Range: {c_start.strftime('%Y-%m-%d')} to {c_end.strftime('%Y-%m-%d')}")
        
        if len(comparison) == 0:
            print("  NO DATA in comparison period!")
            print("=" * 70)
        else:
            compare_total_qty = comparison['Ordered_Qty'].sum()
            compare_total_value = comparison['Total_Value'].sum()
            compare_overall_avg = compare_total_value / compare_total_qty
            
            print(f"  PO Line Items:    {len(comparison):,}")
            print(f"  Unique POs:       {comparison['PO#'].nunique():,}")
            print(f"  Total Quantity:   {compare_total_qty:,} units")
            print(f"  Total FOB Value:  ${compare_total_value:,.2f}")
            print(f"  Overall Wtd Avg:  ${compare_overall_avg:.2f}")
            print("=" * 70)
        
        if len(baseline) == 0 or len(comparison) == 0:
            print("\nCannot compare - one or both periods have no data.")
            return
        
        # =================================================================
        # OVERALL CHANGE SUMMARY
        # =================================================================
        print("\nOVERALL CHANGE")
        print("-" * 70)
        price_change = compare_overall_avg - baseline_overall_avg
        price_change_pct = (price_change / baseline_overall_avg) * 100
        
        print(f"  Price Change:     ${price_change:+.2f} ({price_change_pct:+.1f}%)")
        print("-" * 70)
        
        # =================================================================
        # TABLE 1: PRODUCT-BY-PRODUCT COMPARISON WITH GAIN/LOSS
        # =================================================================
        print("\n" + "=" * 70)
        print("TABLE 1: PRODUCT-BY-PRODUCT COMPARISON WITH GAIN/LOSS")
        print("=" * 70)
        print("Gain/Loss = (Comparison Price - Baseline Price) × Comparison Qty")
        print("Positive = you paid MORE than baseline (loss)")
        print("Negative = you paid LESS than baseline (savings)")
        
        # Calculate weighted averages for BASELINE period
        baseline_avg = baseline.groupby(['Material', 'Short Name']).agg({
            'Total_Value': 'sum',
            'Ordered_Qty': 'sum',
            'PO#': 'nunique'
        }).reset_index()
        baseline_avg['Baseline_Avg'] = baseline_avg['Total_Value'] / baseline_avg['Ordered_Qty']
        baseline_avg['Baseline_Qty'] = baseline_avg['Ordered_Qty']
        baseline_avg['Baseline_POs'] = baseline_avg['PO#']
        baseline_avg = baseline_avg[['Material', 'Short Name', 'Baseline_Avg', 'Baseline_Qty', 'Baseline_POs']]
        
        # Calculate weighted averages for COMPARISON period
        compare_avg = comparison.groupby(['Material', 'Short Name']).agg({
            'Total_Value': 'sum',
            'Ordered_Qty': 'sum',
            'PO#': 'nunique'
        }).reset_index()
        compare_avg['Compare_Avg'] = compare_avg['Total_Value'] / compare_avg['Ordered_Qty']
        compare_avg['Compare_Qty'] = compare_avg['Ordered_Qty']
        compare_avg['Compare_POs'] = compare_avg['PO#']
        compare_avg = compare_avg[['Material', 'Short Name', 'Compare_Avg', 'Compare_Qty', 'Compare_POs']]
        
        # Merge results (outer join to include products in only one period)
        result = baseline_avg.merge(compare_avg, on=['Material', 'Short Name'], how='outer')
        
        # Calculate changes
        result['Price_Change'] = result['Compare_Avg'] - result['Baseline_Avg']
        result['Price_Change_Pct'] = (result['Price_Change'] / result['Baseline_Avg']) * 100
        
        # GAIN/LOSS CALCULATION: Price Delta × Comparison Quantity
        # Positive = paid more than baseline (loss), Negative = paid less (savings)
        result['Gain_Loss'] = result['Price_Change'] * result['Compare_Qty']
        
        # Sort by absolute gain/loss (largest impact first)
        result = result.sort_values('Gain_Loss', ascending=False, key=lambda x: x.abs(), na_position='last')
        
        # Count products by change direction
        valid_changes = result.dropna(subset=['Price_Change'])
        increased = len(valid_changes[valid_changes['Price_Change'] > 0.01])
        decreased = len(valid_changes[valid_changes['Price_Change'] < -0.01])
        unchanged = len(valid_changes[(valid_changes['Price_Change'] >= -0.01) & (valid_changes['Price_Change'] <= 0.01)])
        only_baseline = len(result[result['Compare_Avg'].isna()])
        only_compare = len(result[result['Baseline_Avg'].isna()])
        
        print(f"\nProducts with price INCREASE:  {increased}")
        print(f"Products with price DECREASE:  {decreased}")
        print(f"Products UNCHANGED:            {unchanged}")
        if only_baseline > 0:
            print(f"Products ONLY in baseline:     {only_baseline}")
        if only_compare > 0:
            print(f"Products ONLY in comparison:   {only_compare}")
        
        # Format for display
        display_df = result.copy()
        display_df = display_df.rename(columns={
            'Short Name': 'Product',
            'Baseline_Avg': 'Base Price',
            'Compare_Avg': 'Comp Price',
            'Price_Change': 'Price Δ',
            'Price_Change_Pct': 'Price Δ%',
            'Baseline_Qty': 'Base Qty',
            'Compare_Qty': 'Comp Qty',
            'Baseline_POs': 'Base POs',
            'Compare_POs': 'Comp POs',
            'Gain_Loss': 'Gain/Loss ($)'
        })
        
        # Format numbers
        display_df['Base Price'] = display_df['Base Price'].apply(lambda x: f'${x:.2f}' if pd.notna(x) else '-')
        display_df['Comp Price'] = display_df['Comp Price'].apply(lambda x: f'${x:.2f}' if pd.notna(x) else '-')
        display_df['Price Δ'] = display_df['Price Δ'].apply(
            lambda x: f'+${x:.2f}' if pd.notna(x) and x > 0 else (f'-${abs(x):.2f}' if pd.notna(x) and x < 0 else ('$0.00' if pd.notna(x) else '-'))
        )
        display_df['Price Δ%'] = display_df['Price Δ%'].apply(
            lambda x: f'+{x:.1f}%' if pd.notna(x) and x > 0 else (f'{x:.1f}%' if pd.notna(x) else '-')
        )
        display_df['Base Qty'] = display_df['Base Qty'].apply(lambda x: f'{int(x):,}' if pd.notna(x) else '-')
        display_df['Comp Qty'] = display_df['Comp Qty'].apply(lambda x: f'{int(x):,}' if pd.notna(x) else '-')
        display_df['Base POs'] = display_df['Base POs'].apply(lambda x: f'{int(x)}' if pd.notna(x) else '-')
        display_df['Comp POs'] = display_df['Comp POs'].apply(lambda x: f'{int(x)}' if pd.notna(x) else '-')
        display_df['Gain/Loss ($)'] = display_df['Gain/Loss ($)'].apply(
            lambda x: f'+${x:,.2f}' if pd.notna(x) and x > 0 else (f'-${abs(x):,.2f}' if pd.notna(x) and x < 0 else ('$0.00' if pd.notna(x) else '-'))
        )
        
        print("\n")
        display(display_df[['Material', 'Product', 'Base POs', 'Base Price', 'Base Qty', 
                           'Comp POs', 'Comp Price', 'Comp Qty', 'Price Δ', 'Price Δ%', 'Gain/Loss ($)']])
        
        # =================================================================
        # TOTAL GAIN/LOSS SUMMARY
        # =================================================================
        total_gain_loss = result['Gain_Loss'].sum()
        total_gain = result[result['Gain_Loss'] > 0]['Gain_Loss'].sum()
        total_savings = result[result['Gain_Loss'] < 0]['Gain_Loss'].sum()
        
        print("\n" + "=" * 70)
        print("TOTAL GAIN/LOSS SUMMARY")
        print("=" * 70)
        if total_gain_loss >= 0:
            print(f"  TOTAL GAIN/LOSS:     +${total_gain_loss:,.2f} (you paid MORE than baseline)")
        else:
            print(f"  TOTAL GAIN/LOSS:     -${abs(total_gain_loss):,.2f} (you SAVED vs baseline)")
        print(f"  Total Losses:        +${total_gain:,.2f}")
        print(f"  Total Savings:       -${abs(total_savings):,.2f}")
        print("=" * 70)
        
        # =================================================================
        # TABLE 2: MONTHLY COMPARISON VS BASELINE BENCHMARK
        # =================================================================
        print("\n\n" + "=" * 70)
        print("TABLE 2: MONTHLY COMPARISON VS BASELINE BENCHMARK")
        print("=" * 70)
        print("Baseline weighted avg price is used as the benchmark for each product.")
        print("Each column shows a month in the comparison period.")
        print("")
        
        # Get baseline averages (benchmark prices)
        baseline_benchmark = baseline.groupby(['Material', 'Short Name']).agg({
            'Total_Value': 'sum',
            'Ordered_Qty': 'sum'
        }).reset_index()
        baseline_benchmark['Benchmark_Price'] = baseline_benchmark['Total_Value'] / baseline_benchmark['Ordered_Qty']
        baseline_benchmark = baseline_benchmark[['Material', 'Short Name', 'Benchmark_Price']]
        
        # Calculate monthly weighted averages and quantities for comparison period
        monthly_comp = comparison.groupby(['Material', 'Short Name', 'PO_Month']).agg({
            'Total_Value': 'sum',
            'Ordered_Qty': 'sum'
        }).reset_index()
        monthly_comp['Monthly_Avg'] = monthly_comp['Total_Value'] / monthly_comp['Ordered_Qty']
        
        # Merge with baseline benchmark
        monthly_comp = monthly_comp.merge(baseline_benchmark, on=['Material', 'Short Name'], how='left')
        
        # Calculate delta vs benchmark
        monthly_comp['Delta_vs_Benchmark'] = monthly_comp['Monthly_Avg'] - monthly_comp['Benchmark_Price']
        
        # Calculate gain/loss for each month: Delta × Monthly Qty
        monthly_comp['Monthly_Gain_Loss'] = monthly_comp['Delta_vs_Benchmark'] * monthly_comp['Ordered_Qty']
        
        # -----------------------------------------------------------------
        # TABLE 2A: Price Delta vs Baseline Benchmark (by month)
        # -----------------------------------------------------------------
        print("Table 2A: Price Delta vs Baseline Benchmark ($ difference)")
        print("(Comparison Period Monthly Price - Baseline Benchmark Price)\n")
        
        pivot_delta = monthly_comp.pivot_table(
            index=['Material', 'Short Name'],
            columns='PO_Month',
            values='Delta_vs_Benchmark',
            aggfunc='first'
        ).round(2)
        
        # Add benchmark price as first column
        benchmark_series = baseline_benchmark.set_index(['Material', 'Short Name'])['Benchmark_Price'].round(2)
        
        pivot_delta_display = pivot_delta.copy()
        for col in pivot_delta_display.columns:
            pivot_delta_display[col] = pivot_delta_display[col].apply(
                lambda x: f'+${x:.2f}' if pd.notna(x) and x > 0 else (f'-${abs(x):.2f}' if pd.notna(x) and x < 0 else ('-' if pd.isna(x) else '$0.00'))
            )
        
        # Insert benchmark price column at the beginning
        pivot_delta_display.insert(0, 'Benchmark Price', benchmark_series.apply(lambda x: f'${x:.2f}' if pd.notna(x) else '-'))
        
        display(pivot_delta_display)
        
        # -----------------------------------------------------------------
        # TABLE 2B: Gain/Loss vs Baseline Benchmark (by month)
        # -----------------------------------------------------------------
        print("\n\nTable 2B: Gain/Loss vs Baseline Benchmark ($ amount = Delta × Qty)")
        print("(Positive = paid more than benchmark, Negative = savings)\n")
        
        pivot_gainloss = monthly_comp.pivot_table(
            index=['Material', 'Short Name'],
            columns='PO_Month',
            values='Monthly_Gain_Loss',
            aggfunc='sum'
        ).round(2)
        
        pivot_gainloss_display = pivot_gainloss.copy()
        for col in pivot_gainloss_display.columns:
            pivot_gainloss_display[col] = pivot_gainloss_display[col].apply(
                lambda x: f'+${x:,.2f}' if pd.notna(x) and x > 0 else (f'-${abs(x):,.2f}' if pd.notna(x) and x < 0 else ('-' if pd.isna(x) else '$0.00'))
            )
        
        # Add row total (total gain/loss per product across all months)
        product_total_gainloss = pivot_gainloss.sum(axis=1).round(2)
        pivot_gainloss_display['TOTAL'] = product_total_gainloss.apply(
            lambda x: f'+${x:,.2f}' if x > 0 else (f'-${abs(x):,.2f}' if x < 0 else '$0.00')
        )
        
        display(pivot_gainloss_display)
        
        # Monthly totals
        print("\n\nMonthly Gain/Loss Totals:")
        monthly_totals = pivot_gainloss.sum(axis=0)
        for month in monthly_totals.index:
            val = monthly_totals[month]
            if val >= 0:
                print(f"  {month}: +${val:,.2f}")
            else:
                print(f"  {month}: -${abs(val):,.2f}")
        
        grand_total = monthly_totals.sum()
        print("-" * 40)
        if grand_total >= 0:
            print(f"  GRAND TOTAL: +${grand_total:,.2f}")
        else:
            print(f"  GRAND TOTAL: -${abs(grand_total):,.2f}")

benchmark_button.on_click(compare_benchmarks)

# Display the interface
print("Compare pricing between two date ranges:\n")
print("BASELINE Period (your price benchmark):")
display(widgets.HBox([baseline_start, baseline_end]))
print("\nCOMPARISON Period (analyze against baseline benchmark):")
display(widgets.HBox([compare_start, compare_end]))
print("\nFilter by products:")
display(benchmark_product_filter)
display(benchmark_button)
display(benchmark_output)

---

## Quick Reference

| Section | Purpose |
|---------|----------|
| Section 2 | View monthly price trends per product |
| Section 3 | Calculate weighted average prices for date range |
| Section 4 | Analyze line item deviations from average |
| Section 5 | Visualize deviation distribution with banding |
| Section 6 | Compare prices between two periods |