%md
## Building `instacart.silver.dim_product` (Product Dimension)

This step creates a **lightweight, analytics-ready product dimension** that captures product-level ordering and reorder behavior.  
It transforms detailed transactional data into compact, reusable product features suitable for reporting and machine learning.

---

### What This Code Does

**Input Sources:** `bronze.products`, `bronze.order_products`

The process performs the following steps:

1. **Read product and order-product data** from the bronze layer.  
2. **Aggregate by `product_id`** to compute key metrics:
   - **Total orders** and **reorder rate** → measure of product popularity and loyalty.  
   - **Average add-to-cart position** → proxy for product importance in the basket sequence.  
   - **Pack density** → average number of times a product appears per order, indicating bulk versus single-item purchases.  
3. **Persist results** as a Delta table in the silver schema for optimized analytical queries.

---

### Why We Use a Surrogate Key (`product_sk`)

- Enables **faster joins** with fact tables and supports **Z-ORDER optimizations** for Delta performance.  
- Provides a **stable, numeric identifier**, ensuring model and schema consistency even if product IDs or naming conventions change upstream.  
- Simplifies partitioning and clustering strategies for improved query efficiency.

---

### Why This Modeling Helps

- **Compression:** Reduces item-level transactional data to one summarized row per product, enabling faster lookups and smaller storage.  
- **Analytical readiness:** Facilitates normalization or z-scoring of reorder and position metrics for use in ML pipelines.  
- **Actionable insights:** Supports questions such as:
  - Which products are most frequently reordered?  
  - What items tend to be added early vs. late in the cart?  
  - Which products serve as anchor items for larger baskets?  
- **Scalability:** Provides a reusable and consistent product-level reference for joining with fact tables or modeling product performance trends.

---

**Result:**  
A **high-performance `dim_product` table** that consolidates product ordering patterns into an analytics-friendly structure.


In [0]:
%sql
USE CATALOG instacart;

In [0]:
%sql
-- Create silver schema if not already done
CREATE SCHEMA IF NOT EXISTS instacart.silver
LOCATION 'abfss://processed-data@datastorage00578.dfs.core.windows.net/Instacart/silver/';

-- Build the dim_product table
CREATE TABLE IF NOT EXISTS instacart.silver.dim_product (
  product_sk BIGINT GENERATED ALWAYS AS IDENTITY,
  product_id INT,
  product_name STRING,
  total_orders BIGINT,
  reorder_rate FLOAT,
  avg_add_to_cart_order FLOAT,
  is_frequently_reordered BOOLEAN,
  pack_density FLOAT
)
USING DELTA
LOCATION 'abfss://processed-data@datastorage00578.dfs.core.windows.net/Instacart/silver/dim_product';

-- Create temp views from bronze data
CREATE OR REPLACE TEMP VIEW v_products AS
SELECT product_id, product_name
FROM instacart.bronze.products;

CREATE OR REPLACE TEMP VIEW v_order_products AS
SELECT order_id, product_id, add_to_cart_order, reordered
FROM instacart.bronze.order_products_prior;

-- Aggregate product-level behavior
INSERT OVERWRITE instacart.silver.dim_product (
  product_id,
  product_name,
  total_orders,
  reorder_rate,
  avg_add_to_cart_order,
  is_frequently_reordered,
  pack_density
)
SELECT
  p.product_id,
  p.product_name,
  COUNT(*)                                AS total_orders,
  AVG(CAST(op.reordered AS FLOAT))        AS reorder_rate,
  AVG(op.add_to_cart_order)               AS avg_add_to_cart_order,
  CASE WHEN AVG(CAST(op.reordered AS FLOAT)) > 0.5 THEN TRUE ELSE FALSE END AS is_frequently_reordered,
   TRY_DIVIDE(COALESCE(COUNT(op.product_id), 0), NULLIF(COUNT(DISTINCT op.order_id), 0))
                                                                             AS pack_density  
FROM v_products p
LEFT JOIN v_order_products op
  ON p.product_id = op.product_id
GROUP BY p.product_id, p.product_name;


In [0]:
%sql
select * from instacart.silver.dim_product limit 20;

In [0]:
%sql
select count(*) from instacart.silver.dim_product;