We will modularize the Day 2 ecommerce pipeline into three separate notebooks (Bronze, Silver, Gold) and wire them together using a parameterized orchestrator notebook and a Databricks Job.

Notebook Structure We Are Building
Workspace folder: /Users/you@email.com/ecommerce/

  00_orchestrator.ipynb     <- Master notebook: reads params, calls all three steps
  01_bronze_ingest.ipynb    <- Reads raw CSV, writes Bronze Delta table
  02_silver_clean.ipynb     <- Reads Bronze, validates, deduplicates, writes Silver
  03_gold_features.ipynb    <- Reads Silver, builds user feature table, writes Gold


# Add Widget Parameters to Notebook
> What We Are Doing
> - Add widgets to the bronze ingest notebook so it accepts: the source data path, the target database, and a processing date. This makes the notebook reusable — the same notebook processes Oct data today and Nov data tomorrow, driven by Job parameters.


In [0]:
# Define all widget parameters at the TOP of the notebook
# Always define widgets in the very first cell — Jobs inject parameters before execution

dbutils.widgets.text(
    name         = 'catalog',
    defaultValue = 'ecommerce',
    label        = 'Unity Catalog Name'
)

dbutils.widgets.text(
    name         = 'run_date',
    defaultValue = '2026-02-25',
    label        = 'Processing Date (YYYY-MM-DD)'
)

dbutils.widgets.dropdown(
    name         = 'write_mode',
    defaultValue = 'overwrite',
    choices      = ['overwrite', 'append'],
    label        = 'Write Mode'
)

print('Widgets registered successfully')


Widgets registered successfully


In [0]:
#  Read widget values and cast to correct types
# IMPORTANT: All widget values are strings — always cast explicitly

CATALOG    = dbutils.widgets.get('catalog')
RUN_DATE   = dbutils.widgets.get('run_date')
WRITE_MODE = dbutils.widgets.get('write_mode')

print(f'CATALOG:    {CATALOG}')
print(f'RUN_DATE:   {RUN_DATE}')
print(f'WRITE_MODE: {WRITE_MODE}')


# # Set the active database
# spark.sql(f'CREATE DATABASE IF NOT EXISTS {TARGET_DATABASE}')
# spark.sql(f'USE {TARGET_DATABASE}')


CATALOG:    ecommerce
RUN_DATE:   
WRITE_MODE: overwrite


In [0]:
# Bronze ingestion (same logic as your Day 2 Cell 7, now uses widget values)
from pyspark.sql.functions import current_timestamp, lit

DELTA_PATH = '/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta/events/'  # here all data lies before storing to bronze layer

bronze_df = spark.read.format('delta') \
    .option('mode', 'PERMISSIVE') \
    .option('columnNameOfCorruptRecord', '_corrupt_record') \
    .load(DELTA_PATH)

bronze_df = bronze_df \
    .withColumn('_ingested_at', current_timestamp()) \
    .withColumn('_run_date', lit(RUN_DATE))   # tag with the processing date

bronze_df.write.format('delta') \
    .mode(WRITE_MODE) \
    .saveAsTable(f'{CATALOG}.bronze.events_br')

record_count = bronze_df.count()
print(f'Bronze ingestion complete: {record_count:,} records written')

# Return success signal to the orchestrator
dbutils.notebook.exit(f'SUCCESS: {record_count} records')
