# 00_volume_to_bronze

**Project:** School Energy Load Forecasting  
**Purpose:**  
- Ingest raw `.txt` files from Unity Catalog Volume  
- No parsing, no assumptions  
- Create Bronze Delta table  

**Output table:**  
`workspace.school_energy_ml.bronze_energy_raw

In [0]:
from pyspark.sql import functions as F


In [0]:
# ===== Configuration =====

CATALOG = "workspace"
SCHEMA = "school_energy_ml"
BRONZE_TABLE = f"{CATALOG}.{SCHEMA}.bronze_energy_raw"

RAW_PATH = "/Volumes/workspace/school_energy_ml/raw_files/*.txt"

In [0]:
spark.sql(f"CREATE SCHEMA IF NOT EXISTS {CATALOG}.{SCHEMA}")

## Read raw files

In [0]:
bronze_df = (
    spark.read
    .text(RAW_PATH)
    .withColumnRenamed("value", "raw_line")
    .withColumn("source_file", F.col("_metadata.file_path"))  # âœ… Unity Catalog compatible
    .withColumn("ingestion_time", F.current_timestamp())
    .withColumn("layer", F.lit("bronze"))
)

## Quick sanity queries

In [0]:
bronze_df.limit(20).display()

## Write Bronze Delta Table

In [0]:
(
    bronze_df
    .write
    .format("delta")
    .mode("overwrite")
    .option("overwriteSchema", "true")
    .saveAsTable(BRONZE_TABLE)
)

## Validation table

In [0]:
spark.table(BRONZE_TABLE).limit(20).display()