
# Bronze Layer: Raw Data Ingestion

## Purpose
Ingest raw data from source volumes into the Bronze layer of the Databricks Lakehouse following the medallion architecture pattern.

## What This Notebook Does
- Reads multiple CSV source datasets from CRM and ERP systems
- Ingests data as-is without transformation (raw state preservation)
- Writes data to Bronze layer in Delta Lake format for ACID compliance
- Uses overwrite mode to replace existing tables with latest source data
- Maintains source system lineage through table naming convention

## Source Systems
- **CRM:** Customer information, product catalog, sales transactions
- **ERP:** Customer master data, location hierarchy, product categories

## Output
Delta tables stored in `workspace.bronze` schema with naming pattern: `{source_system}_{table_name}`

#Define Ingestion Configuration

In [0]:
INGESTION_CONFIG = [
    {
        "source": "crm",
        "path": "/Volumes/workspace/bronze/raw_sources/source_crm/cust_info.csv",
        "table": "crm_cust_info"
    },
    {
        "source": "crm",
        "path": "/Volumes/workspace/bronze/raw_sources/source_crm/prd_info.csv",
        "table": "crm_prd_info"
    },
    {
        "source": "crm",
        "path": "/Volumes/workspace/bronze/raw_sources/source_crm/sales_details.csv",
        "table": "crm_sales_details"
    },
    {
        "source": "erp",
        "path": "/Volumes/workspace/bronze/raw_sources/source_erp/CUST_AZ12.csv",
        "table": "erp_cust_az12"
    },
    {
        "source": "erp",
        "path": "/Volumes/workspace/bronze/raw_sources/source_erp/LOC_A101.csv",
        "table": "erp_loc_a101"
    },
    {
        "source": "erp",
        "path": "/Volumes/workspace/bronze/raw_sources/source_erp/PX_CAT_G1V2.csv",
        "table": "erp_px_cat_g1v2"
    }
]


#Ingest Files into Bronze Tables

In [0]:
%python

for item in INGESTION_CONFIG:
    print(f"Ingesting {item['source']} â†’ workspace.bronze.{item['table']}")

    df = (
        spark.read
             .option("header", "true")
             .option("inferSchema", "true")
             .csv(item["path"])
    )

    (
        df.write
          .mode("overwrite")
          .format("delta")
          .saveAsTable(f"workspace.bronze.{item['table']}")
    )
