# Epic 1 – Data Foundation Platform

## Feature 1.1: Raw Data Ingestion (Sprint 1)


### 1) 🟥 Create raw landing folders in DBFS (`/FileStore/retail/raw/contoso/`, `/FileStore/retail/raw/eurostyle/`) and document paths in the runbook.  
[DBX-DE-Assoc][Medallion][Platform]


In [0]:
%sql
CREATE EXTERNAL LOCATION IF NOT EXISTS loc_raw
  URL 'abfss://raw@stescontosoma.dfs.core.windows.net/'
  WITH (STORAGE CREDENTIAL `ws_es_contoso_ma`);

DESCRIBE EXTERNAL LOCATION loc_raw;



In [0]:
# Create directories in DBFS
dbutils.fs.mkdirs("/FileStore/retail/raw/contoso")
dbutils.fs.mkdirs("/FileStore/retail/raw/eurostyle")

# Verify that they exist
display(dbutils.fs.ls("/FileStore/retail/raw"))

### 2. 🟥 Upload Contoso CSVs to the raw path; note file names, counts, and approximate sizes.
[DBX-DE-Assoc][Medallion] 

### 3. 🟥 Ingest Contoso to Delta Bronze with lineage columns (ingest_ts, source_system='CONTOSO') as bronze.sales_contoso
[DBX-DE-Assoc][Delta-Basics][Autoloader][CopyInto][Medallion]

### 4. 🟥 Create a BI-friendly Contoso view bronze.v_sales_contoso with trimmed/typed columns for Power BI DirectQuery.
[DBX-DA-Assoc][SQL-Basics][Dashboards]

### 5. 🟥 Register tables/views in the metastore (Unity Catalog or workspace) and add table comments.
[DBX-DE-Assoc][UC-Permissions]

### 6. 🟥 Validate Contoso types (dates/numerics), address corrupt records if any, and record issues.
[DBX-DE-Assoc][Delta-Basics]

### 7. 🟨 Perform a Power BI DirectQuery smoke test to bronze.v_sales_contoso; capture steps/screenshot in the README.
[DBX-DA-Assoc][Dashboards][MS-PL300][Visualize]

### 8) 🟥 Upload EuroStyle CSVs to the raw path and capture source metadata (provenance, obtained date). 
[DBX-DE-Assoc][Medallion]  

### 9) 🟥 Ingest EuroStyle to Delta Bronze with lineage columns (`ingest_ts`, `source_system='EUROSTYLE'`) as `bronze.sales_eurostyle`.  
[DBX-DE-Assoc][Delta-Basics][Autoloader][CopyInto][Medallion] 

### 10) 🟥 Create and check in `docs/column_mapping.csv` with `source_name, unified_name, target_type`.  
[DBX-DE-Prof][Modeling]


### 11) 🟥 Apply initial schema alignment across brands using the mapping and naming conventions (snake_case, consistent date/decimal types); update the runbook.  
[DBX-DE-Prof][Modeling]  



### 12) 🟥 Reconcile raw→Bronze row counts per brand (±1% tolerance or explained variance) and persist counts to `monitor.dq_bronze_daily`.  
[DBX-DE-Prof][Monitoring-Logs]  


### 13) 🟥 Compute a basic DQ summary: null rates on keys, duplicate rate on `(order_id, sku, customer_id, order_date)`, top countries/currencies; publish a one-pager.  
[DBX-DE-Prof][Monitoring-Logs]  



### 14) 🟥 Enforce basic Delta constraints where feasible (NOT NULL on business keys, simple CHECKs); record violations.  
[DBX-DE-Assoc][Delta-Basics]  



### 15) 🟥 [DBX-DE-Assoc][Delta-MERGE][Delta-Basics][Medallion]  
Implement an idempotent re-run strategy (deterministic overwrite by date window via `replaceWhere` or `MERGE` on business keys) and verify repeatability.  