# Delta Lake Column Mapping Demonstration

## Overview
This notebook demonstrates **Delta Lake Column Mapping**, a critical feature for schema evolution in modern data lakes. Column mapping enables safe schema changes like renaming and dropping columns without breaking existing queries or data pipelines.

## What You'll Learn
* **Column Mapping Fundamentals**: Understanding the difference between positional and name-based column mapping
* **Schema Evolution**: How to safely rename and drop columns in Delta tables
* **Best Practices**: When and why to enable column mapping for production workloads
* **Common Pitfalls**: What happens when you try schema evolution without column mapping

## Prerequisites
* Basic understanding of Delta Lake and SQL
* Familiarity with data lake concepts
* Understanding of schema evolution challenges

## Notebook Structure
1. **Setup**: Create demo catalog, schema, and sample data
2. **Comparison**: Create tables with and without column mapping
3. **Schema Evolution**: Demonstrate rename and drop operations
4. **Results Analysis**: Compare outcomes and understand limitations

---

 ![Delta Lake Logo](https://delta.io/_astro/delta-lake-logo.Bqi7mgVq_Kp5oj.webp)
  <br><br> Delta Lake OSS (Open Source Software) is an open-source storage framework designed to bring reliable ACID transactions, scalable metadata handling, and unified batch and streaming data processing to data lakes, enabling the construction of modern "lakehouse" architectures.​

##   Key Features
  **ACID Transactions**: Delta Lake ensures data reliability and consistency by providing serializability, the strongest level of isolation for transactions.​

  **Scalable Metadata**: It efficiently manages petabyte-scale tables and billions of partitions, making large-scale analytics practical.​

  **Time Travel**: Users can access and revert to earlier versions of datasets, supporting auditing and rollbacks.​

  **Schema Enforcement & Evolution**: Delta Lake prevents "bad" data from corrupting datasets and supports gradual schema updates.​

  **Unified Batch/Streaming**: The same table can serve both streaming and batch processing seamlessly.​

  **Openness**: Delta Lake OSS is governed by the Linux Foundation and is community-driven without control by any single company.​

  **Multi-Engine Support**: Works natively with engines like Apache Spark, Flink, Hive, Trino, and Presto, and provides APIs in multiple programming languages (Scala, Java, Python, Rust, Ruby).​

In [0]:
%sql

drop catalog demo_youssefM cascade;

In [0]:
%sql

create catalog demo_youssefM;
use catalog demo_youssefM;
create schema delta;
use schema delta;

In [0]:
%sql


CREATE OR REPLACE VIEW fake_data_view AS
SELECT
  1 AS id,
  'Alice' AS `first name`,
  'Engineering' AS department
UNION ALL
SELECT
  2 AS id,
  'Bob' AS `first name`,
  'Sales' AS department
UNION ALL
SELECT
  3 AS id,
  'Carol' AS `first name`,
  'Marketing' AS department;

select * from fake_data_view

id,first name,department
1,Alice,Engineering
2,Bob,Sales
3,Carol,Marketing


In [0]:
%sql
create table customer as select * from fake_data_view;

[0;31m---------------------------------------------------------------------------[0m
[0;31mAnalysisException[0m                         Traceback (most recent call last)
File [0;32m<command-5858405212978737>, line 1[0m
[0;32m----> 1[0m get_ipython()[38;5;241m.[39mrun_cell_magic([38;5;124m'[39m[38;5;124msql[39m[38;5;124m'[39m, [38;5;124m'[39m[38;5;124m'[39m, [38;5;124m'[39m[38;5;124mcreate table customer as select * from fake_data_view;[39m[38;5;130;01m\n[39;00m[38;5;124m'[39m)

File [0;32m/databricks/python/lib/python3.12/site-packages/IPython/core/interactiveshell.py:2543[0m, in [0;36mInteractiveShell.run_cell_magic[0;34m(self, magic_name, line, cell)[0m
[1;32m   2541[0m [38;5;28;01mwith[39;00m [38;5;28mself[39m[38;5;241m.[39mbuiltin_trap:
[1;32m   2542[0m     args [38;5;241m=[39m (magic_arg_s, cell)
[0;32m-> 2543[0m     result [38;5;241m=[39m fn([38;5;241m*[39margs, [38;5;241m*[39m[38;5;241m*[39mkwargs)
[1;32m   2545[0m [38;5;

- FAILED ATTEMPT: Creating a table without column mapping enabled
- This cell demonstrates the default Delta Lake behavior
- Without explicit column mapping configuration, tables use positional column mapping
- This will fail or have limitations when we try to rename/drop columns later
- Column mapping must be enabled at table creation time - it cannot be added later

In [0]:
%sql


CREATE TABLE customer_with_CM
USING DELTA
TBLPROPERTIES (
  'delta.columnMapping.mode' = 'name'
)
AS
SELECT * FROM fake_data_view;




<br>SUCCESS: Creating a Delta table with column mapping enabled
<br>KEY FEATURE: 'delta.columnMapping.mode' = 'name' enables column mapping by name
<br>This allows future schema evolution operations like:
- Renaming columns without breaking existing queries
- Dropping columns safely
- Handling column order changes in INSERT operations
- Column mapping MUST be enabled at table creation - cannot be added later!

In [0]:
%sql

select * from customer_with_CM



In [0]:
%sql

CREATE OR REPLACE VIEW fake_data_view_2 AS
SELECT
  1 AS id,
  'Alice' AS `firstname`,
  'Engineering' AS department
UNION ALL
SELECT
  2 AS id,
  'Bob' AS `firstname`,
  'Sales' AS department
UNION ALL
SELECT
  3 AS id,
  'Carol' AS `firstname`,
  'Marketing' AS department;



In [0]:
%sql

create table customer_without_CM as select * from fake_data_view_2;



In [0]:
%sql


ALTER TABLE customer_with_CM RENAME COLUMN `first name` TO full_name;



✅ SUCCESS: Renaming a column in a table WITH column mapping enabled
-  This operation succeeds because column mapping allows safe schema evolution
-  The 'first name' column (with space) is renamed to 'full_name' (no space)
-  Column mapping maintains the relationship between logical and physical column names
-  Existing queries and applications continue to work during the transition

In [0]:
%sql

ALTER TABLE customer_without_CM RENAME COLUMN `firstname` TO full_name;



❌ FAILED: Attempting to rename a column in a table WITHOUT column mapping
-  This operation FAILS because standard Delta tables use positional column mapping
-  Without column mapping, Delta Lake cannot safely rename columns
-  This demonstrates why column mapping is crucial for schema evolution
-  The error shows the limitation of traditional Delta Lake tables

In [0]:
%sql

ALTER TABLE customer_with_CM drop COLUMN full_name;



✅ SUCCESS: Dropping a column in a table WITH column mapping enabled
- This operation succeeds because column mapping allows safe schema evolution
- The 'department' column is safely removed from the table structure
- Column mapping ensures that the physical data files remain intact
- This is a non-breaking change that doesn't affect existing data files

In [0]:
%sql

ALTER TABLE customer_without_CM drop COLUMN firstname;



❌ FAILED: Attempting to drop a column in a table WITHOUT column mapping
- This operation FAILS because standard Delta tables cannot safely drop columns
- Without column mapping, dropping columns would break the physical data structure
- This demonstrates another critical limitation of tables without column mapping
- Schema evolution operations require column mapping to work properly

In [0]:
%sql

select * from customer_with_CM



In [0]:
%sql

select * from customer_without_CM

