# Delta Lake Core Features

This notebook explores core **Delta Lake** features.

**Key Features Covered:**
1.  Table Creation
2.  ACID Transactions
3.  Time Travel
4.  Schema Evolution
5.  Vacuum

In [None]:
import connector
spark = connector.create_spark_session("delta-demo")

## 1. Create a Delta Table
We'll create a table using the `delta` provider. Since we configured the default catalog to be DeltaCatalog, we can also use `spark_catalog`.

In [None]:
# We'll use an S3 path for the Delta table location
table_path = "s3a://polaris/delta/demo/events"

print(f"Creating Delta table at {table_path}...")
spark.sql(f"DROP TABLE IF EXISTS delta.`{table_path}`")

spark.sql(f"""
    CREATE TABLE delta.`{table_path}` (
        event_id BIGINT,
        event_type STRING,
        ts TIMESTAMP
    )
    USING delta
""")

## 2. ACID Transactions
Delta supports Append and Overwrite modes with ACID guarantees.

In [None]:
# Append Data (Version 1)
print("Appending data (v1)...")
spark.sql(f"""
    INSERT INTO delta.`{table_path}` VALUES
    (1, 'login', TIMESTAMP '2023-01-01 10:00:00'),
    (2, 'logout', TIMESTAMP '2023-01-01 10:30:00')
""")
spark.read.format("delta").load(table_path).show()

In [None]:
# Overwrite Data (Version 2)
print("Overwriting data (v2)...")
spark.sql(f"""
    INSERT OVERWRITE delta.`{table_path}` VALUES
    (3, 'purchase', TIMESTAMP '2023-01-02 12:00:00')
""")
spark.read.format("delta").load(table_path).show()

## 3. Time Travel
We can query previous versions of the table.

In [None]:
# Show history
spark.sql(f"DESCRIBE HISTORY delta.`{table_path}`").select("version", "timestamp", "operation", "operationParameters").show(truncate=False)

In [None]:
# Query Version 1 (The Append)
print("Querying Version 1...")
spark.read.format("delta").option("versionAsOf", 1).load(table_path).show()

## 4. Schema Evolution
Delta can automatically merge schema changes.

In [None]:
from pyspark.sql.types import StructType, StructField, StringType, LongType, TimestampType

# Create a DataFrame with a NEW column 'user_id'
data = [(4, 'click', 'user_123')]
schema = StructType([
    StructField("event_id", LongType(), True),
    StructField("event_type", StringType(), True),
    StructField("user_id", StringType(), True) # New column
])
df = spark.createDataFrame(data, schema)

print("Appending data with new column (mergeSchema)...")
df.write.format("delta").option("mergeSchema", "true").mode("append").save(table_path)

spark.read.format("delta").load(table_path).show()

## 5. Vacuum
Clean up old files to save space (removes ability to time travel beyond retention period).

In [None]:
# By default, vacuum retention is 7 days. We can override check for demo purposes.
spark.conf.set("spark.databricks.delta.retentionDurationCheck.enabled", "false")

print("Vacuuming table (retain 0 hours)...")
spark.sql(f"VACUUM delta.`{table_path}` RETAIN 0 HOURS")

# History is still there, but files for old versions might be gone
spark.sql(f"DESCRIBE HISTORY delta.`{table_path}`").select("version", "operation").show()

In [None]:
spark.stop()