# Apache Iceberg Core Features

This notebook explores core **Apache Iceberg** features using the Polaris catalog.

**Key Features Covered:**
1.  Table Creation (Partitioned)
2.  ACID Operations (Insert, Update, Delete)
3.  Schema Evolution
4.  Time Travel

In [None]:
import connector
spark = connector.create_spark_session("iceberg-demo")

## 1. Create a Partitioned Table
We'll create a table `sales` partitioned by `day` (derived from `ts`).

In [None]:
spark.sql("CREATE DATABASE IF NOT EXISTS polaris.sales_db")
spark.sql("DROP TABLE IF EXISTS polaris.sales_db.orders")

print("Creating partitioned table 'polaris.sales_db.orders'...")
spark.sql("""
    CREATE TABLE polaris.sales_db.orders (
        order_id BIGINT,
        customer_id BIGINT,
        amount DOUBLE,
        ts TIMESTAMP
    )
    USING iceberg
    PARTITIONED BY (days(ts))
""").show()

## 2. ACID Operations
Iceberg supports full ACID compliance.

In [None]:
# INSERT
print("Inserting data...")
spark.sql("""
    INSERT INTO polaris.sales_db.orders VALUES 
    (1, 100, 50.0, TIMESTAMP '2023-01-01 10:00:00'),
    (2, 101, 25.5, TIMESTAMP '2023-01-01 11:00:00'),
    (3, 100, 100.0, TIMESTAMP '2023-01-02 09:00:00')
""")
spark.table("polaris.sales_db.orders").show()

In [None]:
# UPDATE (Row-level update)
print("Updating order 1 amount to 55.0...")
spark.sql("UPDATE polaris.sales_db.orders SET amount = 55.0 WHERE order_id = 1")
spark.table("polaris.sales_db.orders").show()

In [None]:
# DELETE
print("Deleting order 2...")
spark.sql("DELETE FROM polaris.sales_db.orders WHERE order_id = 2")
spark.table("polaris.sales_db.orders").show()

## 3. Schema Evolution
Iceberg allows full schema evolution without rewriting data.

In [None]:
print("Adding column 'discount' and renaming 'amount' to 'total_amount'...")
spark.sql("ALTER TABLE polaris.sales_db.orders ADD COLUMN discount DOUBLE")
spark.sql("ALTER TABLE polaris.sales_db.orders RENAME COLUMN amount TO total_amount")

spark.table("polaris.sales_db.orders").printSchema()

## 4. Time Travel
We can query the table as it existed in the past.

In [None]:
# List snapshots to get IDs and timestamps
spark.sql("SELECT committed_at, snapshot_id, operation, summary FROM polaris.sales_db.orders.snapshots").show(truncate=False)

In [None]:
# Query the first snapshot (before update/delete)
# Replace the snapshot_id below with one from the output above manually if running interactively,
# or we can programmatically fetch it:
first_snapshot = spark.sql("SELECT snapshot_id FROM polaris.sales_db.orders.snapshots ORDER BY committed_at ASC LIMIT 1").collect()[0][0]

print(f"Querying snapshot {first_snapshot}...")
spark.read.option("snapshot-id", first_snapshot).table("polaris.sales_db.orders").show()

In [None]:
spark.stop()