# ACID - Atomicity Guarantee

Atomic guarantees through Transactional Log `_delta_log`. Transaction log is the **single source of the truth** for your table.

## LogStore

Object storage systems generally [doesn't provide capabilities](https://docs.delta.io/0.8.0/delta-storage.html#microsoft-azure-storage) required for atomic guarantees out-of-the-box. So, Delta Lake transactional operations typically go through the LogStore API implementattion instead of accessing the storage system directly. `LogStore is an abstraction of transaction log stores (to read and write Delta log files).` Spark Pool uses the following implementation:

In [23]:
println(spark.conf.get("spark.delta.logStore.class"))

## Working with transaction logs

When a Delta table is created

- A new folder is created where all the data and corresponding metadata is stored
- A sub-directory, called `_delta_log`, is created to store the transaction log
- All the files under that folder togeter constitute the table's transaction log
- Every change (CRUD) to the table is recorded as ordered atomic commits under that folder
- Each commit is a written out as a JSON file, starting with `000000.json`
- Each numeric JSON file increment represents a **new version** of the table
- Delta Lake creates a **checkpoint file in Parquet** format after every 10 JSON files or after every 10th commit (i.e. transaction)

In [3]:
val deltaLogPath = "/poc/delatalake/nyc_yellow_taxi_trips/_delta_log/00000000000000000000.json"
val transLogDf = spark.read.json(deltaLogPath)

In [20]:
transLogDf.printSchema

### add

In [14]:
{
    val addDf = transLogDf.select("add").filter("add IS NOT NULL")
    display(addDf.limit(2))
}

### commitInfo

In [15]:
{
    val commitInfoDf = transLogDf.select("commitInfo").filter("commitInfo IS NOT NULL")
    display(commitInfoDf)
}

### metadata

In [17]:
{
    val metadataDf = transLogDf.select("metadata").filter("metadata IS NOT NULL")
    display(metadataDf)
}

### protocol

In [18]:
{
    val protocolDf = transLogDf.select("protocol").filter("protocol IS NOT NULL")
    display(protocolDf)
}

## Working with checkpoint file

In [24]:
{
    //spark.read.parquet(deltaLogPath)
}