![capa](images/capa.png)

### Evolução das Arquiteturas de Dados

![data_architectures](images/data_architectures.png)

![lakehouse](images/data_lakehouse.png)

![deltalake_logo](images/deltalake_logo.png)

![deltalake_01](images/deltalake_oferece.png)

![delta_lake_oferece](images/deltalake_oferece_01.png)

![deltalake](images/deltalake.png)

## Hands On - Data Lakehouse com Delta Lake

In [None]:
import pyspark

spark = pyspark.sql.SparkSession.builder.appName("Lakehouse") \
    .config("spark.jars.packages", "io.delta:delta-core_2.12:0.8.0") \
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
    .config("spark.databricks.delta.schema.autoMerge.enabled","true") \
    .config("spark.databricks.delta.autoOptimize.optimizeWrite","true") \
    .config("spark.databricks.delta.optimizeWrite.enabled","true") \
    .config("spark.databricks.delta.vacuum.parallelDelete.enabled","true") \
    .getOrCreate()

from delta.tables import *
from pyspark.sql.functions import *

In [2]:
spark

In [3]:
!rm -rf tmp/

![star_wars](images/star_wars.png)

### **Criando uma tabela Delta**

* Para criar uma tabela Delta, escreve-se um DataFrame no formato delta. 

* Podemos utilizar um código Spark SQL existente e alterar o formato tradicional (parquet,csv,json, etc) para o formato delta.

In [4]:
path = 'tmp/sample.parquet'

In [5]:
df_star_wars = spark.createDataFrame(
    [
        (1, 'Luke Skywalker', 1.72,'azul','19BBY','masculino','Tatooine','Humano'),
        (2, 'C-3PO',1.67,'amarelo','112BBY','NA','Tatooine','Droid'),
        (3, 'R2-D2', 0.67, 'vermelho','33BBY','NA','Naboo','Droid'),
        (4, 'Anakin Skywalker', 1.88, 'azul','41.9BBY','masculino','Tatooine','Humano'),
        (5, 'Leia Organa', 1.50,'castanho','19BBY','feminino','Alderaan','Humano'),
        (6, 'Han Solo', 1.80, 'castanho', '29BBY', 'masculino', 'Corellia', 'Humano'),
        (7, 'Yoda', 0.66, 'castanho', '896BBY', 'masculino', None, 'Yoda Especie')
     
    ],
        ['id', 'nome', 'altura', 'cor_dos_olhos','data_nascimento','sexo','planeta','especie']
)

df_star_wars.show()

+---+----------------+------+-------------+---------------+---------+--------+------------+
| id|            nome|altura|cor_dos_olhos|data_nascimento|     sexo| planeta|     especie|
+---+----------------+------+-------------+---------------+---------+--------+------------+
|  1|  Luke Skywalker|  1.72|         azul|          19BBY|masculino|Tatooine|      Humano|
|  2|           C-3PO|  1.67|      amarelo|         112BBY|       NA|Tatooine|       Droid|
|  3|           R2-D2|  0.67|     vermelho|          33BBY|       NA|   Naboo|       Droid|
|  4|Anakin Skywalker|  1.88|         azul|        41.9BBY|masculino|Tatooine|      Humano|
|  5|     Leia Organa|   1.5|     castanho|          19BBY| feminino|Alderaan|      Humano|
|  6|        Han Solo|   1.8|     castanho|          29BBY|masculino|Corellia|      Humano|
|  7|            Yoda|  0.66|     castanho|         896BBY|masculino|    null|Yoda Especie|
+---+----------------+------+-------------+---------------+---------+--------+--

In [6]:
df_star_wars.write.format("delta").save(path)

![delta_table](images/delta_table.png)

### **Leitura dos Dados**

In [10]:
df_delta = spark.read.format("delta").load(path)
df_delta.show(truncate=False)

+---+----------------+------+-------------+---------------+---------+--------+------------+
|id |nome            |altura|cor_dos_olhos|data_nascimento|sexo     |planeta |especie     |
+---+----------------+------+-------------+---------------+---------+--------+------------+
|4  |Anakin Skywalker|1.88  |azul         |41.9BBY        |masculino|Tatooine|Humano      |
|5  |Leia Organa     |1.5   |castanho     |19BBY          |feminino |Alderaan|Humano      |
|1  |Luke Skywalker  |1.72  |azul         |19BBY          |masculino|Tatooine|Humano      |
|6  |Han Solo        |1.8   |castanho     |29BBY          |masculino|Corellia|Humano      |
|7  |Yoda            |0.66  |castanho     |896BBY         |masculino|null    |Yoda Especie|
|2  |C-3PO           |1.67  |amarelo      |112BBY         |NA       |Tatooine|Droid       |
|3  |R2-D2           |0.67  |vermelho     |33BBY          |NA       |Naboo   |Droid       |
+---+----------------+------+-------------+---------------+---------+--------+--

![upsert](images/upsert.png)

![update](images/update.png)

**Atualização condicional sem sobrescrever**

Delta Lake fornece APIs para:

* Atualização condicional
* Exclusão
* Merge(upsert) de dados 

In [9]:
df_star_wars_new = spark.createDataFrame(
    [
        (1, 'Luke Skywalker', 1.72,'azul','19BBY','masculino','Tatooine','Humano'),
        (2, 'C-3PO',1.67,'amarelo','112BBY','NA','Tatooine','Droid'),
        (3, 'R2-D2', 0.67, 'vermelho','33BBY','NA','Naboo','Droid'),
        (4, 'Darth Vader', 2.02, 'azul','41.9BBY','amarelo','Tatooine','Humano'),
        (5, 'Leia Organa', 1.50,'castanho','19BBY','feminino','Alderaan','Humano'),
        (6, 'Han Solo', 1.80, 'castanho', '29BBY', 'masculino', 'Corellia', 'Humano'),
        (7, 'Yoda', 0.66, 'castanho', '896BBY', 'masculino', None, 'Yoda Especie'),
        (8, 'Chewbacca', 2.28, 'azul', '200BBY', 'masculino', 'Kashyyyk', 'Wookiee'),
        (9, 'Boba Fett', 1.83, 'castanho', '31.5BBY', 'masculino', 'Kamino', 'Humano'),
        (10, 'Palpatine', 1.70, 'amarelo', '82BBY', 'masculino', 'Naboo', 'Humano'),
        
     
    ],
        ['id', 'nome', 'altura', 'cor_dos_olhos','data_nascimento','sexo','planeta','especie']
)

df_star_wars_new.show()

+---+--------------+------+-------------+---------------+---------+--------+------------+
| id|          nome|altura|cor_dos_olhos|data_nascimento|     sexo| planeta|     especie|
+---+--------------+------+-------------+---------------+---------+--------+------------+
|  1|Luke Skywalker|  1.72|         azul|          19BBY|masculino|Tatooine|      Humano|
|  2|         C-3PO|  1.67|      amarelo|         112BBY|       NA|Tatooine|       Droid|
|  3|         R2-D2|  0.67|     vermelho|          33BBY|       NA|   Naboo|       Droid|
|  4|   Darth Vader|  2.02|         azul|        41.9BBY|  amarelo|Tatooine|      Humano|
|  5|   Leia Organa|   1.5|     castanho|          19BBY| feminino|Alderaan|      Humano|
|  6|      Han Solo|   1.8|     castanho|          29BBY|masculino|Corellia|      Humano|
|  7|          Yoda|  0.66|     castanho|         896BBY|masculino|    null|Yoda Especie|
|  8|     Chewbacca|  2.28|         azul|         200BBY|masculino|Kashyyyk|     Wookiee|
|  9|     

![merge_tables](images/merge_tables.png)

In [10]:
table = DeltaTable.forPath(spark, path)

In [11]:
table.alias("persisteddata") .merge( \
   df_star_wars_new.alias("newdata"), \
    "persisteddata.id = newdata.id") \
.whenMatchedUpdateAll() \
.whenNotMatchedInsertAll() \
.execute()

In [12]:
table.toDF().show(truncate=False)

+---+--------------+------+-------------+---------------+---------+--------+------------+
|id |nome          |altura|cor_dos_olhos|data_nascimento|sexo     |planeta |especie     |
+---+--------------+------+-------------+---------------+---------+--------+------------+
|5  |Leia Organa   |1.5   |castanho     |19BBY          |feminino |Alderaan|Humano      |
|1  |Luke Skywalker|1.72  |azul         |19BBY          |masculino|Tatooine|Humano      |
|9  |Boba Fett     |1.83  |castanho     |31.5BBY        |masculino|Kamino  |Humano      |
|6  |Han Solo      |1.8   |castanho     |29BBY          |masculino|Corellia|Humano      |
|4  |Darth Vader   |2.02  |azul         |41.9BBY        |amarelo  |Tatooine|Humano      |
|8  |Chewbacca     |2.28  |azul         |200BBY         |masculino|Kashyyyk|Wookiee     |
|10 |Palpatine     |1.7   |amarelo      |82BBY          |masculino|Naboo   |Humano      |
|7  |Yoda          |0.66  |castanho     |896BBY         |masculino|null    |Yoda Especie|
|2  |C-3PO

![log_transaction](images/log_transaction.png)

![update_data](images/update_data.png)

![update_darth](images/update_darth.png)

In [21]:
deltaTable = DeltaTable.forPath(spark, path)

deltaTable.update("id = 4", { "cor_dos_olhos": "'amarelo'", "sexo": "'masculino'"} )

In [23]:
table.toDF().show(truncate=False)

+---+--------------+------+-------------+---------------+---------+--------+------------+
|id |nome          |altura|cor_dos_olhos|data_nascimento|sexo     |planeta |especie     |
+---+--------------+------+-------------+---------------+---------+--------+------------+
|4  |Darth Vader   |2.02  |amarelo      |41.9BBY        |masculino|Tatooine|Humano      |
|5  |Leia Organa   |1.5   |castanho     |19BBY          |feminino |Alderaan|Humano      |
|1  |Luke Skywalker|1.72  |azul         |19BBY          |masculino|Tatooine|Humano      |
|9  |Boba Fett     |1.83  |castanho     |31.5BBY        |masculino|Kamino  |Humano      |
|6  |Han Solo      |1.8   |castanho     |29BBY          |masculino|Corellia|Humano      |
|8  |Chewbacca     |2.28  |azul         |200BBY         |masculino|Kashyyyk|Wookiee     |
|10 |Palpatine     |1.7   |amarelo      |82BBY          |masculino|Naboo   |Humano      |
|7  |Yoda          |0.66  |castanho     |896BBY         |masculino|null    |Yoda Especie|
|2  |C-3PO

![delete](images/delete.png)

![delete_boba_fett](images/delete_boba_fett.png)

In [24]:
table = DeltaTable.forPath(spark, path)
table.delete("id = 9") 

In [25]:
table = DeltaTable.forPath(spark, path)
table.toDF().show(truncate=False)

+---+--------------+------+-------------+---------------+---------+--------+------------+
|id |nome          |altura|cor_dos_olhos|data_nascimento|sexo     |planeta |especie     |
+---+--------------+------+-------------+---------------+---------+--------+------------+
|4  |Darth Vader   |2.02  |amarelo      |41.9BBY        |masculino|Tatooine|Humano      |
|5  |Leia Organa   |1.5   |castanho     |19BBY          |feminino |Alderaan|Humano      |
|1  |Luke Skywalker|1.72  |azul         |19BBY          |masculino|Tatooine|Humano      |
|6  |Han Solo      |1.8   |castanho     |29BBY          |masculino|Corellia|Humano      |
|8  |Chewbacca     |2.28  |azul         |200BBY         |masculino|Kashyyyk|Wookiee     |
|10 |Palpatine     |1.7   |amarelo      |82BBY          |masculino|Naboo   |Humano      |
|7  |Yoda          |0.66  |castanho     |896BBY         |masculino|null    |Yoda Especie|
|2  |C-3PO         |1.67  |amarelo      |112BBY         |NA       |Tatooine|Droid       |
|3  |R2-D2

![history_01](images/history_01.png)

![history](images/history.png)

In [47]:
df_star_wars = spark.read.format("delta").option("versionAsOf", 0).load(path)
df_star_wars.show()

+---+----------------+------+-------------+---------------+---------+--------+------------+
| id|            nome|altura|cor_dos_olhos|data_nascimento|     sexo| planeta|     especie|
+---+----------------+------+-------------+---------------+---------+--------+------------+
|  4|Anakin Skywalker|  1.88|         azul|        41.9BBY|masculino|Tatooine|      Humano|
|  5|     Leia Organa|   1.5|     castanho|          19BBY| feminino|Alderaan|      Humano|
|  1|  Luke Skywalker|  1.72|         azul|          19BBY|masculino|Tatooine|      Humano|
|  6|        Han Solo|   1.8|     castanho|          29BBY|masculino|Corellia|      Humano|
|  7|            Yoda|  0.66|     castanho|         896BBY|masculino|    null|Yoda Especie|
|  2|           C-3PO|  1.67|      amarelo|         112BBY|       NA|Tatooine|       Droid|
|  3|           R2-D2|  0.67|     vermelho|          33BBY|       NA|   Naboo|       Droid|
+---+----------------+------+-------------+---------------+---------+--------+--

In [48]:
df_star_wars = spark.read.format("delta").option("versionAsOf", 1).load(path)
df_star_wars.show()

+---+--------------+------+-------------+---------------+---------+--------+------------+
| id|          nome|altura|cor_dos_olhos|data_nascimento|     sexo| planeta|     especie|
+---+--------------+------+-------------+---------------+---------+--------+------------+
|  5|   Leia Organa|   1.5|     castanho|          19BBY| feminino|Alderaan|      Humano|
|  1|Luke Skywalker|  1.72|         azul|          19BBY|masculino|Tatooine|      Humano|
|  9|     Boba Fett|  1.83|     castanho|        31.5BBY|masculino|  Kamino|      Humano|
|  6|      Han Solo|   1.8|     castanho|          29BBY|masculino|Corellia|      Humano|
|  8|     Chewbacca|  2.28|         azul|         200BBY|masculino|Kashyyyk|     Wookiee|
|  4|   Darth Vader|  2.02|         azul|        41.9BBY|  amarelo|Tatooine|      Humano|
| 10|     Palpatine|   1.7|      amarelo|          82BBY|masculino|   Naboo|      Humano|
|  7|          Yoda|  0.66|     castanho|         896BBY|masculino|    null|Yoda Especie|
|  2|     

In [53]:
df_star_wars = spark.read.format("delta").option("versionAsOf", 5).load(path)
df_star_wars.show()

+---+--------------+------+-------------+---------------+---------+--------+------------+
| id|          nome|altura|cor_dos_olhos|data_nascimento|     sexo| planeta|     especie|
+---+--------------+------+-------------+---------------+---------+--------+------------+
|  4|   Darth Vader|  2.02|      amarelo|        41.9BBY|masculino|Tatooine|      Humano|
|  5|   Leia Organa|   1.5|     castanho|          19BBY| feminino|Alderaan|      Humano|
|  1|Luke Skywalker|  1.72|         azul|          19BBY|masculino|Tatooine|      Humano|
|  6|      Han Solo|   1.8|     castanho|          29BBY|masculino|Corellia|      Humano|
|  8|     Chewbacca|  2.28|         azul|         200BBY|masculino|Kashyyyk|     Wookiee|
| 10|     Palpatine|   1.7|      amarelo|          82BBY|masculino|   Naboo|      Humano|
|  7|          Yoda|  0.66|     castanho|         896BBY|masculino|    null|Yoda Especie|
|  2|         C-3PO|  1.67|      amarelo|         112BBY|       NA|Tatooine|       Droid|
|  3|     

![history_02](images/history_02.png)

In [68]:
fullHistoryDF = deltaTable.history() 

![history_03](images/history_03.png)

In [71]:
lastOperationDF = deltaTable.history(1) 

## Acessando a última modificação

In [72]:
lastOperationDF.show(truncate=True)

+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+--------------------+
|version|          timestamp|userId|userName|operation| operationParameters| job|notebook|clusterId|readVersion|isolationLevel|isBlindAppend|    operationMetrics|        userMetadata|
+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+--------------------+
|      6|2021-11-28 17:40:47|  null|    null|    WRITE|[mode -> Overwrit...|null|    null|     null|          5|          null|        false|[numFiles -> 5, n...|overwritten-for-f...|
+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+--------------------+



![vacuum](images/vacuum.png)

In [74]:
deltaTable = DeltaTable.forPath(spark, path)

In [79]:
deltaTable.vacuum()

DataFrame[]

### Pode-se configurar o vacuum para outros períodos.

### Referências:

[1][Gerenciamento de dados:dos Dados ao Lakehouse](https://blog.compass.uol/tech/gerenciamento-de-dados-dos-dados-ao-lakehouse/)

[2][Quickstart Delta Lake](https://docs.delta.io/latest/quick-start.html)

[3] [5 razões para utilizar o Delta Lake](https://ichi.pro/pt/5-razoes-para-escolher-o-formato-delta-lake-em-databricks-239587988596605)

[4][Data Warehouse x Data Lake x Data Lakehouse](https://www.striim.com/data-warehouse-vs-data-lake-vs-data-lakehouse-an-overview/)