![capa](images/capa.png)

![data_architectures](images/data_architectures.png)

![lakehouse](images/data_lakehouse.png)

![deltalake_logo](images/deltalake_logo.png)

![similares](images/similares.png)

![deltalake_01](images/deltalake_oferece.png)

![delta_lake_oferece](images/deltalake_oferece_01.png)

![deltalake](images/deltalake.png)

## Hands On - Data Lakehouse com Delta Lake

In [1]:
import pyspark

spark = pyspark.sql.SparkSession.builder.appName("Lakehouse") \
    .config("spark.jars.packages", "io.delta:delta-core_2.12:0.8.0") \
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
    .config("spark.databricks.delta.schema.autoMerge.enabled","true") \
    .config("spark.databricks.delta.autoOptimize.optimizeWrite","true") \
    .config("spark.databricks.delta.optimizeWrite.enabled","true") \
    .config("spark.databricks.delta.vacuum.parallelDelete.enabled","true") \
    .getOrCreate()

from delta.tables import *
from pyspark.sql.functions import *

In [2]:
spark

In [3]:
path = 'tmp/sample.parquet'

![star_wars](images/star_wars.png)

### **Criando uma tabela Delta**

* Para criar uma tabela Delta, escreve-se um DataFrame no formato delta. 

* Podemos utilizar um código Spark SQL existente e alterar o formato tradicional (parquet,csv,json, etc) para o formato delta.

In [4]:
df_star_wars = spark.createDataFrame(
    [
        (1, 'Luke Skywalker', 1.72,'azul','19BBY','masculino','Tatooine','Humano'),
        (2, 'C-3PO',1.67,'amarelo','112BBY','NA','Tatooine','Droid'),
        (3, 'R2-D2', 0.67, 'vermelho','33BBY','NA','Naboo','Droid'),
        (4, 'Anakin Skywalker', 1.88, 'azul','41.9BBY','masculino','Tatooine','Humano'),
        (5, 'Leia Organa', 1.50,'castanho','19BBY','feminino','Alderaan','Humano'),
        (6, 'Han Solo', 1.80, 'castanho', '29BBY', 'masculino', 'Corellia', 'Humano'),
        (7, 'Yoda', 0.66, 'castanho', '896BBY', 'masculino', None, 'Yoda Especie')
     
    ],
        ['id', 'nome', 'altura', 'cor_dos_olhos','data_nascimento','sexo','planeta','especie']
)

df_star_wars.show()

+---+----------------+------+-------------+---------------+---------+--------+------------+
| id|            nome|altura|cor_dos_olhos|data_nascimento|     sexo| planeta|     especie|
+---+----------------+------+-------------+---------------+---------+--------+------------+
|  1|  Luke Skywalker|  1.72|         azul|          19BBY|masculino|Tatooine|      Humano|
|  2|           C-3PO|  1.67|      amarelo|         112BBY|       NA|Tatooine|       Droid|
|  3|           R2-D2|  0.67|     vermelho|          33BBY|       NA|   Naboo|       Droid|
|  4|Anakin Skywalker|  1.88|         azul|        41.9BBY|masculino|Tatooine|      Humano|
|  5|     Leia Organa|   1.5|     castanho|          19BBY| feminino|Alderaan|      Humano|
|  6|        Han Solo|   1.8|     castanho|          29BBY|masculino|Corellia|      Humano|
|  7|            Yoda|  0.66|     castanho|         896BBY|masculino|    null|Yoda Especie|
+---+----------------+------+-------------+---------------+---------+--------+--

In [5]:
df_star_wars.write.mode("overwrite").format("delta").save(path)

![delta_table](images/delta_table.png)

![delta_table_01](images/delta_table_01.png)

### **Leitura dos Dados**

In [6]:
df_delta = spark.read.format("delta").load(path)
df_delta.orderBy("id").show(truncate=False)

+---+----------------+------+-------------+---------------+---------+--------+------------+
|id |nome            |altura|cor_dos_olhos|data_nascimento|sexo     |planeta |especie     |
+---+----------------+------+-------------+---------------+---------+--------+------------+
|1  |Luke Skywalker  |1.72  |azul         |19BBY          |masculino|Tatooine|Humano      |
|2  |C-3PO           |1.67  |amarelo      |112BBY         |NA       |Tatooine|Droid       |
|3  |R2-D2           |0.67  |vermelho     |33BBY          |NA       |Naboo   |Droid       |
|4  |Anakin Skywalker|1.88  |azul         |41.9BBY        |masculino|Tatooine|Humano      |
|5  |Leia Organa     |1.5   |castanho     |19BBY          |feminino |Alderaan|Humano      |
|6  |Han Solo        |1.8   |castanho     |29BBY          |masculino|Corellia|Humano      |
|7  |Yoda            |0.66  |castanho     |896BBY         |masculino|null    |Yoda Especie|
+---+----------------+------+-------------+---------------+---------+--------+--

![merge_tables](images/merge_tables.png)

In [7]:
df_star_wars_new = spark.createDataFrame(
    [
        (1, 'Luke Skywalker', 1.72,'azul','19BBY','masculino','Tatooine','Humano'),
        (2, 'C-3PO',1.67,'amarelo','112BBY','NA','Tatooine','Droid'),
        (3, 'R2-D2', 0.67, 'vermelho','33BBY','NA','Naboo','Droid'),
        (4, 'Anakin Skywalker', 1.88, 'azul','41.9BBY','masculino','Tatooine','Humano'),
        (5, 'Leia Organa', 1.50,'castanho','19BBY','feminino','Alderaan','Humano'),
        (6, 'Han Solo', 1.80, 'castanho', '29BBY', 'masculino', 'Corellia', 'Humano'),
        (7, 'Yoda', 0.66, 'castanho', '896BBY', 'masculino', None, 'Yoda Especie'),
        (8, 'Chewbacca', 2.28, 'azul', '200BBY','masculino', 'Kashyyyk', 'Wookiee'),
        (9, 'Boba Fett', 1.83, 'castanho', '31.5BBY','masculino','Kamino', 'Humano'),
        (10, 'Palpatine', 1.70, 'amarelo', '82BBY','masculino','Naboo','Humano'),
        
     
    ],
        ['id', 'nome', 'altura', 'cor_dos_olhos','data_nascimento','sexo','planeta','especie']
)

df_star_wars_new.show()

+---+----------------+------+-------------+---------------+---------+--------+------------+
| id|            nome|altura|cor_dos_olhos|data_nascimento|     sexo| planeta|     especie|
+---+----------------+------+-------------+---------------+---------+--------+------------+
|  1|  Luke Skywalker|  1.72|         azul|          19BBY|masculino|Tatooine|      Humano|
|  2|           C-3PO|  1.67|      amarelo|         112BBY|       NA|Tatooine|       Droid|
|  3|           R2-D2|  0.67|     vermelho|          33BBY|       NA|   Naboo|       Droid|
|  4|Anakin Skywalker|  1.88|         azul|        41.9BBY|masculino|Tatooine|      Humano|
|  5|     Leia Organa|   1.5|     castanho|          19BBY| feminino|Alderaan|      Humano|
|  6|        Han Solo|   1.8|     castanho|          29BBY|masculino|Corellia|      Humano|
|  7|            Yoda|  0.66|     castanho|         896BBY|masculino|    null|Yoda Especie|
|  8|       Chewbacca|  2.28|         azul|         200BBY|masculino|Kashyyyk|  

In [8]:
table = DeltaTable.forPath(spark, path)

table.alias("persisteddata") .merge( \
   df_star_wars_new.alias("newdata"), \
    "persisteddata.id = newdata.id") \
.whenMatchedUpdateAll() \
.whenNotMatchedInsertAll() \
.execute()

In [9]:
table.toDF().orderBy("id").show(truncate=False)

+---+----------------+------+-------------+---------------+---------+--------+------------+
|id |nome            |altura|cor_dos_olhos|data_nascimento|sexo     |planeta |especie     |
+---+----------------+------+-------------+---------------+---------+--------+------------+
|1  |Luke Skywalker  |1.72  |azul         |19BBY          |masculino|Tatooine|Humano      |
|2  |C-3PO           |1.67  |amarelo      |112BBY         |NA       |Tatooine|Droid       |
|3  |R2-D2           |0.67  |vermelho     |33BBY          |NA       |Naboo   |Droid       |
|4  |Anakin Skywalker|1.88  |azul         |41.9BBY        |masculino|Tatooine|Humano      |
|5  |Leia Organa     |1.5   |castanho     |19BBY          |feminino |Alderaan|Humano      |
|6  |Han Solo        |1.8   |castanho     |29BBY          |masculino|Corellia|Humano      |
|7  |Yoda            |0.66  |castanho     |896BBY         |masculino|null    |Yoda Especie|
|8  |Chewbacca       |2.28  |azul         |200BBY         |masculino|Kashyyyk|Wo

![log_transaction](images/log_transaction.png)

![new_files](images/new_files.png)

![update](images/update.png)

In [10]:
table = DeltaTable.forPath(spark, path)
table.update("id = 4", 
            { "nome":"'Darth Vader'",
            "altura":"2.02",
            "cor_dos_olhos": "'amarelo'"} )

In [11]:
table.toDF().orderBy("id").show(truncate=False)

+---+--------------+------+-------------+---------------+---------+--------+------------+
|id |nome          |altura|cor_dos_olhos|data_nascimento|sexo     |planeta |especie     |
+---+--------------+------+-------------+---------------+---------+--------+------------+
|1  |Luke Skywalker|1.72  |azul         |19BBY          |masculino|Tatooine|Humano      |
|2  |C-3PO         |1.67  |amarelo      |112BBY         |NA       |Tatooine|Droid       |
|3  |R2-D2         |0.67  |vermelho     |33BBY          |NA       |Naboo   |Droid       |
|4  |Darth Vader   |2.02  |amarelo      |41.9BBY        |masculino|Tatooine|Humano      |
|5  |Leia Organa   |1.5   |castanho     |19BBY          |feminino |Alderaan|Humano      |
|6  |Han Solo      |1.8   |castanho     |29BBY          |masculino|Corellia|Humano      |
|7  |Yoda          |0.66  |castanho     |896BBY         |masculino|null    |Yoda Especie|
|8  |Chewbacca     |2.28  |azul         |200BBY         |masculino|Kashyyyk|Wookiee     |
|9  |Boba 

![delete](images/delete.png)

In [12]:
table = DeltaTable.forPath(spark, path)
table.delete("id=9")

In [13]:
table.toDF().orderBy("id").show(truncate=False)

+---+--------------+------+-------------+---------------+---------+--------+------------+
|id |nome          |altura|cor_dos_olhos|data_nascimento|sexo     |planeta |especie     |
+---+--------------+------+-------------+---------------+---------+--------+------------+
|1  |Luke Skywalker|1.72  |azul         |19BBY          |masculino|Tatooine|Humano      |
|2  |C-3PO         |1.67  |amarelo      |112BBY         |NA       |Tatooine|Droid       |
|3  |R2-D2         |0.67  |vermelho     |33BBY          |NA       |Naboo   |Droid       |
|4  |Darth Vader   |2.02  |amarelo      |41.9BBY        |masculino|Tatooine|Humano      |
|5  |Leia Organa   |1.5   |castanho     |19BBY          |feminino |Alderaan|Humano      |
|6  |Han Solo      |1.8   |castanho     |29BBY          |masculino|Corellia|Humano      |
|7  |Yoda          |0.66  |castanho     |896BBY         |masculino|null    |Yoda Especie|
|8  |Chewbacca     |2.28  |azul         |200BBY         |masculino|Kashyyyk|Wookiee     |
|10 |Palpa

![schema_validation](images/schema_validation.png)

In [14]:
df_star_wars_schema = spark.createDataFrame(
    [
        (1, 'Luke Skywalker', 1.72,'azul','19BBY','masculino','Tatooine','Humano','Jedi'),
        (11, 'Obi-Wan Kenobi', 1.82,'azul','64BBY','masculino','Eriadu','Humano','Jedi'),
     
    ],
        ['id', 'nome', 'altura', 'cor_dos_olhos','data_nascimento','sexo','planeta','especie', 'titulo']
)
df_star_wars_schema.show(truncate=False)

+---+--------------+------+-------------+---------------+---------+--------+-------+------+
|id |nome          |altura|cor_dos_olhos|data_nascimento|sexo     |planeta |especie|titulo|
+---+--------------+------+-------------+---------------+---------+--------+-------+------+
|1  |Luke Skywalker|1.72  |azul         |19BBY          |masculino|Tatooine|Humano |Jedi  |
|11 |Obi-Wan Kenobi|1.82  |azul         |64BBY          |masculino|Eriadu  |Humano |Jedi  |
+---+--------------+------+-------------+---------------+---------+--------+-------+------+



In [18]:
table = DeltaTable.forPath(spark, path)
table.alias("persisteddata") .merge( \
   df_star_wars_schema.alias("newdata"), \
    "persisteddata.id = newdata.id") \
.whenMatchedUpdateAll() \
.whenNotMatchedInsertAll()\
.execute()

In [19]:
table.toDF().orderBy("id").show(truncate=False)

+---+--------------+------+-------------+---------------+---------+--------+------------+------+
|id |nome          |altura|cor_dos_olhos|data_nascimento|sexo     |planeta |especie     |titulo|
+---+--------------+------+-------------+---------------+---------+--------+------------+------+
|1  |Luke Skywalker|1.72  |azul         |19BBY          |masculino|Tatooine|Humano      |Jedi  |
|2  |C-3PO         |1.67  |amarelo      |112BBY         |NA       |Tatooine|Droid       |null  |
|3  |R2-D2         |0.67  |vermelho     |33BBY          |NA       |Naboo   |Droid       |null  |
|4  |Darth Vader   |2.02  |amarelo      |41.9BBY        |masculino|Tatooine|Humano      |null  |
|5  |Leia Organa   |1.5   |castanho     |19BBY          |feminino |Alderaan|Humano      |null  |
|6  |Han Solo      |1.8   |castanho     |29BBY          |masculino|Corellia|Humano      |null  |
|7  |Yoda          |0.66  |castanho     |896BBY         |masculino|null    |Yoda Especie|null  |
|8  |Chewbacca     |2.28  |azu

![history_01](images/history_01.png)

![history](images/history.png)

In [20]:
#Dataset original
df_star_wars = spark.read.format("delta").option("versionAsOf", 0).load(path)
df_star_wars.orderBy("id").show()

+---+----------------+------+-------------+---------------+---------+--------+------------+
| id|            nome|altura|cor_dos_olhos|data_nascimento|     sexo| planeta|     especie|
+---+----------------+------+-------------+---------------+---------+--------+------------+
|  1|  Luke Skywalker|  1.72|         azul|          19BBY|masculino|Tatooine|      Humano|
|  2|           C-3PO|  1.67|      amarelo|         112BBY|       NA|Tatooine|       Droid|
|  3|           R2-D2|  0.67|     vermelho|          33BBY|       NA|   Naboo|       Droid|
|  4|Anakin Skywalker|  1.88|         azul|        41.9BBY|masculino|Tatooine|      Humano|
|  5|     Leia Organa|   1.5|     castanho|          19BBY| feminino|Alderaan|      Humano|
|  6|        Han Solo|   1.8|     castanho|          29BBY|masculino|Corellia|      Humano|
|  7|            Yoda|  0.66|     castanho|         896BBY|masculino|    null|Yoda Especie|
+---+----------------+------+-------------+---------------+---------+--------+--

In [21]:
df_star_wars = spark.read.format("delta").option("versionAsOf", 2).load(path)
df_star_wars.orderBy("id").show()

+---+--------------+------+-------------+---------------+---------+--------+------------+
| id|          nome|altura|cor_dos_olhos|data_nascimento|     sexo| planeta|     especie|
+---+--------------+------+-------------+---------------+---------+--------+------------+
|  1|Luke Skywalker|  1.72|         azul|          19BBY|masculino|Tatooine|      Humano|
|  2|         C-3PO|  1.67|      amarelo|         112BBY|       NA|Tatooine|       Droid|
|  3|         R2-D2|  0.67|     vermelho|          33BBY|       NA|   Naboo|       Droid|
|  4|   Darth Vader|  2.02|      amarelo|        41.9BBY|masculino|Tatooine|      Humano|
|  5|   Leia Organa|   1.5|     castanho|          19BBY| feminino|Alderaan|      Humano|
|  6|      Han Solo|   1.8|     castanho|          29BBY|masculino|Corellia|      Humano|
|  7|          Yoda|  0.66|     castanho|         896BBY|masculino|    null|Yoda Especie|
|  8|     Chewbacca|  2.28|         azul|         200BBY|masculino|Kashyyyk|     Wookiee|
|  9|     

![history_02](images/history_02.png)

## A recuperação de histórico ainda não funciona em todos os ambientes do Delta Lake

In [22]:
fullHistoryDF = table.history() 

In [23]:
fullHistoryDF.show()

+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+
|version|          timestamp|userId|userName|operation| operationParameters| job|notebook|clusterId|readVersion|isolationLevel|isBlindAppend|    operationMetrics|userMetadata|
+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+
|      5|2021-12-01 22:10:00|  null|    null|    MERGE|[predicate -> (pe...|null|    null|     null|          4|          null|        false|[numTargetRowsCop...|        null|
|      4|2021-12-01 22:09:29|  null|    null|    MERGE|[predicate -> (pe...|null|    null|     null|          3|          null|        false|[numTargetRowsCop...|        null|
|      3|2021-12-01 22:08:56|  null|    null|   DELETE|[predicate -> ["(...|null|    null|     null|          2|        

In [24]:
lastOperationDF = table.history(1) 

In [25]:
lastOperationDF.show(truncate=True)

+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+
|version|          timestamp|userId|userName|operation| operationParameters| job|notebook|clusterId|readVersion|isolationLevel|isBlindAppend|    operationMetrics|userMetadata|
+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+
|      5|2021-12-01 22:10:00|  null|    null|    MERGE|[predicate -> (pe...|null|    null|     null|          4|          null|        false|[numTargetRowsCop...|        null|
+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+



In [26]:
df_star_wars = spark.read.format("delta").option("versionAsOf", 5).load(path)
df_star_wars.orderBy("id").show()

+---+--------------+------+-------------+---------------+---------+--------+------------+------+
| id|          nome|altura|cor_dos_olhos|data_nascimento|     sexo| planeta|     especie|titulo|
+---+--------------+------+-------------+---------------+---------+--------+------------+------+
|  1|Luke Skywalker|  1.72|         azul|          19BBY|masculino|Tatooine|      Humano|  Jedi|
|  2|         C-3PO|  1.67|      amarelo|         112BBY|       NA|Tatooine|       Droid|  null|
|  3|         R2-D2|  0.67|     vermelho|          33BBY|       NA|   Naboo|       Droid|  null|
|  4|   Darth Vader|  2.02|      amarelo|        41.9BBY|masculino|Tatooine|      Humano|  null|
|  5|   Leia Organa|   1.5|     castanho|          19BBY| feminino|Alderaan|      Humano|  null|
|  6|      Han Solo|   1.8|     castanho|          29BBY|masculino|Corellia|      Humano|  null|
|  7|          Yoda|  0.66|     castanho|         896BBY|masculino|    null|Yoda Especie|  null|
|  8|     Chewbacca|  2.28|   

In [28]:
fullHistoryDF.select("version","timestamp")\
    .orderBy("version")\
    .show(truncate=False)

+-------+-------------------+
|version|timestamp          |
+-------+-------------------+
|0      |2021-12-01 22:08:01|
|1      |2021-12-01 22:08:15|
|2      |2021-12-01 22:08:49|
|3      |2021-12-01 22:08:56|
|4      |2021-12-01 22:09:29|
|5      |2021-12-01 22:10:00|
+-------+-------------------+



![storage](images/storage.png)

![vacuum](images/vacuum.png)

In [29]:
table = DeltaTable.forPath(spark, path)

### O VACUUM recebe parâmetros de período distintos em Horas.

In [30]:
table.vacuum(1000)

DataFrame[]

![contacts](images/contacts.png)

In [41]:
!rm -rf tmp/