# ````Welcome to Data + AI Summit 2021````
<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTe0AKMu5fuPm_vCOaeILtMxGYQABZQbaMpvg&usqp=CAU"width=100/><img src="https://docs.delta.io/latest/_static/delta-lake-logo.png" width=150/> <img src="https://avatars.githubusercontent.com/u/10746780?s=280&v=4" width=100/><img src="https://www.mlflow.org/docs/latest/_static/MLflow-logo-final-black.png"width=100/>
###Becoming a Data Driven Organization with Modern Lakehouse

In this demonstration, we will be building a unified pipeline with Lakehouse architecture:

- How to ingest streaming data into raw table
- How to ingest your existing data in batch process
- How to build the Data Engineering pipelines using **Bronze/Silver/Gold layers with Delta**   
- Leverage **Lakehouse architecture for Databricks SQL (visualization) and downstream ML pipelines**

##Databricks components
* Databricks Runtime 4.2 or greater

## Datasets Used
* Read Wikipedia edits in real time, with a multitude of different languages. 
* Aggregate the anonymous edits by country, over a window, to see who's editing the English Wikipedia over time.

In [0]:
%run "./Includes/Classroom-Setup-07"

-sandbox
##  Delta Medallion Architecture




<div><img src="https://files.training.databricks.com/images/eLearning/Delta/delta.png" style="height: 350px"/></div><br/>

In [0]:
bronzePath     = basePath + "/wikipedia/bronze.delta"
silverPath     = basePath + "/wikipedia/silver.delta"
checkpointPath = basePath + "/checkpoints"

In [0]:
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, DoubleType, BooleanType

schema = StructType([
  StructField("channel", StringType(), True),
  StructField("comment", StringType(), True),
  StructField("delta", IntegerType(), True),
  StructField("flag", StringType(), True),
  StructField("geocoding", StructType([                 
    StructField("city", StringType(), True),
    StructField("country", StringType(), True),
    StructField("countryCode2", StringType(), True),
    StructField("countryCode3", StringType(), True),
    StructField("stateProvince", StringType(), True),
    StructField("latitude", DoubleType(), True),
    StructField("longitude", DoubleType(), True),
  ]), True),
  StructField("isAnonymous", BooleanType(), True),    
  StructField("isNewPage", BooleanType(), True),
  StructField("isRobot", BooleanType(), True),
  StructField("isUnpatrolled", BooleanType(), True),
  StructField("namespace", StringType(), True),         
  StructField("page", StringType(), True),             
  StructField("pageURL", StringType(), True),         
  StructField("timestamp", StringType(), True),     
  StructField("url", StringType(), True),
  StructField("user", StringType(), True),            
  StructField("userURL", StringType(), True),
  StructField("wikipediaURL", StringType(), True),
  StructField("wikipedia", StringType(), True),         
])

In [0]:
from pyspark.sql.functions import from_json, col
(spark.readStream
  .format("kafka")  
  .option("kafka.bootstrap.servers", "server1.databricks.training:9092") 
  .option("subscribe", "en")
  .load()
  .withColumn("json", from_json(col("value").cast("string"), schema))
  .select(col("timestamp").alias("kafka_timestamp"), col("json.*"))
  .writeStream
  .format("delta")
  .option("checkpointLocation", checkpointPath + "/bronze")
  .outputMode("append")
  .queryName("stream_1p")
  .start(bronzePath)
)

Wait until stream is done initializing...

In [0]:
untilStreamIsReady("stream_1p")

Take a look the first row of the raw table without explicitly creating a table.

In [0]:
bronzeDF = spark.sql("SELECT * FROM delta.`{}` limit 3".format(bronzePath))
display(bronzeDF)

kafka_timestamp,channel,comment,delta,flag,geocoding,isAnonymous,isNewPage,isRobot,isUnpatrolled,namespace,page,pageURL,timestamp,url,user,userURL,wikipediaURL,wikipedia
1969-12-31T23:59:59.999+0000,#en.wikipedia,[[WP:AES|���]]Created page with ' __NOINDEX__ Please do not edit this page. ���{{,,,,,,,,,,,,,,,,


In [0]:
from pyspark.sql.functions import unix_timestamp, col

(spark.readStream
  .format("delta")
  .load(bronzePath)
  .select(col("wikipedia"),
          col("isAnonymous"),
          col("namespace"),
          col("page"),
          col("pageURL"),
          col("geocoding"),
          unix_timestamp(col("timestamp"), "yyyy-MM-dd'T'HH:mm:ss.SSSX").cast("timestamp").alias("timestamp"),
          col("user"))
  .writeStream
  .format("delta")
  .option("checkpointLocation", checkpointPath + "/silver")
  .outputMode("append")
  .queryName("stream_2p")
  .start(silverPath)
)

Wait until the stream is done initializing...

In [0]:
untilStreamIsReady("stream_2p")

In [0]:
silverDF = spark.sql("SELECT * FROM delta.`{}` limit 3".format(silverPath))
display(silverDF)

wikipedia,isAnonymous,namespace,page,pageURL,geocoding,timestamp,user
en,True,article,GMC Hummer EV,http://en.wikipedia.org/wiki/GMC_Hummer_EV,"List(null, United States, US, USA, null, 39.7599983215332, 39.7599983215332)",2021-05-18T13:37:05.000+0000,2600:6C64:507F:E6E1:4521:AFF8:5B35:BA8A
en,False,article,Hamza ibn Abdul-Muttalib,http://en.wikipedia.org/wiki/Hamza_ibn_Abdul-Muttalib,"List(null, null, null, null, null, null, null)",2021-05-18T13:37:05.000+0000,Iylaq
en,False,article,Lucius Aurelius Agaclytus,http://en.wikipedia.org/wiki/Lucius_Aurelius_Agaclytus,"List(null, null, null, null, null, null, null)",2021-05-18T13:37:06.000+0000,*Treker


In [0]:
from pyspark.sql.functions import col, desc, count

goldDF = (spark.readStream
  .format("delta")
  .load(silverPath)
  .withColumn("countryCode", col("geocoding.countryCode3"))
  .filter(col("namespace") == "article")
  .filter(col("countryCode") != "null")
  .filter(col("isAnonymous") == True)
  .groupBy(col("countryCode"))
  .count() 
  .withColumnRenamed("count", "total")
  .orderBy(col("total").desc())
)

## Creating Visualizations (aka "platinum" level)

In [0]:
display(goldDF, streamName = "stream_3p")

countryCode,total
GBR,38
USA,36
AUS,7
IND,7
POL,7
IDN,4
BRA,3
SWE,3
AZE,3
CAN,3


Wait for the streams initialize

In [0]:
untilStreamIsReady("stream_3p")

Make sure all streams are stopped.

In [0]:
for s in spark.streams.active:
    s.stop()

##Batch Pipelines

In [0]:
%sh pip install faker

In [0]:
%python
####FUNCTIONS
import pandas as pd
from faker import Factory
import pandas as pd
import random

from faker import Faker
fake = Faker()

###VARIABLE DEFINITIONS
c_current_cdemo_sk = ["c1", "c2", "c3", "c4", "c5"]
c_current_hdemo_sk = ["h1", "h2", "h3", "h4", "h5"]
salutations = ["Mr.", "Mrs.", "Ms", "Dr", "Prof", "None"]
print("customer_id : ", fake.uuid4())
print("customer_sk : ", fake.sha256())
print("current_cdemo_sk : ", fake.words(1, c_current_cdemo_sk, True))
print("current_hdemo_sk : ", fake.words(1, c_current_hdemo_sk, True))
print("current_addr_sk : ", fake.address())
print("first_shipto_date_sk", fake.date())
print("c_first_sales_date_sk", fake.date())
print("c_salutation : ", fake.words(1, salutations, True)[0])
print("c_first_name : ", fake.first_name())
print("c_last_name : ", fake.last_name())
print("c_preferred_cust_flag : ", fake.words(1, ["Y", "N"], True)[0])
print("c_birth_year : ", fake.year())
print("c_birth_country : ", fake.words(1, ["AB", "BC", "CD"], True)[0])
print("c_email_address : ", fake.email())
print("c_last_review_date : ", fake.date())


df1 = pd.DataFrame(columns=("c_customer_id", "c_customer_sk", "c_current_cdemo_sk", "c_current_hdemo_sk", "c_first_shipto_date_sk", "c_first_sales_date_sk", "c_first_name", "c_last_name", "c_preferred_cust_flag", "c_email_address", "c_last_review_date"))

for i in range(100):
  userRecord = [fake.uuid4(), \
                fake.sha256(), \
                fake.words(1, c_current_cdemo_sk, True)[0], \
                fake.words(1, c_current_hdemo_sk, True)[0], \
                fake.date(), \
                fake.date(), \
                fake.first_name(), \
                fake.last_name(), \
                fake.words(1, ["Y", "N"], True)[0], \
                fake.email(), \
                fake.date()] 
  df1.loc[i] = [item for item in userRecord]
  
  
customer_data_bronze = spark.createDataFrame(df1)
display(customer_data_bronze)

c_customer_id,c_customer_sk,c_current_cdemo_sk,c_current_hdemo_sk,c_first_shipto_date_sk,c_first_sales_date_sk,c_first_name,c_last_name,c_preferred_cust_flag,c_email_address,c_last_review_date
7c6dd528-9aef-4992-9369-328fdbdf7903,8f171270a702a4940daeef58516fa8b3d9df39a426b680a14180386541148cdd,c2,h3,1974-11-17,1998-10-16,Leslie,Green,N,davidsmith@hotmail.com,1994-01-06
45b35dcc-1512-4b30-a9e5-3e1f18dced6b,2e4cab5c3451173f7d6e32d4a419904f3799fd65b683f6a60f5ca2f5d3f6b103,c2,h2,1981-02-12,1984-10-24,Jennifer,Walter,Y,thompsoncourtney@howard.com,1971-03-04
36577dbe-3f6c-415c-b851-5adbdad17fc3,ad158fefceb8aa15932360f2d89cdb61ed8b854348b19f423e56d76cf5cb4615,c4,h5,1974-05-15,1984-04-18,Brandi,Clark,Y,richardjoan@yahoo.com,1985-01-01
1cd0ba4b-41e3-400c-9469-31a2be8740ff,29d2e90956ab983275d8be2309d5375bc245de7938885e0c7f85b4e0d4a5c663,c2,h3,2018-12-29,2011-02-09,Pamela,Yang,Y,beckdavid@foster.net,1995-04-07
42ea1ff8-9157-43b7-907b-143a8ac10922,5491dae2af3538dfed6bfc04b2e8ce6f85277b9ed4cf379340b9c36605fd423c,c4,h2,1982-10-11,1973-09-25,Steven,Brown,Y,samantha44@parker.biz,1973-11-05
47e8b17b-3b0b-446e-903f-f768b4cf42ec,3d82c864c043c272812fb0a5bb16c7ee95437c7cd4fcdfe068f68530278d59c7,c5,h2,1984-11-26,2000-12-02,Debra,Miller,N,dosborne@good.com,1998-08-17
ec53aec8-5cc8-49eb-b961-b7df5e6608d7,931edb45a4f6d77b9c697cc917312bd47d09d5d17b2cd7bb83fe189bff9fec4b,c1,h1,1983-05-13,1996-07-09,Donna,Henry,Y,sjohnson@hotmail.com,1982-03-01
c06f1c7f-6769-4f65-99f4-2bd3fd4644da,3bbdc8781cfa31187cba01b3768a76233f6d26819aabd48699fd824ce2475cd8,c3,h4,1990-03-19,2004-10-13,Nathan,Wilson,Y,perezryan@ward-robles.com,2004-03-02
6c3d6360-e1de-4890-b1bf-270daab50106,8072906ebbc4e4a0cd5bb2c2c07daedfbe40ab70a116cfee73c5d68826c54e8c,c2,h2,1980-08-23,1992-04-14,Jordan,Gibson,Y,barbara71@gmail.com,1981-01-19
2e7e6336-71c0-444e-adcd-693aaf26ecd9,7c1f325039c2ab0105eb7c313c6c4b6851e506b8c1c3f440984e6a9c887c7ad4,c1,h2,2002-12-28,1994-06-29,Jennifer,Stevens,Y,anthonybryant@hart-washington.biz,2001-10-04


In [0]:
customer_data_bronze = spark.createDataFrame(df1)
display(customer_data_bronze)

c_customer_id,c_customer_sk,c_current_cdemo_sk,c_current_hdemo_sk,c_first_shipto_date_sk,c_first_sales_date_sk,c_first_name,c_last_name,c_preferred_cust_flag,c_email_address,c_last_review_date
7c6dd528-9aef-4992-9369-328fdbdf7903,8f171270a702a4940daeef58516fa8b3d9df39a426b680a14180386541148cdd,c2,h3,1974-11-17,1998-10-16,Leslie,Green,N,davidsmith@hotmail.com,1994-01-06
45b35dcc-1512-4b30-a9e5-3e1f18dced6b,2e4cab5c3451173f7d6e32d4a419904f3799fd65b683f6a60f5ca2f5d3f6b103,c2,h2,1981-02-12,1984-10-24,Jennifer,Walter,Y,thompsoncourtney@howard.com,1971-03-04
36577dbe-3f6c-415c-b851-5adbdad17fc3,ad158fefceb8aa15932360f2d89cdb61ed8b854348b19f423e56d76cf5cb4615,c4,h5,1974-05-15,1984-04-18,Brandi,Clark,Y,richardjoan@yahoo.com,1985-01-01
1cd0ba4b-41e3-400c-9469-31a2be8740ff,29d2e90956ab983275d8be2309d5375bc245de7938885e0c7f85b4e0d4a5c663,c2,h3,2018-12-29,2011-02-09,Pamela,Yang,Y,beckdavid@foster.net,1995-04-07
42ea1ff8-9157-43b7-907b-143a8ac10922,5491dae2af3538dfed6bfc04b2e8ce6f85277b9ed4cf379340b9c36605fd423c,c4,h2,1982-10-11,1973-09-25,Steven,Brown,Y,samantha44@parker.biz,1973-11-05
47e8b17b-3b0b-446e-903f-f768b4cf42ec,3d82c864c043c272812fb0a5bb16c7ee95437c7cd4fcdfe068f68530278d59c7,c5,h2,1984-11-26,2000-12-02,Debra,Miller,N,dosborne@good.com,1998-08-17
ec53aec8-5cc8-49eb-b961-b7df5e6608d7,931edb45a4f6d77b9c697cc917312bd47d09d5d17b2cd7bb83fe189bff9fec4b,c1,h1,1983-05-13,1996-07-09,Donna,Henry,Y,sjohnson@hotmail.com,1982-03-01
c06f1c7f-6769-4f65-99f4-2bd3fd4644da,3bbdc8781cfa31187cba01b3768a76233f6d26819aabd48699fd824ce2475cd8,c3,h4,1990-03-19,2004-10-13,Nathan,Wilson,Y,perezryan@ward-robles.com,2004-03-02
6c3d6360-e1de-4890-b1bf-270daab50106,8072906ebbc4e4a0cd5bb2c2c07daedfbe40ab70a116cfee73c5d68826c54e8c,c2,h2,1980-08-23,1992-04-14,Jordan,Gibson,Y,barbara71@gmail.com,1981-01-19
2e7e6336-71c0-444e-adcd-693aaf26ecd9,7c1f325039c2ab0105eb7c313c6c4b6851e506b8c1c3f440984e6a9c887c7ad4,c1,h2,2002-12-28,1994-06-29,Jennifer,Stevens,Y,anthonybryant@hart-washington.biz,2001-10-04


In [0]:
customer_data_bronze.write.mode("overwrite").format("delta").saveAsTable("default.customer_data_bronze")

In [0]:
%python
import pyspark.sql.functions as f
###Psedonymize identifiable info in original table
customer_data_silver = customer_data_bronze.withColumn("c_email_address_pseudonym", f.sha2(customer_data_bronze['c_email_address'], 256)) 

In [0]:
display(customer_data_silver)

c_customer_id,c_customer_sk,c_current_cdemo_sk,c_current_hdemo_sk,c_first_shipto_date_sk,c_first_sales_date_sk,c_first_name,c_last_name,c_preferred_cust_flag,c_email_address,c_last_review_date,c_email_address_pseudonym
7c6dd528-9aef-4992-9369-328fdbdf7903,8f171270a702a4940daeef58516fa8b3d9df39a426b680a14180386541148cdd,c2,h3,1974-11-17,1998-10-16,Leslie,Green,N,davidsmith@hotmail.com,1994-01-06,6423b68d988a6047b8993a90ebc0e6e58355a46fcf41430e6494ddc4dddbe22a
45b35dcc-1512-4b30-a9e5-3e1f18dced6b,2e4cab5c3451173f7d6e32d4a419904f3799fd65b683f6a60f5ca2f5d3f6b103,c2,h2,1981-02-12,1984-10-24,Jennifer,Walter,Y,thompsoncourtney@howard.com,1971-03-04,059dc6d7e569c4450e63d41a7fb6582fe1f22d41fc2f0f3ad8bcdac77d5b10b9
36577dbe-3f6c-415c-b851-5adbdad17fc3,ad158fefceb8aa15932360f2d89cdb61ed8b854348b19f423e56d76cf5cb4615,c4,h5,1974-05-15,1984-04-18,Brandi,Clark,Y,richardjoan@yahoo.com,1985-01-01,84265488e4a06de685f4fdd3d7ce635393738b4c77701bf22a93c929876807b0
1cd0ba4b-41e3-400c-9469-31a2be8740ff,29d2e90956ab983275d8be2309d5375bc245de7938885e0c7f85b4e0d4a5c663,c2,h3,2018-12-29,2011-02-09,Pamela,Yang,Y,beckdavid@foster.net,1995-04-07,ee1ea9d4a1aa726ec9aa97fedbfff9d9f14b688cf972986ae6c1b924299f27c8
42ea1ff8-9157-43b7-907b-143a8ac10922,5491dae2af3538dfed6bfc04b2e8ce6f85277b9ed4cf379340b9c36605fd423c,c4,h2,1982-10-11,1973-09-25,Steven,Brown,Y,samantha44@parker.biz,1973-11-05,b437e829bfd035e189fe1d2565883665f4048ead4ad53b7a24a2e3d6ba89213e
47e8b17b-3b0b-446e-903f-f768b4cf42ec,3d82c864c043c272812fb0a5bb16c7ee95437c7cd4fcdfe068f68530278d59c7,c5,h2,1984-11-26,2000-12-02,Debra,Miller,N,dosborne@good.com,1998-08-17,8e5b2f448577c6a7c96a54a6cff9fcf7c3f2bb173c9e2f23bd46608248fb186c
ec53aec8-5cc8-49eb-b961-b7df5e6608d7,931edb45a4f6d77b9c697cc917312bd47d09d5d17b2cd7bb83fe189bff9fec4b,c1,h1,1983-05-13,1996-07-09,Donna,Henry,Y,sjohnson@hotmail.com,1982-03-01,835a3b1322cddc90ea96fbfdaff3135bf30a8ca4cae09fe1f0500a549b42d642
c06f1c7f-6769-4f65-99f4-2bd3fd4644da,3bbdc8781cfa31187cba01b3768a76233f6d26819aabd48699fd824ce2475cd8,c3,h4,1990-03-19,2004-10-13,Nathan,Wilson,Y,perezryan@ward-robles.com,2004-03-02,1effb8944448bed3b672dab0eacb2cc6707ad54778411be53fdb413f467d2606
6c3d6360-e1de-4890-b1bf-270daab50106,8072906ebbc4e4a0cd5bb2c2c07daedfbe40ab70a116cfee73c5d68826c54e8c,c2,h2,1980-08-23,1992-04-14,Jordan,Gibson,Y,barbara71@gmail.com,1981-01-19,fde5df7f042cdaec545a5f9691811c6e74aaf0186abe48059c7ba939bf091e82
2e7e6336-71c0-444e-adcd-693aaf26ecd9,7c1f325039c2ab0105eb7c313c6c4b6851e506b8c1c3f440984e6a9c887c7ad4,c1,h2,2002-12-28,1994-06-29,Jennifer,Stevens,Y,anthonybryant@hart-washington.biz,2001-10-04,42ed8355d46ccd9fb26552eaca54ddadf4d5e0c5099b749afe9c2c1ec87a2298


In [0]:
%python
cust_lookup_silver2 = customer_data_silver.select("c_email_address_pseudonym", "c_email_address", "c_first_name", "c_last_name")

cust_lookup_silver2.write.mode("overwrite").format("delta").saveAsTable("default.customer_lookup_silver2")

In [0]:
%python
##drop the identifiable columns from the original table
customer_data_silver.drop("c_email_address").drop("c_first_name").drop("c_last_name").write.mode("overwrite").format("delta").saveAsTable("default.customer_pseudo_gold")

In [0]:
%sql
select * from default.customer_pseudo_gold

c_customer_id,c_customer_sk,c_current_cdemo_sk,c_current_hdemo_sk,c_first_shipto_date_sk,c_first_sales_date_sk,c_preferred_cust_flag,c_last_review_date,c_email_address_pseudonym
46f7ec95-de70-43e2-9734-b4ba143f184d,4236043226a817d9bc26e4e1bb2549777c8cc89c4097c382e89e5527a71f335e,c4,h5,1988-05-13,1977-08-12,N,1973-04-04,da5389d056db1fdee5a1897f8a0d22fa03080ff9939705ab1571747ff45057a1
c82efb6c-0427-4f6e-aa5f-7469803b3607,f6d8f2048df41eeac5d8b64184c36c236a5a83d973758ad1ec81873e35d78ad6,c3,h4,2014-08-20,2006-04-01,Y,1974-01-15,73ff72fb9412674b14914a193f8b784a5f2b0fa5e56722d07a79007cde4c7568
b6f5f599-8e4b-4bdf-bafd-c4d348222eaf,4f82e95f33e034fcc867c5a4ee486a9742fecf5060561905db979ecbb621667e,c2,h3,1973-04-08,1990-01-07,N,1974-03-13,f98b4f80a77f74a9de8f0ef2b617752a6b02da9347b42827f1052a9776a0340d
c9330920-119f-4f0f-ad10-72e6fc13c800,0bed0244ba0478571bb97a062b69df58521fc160f2075a2b79e47fa788d6277a,c1,h3,1974-05-22,2006-09-04,Y,2001-08-12,11fc843bda71f5777eefc7a927ab472353fa917a95e4b6019ffeb25954c1d84e
231bbb3d-eafb-4936-9393-aad65d746e90,ae38fe25bcf38204b57da1952dc2686ec68d58ca8bb513a0b7e3901885cf2046,c3,h1,1981-07-19,1999-05-06,Y,1998-10-18,f567e60607eaf5403b53a414a9c0e7b600b6a73ad9af42007cbccc7558f7c951
f0a992ac-bdb1-405a-bcc4-40e16e5c587d,c97283eff96fe92ecb76356ff56b1603d7382f0ad488195e77e742294cc68e63,c1,h1,1983-08-27,2009-06-16,N,1980-08-02,cd6f6d7053050f9a2ced4762f8006d7ba8ed49e86cbf57f38a9fc1af53ba74b3
0c49f3f6-1f59-44d0-a3b5-5a2cba80e0d8,6b8b0114f36fa9fc6dba47640a8c09018b3b577e3db62ab87999764c6317d64b,c3,h2,1977-06-30,1993-01-19,Y,2012-03-15,9770bfb17d40e5737df78b2b09873b8d1f2796b403629b701b06c6f3cf84ce18
a54d3b8d-ef52-4d30-9e50-af36f8368a28,4076e706ba31e0724ebaa05783151e5eea75270df45556c2cda45e6d356b9e0a,c1,h3,2016-12-27,1995-12-29,Y,1996-06-28,0357e5b7cfad5c572f81136902b6fcb776ff9995c959924043f6e00208a9fb3d
08aac947-1f43-4ea2-8e07-b8139f9f99da,0e61c158aca4438f5dc2da0a1c9a6d8385bc8477c9ffe0eecd8035133a5c1301,c1,h3,2020-04-11,1971-08-07,Y,2002-07-06,705023d4a04baf3af96c6cd80bed787ebddefd50dc81405bf14005c23d9e7aa9
102ce1de-db4c-4a66-8617-4b0b94f0953d,6d5eccdd6e9ac827bc0d991086bc6bfdd110a93a7b746a0b840e8603d7548424,c4,h2,1973-12-15,1986-08-07,N,2015-03-13,0b4357e25be940a047831fd1824c85d99935ecac5fdceb4d87cea5e0581d339b


###1. Using Databricks SQL for Serving Layer for Businesses and Data Analysts
###2. MLflow for ML experiments

## Summary
In this lesson we:
* Learned about the Databricks Delta (reference) architecture.
* Used the Databricks Delta architecture to craft bronze, silver, gold and platinum queries.
* Produced beautiful visualizations of key business metrics.
* Did not have to explicitly create tables along the way.

## Additional Topics & Resources
* <a href="https://learning.oreilly.com/library/view/delta-lake-the/9781098104580/#" target="_blank">Delta Lake: The Definitive Guide</a>
* <a href="http://lambda-architecture.net/#" target="_blank">Lambda Architecture</a>
* <a href="https://bennyaustin.wordpress.com/2010/05/02/kimball-and-inmon-dw-models/#" target="_blank">Data Warehouse Models</a>
* <a href="https://people.apache.org//~pwendell/spark-nightly/spark-branch-2.1-docs/latest/structured-streaming-kafka-integration.html#" target="_blank">Reading structured streams from Kafka</a>
* <a href="http://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#creating-a-kafka-source-stream#" target="_blank">Create a Kafka Source Stream</a>
* <a href="https://docs.databricks.com/delta/delta-intro.html#case-study-multi-hop-pipelines#" target="_blank">Multi Hop Pipelines</a>

#######Get in the Early Release of ```Delta Lake: The Definitive Guide```

<img src="https://learning.oreilly.com/library/view/delta-lake-the/9781098104580/assets/cover.png"width=300/>

-sandbox
&copy; 2020 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="http://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="http://help.databricks.com/">Support</a>