
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning">
</div>


# Using the Delta Live Tables UI

This demo will explore the DLT UI. By the end of this lesson you will be able to: 

* Deploy a DLT pipeline
* Explore the resultant DAG
* Execute an update of the pipeline

## Classroom Setup

Run the following cell to configure your working environment for this course.

In [0]:
%run ./Includes/Classroom-Setup-04.1

## Generate Pipeline Configuration
Delta Live Tables (DLT) pipelines can be written in either SQL or python. In this course, we have written examples in both languages. In the code cell below, note that we are first going to look at the SQL example. 

We are going to manually configure a pipeline using the DLT UI. Configuring this pipeline will require parameters unique to a given user. Run the cell to print out values you'll use to configure your pipeline in subsequent steps.

In [0]:
pipeline_language = "SQL"
# pipeline_language = "Python"

DA.print_pipeline_config(pipeline_language)

## Create and Configure a Pipeline

Complete the following to configure the pipeline.

Steps:
1. Right-click the **Workflows** button on the left sidebar, and open the link in a new browser tab. Click **Delta Live Tables** in the upper-left corner, and click **Create Pipeline** in the upper-right corner. Then, return to this tab to complete the next steps.
2. Configure the pipeline as specified below. You'll need the values provided in the cell output above for this step.

| Setting | Instructions |
|--|--|
| Pipeline name | Enter the **Pipeline Name** provided above |
| Product edition | Choose **Advanced** |
| Pipeline mode | Choose **Triggered** |
| Paths | Use the navigator to select or enter all three notebook paths provided above |
| Storage options | Choose **Unity Catalog**  |
| Catalog | Choose your **Catalog** provided above |
| Target schema | Enter **default** |
| Cluster policy | Choose the **Policy** provided above |
| Cluster mode | Choose **Fixed size** to disable auto scaling for your cluster |
| Workers | Enter **1**  |
| Photon Acceleration | Check this checkbox to enable |
| Channel | Choose **Current** |
| Configuration | Click **Add Configuration** and input the **Key** and **Value** in the table below|


| Key                 | Value                                      |
| ------------------- | ------------------------------------------ |
| **`source`** | Enter the **source** provided above |

<br>

3. Click the **Create** button.
4. Verify that the pipeline mode is set to **Development**.

## Check Your Pipeline Configuration

1. In the Databricks workspace, open the Delta Live Tables (DLT) UI.

2. Select your pipeline configuration.

3. Review the pipeline configuration settings to ensure they are correctly configured according to the provided instructions.

4. **Important:** Remove the maintenance cluster if it is currently part of your pipeline configuration. This is required to successfully validate the pipeline configuration. Do this by clicking JSON in the upper-right corner and removing the code related to the maintenance cluster.

5. Once you've confirmed that the pipeline configuration is set up correctly and the maintenance cluster has been removed, proceed to the next steps for validating and running the pipeline.



In [0]:
DA.validate_pipeline_config(pipeline_language)

#### Additional Notes on Pipeline Configuration
Here are a few notes regarding the pipeline settings above:

- **Pipeline mode** - This specifies how the pipeline will be run. Choose the mode based on latency and cost requirements.
  - `Triggered` pipelines run once and then shut down until the next manual or scheduled update.
  - `Continuous` pipelines run continuously, ingesting new data as it arrives.
- **Notebook libraries** - Even though these documents are standard Databricks Notebooks, the SQL syntax is specialized to DLT table declarations. We will be exploring the syntax in the exercise that follows.
- **Storage location** - This optional field allows the user to specify a location to store logs, tables, and other information related to pipeline execution. If not specified, DLT will automatically generate a directory.
- **Catalog and Target schema** - These parameters are necessary to make data available outside the pipeline.
- **Cluster mode**, **Min Workers**, **Max Workers** - These fields control the worker configuration for the underlying cluster processing the pipeline. Here, we set the number of workers to 1 because using DLT with Unity Catalog requires at least one worker.
- **`Configuration variables`** - Key-value pairs that we add here will be passed to the notebooks used in the pipeline. We will look at the one variable we are using, **`source`**, in the next lesson. Please note that keys are case-sensitive.

## Full Refresh, Validate, Start
Click the dropdown immediately to the right of the **`Start`** button. There are two additional options (other than "Start").

- Full refresh - All live tables are updated to reflect the current state of their input data sources. For all streaming tables, Delta Live Tables attempts to clear all data from each table and then load all data from the streaming source.

  --**IMPORTANT NOTE**--  
  Because a full refresh clears all data from your current tables and uses the current state of data sources, it is possible for you to lose data if your data sources no longer contain the data you need. Be very careful when running full refreshes.

 - Validate - Builds a directed acyclic graph (DAG) and runs a syntax check but does not actually perform any data updates.

## Validating Pipelines
Click the dropdown next to the **`Start`** button and click **`Validate`**.

DLT builds a graph in the graph window and generates log entries at the bottom of the window. Our pipeline passes all checks. Let's introduce an error:

1. In the **`Pipeline details`** section (to the right of the DAG), click the first **`Source code`** link. Our first source code notebook is opened in a new window. We will be talking about DLT source code in the next lesson. For now, continue through the next steps.
 
  - You may get a note that this notebook is associated with a pipeline. If you do, click the "`x`" to dismiss the dialog box.

2. Scroll to the first code cell in the notebook and remove the word `CREATE` from the SQL command. This will create a syntax error in this notebook.

  - Note that we do not need to "Save" the notebook.

3. Return to the pipeline definition and run `Validate` again by clicking the dropdown next to `Start` and clicking **`Validate`**.

The validation fails. Click the log entry marked in red to get more details about the error. We see that there was a syntax error. We can also view the stack trace by clicking the "+" button. 

4. Fix the error we introduced, and re-run **`Validate`**.

## Run a Pipeline

Now that we have the pipeline validated, let's run it.

1. We are running the pipeline in development mode. Development mode provides for more expeditious iterative development by reusing the cluster (as opposed to creating a new cluster for each run) and disabling retries so that you can readily identify and fix errors. Refer to the <a href="https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-user-guide.html#optimize-execution" target="_blank">documentation</a> for more information on this feature.
2. Click **Start** to begin the pipeline run.

The initial run will take several minutes while a cluster is provisioned. Subsequent runs will be appreciably quicker.

## Explore the DAG

As the pipeline completes, the execution flow is graphed. 

Selecting the tables reviews the details.

Select **orders_silver**. Notice the results reported in the **Data Quality** section. 

With each triggered update, all newly arriving data will be processed through your pipeline. Metrics will always be reported for current run.

## DLT Source Notebooks
In the next six lessons, we are going to be examining the source notebooks that make up our pipeline. There are two versions of the source notebooks: one set written in SQL, and one set written in python.

So far, we have only run the SQL notebooks. If we were to run the python versions of the notebooks, we would find that the pipeline would be the exact same. Whether you choose SQL or python is a matter of preference. In fact, you can actually have some notebooks written in SQL, and others written in python, in the same pipeline. Note, however, that each notebook can only contain one language or the other. 

There are some differences you should know about, and these differences are outlined in the table below:

## Python vs SQL
| Python | SQL | Notes |
|--------|--------|--------|
| Python API | Proprietary SQL API |  |
| No syntax check | Has syntax checks| In Python, if you run a DLT notebook cell on its own it will show in error, whereas in SQL it will check if the command is syntactically valid and tell you. In both cases, individual notebook cells are not supposed to be run for DLT pipelines. |
| A note on imports | None | The dlt module should be explicitly imported into your Python notebook libraries. In SQL, this is not the case. |
| Tables as DataFrames | Tables as query results | The Python DataFrame API allows for multiple transformations of a dataset by stringing multiple API calls together. Compared to SQL, those same transformations must be saved in temporary tables as they are transformed. |
|`@dlt.table()`  | `SELECT` statement | In SQL, the core logic of your query, containing transformations you make to your data, is contained in the `SELECT` statement. In Python, data transformations are specified when you configure options for @dlt.table().  |
| `@dlt.table(comment = `"Python comment",`table_properties = {"quality": "silver"})` | `COMMENT` "SQL comment"       `TBLPROPERTIES ("quality" = "silver")` | This is how you add comments and table properties in Python vs. SQL |
| Python Metaprogramming | N/A | You can use Python inner functions with Delta Live Tables to programmatically create multiple tables to reduce code redundancy.

## Regarding Lesson 2
In the next lesson, lesson 2, we will be examining the syntax for two DLT source notebooks. You will then have the opportunity to work through a lab for the third notebook. As stated above, there are two sets of notebooks. Please go through both sets to see the differences in syntax between the two languages. 

The SQL notebooks are located here: [2A - SQL Pipelines]($./2A - SQL Pipelines)  
The python notebooks are located here: [2B - Python Pipelines]($./2B - Python Pipelines)


&copy; 2024 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the 
<a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/><a href="https://databricks.com/privacy-policy">Privacy Policy</a> | 
<a href="https://databricks.com/terms-of-use">Terms of Use</a> | 
<a href="https://help.databricks.com/">Support</a>