
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning">
</div>


# Explore Scheduling Options

In the last lesson, we manually triggered our job. In this lesson, we will explore three other types of triggers we can use in our Databricks Workflow Jobs:
1. Scheduled
1. File arrival
1. Continuous

Let's get started by running the setup script in the next cell.

In [0]:
%run ./Includes/Classroom-Setup-05.1.1

1. Run the next cell to automatically setup the single notebook job we created in the last lesson.
1. After the cell runs, click the generated link to open the job in a new tab.
1. Return to these instructions.

In [0]:
%python
DA.create_job_v1()

## Explore Scheduling Options
Steps:
1. Click the **Tasks** tab.
1. On the right hand side of the Jobs UI, locate the **Job Details** section.
1. Under the **Schedules & Triggers** section, select the **Add trigger** button to explore the options. There are three options (in addition to manual):
* **Scheduled** uses a cron scheduling UI.
   - This UI provides extensive options for setting up chronological scheduling of your Jobs. Settings configured with the UI can also be output in cron syntax, which can be edited if you need custom configuration that is not available with the UI.
* **Continuous** runs over and over with a small amount of time between runs.
* **File arrival** monitors either an external location or a volume for new files. Note the **Advanced** settings, where you can change the time to wait between checks and the time to wait after a new file arrives before starting a run.

## Databricks Volumes
We are going to configure our job to monitor a volume for new data files. Volumes are Unity Catalog objects representing a logical volume of storage in a cloud object storage location. Volumes provide capabilities for accessing, storing, governing, and organizing files.

You can use volumes to store and access files in any format, including structured, semi-structured, and unstructured data.

## File Arrival Trigger
Let's configure a file arrival trigger. We will first add a volume that we will use as the storage location to monitor:

In [0]:
CREATE VOLUME trigger_storage_location

Run the following cell to get the path to this volume:

In [0]:
%python
print(f"/Volumes/{DA.catalog_name}/{DA.schema_name}/trigger_storage_location/")

Complete the following:
1. Select **File arrival** for the trigger type
1. Paste the path above into the **Storage location** field
1. Click **Test connection** to verify the correct path
* You should see **Success**. If not, verify that you have run the cell above and copied all of the cell output into **Storage location**
1. Click **Save**

## Reconfigure the Task
We are going to reconfigure the single task to execute a python script when files arrive in the storage location we configured above.
1. Change the **Task name** to "View_New_CSV_Data"
1. Click **Path** and update the notebook to "Lesson 3 Notebooks/View Baby Names"

## Task Parameters
The notebook we will be using to view our baby names needs to know the name of the catalog and schema we are using. We can configure this with **Task parameters**. Note that, in the "real world," this gives us a lot of flexibility and the ability to reuse code.
1. Under **Parameters**, click **Add**
1. For **Key**, type "catalog"
1. For **Value**, paste the catalog name from the cell below
1. Repeat the steps above with the schema name using the key, "schema"
1. Click **Save task**

In [0]:
%python
print(f"Catalog name: {DA.catalog_name}")
print(f"Schema name: {DA.schema_name}")

As soon as we configured our trigger, Databricks began monitoring the storage location for newly arrived files. Let's take a look at the status of our job runs.
1. In the upper-left corner, click the **Runs** tab
We should see a **Trigger status**. If not, wait about a minute. If you don't see one during that time, double-check the steps above to ensure you configured the **File arrival** trigger correctly

Note that the trigger has been evaluated, but it has not found any new files, so the job has not run.

2. Run the cell below to add a file to our **Storage location** volume, and wait about a minute

You should see a run triggered automatically.

3. Click on the **Start time** to view the run. The notebook simply displays the contents of the CSV file.


In [0]:
%python
import requests

response = requests.get('https://health.data.ny.gov/api/views/jxy9-yhdk/rows.csv')
csvfile = response.content.decode('utf-8')
dbutils.fs.put(f"/Volumes/{DA.catalog_name}/{DA.schema_name}/trigger_storage_location/babynames.csv", csvfile, True)


## Using Task Parameters
Before finishing this lesson, let's look at how we used our task parameters. The run that we viewed above shows us the code contained in the "View Baby Names" notebook. We can see that we needed to write code that created two widgets: "catalog" and "schema." This registers the task parameters that we want to use. 

We access the values passed in the job task configuration by using `dbutils.widgets.get()`. You can see this in action in the third line of code.

In summary, here are the steps for configuring and using task parameters:
In the notebook that will be used as a task:
1. Configure a widget that has the same name that will be used as a **Key** for the task parameter
1. Use `dbutils.widgets.get()` and pass the name of the widget/parameter as a string
in the task configuration
1. Add a parameter with the **Key** set to the name of the widget configured above
1. Add whichever **Value** you wish

One final note: You can manually trigger a run using different parameters by going to the job configuration page (click **Edit task** from the **Run output** page), clicking the down arrow next to **Run now** and selecting **Run now with different parameters**.

Run the following cell to delete the tables and files associated with this lesson.

In [0]:
%python
DA.cleanup()


&copy; 2024 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the 
<a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/><a href="https://databricks.com/privacy-policy">Privacy Policy</a> | 
<a href="https://databricks.com/terms-of-use">Terms of Use</a> | 
<a href="https://help.databricks.com/">Support</a>