# Unit 1 Introduction to Apache Airflow and DAGs

# Introduction

Welcome to our first lesson on "Automating Retraining with Apache Airflow"\! In this lesson, we'll start our journey into the world of workflow automation with Apache Airflow, a powerful open-source platform that allows us to programmatically author, schedule, and monitor workflows.

As machine learning practitioners, we often need to retrain our models regularly as new data becomes available. This process involves several steps: data extraction, preprocessing, model training, evaluation, and deployment. Manually executing these steps can be time-consuming and error-prone. This is where Apache Airflow comes in—it provides a framework to automate and orchestrate complex computational workflows.

In this course, we focus on **Apache Airflow 2.x** and its modern **TaskFlow API**. The TaskFlow API, introduced in Airflow 2, allows us to define workflows using Python functions and decorators, making DAGs more readable and maintainable compared to the older operator-based approach. All examples and practices in this course will use Airflow 2 and the TaskFlow API.

By the end of this lesson, we'll understand what Apache Airflow is, learn about **Directed Acyclic Graphs (DAGs)**, and implement a simple workflow using Airflow's **TaskFlow** API.

# Understanding Apache Airflow and DAGs

Apache Airflow is a platform created by Airbnb (now an Apache Software Foundation project) to programmatically author, schedule, and monitor workflows. At its core, Airflow uses Directed Acyclic Graphs (DAGs) to represent workflows. But what exactly is a DAG? Let's break it down:

  * **Graph:** A graph is a mathematical structure made up of nodes connected by edges, and in the context of Airflow, it helps us visually and logically organize the sequence and dependencies of tasks in our workflow.
  * **Directed:** The relationships between tasks have a specific direction. Task A may lead to Task B, but not vice versa.
  * **Acyclic:** There are no cycles or loops in the workflow. You can't create circular dependencies where Task A depends on Task B, which depends on Task A.

In Airflow, each task in a workflow is represented as a **node** in the DAG, and the dependencies between tasks are represented as **directed edges**. This allows us to define complex workflows with multiple tasks and their dependencies in a clear, programmatic way.

For example, a simple ML retraining workflow might include these tasks in sequence: extract new data, preprocess data, train model, evaluate model, and deploy model (if evaluation metrics exceed a threshold). Airflow ensures these tasks execute in the correct order and handles scheduling, retries on failure, and provides visibility into the workflow's execution.

# Creating Your First Airflow DAG

Now that we understand the concept of DAGs, let's create a simple Airflow DAG. We'll start with the basic structure and imports:

```python
from datetime import datetime, timedelta
from airflow.decorators import dag, task

# Define default arguments for the DAG
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}
```

In this code, we're importing the necessary modules from Python's `datetime` library and Airflow's decorators. The `default_args` dictionary defines global settings for our DAG, specifying who owns it (`owner`), whether it depends on past executions (`depends_on_past`), email notification preferences, and retry configurations. These settings help Airflow know how to handle the DAG's execution—for instance, it will automatically retry a failed task once after waiting for 1 minute.

# Defining the DAG with TaskFlow API

With our default arguments in place, we can now define the DAG using Airflow's TaskFlow API, which provides a more intuitive way to define workflows:

```python
@dag(
    dag_id='mlops_pipeline',  # Unique identifier for the DAG
    description='A simple two-task DAG using TaskFlow API',
    default_args=default_args,
    schedule='@daily',  # Run daily
    start_date=datetime(2023, 1, 1),  # Start date
    catchup=False,  # Don't run for past dates
    tags=['intro', 'basic'],
)
def hello_airflow_dag():
    """
    This DAG demonstrates the basics of Airflow with just two tasks:
    1. A task that says hello
    2. A task that says goodbye and uses the result from the first task
    """
```

Here, we use the `@dag` decorator to transform our Python function into an Airflow DAG. The parameters define critical aspects of our workflow's behavior: the `dag_id` provides a unique name, `schedule` sets it to run daily, and `start_date` indicates when scheduling should begin. Setting `catchup=False` prevents Airflow from executing the DAG for past periods, which is especially useful when first deploying a DAG with a start date in the past. The docstring clearly documents what our simple workflow will do, enhancing readability for anyone maintaining this code.

# Creating Tasks in Your Workflow

Now it's time to define the individual tasks for our DAG using the `@task` decorator:

```python
    @task(task_id="hello_task")
    def say_hello():
        """Simple function that prints a greeting."""
        print("Hello from Airflow!")
        return "Hello"

    @task(task_id="goodbye_task")
    def say_goodbye(first_task_result):
        """
        Function that uses the result from the first task.
        
        Args:
            first_task_result: The result returned by the first task
        """
        print(f"Previous task said: {first_task_result}")
        print("Goodbye from Airflow!")
```

The `@task` decorator transforms regular Python functions into Airflow tasks. Our first task, `say_hello`, simply prints a message and returns the string "Hello". The second task, `say_goodbye`, takes the output from the first task as a parameter, allowing us to demonstrate how data flows between tasks in Airflow. This is one of the **powerful features** of the TaskFlow API—it automatically handles the serialization, storage, and retrieval of data between tasks, making workflow development more intuitive and less boilerplate-heavy.

**Note:** Although it looks like the value is passed directly as a Python variable, Airflow actually passes data between tasks using its XCom (cross-communication) mechanism. The TaskFlow API makes this seamless, but under the hood, the result is serialized and stored by Airflow, not passed in-memory like a normal Python function call.

# Orchestrating the Workflow

The final step is defining how our tasks should interact. In the TaskFlow API, this happens naturally through function calls:

```python
    # Define the task dependencies
    # With TaskFlow API, dependencies are created by function calls
    first_result = say_hello()  # Execute the first task
    say_goodbye(first_result)   # Pass the result to the second task

# Create the DAG instance
dag = hello_airflow_dag()
```

This is where the **elegant simplicity** of the TaskFlow API shines. In older versions of Airflow, you had to use special symbols (like `>>` or `<<`) to manually set the order in which tasks run. With the TaskFlow API, you simply call your Python functions and pass data between them: this automatically creates the correct order and dependencies. When we call `say_hello()`, it returns the string "Hello," which we assign to `first_result`. By passing this variable to `say_goodbye()`, we tell Airflow that the second task should wait for the first one to finish and use its result.

The final line creates our DAG object and assigns it to the variable `dag`. This is a common convention, but it's not a strict requirement, since Airflow will still discover the DAG as long as the function is called at the module level.

When the DAG runs, it executes `say_hello` first, then passes the returned value to `say_goodbye` (via XCom). In the Airflow UI logs, you'd see "Hello from Airflow\!" from the first task, followed by "Previous task said: Hello" and "Goodbye from Airflow\!" from the second task.

# Understanding the DAG Run Output

When your DAG runs, Airflow generates detailed logs that provide insight into the execution process. While these logs can be quite verbose, they are invaluable for debugging and monitoring. Let's look at a simplified version of the output you might see for our `hello_airflow_dag`:

```text
[2025-05-07T14:46:45.518+0000] {dag.py:4435} INFO - dagrun id: mlops_pipeline
[2025-05-07T14:46:45.552+0000] {dag.py:4396} INFO - [DAG TEST] starting task_id=hello_task map_index=-1
Hello from Airflow!
[2025-05-07 14:46:45,606] {python.py:240} INFO - Done. Returned value was: Hello
[2025-05-07T14:46:45.619+0000] {taskinstance.py:353} INFO - Marking task as SUCCESS. dag_id=mlops_pipeline, task_id=hello_task, ...
[2025-05-07T14:46:45.645+0000] {dag.py:4396} INFO - [DAG TEST] starting task_id=goodbye_task map_index=-1
Previous task said: Hello
Goodbye from Airflow!
[2025-05-07 14:46:45,670] {python.py:240} INFO - Done. Returned value was: None
[2025-05-07T14:46:45.673+0000] {taskinstance.py:353} INFO - Marking task as SUCCESS. dag_id=mlops_pipeline, task_id=goodbye_task, ...
[2025-05-07T14:46:45.686+0000] {dagrun.py:854} INFO - Marking run <DagRun mlops_pipeline ...> successful
```

Let's break down what these key lines tell us:

  * `INFO - dagrun id: mlops_pipeline`: This indicates the start of a new DAG run for our `mlops_pipeline` DAG.
  * The next set of lines shows the execution of `hello_task`:
      * `INFO - [DAG TEST] starting task_id=hello_task ...`: Airflow begins executing the `hello_task`.
      * `Hello from Airflow!`: This is the `print()` output from our `say_hello` function.
      * `INFO - Done. Returned value was: Hello`: The task completed and returned "Hello", which Airflow passes via XComs.
      * `INFO - Marking task as SUCCESS. ... task_id=hello_task, ...`: The `hello_task` finished successfully.
  * Following that, we see the execution of `goodbye_task`:
      * `INFO - [DAG TEST] starting task_id=goodbye_task ...`: Airflow starts the `goodbye_task`.
      * `Previous task said: Hello` and `Goodbye from Airflow!`: These are the `print()` outputs from `say_goodbye`, confirming it received the "Hello" string from the first task.
      * `INFO - Done. Returned value was: None`: The `say_goodbye` task completed (returning `None` as it has no explicit return).
      * `INFO - Marking task as SUCCESS. ... task_id=goodbye_task, ...`: The `goodbye_task` also finished successfully.
  * `INFO - Marking run <DagRun mlops_pipeline ...> successful`: Finally, this line confirms that the entire DAG run completed successfully.

This output confirms that our tasks executed in the correct order, data was passed between them as expected, and the overall workflow was successful. In the Airflow UI, you would see a graphical representation of this execution, with green boxes indicating successfully completed tasks.

# Where to Place Your DAG Code

For Airflow to discover and execute your DAGs, your Python files must be placed in a specific directory known as the **DAGs folder**. By default, this is the `dags/` directory inside your Airflow home directory (`$AIRFLOW_HOME/dags/`), but it can be configured differently in your Airflow settings.

When Airflow runs, it continuously scans the DAGs folder for Python files. Any file that contains a DAG definition (i.e., a variable or function that returns a DAG object) will be automatically detected and made available to the Airflow scheduler. This means you don't need to manually register your DAGs: just save your `.py` file in the correct folder, and Airflow will handle the rest.

**Best practices** for organizing your DAG code:

  * Place each DAG in its own Python file for clarity and maintainability.
  * If you have shared code (such as utility functions or custom operators), consider placing them in a separate `utils/` or `plugins/` directory and importing them into your DAG files.

Please note that, in this course, the CodeSignal environment is pre-configured so that any DAG code you write is automatically placed in the correct folder. You don't need to worry about file placement or Airflow configuration—just focus on writing your DAGs, and they'll be picked up and executed by Airflow behind the scenes.

# Conclusion and Next Steps

In this lesson, we've taken our first steps with Apache Airflow by creating a simple DAG with two tasks. We've learned about the core concepts of Airflow: defining workflows as DAGs, creating tasks with the `@task` decorator, and establishing dependencies between them. The TaskFlow API has made this process intuitive by letting us express workflows as regular Python functions while handling the complexities of data passing and dependency management behind the scenes.

While we've built a simple example, these fundamentals form the building blocks for creating complex, production-grade ML retraining pipelines. As you practice these concepts, experiment with adding more tasks, passing different types of data between them, and visualizing the resulting workflow graphs. In the upcoming lessons, we'll expand on these basics to build more sophisticated workflows that handle real machine learning tasks from data processing to model deployment.

## Transform a Function into a DAG

You’ve just learned how Apache Airflow 2.x uses the TaskFlow API to define workflows as Python functions and how tasks are organized in a DAG. Now, let’s put that knowledge into practice by turning a regular Python function into a real Airflow DAG.

Your task is to add the missing @dag decorator above the hello_airflow_dag function. This decorator tells Airflow to treat the function as a workflow. Make sure to include all the required parameters:

dag_id: set this to 'mlops_pipeline'
description: provide a short explanation of the DAG
default_args: use the default_args dictionary already defined in the code
schedule: set to '@daily'
start_date: set to some date using datetime(YYYY, MM, DD)
catchup: set to False
tags: include two tags in a list
Once you add the decorator with the correct parameters, your function will become a proper Airflow DAG. This is a key step in automating workflows with Airflow, so take your time and ensure each parameter is included.

```python
"""
Basic Introduction to Airflow - Simple DAG

This module defines a minimal Airflow DAG with only two tasks to demonstrate
the core concepts of Directed Acyclic Graphs (DAGs) in Airflow using the TaskFlow API.
"""

from datetime import datetime, timedelta
from airflow.decorators import dag, task

# Define default arguments for the DAG
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

# TODO: Add the @dag decorator above the function below.
# The decorator should include these parameters:
# - dag_id: set to 'mlops_pipeline'
# - description: a short description of the DAG
# - default_args: use the default_args dictionary above
# - schedule: set to '@daily'
# - start_date: set to datetime(2023, 1, 1)
# - catchup: set to False
# - tags: include 'intro' and 'basic' in a list
def hello_airflow_dag():
    """
    This DAG demonstrates the basics of Airflow with just two tasks:
    1. A task that says hello
    2. A task that says goodbye and uses the result from the first task
    """
    
    # Define the first task using the @task decorator
    @task(task_id="hello_task")
    def say_hello():
        """Simple function that prints a greeting."""
        print("Hello from Airflow!")
        return "Hello"
    
    # Define the second task using the @task decorator
    @task(task_id="goodbye_task")
    def say_goodbye(first_task_result):
        """
        Function that uses the result from the first task.
        
        Args:
            first_task_result: The result returned by the first task
        """
        print(f"Previous task said: {first_task_result}")
        print("Goodbye from Airflow!")
    
    # Define the task dependencies
    # With TaskFlow API, dependencies are created by function calls
    first_result = say_hello()  # Execute the first task
    say_goodbye(first_result)   # Pass the result to the second task

# Create the DAG instance
hello_airflow_dag()

```

"""
Basic Introduction to Airflow - Simple DAG

This module defines a minimal Airflow DAG with only two tasks to demonstrate
the core concepts of Directed Acyclic Graphs (DAGs) in Airflow using the TaskFlow API.
"""

from datetime import datetime, timedelta
from airflow.decorators import dag, task

# Define default arguments for the DAG
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

# TODO: Add the @dag decorator above the function below.
# The decorator should include these parameters:
# - dag_id: set to 'mlops_pipeline'
# - description: a short description of the DAG
# - default_args: use the default_args dictionary above
# - schedule: set to '@daily'
# - start_date: set to datetime(2023, 1, 1)
# - catchup: set to False
# - tags: include 'intro' and 'basic' in a list
@dag(
    dag_id='mlops_pipeline',
    description='A simple two-task DAG using TaskFlow API',
    default_args=default_args,
    schedule='@daily',
    start_date=datetime(2023, 1, 1),
    catchup=False,
    tags=['intro', 'basic'],
)
def hello_airflow_dag():
    """
    This DAG demonstrates the basics of Airflow with just two tasks:
    1. A task that says hello
    2. A task that says goodbye and uses the result from the first task
    """
    
    # Define the first task using the @task decorator
    @task(task_id="hello_task")
    def say_hello():
        """Simple function that prints a greeting."""
        print("Hello from Airflow!")
        return "Hello"
    
    # Define the second task using the @task decorator
    @task(task_id="goodbye_task")
    def say_goodbye(first_task_result):
        """
        Function that uses the result from the first task.
        
        Args:
            first_task_result: The result returned by the first task
        """
        print(f"Previous task said: {first_task_result}")
        print("Goodbye from Airflow!")
    
    # Define the task dependencies
    # With TaskFlow API, dependencies are created by function calls
    first_result = say_hello()  # Execute the first task
    say_goodbye(first_result)   # Pass the result to the second task

# Create the DAG instance
hello_airflow_dag()

## Turn Functions into Airflow Tasks

You’ve just seen how Airflow uses the TaskFlow API to organize workflows as Python functions inside a DAG. Now, let’s focus on how individual steps in your workflow become Airflow tasks.

In the code below, the two functions that should act as tasks are missing their @task decorators. Your job is to add these decorators so Airflow knows to treat them as tasks within the DAG.

Here’s what you need to do:

Add the @task decorator above the say_hello function, and set its task_id to "hello_task".
Add the @task decorator above the say_goodbye function, and set its task_id to "goodbye_task".
Define the task dependencies in the body of the function hello_airflow_dag function.
The @task decorator is what turns a regular Python function into a task that Airflow can schedule and monitor. Make sure you place the decorator directly above each function definition.

Once you’ve added the decorators and defined the task dependencies, your DAG will be ready to run both tasks as part of the workflow. This is a key step in building automated pipelines with Airflow!

```python
"""
Basic Introduction to Airflow - Simple DAG

This module defines a minimal Airflow DAG with only two tasks to demonstrate
the core concepts of Directed Acyclic Graphs (DAGs) in Airflow using the TaskFlow API.
"""

from datetime import datetime, timedelta
from airflow.decorators import dag, task

# Define default arguments for the DAG
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

# Define the DAG using the TaskFlow API
@dag(
    dag_id='mlops_pipeline',  # Unique identifier for the DAG
    description='A simple two-task DAG using TaskFlow API',
    default_args=default_args,
    schedule='@daily',  # Run daily
    start_date=datetime(2023, 1, 1),  # Start date
    catchup=False,  # Don't run for past dates
    tags=['intro', 'basic'],
)
def hello_airflow_dag():
    """
    This DAG demonstrates the basics of Airflow with just two tasks:
    1. A task that says hello
    2. A task that says goodbye and uses the result from the first task
    """

    # TODO: Add the @task decorator above this function.
    # This decorator turns the function into an Airflow task.
    def say_hello():
        """Simple function that prints a greeting."""
        print("Hello from Airflow!")
        return "Hello"

    # TODO: Add the @task decorator above this function.
    # This decorator turns the function into an Airflow task.
    def say_goodbye(first_task_result):
        """
        Function that uses the result from the first task.

        Args:
            first_task_result: The result returned by the first task
        """
        print(f"Previous task said: {first_task_result}")
        print("Goodbye from Airflow!")

    # TODO: Define the task dependencies using function calls.
    # With TaskFlow API, dependencies are created by calling the task functions
    # and passing the result from one to the next.


# Create the DAG instance
hello_airflow_dag()
```

```python
"""
Basic Introduction to Airflow - Simple DAG

This module defines a minimal Airflow DAG with only two tasks to demonstrate
the core concepts of Directed Acyclic Graphs (DAGs) in Airflow using the TaskFlow API.
"""

from datetime import datetime, timedelta
from airflow.decorators import dag, task

# Define default arguments for the DAG
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

# Define the DAG using the TaskFlow API
@dag(
    dag_id='mlops_pipeline',  # Unique identifier for the DAG
    description='A simple two-task DAG using TaskFlow API',
    default_args=default_args,
    schedule='@daily',  # Run daily
    start_date=datetime(2023, 1, 1),  # Start date
    catchup=False,  # Don't run for past dates
    tags=['intro', 'basic'],
)
def hello_airflow_dag():
    """
    This DAG demonstrates the basics of Airflow with just two tasks:
    1. A task that says hello
    2. A task that says goodbye and uses the result from the first task
    """

    # TODO: Add the @task decorator above this function.
    # This decorator turns the function into an Airflow task.
    @task(task_id="hello_task")
    def say_hello():
        """Simple function that prints a greeting."""
        print("Hello from Airflow!")
        return "Hello"

    # TODO: Add the @task decorator above this function.
    # This decorator turns the function into an Airflow task.
    @task(task_id="goodbye_task")
    def say_goodbye(first_task_result):
        """
        Function that uses the result from the first task.

        Args:
            first_task_result: The result returned by the first task
        """
        print(f"Previous task said: {first_task_result}")
        print("Goodbye from Airflow!")

    # TODO: Define the task dependencies using function calls.
    # With TaskFlow API, dependencies are created by calling the task functions
    # and passing the result from one to the next.
    first_result = say_hello()
    say_goodbye(first_result)


# Create the DAG instance
hello_airflow_dag()
```

## Controlling Workflow Timing in Airflow

Cosmo
Just now
Read message aloud
You’ve just practiced turning Python functions into Airflow tasks using the TaskFlow API. Now, let’s focus on how to control when your workflow runs by adjusting the DAG’s schedule.

In the code below, the DAG is set to run once per day using the @daily schedule. Your task is to change this so the DAG runs every hour instead. This is done by updating the schedule parameter in the @dag decorator.

Understanding how to set the schedule is an important part of building reliable workflows in Airflow.

```python
"""
Basic Introduction to Airflow - Simple DAG

This module defines a minimal Airflow DAG with only two tasks to demonstrate
the core concepts of Directed Acyclic Graphs (DAGs) in Airflow using the TaskFlow API.
"""

from datetime import datetime, timedelta
from airflow.decorators import dag, task

# Define default arguments for the DAG
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

# Define the DAG using the TaskFlow API
@dag(
    dag_id='mlops_pipeline',  # Unique identifier for the DAG
    description='A simple two-task DAG using TaskFlow API',
    default_args=default_args,
    # TODO: Change the schedule parameter below from '@daily' to '@hourly'
    # This will make the DAG run every hour instead of once per day.
    schedule='@daily',  # Run daily
    start_date=datetime(2023, 1, 1),  # Start date
    catchup=False,  # Don't run for past dates
    tags=['intro', 'basic'],
)
def hello_airflow_dag():
    """
    This DAG demonstrates the basics of Airflow with just two tasks:
    1. A task that says hello
    2. A task that says goodbye and uses the result from the first task
    """
    
    # Define the first task using the @task decorator
    @task(task_id="hello_task")
    def say_hello():
        """Simple function that prints a greeting."""
        print("Hello from Airflow!")
        return "Hello"
    
    # Define the second task using the @task decorator
    @task(task_id="goodbye_task")
    def say_goodbye(first_task_result):
        """
        Function that uses the result from the first task.
        
        Args:
            first_task_result: The result returned by the first task
        """
        print(f"Previous task said: {first_task_result}")
        print("Goodbye from Airflow!")
    
    # Define the task dependencies
    # With TaskFlow API, dependencies are created by function calls
    first_result = say_hello()  # Execute the first task
    say_goodbye(first_result)   # Pass the result to the second task

# Create the DAG instance
hello_airflow_dag()
```

```python
"""
Basic Introduction to Airflow - Simple DAG

This module defines a minimal Airflow DAG with only two tasks to demonstrate
the core concepts of Directed Acyclic Graphs (DAGs) in Airflow using the TaskFlow API.
"""

from datetime import datetime, timedelta
from airflow.decorators import dag, task

# Define default arguments for the DAG
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

# Define the DAG using the TaskFlow API
@dag(
    dag_id='mlops_pipeline',  # Unique identifier for the DAG
    description='A simple two-task DAG using TaskFlow API',
    default_args=default_args,
    # TODO: Change the schedule parameter below from '@daily' to '@hourly'
    # This will make the DAG run every hour instead of once per day.
    schedule='@hourly',  # Run hourly
    start_date=datetime(2023, 1, 1),  # Start date
    catchup=False,  # Don't run for past dates
    tags=['intro', 'basic'],
)
def hello_airflow_dag():
    """
    This DAG demonstrates the basics of Airflow with just two tasks:
    1. A task that says hello
    2. A task that says goodbye and uses the result from the first task
    """
    
    # Define the first task using the @task decorator
    @task(task_id="hello_task")
    def say_hello():
        """Simple function that prints a greeting."""
        print("Hello from Airflow!")
        return "Hello"
    
    # Define the second task using the @task decorator
    @task(task_id="goodbye_task")
    def say_goodbye(first_task_result):
        """
        Function that uses the result from the first task.
        
        Args:
            first_task_result: The result returned by the first task
        """
        print(f"Previous task said: {first_task_result}")
        print("Goodbye from Airflow!")
    
    # Define the task dependencies
    # With TaskFlow API, dependencies are created by function calls
    first_result = say_hello()  # Execute the first task
    say_goodbye(first_result)   # Pass the result to the second task

# Create the DAG instance
hello_airflow_dag()
```

## Adding a Third Task to Your DAG

You’ve just practiced building a simple Airflow DAG with two connected tasks and learned how to pass data from one task to another. Now, let’s make your workflow a bit more interesting by adding a third step.

Your goal is to introduce a new task called finalize_greeting into the DAG. This task should take the output from say_goodbye, perform a transformation on it (for example, add extra text or reformat the message), and return the new result. Here’s what you need to do:

Define a new function called finalize_greeting and decorate it with @task(task_id="finalize_task").
Ensure this function takes the output from say_goodbye as its input.
Inside the function, transform the input in some way (for example, add a friendly remark or change the format), print the final message, and return it.
Update the task dependencies so that the output from say_goodbye is passed to finalize_greeting.
This will help you see how to chain multiple tasks together and build more flexible workflows in Airflow.

```python
"""
Basic Introduction to Airflow - Simple DAG

This module defines a minimal Airflow DAG with two tasks to demonstrate
the core concepts of Directed Acyclic Graphs (DAGs) in Airflow using the TaskFlow API.
"""

from datetime import datetime, timedelta
from airflow.decorators import dag, task

# Define default arguments for the DAG
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

# Define the DAG using the TaskFlow API
@dag(
    dag_id='mlops_pipeline',  # Unique identifier for the DAG
    description='A simple two-task DAG using TaskFlow API',
    default_args=default_args,
    schedule='@daily',  # Run daily
    start_date=datetime(2023, 1, 1),  # Start date
    catchup=False,  # Don't run for past dates
    tags=['intro', 'basic'],
)
def hello_airflow_dag():
    """
    This DAG demonstrates the basics of Airflow with just two tasks:
    1. A task that says hello
    2. A task that says goodbye and uses the result from the first task
    """

    @task(task_id="hello_task")
    def say_hello():
        """Simple function that prints a greeting."""
        print("Hello from Airflow!")
        return "Hello"

    @task(task_id="goodbye_task")
    def say_goodbye(first_task_result):
        """
        Function that uses the result from the first task.

        Args:
            first_task_result: The result returned by the first task
        """
        print(f"Previous task said: {first_task_result}")
        print("Goodbye from Airflow!")
        # The goodbye task should return a value so the next task can use it
        return "Goodbye"

    # TODO: Define a new task called finalize_greeting using the @task decorator.
    # - The function should take the output from say_goodbye as input
    # - It should transform this input in some way, creating a reformatted "final message"
    # - It should print this final message and return it

    # Define the task dependencies
    first_result = say_hello()  # Execute the first task
    goodbye_result = say_goodbye(first_result)
    # TODO: Pass the goodbye_result to the finalize_greeting task
    # finalize_greeting(goodbye_result)

# Create the DAG instance
hello_airflow_dag()

```

```python
"""
Basic Introduction to Airflow - Simple DAG

This module defines a minimal Airflow DAG with two tasks to demonstrate
the core concepts of Directed Acyclic Graphs (DAGs) in Airflow using the TaskFlow API.
"""

from datetime import datetime, timedelta
from airflow.decorators import dag, task

# Define default arguments for the DAG
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

# Define the DAG using the TaskFlow API
@dag(
    dag_id='mlops_pipeline',  # Unique identifier for the DAG
    description='A simple two-task DAG using TaskFlow API',
    default_args=default_args,
    schedule='@daily',  # Run daily
    start_date=datetime(2023, 1, 1),  # Start date
    catchup=False,  # Don't run for past dates
    tags=['intro', 'basic'],
)
def hello_airflow_dag():
    """
    This DAG demonstrates the basics of Airflow with just two tasks:
    1. A task that says hello
    2. A task that says goodbye and uses the result from the first task
    """

    @task(task_id="hello_task")
    def say_hello():
        """Simple function that prints a greeting."""
        print("Hello from Airflow!")
        return "Hello"

    @task(task_id="goodbye_task")
    def say_goodbye(first_task_result):
        """
        Function that uses the result from the first task.

        Args:
            first_task_result: The result returned by the first task
        """
        print(f"Previous task said: {first_task_result}")
        print("Goodbye from Airflow!")
        # The goodbye task should return a value so the next task can use it
        return "Goodbye"

    # TODO: Define a new task called finalize_greeting using the @task decorator.
    # - The function should take the output from say_goodbye as input
    # - It should transform this input in some way, creating a reformatted "final message"
    # - It should print this final message and return it
    @task(task_id="finalize_task")
    def finalize_greeting(goodbye_message):
        """
        Function that takes the output from the previous task and finalizes the message.

        Args:
            goodbye_message: The result returned by the previous task
        """
        final_message = f"Final message: '{goodbye_message}' - It was a pleasure!"
        print(final_message)
        return final_message

    # Define the task dependencies
    first_result = say_hello()  # Execute the first task
    goodbye_result = say_goodbye(first_result)
    # TODO: Pass the goodbye_result to the finalize_greeting task
    finalize_greeting(goodbye_result)

# Create the DAG instance
hello_airflow_dag()
```

## Measuring Your Workflow Output

## Build a Time Formatting Workflow

## Build a Three Step Greeting Workflow