# Unit 1 Introduction to Apache Airflow and DAGs

# Introduction

Welcome to our first lesson on "Automating Retraining with Apache Airflow"\! In this lesson, we'll start our journey into the world of workflow automation with Apache Airflow, a powerful open-source platform that allows us to programmatically author, schedule, and monitor workflows.

As machine learning practitioners, we often need to retrain our models regularly as new data becomes available. This process involves several steps: data extraction, preprocessing, model training, evaluation, and deployment. Manually executing these steps can be time-consuming and error-prone. This is where Apache Airflow comes in—it provides a framework to automate and orchestrate complex computational workflows.

In this course, we focus on **Apache Airflow 2.x** and its modern **TaskFlow API**. The TaskFlow API, introduced in Airflow 2, allows us to define workflows using Python functions and decorators, making DAGs more readable and maintainable compared to the older operator-based approach. All examples and practices in this course will use Airflow 2 and the TaskFlow API.

By the end of this lesson, we'll understand what Apache Airflow is, learn about **Directed Acyclic Graphs (DAGs)**, and implement a simple workflow using Airflow's **TaskFlow** API.

# Understanding Apache Airflow and DAGs

Apache Airflow is a platform created by Airbnb (now an Apache Software Foundation project) to programmatically author, schedule, and monitor workflows. At its core, Airflow uses Directed Acyclic Graphs (DAGs) to represent workflows. But what exactly is a DAG? Let's break it down:

  * **Graph:** A graph is a mathematical structure made up of nodes connected by edges, and in the context of Airflow, it helps us visually and logically organize the sequence and dependencies of tasks in our workflow.
  * **Directed:** The relationships between tasks have a specific direction. Task A may lead to Task B, but not vice versa.
  * **Acyclic:** There are no cycles or loops in the workflow. You can't create circular dependencies where Task A depends on Task B, which depends on Task A.

In Airflow, each task in a workflow is represented as a **node** in the DAG, and the dependencies between tasks are represented as **directed edges**. This allows us to define complex workflows with multiple tasks and their dependencies in a clear, programmatic way.

For example, a simple ML retraining workflow might include these tasks in sequence: extract new data, preprocess data, train model, evaluate model, and deploy model (if evaluation metrics exceed a threshold). Airflow ensures these tasks execute in the correct order and handles scheduling, retries on failure, and provides visibility into the workflow's execution.

# Creating Your First Airflow DAG

Now that we understand the concept of DAGs, let's create a simple Airflow DAG. We'll start with the basic structure and imports:

```python
from datetime import datetime, timedelta
from airflow.decorators import dag, task

# Define default arguments for the DAG
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}
```

In this code, we're importing the necessary modules from Python's `datetime` library and Airflow's decorators. The `default_args` dictionary defines global settings for our DAG, specifying who owns it (`owner`), whether it depends on past executions (`depends_on_past`), email notification preferences, and retry configurations. These settings help Airflow know how to handle the DAG's execution—for instance, it will automatically retry a failed task once after waiting for 1 minute.

# Defining the DAG with TaskFlow API

With our default arguments in place, we can now define the DAG using Airflow's TaskFlow API, which provides a more intuitive way to define workflows:

```python
@dag(
    dag_id='mlops_pipeline',  # Unique identifier for the DAG
    description='A simple two-task DAG using TaskFlow API',
    default_args=default_args,
    schedule='@daily',  # Run daily
    start_date=datetime(2023, 1, 1),  # Start date
    catchup=False,  # Don't run for past dates
    tags=['intro', 'basic'],
)
def hello_airflow_dag():
    """
    This DAG demonstrates the basics of Airflow with just two tasks:
    1. A task that says hello
    2. A task that says goodbye and uses the result from the first task
    """
```

Here, we use the `@dag` decorator to transform our Python function into an Airflow DAG. The parameters define critical aspects of our workflow's behavior: the `dag_id` provides a unique name, `schedule` sets it to run daily, and `start_date` indicates when scheduling should begin. Setting `catchup=False` prevents Airflow from executing the DAG for past periods, which is especially useful when first deploying a DAG with a start date in the past. The docstring clearly documents what our simple workflow will do, enhancing readability for anyone maintaining this code.

# Creating Tasks in Your Workflow

Now it's time to define the individual tasks for our DAG using the `@task` decorator:

```python
    @task(task_id="hello_task")
    def say_hello():
        """Simple function that prints a greeting."""
        print("Hello from Airflow!")
        return "Hello"

    @task(task_id="goodbye_task")
    def say_goodbye(first_task_result):
        """
        Function that uses the result from the first task.
        
        Args:
            first_task_result: The result returned by the first task
        """
        print(f"Previous task said: {first_task_result}")
        print("Goodbye from Airflow!")
```

The `@task` decorator transforms regular Python functions into Airflow tasks. Our first task, `say_hello`, simply prints a message and returns the string "Hello". The second task, `say_goodbye`, takes the output from the first task as a parameter, allowing us to demonstrate how data flows between tasks in Airflow. This is one of the **powerful features** of the TaskFlow API—it automatically handles the serialization, storage, and retrieval of data between tasks, making workflow development more intuitive and less boilerplate-heavy.

**Note:** Although it looks like the value is passed directly as a Python variable, Airflow actually passes data between tasks using its XCom (cross-communication) mechanism. The TaskFlow API makes this seamless, but under the hood, the result is serialized and stored by Airflow, not passed in-memory like a normal Python function call.

# Orchestrating the Workflow

The final step is defining how our tasks should interact. In the TaskFlow API, this happens naturally through function calls:

```python
    # Define the task dependencies
    # With TaskFlow API, dependencies are created by function calls
    first_result = say_hello()  # Execute the first task
    say_goodbye(first_result)   # Pass the result to the second task

# Create the DAG instance
dag = hello_airflow_dag()
```

This is where the **elegant simplicity** of the TaskFlow API shines. In older versions of Airflow, you had to use special symbols (like `>>` or `<<`) to manually set the order in which tasks run. With the TaskFlow API, you simply call your Python functions and pass data between them: this automatically creates the correct order and dependencies. When we call `say_hello()`, it returns the string "Hello," which we assign to `first_result`. By passing this variable to `say_goodbye()`, we tell Airflow that the second task should wait for the first one to finish and use its result.

The final line creates our DAG object and assigns it to the variable `dag`. This is a common convention, but it's not a strict requirement, since Airflow will still discover the DAG as long as the function is called at the module level.

When the DAG runs, it executes `say_hello` first, then passes the returned value to `say_goodbye` (via XCom). In the Airflow UI logs, you'd see "Hello from Airflow\!" from the first task, followed by "Previous task said: Hello" and "Goodbye from Airflow\!" from the second task.

# Understanding the DAG Run Output

When your DAG runs, Airflow generates detailed logs that provide insight into the execution process. While these logs can be quite verbose, they are invaluable for debugging and monitoring. Let's look at a simplified version of the output you might see for our `hello_airflow_dag`:

```text
[2025-05-07T14:46:45.518+0000] {dag.py:4435} INFO - dagrun id: mlops_pipeline
[2025-05-07T14:46:45.552+0000] {dag.py:4396} INFO - [DAG TEST] starting task_id=hello_task map_index=-1
Hello from Airflow!
[2025-05-07 14:46:45,606] {python.py:240} INFO - Done. Returned value was: Hello
[2025-05-07T14:46:45.619+0000] {taskinstance.py:353} INFO - Marking task as SUCCESS. dag_id=mlops_pipeline, task_id=hello_task, ...
[2025-05-07T14:46:45.645+0000] {dag.py:4396} INFO - [DAG TEST] starting task_id=goodbye_task map_index=-1
Previous task said: Hello
Goodbye from Airflow!
[2025-05-07 14:46:45,670] {python.py:240} INFO - Done. Returned value was: None
[2025-05-07T14:46:45.673+0000] {taskinstance.py:353} INFO - Marking task as SUCCESS. dag_id=mlops_pipeline, task_id=goodbye_task, ...
[2025-05-07T14:46:45.686+0000] {dagrun.py:854} INFO - Marking run <DagRun mlops_pipeline ...> successful
```

Let's break down what these key lines tell us:

  * `INFO - dagrun id: mlops_pipeline`: This indicates the start of a new DAG run for our `mlops_pipeline` DAG.
  * The next set of lines shows the execution of `hello_task`:
      * `INFO - [DAG TEST] starting task_id=hello_task ...`: Airflow begins executing the `hello_task`.
      * `Hello from Airflow!`: This is the `print()` output from our `say_hello` function.
      * `INFO - Done. Returned value was: Hello`: The task completed and returned "Hello", which Airflow passes via XComs.
      * `INFO - Marking task as SUCCESS. ... task_id=hello_task, ...`: The `hello_task` finished successfully.
  * Following that, we see the execution of `goodbye_task`:
      * `INFO - [DAG TEST] starting task_id=goodbye_task ...`: Airflow starts the `goodbye_task`.
      * `Previous task said: Hello` and `Goodbye from Airflow!`: These are the `print()` outputs from `say_goodbye`, confirming it received the "Hello" string from the first task.
      * `INFO - Done. Returned value was: None`: The `say_goodbye` task completed (returning `None` as it has no explicit return).
      * `INFO - Marking task as SUCCESS. ... task_id=goodbye_task, ...`: The `goodbye_task` also finished successfully.
  * `INFO - Marking run <DagRun mlops_pipeline ...> successful`: Finally, this line confirms that the entire DAG run completed successfully.

This output confirms that our tasks executed in the correct order, data was passed between them as expected, and the overall workflow was successful. In the Airflow UI, you would see a graphical representation of this execution, with green boxes indicating successfully completed tasks.

# Where to Place Your DAG Code

For Airflow to discover and execute your DAGs, your Python files must be placed in a specific directory known as the **DAGs folder**. By default, this is the `dags/` directory inside your Airflow home directory (`$AIRFLOW_HOME/dags/`), but it can be configured differently in your Airflow settings.

When Airflow runs, it continuously scans the DAGs folder for Python files. Any file that contains a DAG definition (i.e., a variable or function that returns a DAG object) will be automatically detected and made available to the Airflow scheduler. This means you don't need to manually register your DAGs: just save your `.py` file in the correct folder, and Airflow will handle the rest.

**Best practices** for organizing your DAG code:

  * Place each DAG in its own Python file for clarity and maintainability.
  * If you have shared code (such as utility functions or custom operators), consider placing them in a separate `utils/` or `plugins/` directory and importing them into your DAG files.

Please note that, in this course, the CodeSignal environment is pre-configured so that any DAG code you write is automatically placed in the correct folder. You don't need to worry about file placement or Airflow configuration—just focus on writing your DAGs, and they'll be picked up and executed by Airflow behind the scenes.

# Conclusion and Next Steps

In this lesson, we've taken our first steps with Apache Airflow by creating a simple DAG with two tasks. We've learned about the core concepts of Airflow: defining workflows as DAGs, creating tasks with the `@task` decorator, and establishing dependencies between them. The TaskFlow API has made this process intuitive by letting us express workflows as regular Python functions while handling the complexities of data passing and dependency management behind the scenes.

While we've built a simple example, these fundamentals form the building blocks for creating complex, production-grade ML retraining pipelines. As you practice these concepts, experiment with adding more tasks, passing different types of data between them, and visualizing the resulting workflow graphs. In the upcoming lessons, we'll expand on these basics to build more sophisticated workflows that handle real machine learning tasks from data processing to model deployment.

## Transform a Function into a DAG

You’ve just learned how Apache Airflow 2.x uses the TaskFlow API to define workflows as Python functions and how tasks are organized in a DAG. Now, let’s put that knowledge into practice by turning a regular Python function into a real Airflow DAG.

Your task is to add the missing @dag decorator above the hello_airflow_dag function. This decorator tells Airflow to treat the function as a workflow. Make sure to include all the required parameters:

dag_id: set this to 'mlops_pipeline'
description: provide a short explanation of the DAG
default_args: use the default_args dictionary already defined in the code
schedule: set to '@daily'
start_date: set to some date using datetime(YYYY, MM, DD)
catchup: set to False
tags: include two tags in a list
Once you add the decorator with the correct parameters, your function will become a proper Airflow DAG. This is a key step in automating workflows with Airflow, so take your time and ensure each parameter is included.

```python
"""
Basic Introduction to Airflow - Simple DAG

This module defines a minimal Airflow DAG with only two tasks to demonstrate
the core concepts of Directed Acyclic Graphs (DAGs) in Airflow using the TaskFlow API.
"""

from datetime import datetime, timedelta
from airflow.decorators import dag, task

# Define default arguments for the DAG
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

# TODO: Add the @dag decorator above the function below.
# The decorator should include these parameters:
# - dag_id: set to 'mlops_pipeline'
# - description: a short description of the DAG
# - default_args: use the default_args dictionary above
# - schedule: set to '@daily'
# - start_date: set to datetime(2023, 1, 1)
# - catchup: set to False
# - tags: include 'intro' and 'basic' in a list
def hello_airflow_dag():
    """
    This DAG demonstrates the basics of Airflow with just two tasks:
    1. A task that says hello
    2. A task that says goodbye and uses the result from the first task
    """
    
    # Define the first task using the @task decorator
    @task(task_id="hello_task")
    def say_hello():
        """Simple function that prints a greeting."""
        print("Hello from Airflow!")
        return "Hello"
    
    # Define the second task using the @task decorator
    @task(task_id="goodbye_task")
    def say_goodbye(first_task_result):
        """
        Function that uses the result from the first task.
        
        Args:
            first_task_result: The result returned by the first task
        """
        print(f"Previous task said: {first_task_result}")
        print("Goodbye from Airflow!")
    
    # Define the task dependencies
    # With TaskFlow API, dependencies are created by function calls
    first_result = say_hello()  # Execute the first task
    say_goodbye(first_result)   # Pass the result to the second task

# Create the DAG instance
hello_airflow_dag()

```

"""
Basic Introduction to Airflow - Simple DAG

This module defines a minimal Airflow DAG with only two tasks to demonstrate
the core concepts of Directed Acyclic Graphs (DAGs) in Airflow using the TaskFlow API.
"""

from datetime import datetime, timedelta
from airflow.decorators import dag, task

# Define default arguments for the DAG
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

# TODO: Add the @dag decorator above the function below.
# The decorator should include these parameters:
# - dag_id: set to 'mlops_pipeline'
# - description: a short description of the DAG
# - default_args: use the default_args dictionary above
# - schedule: set to '@daily'
# - start_date: set to datetime(2023, 1, 1)
# - catchup: set to False
# - tags: include 'intro' and 'basic' in a list
@dag(
    dag_id='mlops_pipeline',
    description='A simple two-task DAG using TaskFlow API',
    default_args=default_args,
    schedule='@daily',
    start_date=datetime(2023, 1, 1),
    catchup=False,
    tags=['intro', 'basic'],
)
def hello_airflow_dag():
    """
    This DAG demonstrates the basics of Airflow with just two tasks:
    1. A task that says hello
    2. A task that says goodbye and uses the result from the first task
    """
    
    # Define the first task using the @task decorator
    @task(task_id="hello_task")
    def say_hello():
        """Simple function that prints a greeting."""
        print("Hello from Airflow!")
        return "Hello"
    
    # Define the second task using the @task decorator
    @task(task_id="goodbye_task")
    def say_goodbye(first_task_result):
        """
        Function that uses the result from the first task.
        
        Args:
            first_task_result: The result returned by the first task
        """
        print(f"Previous task said: {first_task_result}")
        print("Goodbye from Airflow!")
    
    # Define the task dependencies
    # With TaskFlow API, dependencies are created by function calls
    first_result = say_hello()  # Execute the first task
    say_goodbye(first_result)   # Pass the result to the second task

# Create the DAG instance
hello_airflow_dag()

## Turn Functions into Airflow Tasks

You’ve just seen how Airflow uses the TaskFlow API to organize workflows as Python functions inside a DAG. Now, let’s focus on how individual steps in your workflow become Airflow tasks.

In the code below, the two functions that should act as tasks are missing their @task decorators. Your job is to add these decorators so Airflow knows to treat them as tasks within the DAG.

Here’s what you need to do:

Add the @task decorator above the say_hello function, and set its task_id to "hello_task".
Add the @task decorator above the say_goodbye function, and set its task_id to "goodbye_task".
Define the task dependencies in the body of the function hello_airflow_dag function.
The @task decorator is what turns a regular Python function into a task that Airflow can schedule and monitor. Make sure you place the decorator directly above each function definition.

Once you’ve added the decorators and defined the task dependencies, your DAG will be ready to run both tasks as part of the workflow. This is a key step in building automated pipelines with Airflow!

```python
"""
Basic Introduction to Airflow - Simple DAG

This module defines a minimal Airflow DAG with only two tasks to demonstrate
the core concepts of Directed Acyclic Graphs (DAGs) in Airflow using the TaskFlow API.
"""

from datetime import datetime, timedelta
from airflow.decorators import dag, task

# Define default arguments for the DAG
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

# Define the DAG using the TaskFlow API
@dag(
    dag_id='mlops_pipeline',  # Unique identifier for the DAG
    description='A simple two-task DAG using TaskFlow API',
    default_args=default_args,
    schedule='@daily',  # Run daily
    start_date=datetime(2023, 1, 1),  # Start date
    catchup=False,  # Don't run for past dates
    tags=['intro', 'basic'],
)
def hello_airflow_dag():
    """
    This DAG demonstrates the basics of Airflow with just two tasks:
    1. A task that says hello
    2. A task that says goodbye and uses the result from the first task
    """

    # TODO: Add the @task decorator above this function.
    # This decorator turns the function into an Airflow task.
    def say_hello():
        """Simple function that prints a greeting."""
        print("Hello from Airflow!")
        return "Hello"

    # TODO: Add the @task decorator above this function.
    # This decorator turns the function into an Airflow task.
    def say_goodbye(first_task_result):
        """
        Function that uses the result from the first task.

        Args:
            first_task_result: The result returned by the first task
        """
        print(f"Previous task said: {first_task_result}")
        print("Goodbye from Airflow!")

    # TODO: Define the task dependencies using function calls.
    # With TaskFlow API, dependencies are created by calling the task functions
    # and passing the result from one to the next.


# Create the DAG instance
hello_airflow_dag()
```

```python
"""
Basic Introduction to Airflow - Simple DAG

This module defines a minimal Airflow DAG with only two tasks to demonstrate
the core concepts of Directed Acyclic Graphs (DAGs) in Airflow using the TaskFlow API.
"""

from datetime import datetime, timedelta
from airflow.decorators import dag, task

# Define default arguments for the DAG
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

# Define the DAG using the TaskFlow API
@dag(
    dag_id='mlops_pipeline',  # Unique identifier for the DAG
    description='A simple two-task DAG using TaskFlow API',
    default_args=default_args,
    schedule='@daily',  # Run daily
    start_date=datetime(2023, 1, 1),  # Start date
    catchup=False,  # Don't run for past dates
    tags=['intro', 'basic'],
)
def hello_airflow_dag():
    """
    This DAG demonstrates the basics of Airflow with just two tasks:
    1. A task that says hello
    2. A task that says goodbye and uses the result from the first task
    """

    # TODO: Add the @task decorator above this function.
    # This decorator turns the function into an Airflow task.
    @task(task_id="hello_task")
    def say_hello():
        """Simple function that prints a greeting."""
        print("Hello from Airflow!")
        return "Hello"

    # TODO: Add the @task decorator above this function.
    # This decorator turns the function into an Airflow task.
    @task(task_id="goodbye_task")
    def say_goodbye(first_task_result):
        """
        Function that uses the result from the first task.

        Args:
            first_task_result: The result returned by the first task
        """
        print(f"Previous task said: {first_task_result}")
        print("Goodbye from Airflow!")

    # TODO: Define the task dependencies using function calls.
    # With TaskFlow API, dependencies are created by calling the task functions
    # and passing the result from one to the next.
    first_result = say_hello()
    say_goodbye(first_result)


# Create the DAG instance
hello_airflow_dag()
```

## Controlling Workflow Timing in Airflow

Cosmo
Just now
Read message aloud
You’ve just practiced turning Python functions into Airflow tasks using the TaskFlow API. Now, let’s focus on how to control when your workflow runs by adjusting the DAG’s schedule.

In the code below, the DAG is set to run once per day using the @daily schedule. Your task is to change this so the DAG runs every hour instead. This is done by updating the schedule parameter in the @dag decorator.

Understanding how to set the schedule is an important part of building reliable workflows in Airflow.

```python
"""
Basic Introduction to Airflow - Simple DAG

This module defines a minimal Airflow DAG with only two tasks to demonstrate
the core concepts of Directed Acyclic Graphs (DAGs) in Airflow using the TaskFlow API.
"""

from datetime import datetime, timedelta
from airflow.decorators import dag, task

# Define default arguments for the DAG
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

# Define the DAG using the TaskFlow API
@dag(
    dag_id='mlops_pipeline',  # Unique identifier for the DAG
    description='A simple two-task DAG using TaskFlow API',
    default_args=default_args,
    # TODO: Change the schedule parameter below from '@daily' to '@hourly'
    # This will make the DAG run every hour instead of once per day.
    schedule='@daily',  # Run daily
    start_date=datetime(2023, 1, 1),  # Start date
    catchup=False,  # Don't run for past dates
    tags=['intro', 'basic'],
)
def hello_airflow_dag():
    """
    This DAG demonstrates the basics of Airflow with just two tasks:
    1. A task that says hello
    2. A task that says goodbye and uses the result from the first task
    """
    
    # Define the first task using the @task decorator
    @task(task_id="hello_task")
    def say_hello():
        """Simple function that prints a greeting."""
        print("Hello from Airflow!")
        return "Hello"
    
    # Define the second task using the @task decorator
    @task(task_id="goodbye_task")
    def say_goodbye(first_task_result):
        """
        Function that uses the result from the first task.
        
        Args:
            first_task_result: The result returned by the first task
        """
        print(f"Previous task said: {first_task_result}")
        print("Goodbye from Airflow!")
    
    # Define the task dependencies
    # With TaskFlow API, dependencies are created by function calls
    first_result = say_hello()  # Execute the first task
    say_goodbye(first_result)   # Pass the result to the second task

# Create the DAG instance
hello_airflow_dag()
```

```python
"""
Basic Introduction to Airflow - Simple DAG

This module defines a minimal Airflow DAG with only two tasks to demonstrate
the core concepts of Directed Acyclic Graphs (DAGs) in Airflow using the TaskFlow API.
"""

from datetime import datetime, timedelta
from airflow.decorators import dag, task

# Define default arguments for the DAG
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

# Define the DAG using the TaskFlow API
@dag(
    dag_id='mlops_pipeline',  # Unique identifier for the DAG
    description='A simple two-task DAG using TaskFlow API',
    default_args=default_args,
    # TODO: Change the schedule parameter below from '@daily' to '@hourly'
    # This will make the DAG run every hour instead of once per day.
    schedule='@hourly',  # Run hourly
    start_date=datetime(2023, 1, 1),  # Start date
    catchup=False,  # Don't run for past dates
    tags=['intro', 'basic'],
)
def hello_airflow_dag():
    """
    This DAG demonstrates the basics of Airflow with just two tasks:
    1. A task that says hello
    2. A task that says goodbye and uses the result from the first task
    """
    
    # Define the first task using the @task decorator
    @task(task_id="hello_task")
    def say_hello():
        """Simple function that prints a greeting."""
        print("Hello from Airflow!")
        return "Hello"
    
    # Define the second task using the @task decorator
    @task(task_id="goodbye_task")
    def say_goodbye(first_task_result):
        """
        Function that uses the result from the first task.
        
        Args:
            first_task_result: The result returned by the first task
        """
        print(f"Previous task said: {first_task_result}")
        print("Goodbye from Airflow!")
    
    # Define the task dependencies
    # With TaskFlow API, dependencies are created by function calls
    first_result = say_hello()  # Execute the first task
    say_goodbye(first_result)   # Pass the result to the second task

# Create the DAG instance
hello_airflow_dag()
```

## Adding a Third Task to Your DAG

You’ve just practiced building a simple Airflow DAG with two connected tasks and learned how to pass data from one task to another. Now, let’s make your workflow a bit more interesting by adding a third step.

Your goal is to introduce a new task called finalize_greeting into the DAG. This task should take the output from say_goodbye, perform a transformation on it (for example, add extra text or reformat the message), and return the new result. Here’s what you need to do:

Define a new function called finalize_greeting and decorate it with @task(task_id="finalize_task").
Ensure this function takes the output from say_goodbye as its input.
Inside the function, transform the input in some way (for example, add a friendly remark or change the format), print the final message, and return it.
Update the task dependencies so that the output from say_goodbye is passed to finalize_greeting.
This will help you see how to chain multiple tasks together and build more flexible workflows in Airflow.

```python
"""
Basic Introduction to Airflow - Simple DAG

This module defines a minimal Airflow DAG with two tasks to demonstrate
the core concepts of Directed Acyclic Graphs (DAGs) in Airflow using the TaskFlow API.
"""

from datetime import datetime, timedelta
from airflow.decorators import dag, task

# Define default arguments for the DAG
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

# Define the DAG using the TaskFlow API
@dag(
    dag_id='mlops_pipeline',  # Unique identifier for the DAG
    description='A simple two-task DAG using TaskFlow API',
    default_args=default_args,
    schedule='@daily',  # Run daily
    start_date=datetime(2023, 1, 1),  # Start date
    catchup=False,  # Don't run for past dates
    tags=['intro', 'basic'],
)
def hello_airflow_dag():
    """
    This DAG demonstrates the basics of Airflow with just two tasks:
    1. A task that says hello
    2. A task that says goodbye and uses the result from the first task
    """

    @task(task_id="hello_task")
    def say_hello():
        """Simple function that prints a greeting."""
        print("Hello from Airflow!")
        return "Hello"

    @task(task_id="goodbye_task")
    def say_goodbye(first_task_result):
        """
        Function that uses the result from the first task.

        Args:
            first_task_result: The result returned by the first task
        """
        print(f"Previous task said: {first_task_result}")
        print("Goodbye from Airflow!")
        # The goodbye task should return a value so the next task can use it
        return "Goodbye"

    # TODO: Define a new task called finalize_greeting using the @task decorator.
    # - The function should take the output from say_goodbye as input
    # - It should transform this input in some way, creating a reformatted "final message"
    # - It should print this final message and return it

    # Define the task dependencies
    first_result = say_hello()  # Execute the first task
    goodbye_result = say_goodbye(first_result)
    # TODO: Pass the goodbye_result to the finalize_greeting task
    # finalize_greeting(goodbye_result)

# Create the DAG instance
hello_airflow_dag()

```

```python
"""
Basic Introduction to Airflow - Simple DAG

This module defines a minimal Airflow DAG with two tasks to demonstrate
the core concepts of Directed Acyclic Graphs (DAGs) in Airflow using the TaskFlow API.
"""

from datetime import datetime, timedelta
from airflow.decorators import dag, task

# Define default arguments for the DAG
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

# Define the DAG using the TaskFlow API
@dag(
    dag_id='mlops_pipeline',  # Unique identifier for the DAG
    description='A simple two-task DAG using TaskFlow API',
    default_args=default_args,
    schedule='@daily',  # Run daily
    start_date=datetime(2023, 1, 1),  # Start date
    catchup=False,  # Don't run for past dates
    tags=['intro', 'basic'],
)
def hello_airflow_dag():
    """
    This DAG demonstrates the basics of Airflow with just two tasks:
    1. A task that says hello
    2. A task that says goodbye and uses the result from the first task
    """

    @task(task_id="hello_task")
    def say_hello():
        """Simple function that prints a greeting."""
        print("Hello from Airflow!")
        return "Hello"

    @task(task_id="goodbye_task")
    def say_goodbye(first_task_result):
        """
        Function that uses the result from the first task.

        Args:
            first_task_result: The result returned by the first task
        """
        print(f"Previous task said: {first_task_result}")
        print("Goodbye from Airflow!")
        # The goodbye task should return a value so the next task can use it
        return "Goodbye"

    # TODO: Define a new task called finalize_greeting using the @task decorator.
    # - The function should take the output from say_goodbye as input
    # - It should transform this input in some way, creating a reformatted "final message"
    # - It should print this final message and return it
    @task(task_id="finalize_task")
    def finalize_greeting(goodbye_message):
        """
        Function that takes the output from the previous task and finalizes the message.

        Args:
            goodbye_message: The result returned by the previous task
        """
        final_message = f"Final message: '{goodbye_message}' - It was a pleasure!"
        print(final_message)
        return final_message

    # Define the task dependencies
    first_result = say_hello()  # Execute the first task
    goodbye_result = say_goodbye(first_result)
    # TODO: Pass the goodbye_result to the finalize_greeting task
    finalize_greeting(goodbye_result)

# Create the DAG instance
hello_airflow_dag()
```

## Measuring Your Workflow Output

You’ve just expanded your Airflow DAG to include three connected tasks, building a simple workflow that passes data from one step to the next. Now, let’s make your pipeline even more useful by adding a task that measures the length of your final greeting message.

Your goal is to add a new task called message_length to the DAG. This task should run after finalize_greeting and calculate how many characters are in the finalized greeting message.

Here’s what you need to do:

Define a new function named message_length and decorate it with @task(task_id="length_task").
Ensure this function takes the finalized greeting message as its input.
Inside the function, calculate the length of the message using Python’s len() function, print the result, and return the length as an integer.
Update the task dependencies so that the output from finalize_greeting is passed to your new message_length task.
This exercise will help you practice adding new steps to your workflow and working with task outputs in Airflow.

```python
"""
Basic Introduction to Airflow - Simple DAG

This module defines a minimal Airflow DAG with three tasks to demonstrate
the core concepts of Directed Acyclic Graphs (DAGs) in Airflow using the TaskFlow API.
"""

from datetime import datetime, timedelta
from airflow.decorators import dag, task

# Define default arguments for the DAG
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

# Define the DAG using the TaskFlow API
@dag(
    dag_id='mlops_pipeline',  # Unique identifier for the DAG
    description='A simple three-task DAG using TaskFlow API',
    default_args=default_args,
    schedule='@daily',  # Run daily
    start_date=datetime(2023, 1, 1),  # Start date
    catchup=False,  # Don't run for past dates
    tags=['intro', 'basic'],
)
def hello_airflow_dag():
    """
    This DAG demonstrates the basics of Airflow with three tasks:
    1. A task that says hello
    2. A task that says goodbye and uses the result from the first task
    3. A task that finalizes the greeting by transforming the output from the second task
    """

    @task(task_id="hello_task")
    def say_hello():
        """Simple function that prints a greeting."""
        print("Hello from Airflow!")
        return "Hello"

    @task(task_id="goodbye_task")
    def say_goodbye(first_task_result):
        """
        Function that uses the result from the first task.

        Args:
            first_task_result: The result returned by the first task
        """
        print(f"Previous task said: {first_task_result}")
        print("Goodbye from Airflow!")
        return "Goodbye"

    @task(task_id="finalize_task")
    def finalize_greeting(goodbye_result):
        """
        Finalizes the greeting by transforming the output from say_goodbye.

        Args:
            goodbye_result: The result returned by the say_goodbye task
        """
        final_message = f"{goodbye_result}! Have a great day :)"
        print(f"Final message: {final_message}")
        return final_message

    # TODO: Define a new task called message_length using the @task decorator.
    # - The function should take the finalized greeting message as input (e.g., final_message)
    # - It should calculate the length of the message, print it, and return the length as an integer.

    # Define the task dependencies
    first_result = say_hello()  # Execute the first task
    goodbye_result = say_goodbye(first_result)   # Pass the result to the second task
    final_message = finalize_greeting(goodbye_result)  # Pass the result to the third task

    # TODO: Pass the final_message to the message_length task and capture the result if you want
    

# Create the DAG instance
hello_airflow_dag()

```

To achieve your goal of adding a new `message_length` task, you need to follow the steps you outlined. Here is the completed Python code with the new task and updated dependencies.

I've defined the `message_length` function, decorated it with `@task`, and made sure it correctly takes the output from `finalize_greeting`. The dependencies have also been updated to connect the new task to the workflow.

```python
"""
Basic Introduction to Airflow - Simple DAG

This module defines a minimal Airflow DAG with three tasks to demonstrate
the core concepts of Directed Acyclic Graphs (DAGs) in Airflow using the TaskFlow API.
"""

from datetime import datetime, timedelta
from airflow.decorators import dag, task

# Define default arguments for the DAG
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

# Define the DAG using the TaskFlow API
@dag(
    dag_id='mlops_pipeline',  # Unique identifier for the DAG
    description='A simple three-task DAG using TaskFlow API',
    default_args=default_args,
    schedule='@daily',  # Run daily
    start_date=datetime(2023, 1, 1),  # Start date
    catchup=False,  # Don't run for past dates
    tags=['intro', 'basic'],
)
def hello_airflow_dag():
    """
    This DAG demonstrates the basics of Airflow with three tasks:
    1. A task that says hello
    2. A task that says goodbye and uses the result from the first task
    3. A task that finalizes the greeting by transforming the output from the second task
    """

    @task(task_id="hello_task")
    def say_hello():
        """Simple function that prints a greeting."""
        print("Hello from Airflow!")
        return "Hello"

    @task(task_id="goodbye_task")
    def say_goodbye(first_task_result):
        """
        Function that uses the result from the first task.

        Args:
            first_task_result: The result returned by the first task
        """
        print(f"Previous task said: {first_task_result}")
        print("Goodbye from Airflow!")
        return "Goodbye"

    @task(task_id="finalize_task")
    def finalize_greeting(goodbye_result):
        """
        Finalizes the greeting by transforming the output from say_goodbye.

        Args:
            goodbye_result: The result returned by the say_goodbye task
        """
        final_message = f"{goodbye_result}! Have a great day :)"
        print(f"Final message: {final_message}")
        return final_message

    # Define a new task called message_length using the @task decorator.
    # The function takes the finalized greeting message as input (e.g., final_message)
    # It calculates the length of the message, prints it, and returns the length as an integer.
    @task(task_id="length_task")
    def message_length(final_message):
        """Calculates and returns the length of a message."""
        length = len(final_message)
        print(f"The final message is {length} characters long.")
        return length

    # Define the task dependencies
    first_result = say_hello()  # Execute the first task
    goodbye_result = say_goodbye(first_result)   # Pass the result to the second task
    final_message = finalize_greeting(goodbye_result)  # Pass the result to the third task

    # Pass the final_message to the message_length task
    message_length(final_message)

# Create the DAG instance
hello_airflow_dag()

```

## Build a Time Formatting Workflow

You’ve just practiced building Airflow DAGs by adding and connecting tasks using the TaskFlow API. Now, let’s take it a step further by creating a brand new DAG from scratch.

Your goal is to write a complete Airflow DAG that gets the current time and formats it into a readable string. This will help you put together everything you’ve learned so far.

Here’s what you need to do:

Start by defining your DAG function.
Inside your DAG function, create two tasks using the @task decorator:
The first task, get_current_time, should return the current date and time.
The second task, format_time, should take the date and time from the first task and format it as a string like "January 1, 2023, 12:00 AM".
Make sure to set up the task dependency chain so that the output of get_current_time is passed to format_time.
At the end, create the DAG instance by calling your DAG function.
This exercise will help you see how all the pieces fit together when building a workflow from the ground up.


```python
"""
Time Formatter DAG

This module will define an Airflow DAG using the TaskFlow API that gets the current datetime
and formats it into a readable string. The workflow should have two tasks:
1. get_current_time: Returns the current datetime.
2. format_time: Formats the datetime into a string like "January 1, 2023, 12:00 AM".
"""

from datetime import datetime, timedelta
from airflow.decorators import dag, task

# Define default arguments for the DAG
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

@dag(
    dag_id='time_formatter',
    description='A DAG that gets the current time and formats it as a readable string.',
    default_args=default_args,
    schedule='@daily',
    start_date=datetime(2023, 1, 1),
    catchup=False,
    tags=['time', 'format'],
)
# TODO: Define the DAG function (e.g., time_formatter_dag).

# Inside this function, define two tasks using the @task decorator:

    # TODO: Define get_current_time task
    # - The function should return the current datetime (use datetime.now())

    # TODO: Define format_time task
    # - The function should take a datetime object as input
    # - It should format the datetime as a string like "January 1, 2023, 12:00 AM"
    #   (Hint: use strftime with "%B %-d, %Y, %I:%M %p")
    # - The function should print and return the formatted string

    # TODO: Set up the task dependency chain:
    # - Call get_current_time() and pass its result to format_time()

# TODO: At the end of the file, create the DAG instance by calling your DAG function.
```

Great work\! Building a new DAG from scratch is a fantastic way to solidify your understanding of Airflow's core concepts. Here is the completed code that defines a new DAG with two connected tasks to get and format the current time.

```python
"""
Time Formatter DAG

This module will define an Airflow DAG using the TaskFlow API that gets the current datetime
and formats it into a readable string. The workflow should have two tasks:
1. get_current_time: Returns the current datetime.
2. format_time: Formats the datetime into a string like "January 1, 2023, 12:00 AM".
"""

from datetime import datetime, timedelta
from airflow.decorators import dag, task

# Define default arguments for the DAG
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

@dag(
    dag_id='time_formatter',
    description='A DAG that gets the current time and formats it as a readable string.',
    default_args=default_args,
    schedule='@daily',
    start_date=datetime(2023, 1, 1),
    catchup=False,
    tags=['time', 'format'],
)
# Define the DAG function (e.g., time_formatter_dag).
def time_formatter_dag():
    """
    A simple DAG with two tasks: one to get the current time and one to format it.
    """
    # Define get_current_time task
    @task(task_id="get_time_task")
    def get_current_time():
        """Returns the current date and time."""
        current_time = datetime.now()
        print(f"Current time: {current_time}")
        return current_time

    # Define format_time task
    @task(task_id="format_time_task")
    def format_time(datetime_object):
        """Formats a datetime object into a readable string."""
        formatted_string = datetime_object.strftime("%B %-d, %Y, %I:%M %p")
        print(f"Formatted time: {formatted_string}")
        return formatted_string

    # Set up the task dependency chain:
    # Call get_current_time() and pass its result to format_time()
    current_time_object = get_current_time()
    format_time(current_time_object)

# At the end of the file, create the DAG instance by calling your DAG function.
time_formatter_dag()
```

### Explanation of the Solution

  * **DAG Function:** The code is wrapped in the `time_formatter_dag()` function, which is decorated with `@dag`. This is the standard way to define a DAG using the TaskFlow API.
  * **`get_current_time` Task:** This task, decorated with `@task`, simply calls `datetime.now()` to get the current date and time. It returns this `datetime` object, which Airflow automatically handles and makes available to the next task.
  * **`format_time` Task:** This task takes the `datetime_object` as an argument. The **`.strftime()`** method is used to format the object into a specific string. The format string `"%B %-d, %Y, %I:%M %p"` is a key part of the solution, as it produces the desired output:
      * `%B`: Full month name (e.g., "January")
      * `%-d`: Day of the month without a leading zero (e.g., "1" instead of "01")
      * `%Y`: Four-digit year (e.g., "2023")
      * `%I`: Hour in 12-hour format (e.g., "12")
      * `%M`: Minute with a leading zero if needed (e.g., "00")
      * `%p`: AM or PM
  * **Dependency Chain:** The dependency is created by calling `get_current_time()` and then passing its output directly as the argument to `format_time()`. Airflow handles the data passing behind the scenes, ensuring the tasks run in the correct order.

This completed DAG is ready to be deployed and scheduled in an Airflow environment.

## Build a Three Step Greeting Workflow

You’ve just practiced building Airflow DAGs by adding tasks and connecting them with the TaskFlow API. Now, let’s see if you can put all the pieces together by creating a complete workflow from scratch.

Your task is to write a full Airflow DAG that runs a three-step greeting pipeline. Here’s what you need to do:

Set up the default arguments for your DAG, including owner, retry settings, and timing.
Use the @dag decorator to define your DAG function, and include all the required parameters: dag_id, description, default_args, schedule, start_date, catchup, and tags. Make sure to use greeting_pipeline as dag_id.
Inside your DAG function, create three tasks using the @task decorator:
The first task, greet, should return a simple greeting message (like "Hello").
The second task, emphasize, should take the greeting, transform it (for example, make it uppercase and add exclamation marks), and return this new string.
The third task, farewell, should take the emphasized greeting, append a farewell message to it and print it.
Make sure to set up the task dependencies so that the output of each task is passed to the next one in the pipeline.
At the end, create the DAG instance by calling your DAG function.
This is your chance to show you can build a small workflow from start to finish using Airflow’s TaskFlow API.

```python
"""
Greeting Pipeline DAG

This module will define an Airflow DAG using the TaskFlow API that runs a three-step greeting pipeline:
1. greet: Returns a greeting message.
2. emphasize: Transforms the greeting (e.g., makes it uppercase and adds exclamation marks).
3. farewell: Appends a farewell message to the emphasized greeting.

Your task is to write the full DAG, including all three tasks, the default_args, the DAG decorator, and the correct task dependencies.
"""

from datetime import datetime, timedelta
from airflow.decorators import dag, task

# TODO: Define the default_args dictionary for the DAG.
# It should include:
# - 'owner': 'airflow'
# - 'depends_on_past': False
# - 'email_on_failure': False
# - 'email_on_retry': False
# - 'retries': 1
# - 'retry_delay': timedelta(minutes=1)

# TODO: Use the @dag decorator to define a DAG function.
# The decorator should include:
# - dag_id: 'greeting_pipeline'
# - description
# - default_args
# - schedule
# - start_date
# - catchup
# - tags

# TODO: Define the DAG function (e.g., greeting_pipeline_dag).
# Inside this function, define three tasks using the @task decorator:

    # TODO: Define greet task
    # - The function should return a simple greeting string (e.g., "Hello")

    # TODO: Define emphasize task
    # - The function should take the greeting as input
    # - It should transform the greeting (e.g., make it uppercase and add exclamation marks)
    # - It should return the emphasized greeting

    # TODO: Define farewell task
    # - The function should take the emphasized greeting as input
    # - It should append a farewell message (e.g., "Goodbye!") to the emphasized greeting
    # - It should print and return the final message

    # TODO: Set up the task dependency chain:
    # - Call greet() and pass its result to emphasize()
    # - Pass the result of emphasize() to farewell()

# TODO: At the end of the file, create the DAG instance by calling your DAG function.
```

Great\! Building a complete DAG from scratch is a key step in mastering Airflow. Here is the full code that implements a three-step greeting pipeline, including the default arguments, the DAG decorator, the three connected tasks, and the final DAG instance.

-----

### Three-Step Greeting Workflow

```python
"""
Greeting Pipeline DAG

This module will define an Airflow DAG using the TaskFlow API that runs a three-step greeting pipeline:
1. greet: Returns a greeting message.
2. emphasize: Transforms the greeting (e.g., makes it uppercase and adds exclamation marks).
3. farewell: Appends a farewell message to the emphasized greeting.

Your task is to write the full DAG, including all three tasks, the default_args, the DAG decorator, and the correct task dependencies.
"""

from datetime import datetime, timedelta
from airflow.decorators import dag, task

# Define the default_args dictionary for the DAG.
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

# Use the @dag decorator to define a DAG function.
@dag(
    dag_id='greeting_pipeline',
    description='A three-step greeting pipeline using the TaskFlow API.',
    default_args=default_args,
    schedule='@daily',
    start_date=datetime(2023, 1, 1),
    catchup=False,
    tags=['tutorial', 'pipeline'],
)
# Define the DAG function (e.g., greeting_pipeline_dag).
def greeting_pipeline_dag():
    """
    A DAG with three tasks to create and transform a greeting message.
    """
    # Define greet task
    @task(task_id="greet")
    def greet():
        """Returns a simple greeting string."""
        print("Starting the greeting pipeline.")
        return "Hello"

    # Define emphasize task
    @task(task_id="emphasize")
    def emphasize(greeting_message):
        """Transforms the greeting message."""
        emphasized_message = f"{greeting_message.upper()}!!!"
        print(f"Emphasized greeting: {emphasized_message}")
        return emphasized_message

    # Define farewell task
    @task(task_id="farewell")
    def farewell(emphasized_greeting):
        """Appends a farewell message to the emphasized greeting."""
        final_message = f"{emphasized_greeting} Goodbye!"
        print(f"Final message: {final_message}")
        return final_message

    # Set up the task dependency chain:
    # Call greet() and pass its result to emphasize()
    # Pass the result of emphasize() to farewell()
    initial_greeting = greet()
    emphasized_greeting = emphasize(initial_greeting)
    farewell(emphasized_greeting)

# At the end of the file, create the DAG instance by calling your DAG function.
greeting_pipeline_dag()
```