# Lab 2: DAGs and Tasks - Creating Workflows with Task SDK

## üéØ Objectives
- Learn how to create DAGs with Task SDK (@dag, @task decorators)
- Understand how to define tasks and dependencies
- Handle errors and retries
- Create a simple ETL pipeline

## üìã Prerequisites
- Completed Lab 1: Airflow Basics
- Airflow cluster is running
- Basic Python knowledge

## üèóÔ∏è Task SDK Overview
Airflow 3.x introduces Task SDK with decorators:
- `@dag`: Define DAG
- `@task`: Define Python task
- `@task.bash`: Define bash task
- `@task.docker`: Define Docker task


## 1. Import Libraries


In [None]:
# Import Airflow Task SDK
from airflow.sdk import DAG, task
from airflow.sdk.task import Task
import pendulum
from datetime import datetime, timedelta

print("‚úÖ Airflow Task SDK imported successfully!")
print(f"üì¶ Airflow version: Check Airflow UI or CLI")


## 2. Create Simple DAG with @dag Decorator


In [None]:
# Create simple DAG with @dag decorator
@dag(
    dag_id="simple_dag_example",
    schedule=None,  # Manual trigger
    start_date=pendulum.datetime(2024, 1, 1, tz="UTC"),
    catchup=False,
    tags=["tutorial", "example"],
)
def simple_dag():
    """
    ### Simple DAG Example
    Simple DAG with a single task.
    """
    
    @task
    def hello_task():
        """Print hello message"""
        print("Hello from Airflow Task SDK!")
        return "Task completed"
    
    hello_task()

# Create DAG instance
dag_instance = simple_dag()

print("‚úÖ DAG created successfully!")
print(f"DAG ID: {dag_instance.dag_id}")
print(f"Tasks: {[task.task_id for task in dag_instance.tasks]}")


## 3. Create ETL Pipeline with Task Dependencies

We will create a simple ETL pipeline with 3 tasks:
1. Extract: Get data
2. Transform: Transform data
3. Load: Save data


In [None]:
# ETL Pipeline Example
@dag(
    dag_id="etl_pipeline_example",
    schedule="@daily",
    start_date=pendulum.datetime(2024, 1, 1, tz="UTC"),
    catchup=False,
    tags=["etl", "pipeline"],
)
def etl_pipeline():
    """
    ### ETL Pipeline
    Extract, Transform, Load pipeline example.
    """
    
    @task
    def extract():
        """Extract data from source"""
        print("Extracting data...")
        # Simulate data extraction
        data = {
            "users": [
                {"id": 1, "name": "Alice", "age": 30},
                {"id": 2, "name": "Bob", "age": 25},
                {"id": 3, "name": "Charlie", "age": 35},
            ],
            "timestamp": datetime.now().isoformat(),
        }
        print(f"Extracted {len(data['users'])} records")
        return data
    
    @task
    def transform(data: dict):
        """Transform extracted data"""
        print("Transforming data...")
        users = data["users"]
        
        # Calculate statistics
        total_age = sum(user["age"] for user in users)
        avg_age = total_age / len(users)
        
        transformed = {
            "total_users": len(users),
            "average_age": avg_age,
            "timestamp": data["timestamp"],
        }
        print(f"Transformed: {transformed}")
        return transformed
    
    @task
    def load(data: dict):
        """Load transformed data to destination"""
        print("Loading data...")
        print(f"Loading {data['total_users']} users with avg age {data['average_age']:.2f}")
        return f"Loaded {data['total_users']} records successfully"
    
    # Define dependencies
    extracted_data = extract()
    transformed_data = transform(extracted_data)
    load(transformed_data)

# Create DAG
etl_dag = etl_pipeline()

print("‚úÖ ETL Pipeline DAG created!")
print(f"Tasks: {[task.task_id for task in etl_dag.tasks]}")


## 4. Error Handling and Retries

Airflow supports retry logic to automatically handle failures.


In [None]:
# DAG with retry logic
@dag(
    dag_id="retry_example",
    schedule=None,
    start_date=pendulum.datetime(2024, 1, 1, tz="UTC"),
    catchup=False,
    default_args={
        "retries": 3,  # Retry 3 times
        "retry_delay": timedelta(minutes=1),  # Wait 1 minute between retries
    },
    tags=["retry", "error-handling"],
)
def retry_example():
    """
    ### Retry Example
    DAG with retry logic to handle failures.
    """
    
    @task(retries=2, retry_delay=timedelta(seconds=30))
    def unreliable_task():
        """Task that may fail"""
        import random
        
        # Simulate random failure (50% chance)
        if random.random() < 0.5:
            print("Task failed! Will retry...")
            raise Exception("Random failure occurred")
        else:
            print("Task succeeded!")
            return "Success"
    
    @task
    def always_succeed():
        """Task that always succeeds"""
        print("This task always succeeds")
        return "Done"
    
    unreliable_task() >> always_succeed()

# Create DAG
retry_dag = retry_example()

print("‚úÖ Retry Example DAG created!")
print("üí° This DAG demonstrates retry logic for error handling")


## 5. Summary and Next Steps

### ‚úÖ What we learned:
1. Create DAGs with @dag decorator
2. Create tasks with @task decorator
3. Define task dependencies
4. Handle errors with retries

### üìö Next Lab:
- **Lab 3**: Operators and Hooks
- Use BashOperator, PythonOperator
- SQLExecuteQueryOperator
- Custom operators and hooks

### üîó Useful Links:
- [Task SDK Documentation](https://airflow.apache.org/docs/apache-airflow/3.1.1/task-sdk/index.html)
- [DAG Best Practices](https://airflow.apache.org/docs/apache-airflow/3.1.1/best-practices/index.html)
