# Lab 2: DAGs v√† Tasks - T·∫°o Workflows v·ªõi Task SDK

## üéØ Objectives
- H·ªçc c√°ch t·∫°o DAGs v·ªõi Task SDK (@dag, @task decorators)
- Hi·ªÉu c√°ch ƒë·ªãnh nghƒ©a tasks v√† dependencies
- X·ª≠ l√Ω errors v√† retries
- T·∫°o ETL pipeline ƒë∆°n gi·∫£n

## üìã Prerequisites
- Ho√†n th√†nh Lab 1: Airflow Basics
- Airflow cluster ƒëang ch·∫°y
- Basic Python knowledge

## üèóÔ∏è Task SDK Overview
Airflow 3.x gi·ªõi thi·ªáu Task SDK v·ªõi decorators:
- `@dag`: ƒê·ªãnh nghƒ©a DAG
- `@task`: ƒê·ªãnh nghƒ©a Python task
- `@task.bash`: ƒê·ªãnh nghƒ©a bash task
- `@task.docker`: ƒê·ªãnh nghƒ©a Docker task


## 1. Import Libraries


In [None]:
# Import Airflow Task SDK
from airflow.sdk import DAG, task
from airflow.sdk.task import Task
import pendulum
from datetime import datetime, timedelta

print("‚úÖ Airflow Task SDK imported successfully!")
print(f"üì¶ Airflow version: Check Airflow UI or CLI")


## 2. T·∫°o DAG ƒë∆°n gi·∫£n v·ªõi @dag decorator


In [None]:
# T·∫°o DAG ƒë∆°n gi·∫£n v·ªõi @dag decorator
@dag(
    dag_id="simple_dag_example",
    schedule=None,  # Manual trigger
    start_date=pendulum.datetime(2024, 1, 1, tz="UTC"),
    catchup=False,
    tags=["tutorial", "example"],
)
def simple_dag():
    """
    ### Simple DAG Example
    DAG ƒë∆°n gi·∫£n v·ªõi m·ªôt task duy nh·∫•t.
    """
    
    @task
    def hello_task():
        """Print hello message"""
        print("Hello from Airflow Task SDK!")
        return "Task completed"
    
    hello_task()

# T·∫°o DAG instance
dag_instance = simple_dag()

print("‚úÖ DAG created successfully!")
print(f"DAG ID: {dag_instance.dag_id}")
print(f"Tasks: {[task.task_id for task in dag_instance.tasks]}")


## 3. T·∫°o ETL Pipeline v·ªõi Task Dependencies

Ch√∫ng ta s·∫Ω t·∫°o m·ªôt ETL pipeline ƒë∆°n gi·∫£n v·ªõi 3 tasks:
1. Extract: L·∫•y d·ªØ li·ªáu
2. Transform: Bi·∫øn ƒë·ªïi d·ªØ li·ªáu
3. Load: L∆∞u d·ªØ li·ªáu


In [None]:
# ETL Pipeline Example
@dag(
    dag_id="etl_pipeline_example",
    schedule="@daily",
    start_date=pendulum.datetime(2024, 1, 1, tz="UTC"),
    catchup=False,
    tags=["etl", "pipeline"],
)
def etl_pipeline():
    """
    ### ETL Pipeline
    Extract, Transform, Load pipeline example.
    """
    
    @task
    def extract():
        """Extract data from source"""
        print("Extracting data...")
        # Simulate data extraction
        data = {
            "users": [
                {"id": 1, "name": "Alice", "age": 30},
                {"id": 2, "name": "Bob", "age": 25},
                {"id": 3, "name": "Charlie", "age": 35},
            ],
            "timestamp": datetime.now().isoformat(),
        }
        print(f"Extracted {len(data['users'])} records")
        return data
    
    @task
    def transform(data: dict):
        """Transform extracted data"""
        print("Transforming data...")
        users = data["users"]
        
        # Calculate statistics
        total_age = sum(user["age"] for user in users)
        avg_age = total_age / len(users)
        
        transformed = {
            "total_users": len(users),
            "average_age": avg_age,
            "timestamp": data["timestamp"],
        }
        print(f"Transformed: {transformed}")
        return transformed
    
    @task
    def load(data: dict):
        """Load transformed data to destination"""
        print("Loading data...")
        print(f"Loading {data['total_users']} users with avg age {data['average_age']:.2f}")
        return f"Loaded {data['total_users']} records successfully"
    
    # Define dependencies
    extracted_data = extract()
    transformed_data = transform(extracted_data)
    load(transformed_data)

# Create DAG
etl_dag = etl_pipeline()

print("‚úÖ ETL Pipeline DAG created!")
print(f"Tasks: {[task.task_id for task in etl_dag.tasks]}")


## 4. Error Handling v√† Retries

Airflow h·ªó tr·ª£ retry logic ƒë·ªÉ x·ª≠ l√Ω failures t·ª± ƒë·ªông.


In [None]:
# DAG v·ªõi retry logic
@dag(
    dag_id="retry_example",
    schedule=None,
    start_date=pendulum.datetime(2024, 1, 1, tz="UTC"),
    catchup=False,
    default_args={
        "retries": 3,  # Retry 3 times
        "retry_delay": timedelta(minutes=1),  # Wait 1 minute between retries
    },
    tags=["retry", "error-handling"],
)
def retry_example():
    """
    ### Retry Example
    DAG v·ªõi retry logic ƒë·ªÉ x·ª≠ l√Ω failures.
    """
    
    @task(retries=2, retry_delay=timedelta(seconds=30))
    def unreliable_task():
        """Task c√≥ th·ªÉ fail"""
        import random
        
        # Simulate random failure (50% chance)
        if random.random() < 0.5:
            print("Task failed! Will retry...")
            raise Exception("Random failure occurred")
        else:
            print("Task succeeded!")
            return "Success"
    
    @task
    def always_succeed():
        """Task lu√¥n th√†nh c√¥ng"""
        print("This task always succeeds")
        return "Done"
    
    unreliable_task() >> always_succeed()

# Create DAG
retry_dag = retry_example()

print("‚úÖ Retry Example DAG created!")
print("üí° This DAG demonstrates retry logic for error handling")


## 5. T√≥m t·∫Øt v√† Next Steps

### ‚úÖ Nh·ªØng g√¨ ƒë√£ h·ªçc:
1. T·∫°o DAGs v·ªõi @dag decorator
2. T·∫°o tasks v·ªõi @task decorator
3. ƒê·ªãnh nghƒ©a task dependencies
4. X·ª≠ l√Ω errors v·ªõi retries

### üìö Next Lab:
- **Lab 3**: Operators v√† Hooks
- S·ª≠ d·ª•ng BashOperator, PythonOperator
- SQLExecuteQueryOperator
- Custom operators v√† hooks

### üîó Useful Links:
- [Task SDK Documentation](https://airflow.apache.org/docs/apache-airflow/3.1.1/task-sdk/index.html)
- [DAG Best Practices](https://airflow.apache.org/docs/apache-airflow/3.1.1/best-practices/index.html)
