# Lab 4: Task Dependencies v√† Branching - Qu·∫£n l√Ω Workflow Logic

## üéØ Objectives
- Hi·ªÉu c√°ch ƒë·ªãnh nghƒ©a task dependencies v·ªõi bitshift operators
- S·ª≠ d·ª•ng BranchPythonOperator cho conditional logic
- √Åp d·ª•ng trigger rules ƒë·ªÉ x·ª≠ l√Ω failures
- T·∫°o dynamic tasks v·ªõi task mapping
- S·ª≠ d·ª•ng TaskGroups ƒë·ªÉ t·ªï ch·ª©c tasks
- X√¢y d·ª±ng conditional workflows ph·ª©c t·∫°p

## üìã Prerequisites
- Ho√†n th√†nh Lab 1-3
- Hi·ªÉu operators v√† tasks
- Airflow cluster ƒëang ch·∫°y

## üèóÔ∏è Dependencies Overview
Task dependencies trong Airflow c√≥ th·ªÉ ƒë∆∞·ª£c ƒë·ªãnh nghƒ©a b·∫±ng:
- **Bitshift operators**: `>>` (set downstream), `<<` (set upstream)
- **Methods**: `set_downstream()`, `set_upstream()`
- **Lists**: Multiple dependencies c√πng l√∫c


## 1. Import Libraries v√† Setup


In [None]:
# Import Airflow dependencies v√† branching
from airflow.sdk import DAG, task
from airflow.providers.standard.operators.empty import EmptyOperator
from airflow.providers.standard.operators.python import BranchPythonOperator, PythonOperator
from airflow.providers.standard.operators.bash import BashOperator
from airflow.sdk.task_group import TaskGroup

import pendulum
from datetime import datetime, timedelta
import random

print("‚úÖ Airflow dependencies v√† branching modules imported successfully!")


## 2. Bitshift Operators - ƒê·ªãnh nghƒ©a Dependencies

Airflow s·ª≠ d·ª•ng bitshift operators (`>>` v√† `<<`) ƒë·ªÉ ƒë·ªãnh nghƒ©a dependencies m·ªôt c√°ch tr·ª±c quan v√† Pythonic.


In [None]:
# DAG v·ªõi bitshift operators
@dag(
    dag_id="bitshift_dependencies_example",
    schedule=None,
    start_date=pendulum.datetime(2024, 1, 1, tz="UTC"),
    catchup=False,
    tags=["dependencies", "bitshift"],
)
def bitshift_dependencies_dag():
    """
    ### Bitshift Operators Example
    DAG m·∫´u s·ª≠ d·ª•ng bitshift operators ƒë·ªÉ ƒë·ªãnh nghƒ©a dependencies.
    """
    
    # T·∫°o tasks
    start = EmptyOperator(task_id="start")
    task_a = EmptyOperator(task_id="task_a")
    task_b = EmptyOperator(task_id="task_b")
    task_c = EmptyOperator(task_id="task_c")
    task_d = EmptyOperator(task_id="task_d")
    end = EmptyOperator(task_id="end")
    
    # C√°ch 1: Single dependency v·ªõi >>
    # start >> task_a c√≥ nghƒ©a l√† task_a ph·ª• thu·ªôc v√†o start
    start >> task_a
    
    # C√°ch 2: Multiple downstream tasks
    # task_a ch·∫°y tr∆∞·ªõc task_b v√† task_c
    task_a >> [task_b, task_c]
    
    # C√°ch 3: Multiple upstream tasks
    # task_d ph·ª• thu·ªôc v√†o c·∫£ task_b v√† task_c
    [task_b, task_c] >> task_d
    
    # C√°ch 4: Chain dependencies
    task_d >> end
    
    # Ho·∫∑c vi·∫øt g·ªçn h∆°n:
    # start >> task_a >> [task_b, task_c] >> task_d >> end

# Create DAG
bitshift_dag = bitshift_dependencies_dag()

print("‚úÖ Bitshift Dependencies DAG created!")
print(f"Tasks: {[task.task_id for task in bitshift_dag.tasks]}")
print("\nüìä Dependency Flow:")
print("start ‚Üí task_a ‚Üí [task_b, task_c] ‚Üí task_d ‚Üí end")


## 3. BranchPythonOperator - Conditional Branching

BranchPythonOperator cho ph√©p ch·ªçn nh√°nh n√†o s·∫Ω ch·∫°y d·ª±a tr√™n ƒëi·ªÅu ki·ªán. Function ph·∫£i return task_id(s) c·ªßa task(s) ti·∫øp theo.


In [None]:
# DAG v·ªõi BranchPythonOperator
@dag(
    dag_id="branching_example",
    schedule=None,
    start_date=pendulum.datetime(2024, 1, 1, tz="UTC"),
    catchup=False,
    tags=["branching", "conditional"],
)
def branching_dag():
    """
    ### Branching Example
    DAG m·∫´u s·ª≠ d·ª•ng BranchPythonOperator cho conditional logic.
    """
    
    start = EmptyOperator(task_id="start")
    
    # Branch function - return task_id(s) c·ªßa task(s) s·∫Ω ch·∫°y
    def choose_branch(**context):
        """Ch·ªçn branch d·ª±a tr√™n ƒëi·ªÅu ki·ªán"""
        execution_date = context['data_interval_start']
        day_of_week = execution_date.weekday()
        
        # Th·ª© 2 (0) ho·∫∑c th·ª© 6 (4): ch·∫°y c·∫£ 2 branches
        if day_of_week == 0 or day_of_week == 4:
            return ["branch_a", "branch_b"]
        # Cu·ªëi tu·∫ßn: ch·ªâ ch·∫°y branch_a
        elif day_of_week >= 5:
            return "branch_a"
        # Ng√†y th∆∞·ªùng: ch·ªâ ch·∫°y branch_b
        else:
            return "branch_b"
    
    branching = BranchPythonOperator(
        task_id="branching",
        python_callable=choose_branch,
    )
    
    # C√°c branch tasks
    branch_a = EmptyOperator(task_id="branch_a")
    branch_b = EmptyOperator(task_id="branch_b")
    
    # Join task - ch·∫°y sau khi c√°c branches ho√†n th√†nh
    join = EmptyOperator(
        task_id="join",
        trigger_rule="none_failed_min_one_success",  # Ch·∫°y n·∫øu √≠t nh·∫•t 1 branch th√†nh c√¥ng
    )
    
    end = EmptyOperator(task_id="end")
    
    # Define dependencies
    start >> branching
    branching >> branch_a >> join
    branching >> branch_b >> join
    join >> end

# Create DAG
branching_dag_instance = branching_dag()

print("‚úÖ Branching DAG created!")
print(f"Tasks: {[task.task_id for task in branching_dag_instance.tasks]}")
print("\nüí° Branch logic:")
print("  - Monday/Friday: Run both branches")
print("  - Weekend: Run branch_a only")
print("  - Weekday: Run branch_b only")


## 4. Trigger Rules - X·ª≠ l√Ω Task Failures

Trigger rules x√°c ƒë·ªãnh khi n√†o m·ªôt task s·∫Ω ch·∫°y d·ª±a tr√™n tr·∫°ng th√°i c·ªßa upstream tasks. C√°c trigger rules ph·ªï bi·∫øn:
- `all_success` (default): T·∫•t c·∫£ upstream tasks ph·∫£i th√†nh c√¥ng
- `all_failed`: T·∫•t c·∫£ upstream tasks ph·∫£i failed
- `all_done`: Ch·∫°y khi t·∫•t c·∫£ upstream tasks ho√†n th√†nh (b·∫•t k·ªÉ status)
- `one_failed`: Ch·∫°y khi √≠t nh·∫•t 1 upstream task failed
- `one_success`: Ch·∫°y khi √≠t nh·∫•t 1 upstream task th√†nh c√¥ng
- `none_failed`: Ch·∫°y khi kh√¥ng c√≥ upstream task n√†o failed (success ho·∫∑c skipped)
- `none_failed_min_one_success`: Ch·∫°y khi kh√¥ng c√≥ failed v√† √≠t nh·∫•t 1 success
- `none_skipped`: Ch·∫°y khi kh√¥ng c√≥ upstream task n√†o skipped


In [None]:
# DAG v·ªõi Trigger Rules
@dag(
    dag_id="trigger_rules_example",
    schedule=None,
    start_date=pendulum.datetime(2024, 1, 1, tz="UTC"),
    catchup=False,
    tags=["trigger-rules", "error-handling"],
)
def trigger_rules_dag():
    """
    ### Trigger Rules Example
    DAG m·∫´u minh h·ªça c√°c trigger rules kh√°c nhau.
    """
    
    start = EmptyOperator(task_id="start")
    
    # Task c√≥ th·ªÉ fail
    def task_that_might_fail(**context):
        """Task c√≥ th·ªÉ fail d·ª±a tr√™n ƒëi·ªÅu ki·ªán"""
        import random
        if random.random() < 0.5:
            raise Exception("Random failure occurred")
        return "Success"
    
    might_fail = PythonOperator(
        task_id="might_fail",
        python_callable=task_that_might_fail,
    )
    
    # Task lu√¥n th√†nh c√¥ng
    always_succeed = EmptyOperator(task_id="always_succeed")
    
    # Task v·ªõi trigger_rule="all_done" - ch·∫°y b·∫•t k·ªÉ upstream status
    cleanup_all_done = EmptyOperator(
        task_id="cleanup_all_done",
        trigger_rule="all_done",
    )
    
    # Task v·ªõi trigger_rule="one_failed" - ch·∫°y n·∫øu c√≥ task failed
    handle_failure = EmptyOperator(
        task_id="handle_failure",
        trigger_rule="one_failed",
    )
    
    # Task v·ªõi trigger_rule="none_failed_min_one_success" - ch·∫°y n·∫øu kh√¥ng failed v√† c√≥ success
    final_task = EmptyOperator(
        task_id="final_task",
        trigger_rule="none_failed_min_one_success",
    )
    
    end = EmptyOperator(
        task_id="end",
        trigger_rule="all_done",  # Lu√¥n ch·∫°y
    )
    
    # Define dependencies
    start >> [might_fail, always_succeed]
    [might_fail, always_succeed] >> cleanup_all_done
    might_fail >> handle_failure
    [cleanup_all_done, handle_failure] >> final_task >> end

# Create DAG
trigger_rules_dag_instance = trigger_rules_dag()

print("‚úÖ Trigger Rules DAG created!")
print(f"Tasks: {[task.task_id for task in trigger_rules_dag_instance.tasks]}")
print("\nüìä Trigger Rules:")
print("  - cleanup_all_done: all_done (ch·∫°y b·∫•t k·ªÉ status)")
print("  - handle_failure: one_failed (ch·∫°y n·∫øu c√≥ failure)")
print("  - final_task: none_failed_min_one_success (ch·∫°y n·∫øu kh√¥ng failed v√† c√≥ success)")
print("  - end: all_done (lu√¥n ch·∫°y)")


## 5. Dynamic Task Mapping - T·∫°o Tasks ƒê·ªông

Dynamic Task Mapping cho ph√©p t·∫°o nhi·ªÅu task instances t·ª´ m·ªôt task definition d·ª±a tr√™n input data. R·∫•t h·ªØu √≠ch cho:
- Processing multiple files
- Batch operations
- Parallel processing


In [None]:
# DAG v·ªõi Dynamic Task Mapping
@dag(
    dag_id="dynamic_task_mapping_example",
    schedule=None,
    start_date=pendulum.datetime(2024, 1, 1, tz="UTC"),
    catchup=False,
    tags=["dynamic", "mapping"],
)
def dynamic_mapping_dag():
    """
    ### Dynamic Task Mapping Example
    DAG m·∫´u s·ª≠ d·ª•ng dynamic task mapping ƒë·ªÉ t·∫°o tasks ƒë·ªông.
    """
    
    # Task t·∫°o list c√°c items c·∫ßn process
    @task
    def get_items_to_process():
        """Tr·∫£ v·ªÅ list c√°c items c·∫ßn process"""
        items = [
            {"id": 1, "name": "item_1", "value": 100},
            {"id": 2, "name": "item_2", "value": 200},
            {"id": 3, "name": "item_3", "value": 300},
        ]
        print(f"Generated {len(items)} items to process")
        return items
    
    # Task v·ªõi dynamic mapping - s·∫Ω t·∫°o nhi·ªÅu task instances
    @task
    def process_item(item: dict):
        """Process m·ªôt item - s·∫Ω ƒë∆∞·ª£c map cho m·ªói item"""
        print(f"Processing {item['name']} with value {item['value']}")
        result = item['value'] * 2
        print(f"Result: {result}")
        return {"item_id": item['id'], "result": result}
    
    # Task t·ªïng h·ª£p k·∫øt qu·∫£
    @task
    def aggregate_results(results: list):
        """T·ªïng h·ª£p k·∫øt qu·∫£ t·ª´ t·∫•t c·∫£ tasks"""
        total = sum(r['result'] for r in results)
        print(f"Aggregated total: {total}")
        return total
    
    # Define workflow v·ªõi dynamic mapping
    items = get_items_to_process()
    # .expand() t·∫°o dynamic task instances cho m·ªói item
    processed_items = process_item.expand(item=items)
    aggregate_results(processed_items)

# Create DAG
dynamic_mapping_dag_instance = dynamic_mapping_dag()

print("‚úÖ Dynamic Task Mapping DAG created!")
print("\nüí° Dynamic mapping s·∫Ω t·∫°o:")
print("  - 1 task instance cho get_items_to_process")
print("  - 3 task instances cho process_item (m·ªôt cho m·ªói item)")
print("  - 1 task instance cho aggregate_results")


## 6. TaskGroups - T·ªï ch·ª©c Tasks

TaskGroups cho ph√©p nh√≥m c√°c tasks l·∫°i v·ªõi nhau ƒë·ªÉ:
- C·∫£i thi·ªán visualization trong UI
- T·ªï ch·ª©c code t·ªët h∆°n
- √Åp d·ª•ng settings cho nh√≥m tasks
- T·∫°o sub-workflows


In [None]:
# DAG v·ªõi TaskGroups
@dag(
    dag_id="taskgroup_example",
    schedule=None,
    start_date=pendulum.datetime(2024, 1, 1, tz="UTC"),
    catchup=False,
    tags=["taskgroup", "organization"],
)
def taskgroup_dag():
    """
    ### TaskGroup Example
    DAG m·∫´u s·ª≠ d·ª•ng TaskGroups ƒë·ªÉ t·ªï ch·ª©c tasks.
    """
    
    start = EmptyOperator(task_id="start")
    
    # TaskGroup 1: Data Extraction
    with TaskGroup("extract_group") as extract_group:
        extract_a = EmptyOperator(task_id="extract_source_a")
        extract_b = EmptyOperator(task_id="extract_source_b")
        extract_c = EmptyOperator(task_id="extract_source_c")
        
        # Dependencies trong group
        extract_a >> [extract_b, extract_c]
    
    # TaskGroup 2: Data Transformation
    with TaskGroup("transform_group") as transform_group:
        transform_a = EmptyOperator(task_id="transform_a")
        transform_b = EmptyOperator(task_id="transform_b")
        
        transform_a >> transform_b
    
    # TaskGroup 3: Data Loading
    with TaskGroup("load_group") as load_group:
        load_a = EmptyOperator(task_id="load_destination_a")
        load_b = EmptyOperator(task_id="load_destination_b")
        
        # Parallel loading
        [load_a, load_b]
    
    end = EmptyOperator(task_id="end")
    
    # Define dependencies gi·ªØa groups
    start >> extract_group >> transform_group >> load_group >> end

# Create DAG
taskgroup_dag_instance = taskgroup_dag()

print("‚úÖ TaskGroup DAG created!")
print(f"Tasks: {[task.task_id for task in taskgroup_dag_instance.tasks]}")
print("\nüìä Task Groups:")
print("  - extract_group: extract_source_a ‚Üí [extract_source_b, extract_source_c]")
print("  - transform_group: transform_a ‚Üí transform_b")
print("  - load_group: [load_destination_a, load_destination_b] (parallel)")


## 7. Complex Branching Scenario - Real-world Example

T·∫°o m·ªôt workflow ph·ª©c t·∫°p v·ªõi nhi·ªÅu branches v√† conditions ƒë·ªÉ minh h·ªça c√°ch s·ª≠ d·ª•ng branching trong th·ª±c t·∫ø.


In [None]:
# Complex Branching Scenario
@dag(
    dag_id="complex_branching_scenario",
    schedule="@daily",
    start_date=pendulum.datetime(2024, 1, 1, tz="UTC"),
    catchup=False,
    tags=["complex", "branching", "real-world"],
)
def complex_branching_dag():
    """
    ### Complex Branching Scenario
    DAG m·∫´u v·ªõi branching logic ph·ª©c t·∫°p cho use case th·ª±c t·∫ø.
    """
    
    start = EmptyOperator(task_id="start")
    
    # Check data quality
    def check_data_quality(**context):
        """Ki·ªÉm tra ch·∫•t l∆∞·ª£ng data"""
        # Simulate data quality check
        import random
        quality_score = random.uniform(0.7, 1.0)
        
        if quality_score >= 0.9:
            return "high_quality"
        elif quality_score >= 0.8:
            return "medium_quality"
        else:
            return "low_quality"
    
    quality_check = BranchPythonOperator(
        task_id="check_data_quality",
        python_callable=check_data_quality,
    )
    
    # Different processing paths based on quality
    high_quality_process = EmptyOperator(task_id="high_quality_process")
    medium_quality_process = EmptyOperator(task_id="medium_quality_process")
    low_quality_process = EmptyOperator(task_id="low_quality_process")
    
    # Data cleaning for low quality
    clean_data = EmptyOperator(task_id="clean_data")
    low_quality_process >> clean_data
    
    # Join after processing
    join_processing = EmptyOperator(
        task_id="join_processing",
        trigger_rule="none_failed_min_one_success",
    )
    
    # Final validation
    validate_output = EmptyOperator(task_id="validate_output")
    
    # Notification based on result
    def send_notification(**context):
        """G·ª≠i notification d·ª±a tr√™n k·∫øt qu·∫£"""
        ti = context['ti']
        # Check if validation passed
        print("Sending notification...")
        return "notification_sent"
    
    send_notif = PythonOperator(
        task_id="send_notification",
        python_callable=send_notification,
    )
    
    end = EmptyOperator(
        task_id="end",
        trigger_rule="all_done",
    )
    
    # Define dependencies
    start >> quality_check
    quality_check >> high_quality_process >> join_processing
    quality_check >> medium_quality_process >> join_processing
    quality_check >> low_quality_process
    clean_data >> join_processing
    join_processing >> validate_output >> send_notif >> end

# Create DAG
complex_branching_dag_instance = complex_branching_dag()

print("‚úÖ Complex Branching DAG created!")
print(f"Tasks: {[task.task_id for task in complex_branching_dag_instance.tasks]}")
print("\nüìä Workflow:")
print("  start ‚Üí check_data_quality ‚Üí [high/medium/low_quality_process]")
print("  low_quality_process ‚Üí clean_data ‚Üí join_processing")
print("  join_processing ‚Üí validate_output ‚Üí send_notification ‚Üí end")


## 9. T√≥m t·∫Øt v√† Next Steps

### ‚úÖ Nh·ªØng g√¨ ƒë√£ h·ªçc:
1. Bitshift operators (>>, <<) - ƒê·ªãnh nghƒ©a dependencies
2. BranchPythonOperator - Conditional branching
3. Trigger rules - X·ª≠ l√Ω failures v√† skipped tasks
4. Dynamic Task Mapping - T·∫°o tasks ƒë·ªông
5. TaskGroups - T·ªï ch·ª©c tasks
6. Complex branching scenarios - Use cases th·ª±c t·∫ø
7. Best practices v√† common pitfalls

### üìö Next Lab:
- **Lab 5**: XCom v√† Data Sharing
- XCom push/pull
- Task return values
- Custom XCom backends
- Data passing best practices

### üîó Useful Links:
- [Task Dependencies](https://airflow.apache.org/docs/apache-airflow/3.1.1/core-concepts/dags.html#task-dependencies)
- [Branching](https://airflow.apache.org/docs/apache-airflow/3.1.1/core-concepts/dags.html#branching)
- [Trigger Rules](https://airflow.apache.org/docs/apache-airflow/3.1.1/core-concepts/tasks.html#trigger-rules)
- [Dynamic Task Mapping](https://airflow.apache.org/docs/apache-airflow/3.1.1/core-concepts/dynamic-task-mapping.html)
- [TaskGroups](https://airflow.apache.org/docs/apache-airflow/3.1.1/core-concepts/taskgroup.html)

### üí° Exercises:
1. T·∫°o DAG v·ªõi branching d·ª±a tr√™n data quality
2. S·ª≠ d·ª•ng dynamic mapping ƒë·ªÉ process multiple files
3. T·∫°o TaskGroups cho ETL pipeline
4. Implement error handling v·ªõi trigger rules
5. X√¢y d·ª±ng complex workflow v·ªõi nhi·ªÅu branches
