# Lab 1: Airflow Basics - Introduction to Apache Airflow

## üéØ Objectives
- Understand the architecture and components of Apache Airflow
- Learn how to use Airflow Web UI
- Get familiar with Airflow CLI commands
- Create and run your first DAG

## üìã Prerequisites
- Airflow cluster is running (`docker compose up -d`)
- Access to Airflow UI at http://localhost:8080
- Basic understanding of Python and workflows

## üèóÔ∏è Airflow Architecture Overview
```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  Web UI     ‚îÇ  Port 8080 - Manage and monitor DAGs
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
      ‚îÇ
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñº‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ Scheduler  ‚îÇ  Schedule and trigger tasks
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
      ‚îÇ
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñº‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ Executor   ‚îÇ  Execute tasks (LocalExecutor)
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
      ‚îÇ
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñº‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ Metadata   ‚îÇ  PostgreSQL - Store metadata
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

## üìö Key Concepts
- **DAG (Directed Acyclic Graph)**: Workflow defined in Python
- **Task**: Smallest unit of work in a DAG
- **Operator**: Template for a task (BashOperator, PythonOperator, etc.)
- **Scheduler**: Component that schedules and triggers DAG runs
- **Executor**: Component that executes tasks


## 1. Check Airflow Installation


In [None]:
# Check Airflow version and connection
import subprocess
import requests
import json

# Check Airflow CLI
try:
    result = subprocess.run(
        ["docker", "compose", "exec", "-T", "airflow-webserver", "airflow", "version"],
        capture_output=True,
        text=True,
        timeout=10
    )
    print("‚úÖ Airflow CLI accessible")
    print(result.stdout)
except Exception as e:
    print(f"‚ö†Ô∏è  Airflow CLI check failed: {e}")
    print("üí° Make sure Airflow is running: docker compose up -d")

# Check Airflow UI
try:
    response = requests.get("http://localhost:8080/health", timeout=5)
    if response.status_code == 200:
        print("\n‚úÖ Airflow Web UI is accessible at http://localhost:8080")
        print(f"   Health check: {response.json()}")
    else:
        print(f"\n‚ö†Ô∏è  Airflow UI returned status code: {response.status_code}")
except Exception as e:
    print(f"\n‚ö†Ô∏è  Cannot connect to Airflow UI: {e}")
    print("üí° Make sure Airflow webserver is running: docker compose up -d")


## 2. List DAGs


In [None]:
# List all DAGs
result = subprocess.run(
    ["docker", "compose", "exec", "-T", "airflow-webserver", "airflow", "dags", "list"],
    capture_output=True,
    text=True,
    timeout=10
)

print("üìã Available DAGs:")
print("=" * 60)
print(result.stdout)
print("=" * 60)

# Parse and display DAGs
lines = result.stdout.strip().split('\n')
dag_list = []
for line in lines[2:]:  # Skip header lines
    if line.strip():
        parts = line.split()
        if len(parts) >= 2:
            dag_list.append({
                'dag_id': parts[0],
                'owner': parts[1] if len(parts) > 1 else 'N/A',
                'status': parts[-1] if len(parts) > 2 else 'N/A'
            })

if dag_list:
    print(f"\n‚úÖ Found {len(dag_list)} DAG(s):")
    for dag in dag_list:
        print(f"   - {dag['dag_id']} (Owner: {dag['owner']})")
else:
    print("\n‚ö†Ô∏è  No DAGs found. Make sure DAGs are in the dags/ directory.")


## 3. View DAG Details


In [None]:
# View details of hello_world DAG
dag_id = "hello_world"

print(f"üìä DAG Details: {dag_id}")
print("=" * 60)

# Show DAG structure
result = subprocess.run(
    ["docker", "compose", "exec", "-T", "airflow-webserver", "airflow", "dags", "show", dag_id],
    capture_output=True,
    text=True,
    timeout=10
)

if result.returncode == 0:
    print(result.stdout)
else:
    print(f"‚ö†Ô∏è  DAG '{dag_id}' not found or error occurred")
    print(result.stderr)
    print("\nüí° Available DAGs:")
    subprocess.run(
        ["docker", "compose", "exec", "-T", "airflow-webserver", "airflow", "dags", "list"],
        timeout=10
    )


## 4. Using Airflow REST API

Airflow provides REST API for programmatic interaction. We will use the API to:
- Get DAG information
- Trigger DAG runs
- View task status


In [None]:
# Configure Airflow API
AIRFLOW_URL = "http://localhost:8080"
AIRFLOW_USERNAME = "airflow"
AIRFLOW_PASSWORD = "airflow"

# Helper function to call API
def airflow_api_call(endpoint, method="GET", data=None):
    """Call Airflow REST API"""
    url = f"{AIRFLOW_URL}/api/v1/{endpoint}"
    auth = (AIRFLOW_USERNAME, AIRFLOW_PASSWORD)
    
    try:
        if method == "GET":
            response = requests.get(url, auth=auth, timeout=10)
        elif method == "POST":
            response = requests.post(url, auth=auth, json=data, timeout=10)
        else:
            raise ValueError(f"Unsupported method: {method}")
        
        response.raise_for_status()
        return response.json()
    except requests.exceptions.RequestException as e:
        print(f"‚ùå API call failed: {e}")
        return None

# Get list of DAGs from API
print("üìã Getting DAGs from REST API...")
dag_list_response = airflow_api_call("dags")

if dag_list_response:
    dags = dag_list_response.get("dags", [])
    print(f"\n‚úÖ Found {len(dags)} DAG(s) via API:")
    for dag in dags[:10]:  # Show first 10
        print(f"   - {dag['dag_id']} (Is Paused: {dag.get('is_paused', False)})")
    
    if len(dags) > 10:
        print(f"   ... and {len(dags) - 10} more")


## 5. Trigger DAG Run

We can trigger a DAG run through REST API or CLI.


In [None]:
# Trigger DAG run via REST API
dag_id = "hello_world"

print(f"üöÄ Triggering DAG: {dag_id}")

# Trigger DAG
trigger_response = airflow_api_call(
    f"dags/{dag_id}/dagRuns",
    method="POST",
    data={
        "dag_run_id": f"manual_run_{pd.Timestamp.now().strftime('%Y%m%d_%H%M%S')}",
        "conf": {}
    }
)

if trigger_response:
    print(f"‚úÖ DAG run triggered successfully!")
    print(f"   DAG Run ID: {trigger_response.get('dag_run_id')}")
    print(f"   State: {trigger_response.get('state')}")
    print(f"\nüí° Check the Airflow UI to see the DAG run: http://localhost:8080")
else:
    print("‚ö†Ô∏è  Failed to trigger DAG run")


## 6. View DAG Runs and Task Status


In [None]:
import pandas as pd
import time

dag_id = "hello_world"

# Wait a bit for DAG run to be created
print("‚è≥ Waiting for DAG run to be created...")
time.sleep(2)

# Get list of DAG runs
print(f"\nüìä Getting DAG runs for: {dag_id}")
dag_runs_response = airflow_api_call(f"dags/{dag_id}/dagRuns?limit=5")

if dag_runs_response:
    dag_runs = dag_runs_response.get("dag_runs", [])
    
    if dag_runs:
        print(f"\n‚úÖ Found {len(dag_runs)} DAG run(s):")
        print("=" * 80)
        
        for run in dag_runs:
            print(f"\nDAG Run ID: {run.get('dag_run_id')}")
            print(f"  State: {run.get('state')}")
            print(f"  Start Date: {run.get('start_date')}")
            print(f"  End Date: {run.get('end_date', 'N/A')}")
            print(f"  Duration: {run.get('duration', 'N/A')} seconds")
        
        # Get task instances for the latest DAG run
        latest_run_id = dag_runs[0]['dag_run_id']
        print(f"\nüìã Task instances for DAG run: {latest_run_id}")
        
        tasks_response = airflow_api_call(
            f"dags/{dag_id}/dagRuns/{latest_run_id}/taskInstances"
        )
        
        if tasks_response:
            tasks = tasks_response.get("task_instances", [])
            print(f"\n‚úÖ Found {len(tasks)} task instance(s):")
            for task in tasks:
                print(f"   - {task.get('task_id')}: {task.get('state')}")
    else:
        print("‚ö†Ô∏è  No DAG runs found")
        print("üí° Trigger a DAG run first or wait for scheduled runs")


## 7. Summary and Next Steps

### ‚úÖ What we learned:
1. Basic Airflow architecture
2. How to use Airflow CLI
3. How to use Airflow REST API
4. How to trigger and monitor DAG runs

### üìö Next Lab:
- **Lab 2**: DAGs and Tasks - Learn how to create DAGs with Task SDK
- Create DAGs with @dag and @task decorators
- Define task dependencies
- Handle errors and retries

### üîó Useful Links:
- [Airflow Documentation](https://airflow.apache.org/docs/apache-airflow/3.1.1/)
- [Airflow Task SDK](https://airflow.apache.org/docs/apache-airflow/3.1.1/task-sdk/index.html)
- [Airflow REST API](https://airflow.apache.org/docs/apache-airflow/3.1.1/stable-rest-api-ref.html)
