In [None]:
```xml
<VSCode.Cell language="markdown">
# Distributed Patterns for Microservices

Essential patterns for managing distributed transactions, resilience, and observability in microservices.

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                    DISTRIBUTED PATTERNS                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   TRANSACTIONS           RESILIENCE              OBSERVABILITY              │
│   ────────────           ──────────              ─────────────              │
│                                                                              │
│   ┌────────────┐         ┌────────────┐          ┌────────────┐             │
│   │   SAGA     │         │  Circuit   │          │ Distributed│             │
│   │  Pattern   │         │  Breaker   │          │  Tracing   │             │
│   └────────────┘         └────────────┘          └────────────┘             │
│                                                                              │
│   ┌────────────┐         ┌────────────┐          ┌────────────┐             │
│   │ Two-Phase  │         │  Bulkhead  │          │ Correlation│             │
│   │  Commit    │         │   Pattern  │          │    IDs     │             │
│   └────────────┘         └────────────┘          └────────────┘             │
│                                                                              │
│   ┌────────────┐         ┌────────────┐          ┌────────────┐             │
│   │Idempotency │         │   Retry    │          │  Metrics   │             │
│   │   Keys     │         │ + Backoff  │          │  & Logs    │             │
│   └────────────┘         └────────────┘          └────────────┘             │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```
</VSCode.Cell>
<VSCode.Cell language="markdown">
## The Saga Pattern

Managing distributed transactions without two-phase commit.

```
PROBLEM: Cross-service transactions

  Order Service           Payment Service          Inventory Service
       │                        │                        │
       │   ┌─ Transaction ─┐    │                        │
       │   │ Create Order  │    │                        │
       │   │ Charge Card ──┼────┼─▶ Fails!               │
       │   │ Reserve Stock ┼────┼────────────────────────┼─▶ Never happens
       │   └───────────────┘    │                        │
       │                        │                        │
       ▼                        ▼                        ▼
  Order created           Money charged            Stock not reserved
  but invalid!            but no order!            INCONSISTENT!


SOLUTION: Saga - sequence of local transactions with compensating actions

SAGA TYPES:
┌─────────────────────────────────────────────────────────────────────────────┐
│                                                                              │
│  CHOREOGRAPHY                         ORCHESTRATION                         │
│  ─────────────                        ─────────────                         │
│                                                                              │
│  Services react to events             Central orchestrator coordinates      │
│                                                                              │
│  Order ──event──▶ Payment             Orchestrator                          │
│    │                 │                     │                                │
│    │              event                    ├──▶ Order                       │
│    │                 │                     ├──▶ Payment                     │
│    └◀── event ◀──────┘                     └──▶ Inventory                   │
│         (decentralized)                        (centralized control)        │
│                                                                              │
│  ✓ Simple, decoupled                  ✓ Easier to understand                │
│  ✗ Hard to track, debug               ✓ Explicit failure handling           │
│  ✗ Cyclic dependencies risk           ✗ Single point of failure             │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```
</VSCode.Cell>
<VSCode.Cell language="markdown">
## Orchestration Saga Example

```
ORDER SAGA: Create Order → Reserve Inventory → Charge Payment → Confirm

HAPPY PATH:
┌───────────────────────────────────────────────────────────────────────────┐
│                                                                            │
│  ┌────────────┐     ┌────────────┐     ┌────────────┐     ┌────────────┐  │
│  │   Create   │────▶│  Reserve   │────▶│   Charge   │────▶│  Confirm   │  │
│  │   Order    │     │  Inventory │     │  Payment   │     │   Order    │  │
│  └────────────┘     └────────────┘     └────────────┘     └────────────┘  │
│                                                                            │
└───────────────────────────────────────────────────────────────────────────┘

FAILURE + COMPENSATION (Payment fails):
┌───────────────────────────────────────────────────────────────────────────┐
│                                                                            │
│  ┌────────────┐     ┌────────────┐     ┌────────────┐                     │
│  │   Create   │────▶│  Reserve   │────▶│   Charge   │──── ✗ FAILS         │
│  │   Order    │     │  Inventory │     │  Payment   │                     │
│  └────────────┘     └────────────┘     └────────────┘                     │
│        │                  │                                                │
│        │                  │          COMPENSATING ACTIONS                  │
│        │                  │          ────────────────────                  │
│        ▼                  ▼                                                │
│  ┌────────────┐     ┌────────────┐                                        │
│  │   Cancel   │◀────│  Release   │     Rollback in reverse order          │
│  │   Order    │     │  Inventory │                                        │
│  └────────────┘     └────────────┘                                        │
│                                                                            │
└───────────────────────────────────────────────────────────────────────────┘
```
</VSCode.Cell>
<VSCode.Cell language="python">
# Saga Pattern: Orchestration Implementation

from dataclasses import dataclass, field
from enum import Enum
from typing import Callable, Dict, List, Optional, Any
from uuid import uuid4
import time

class SagaStatus(Enum):
    PENDING = "pending"
    RUNNING = "running"
    COMPLETED = "completed"
    COMPENSATING = "compensating"
    FAILED = "failed"

class StepStatus(Enum):
    PENDING = "pending"
    RUNNING = "running"
    COMPLETED = "completed"
    COMPENSATED = "compensated"
    FAILED = "failed"

@dataclass
class SagaStep:
    """A step in the saga with action and compensation."""
    name: str
    action: Callable[[Dict], Dict]
    compensation: Callable[[Dict], None]
    status: StepStatus = StepStatus.PENDING
    result: Optional[Dict] = None

@dataclass
class Saga:
    """Saga orchestrator managing distributed transaction."""
    saga_id: str
    steps: List[SagaStep]
    status: SagaStatus = SagaStatus.PENDING
    context: Dict = field(default_factory=dict)
    completed_steps: List[str] = field(default_factory=list)
    
    def execute(self) -> bool:
        """Execute saga steps in order, compensate on failure."""
        self.status = SagaStatus.RUNNING
        
        for i, step in enumerate(self.steps):
            print(f"[SAGA {self.saga_id[:8]}] Executing: {step.name}")
            step.status = StepStatus.RUNNING
            
            try:
                result = step.action(self.context)
                step.result = result
                self.context.update(result or {})
                step.status = StepStatus.COMPLETED
                self.completed_steps.append(step.name)
                print(f"[SAGA {self.saga_id[:8]}] ✓ {step.name} completed")
                
            except Exception as e:
                print(f"[SAGA {self.saga_id[:8]}] ✗ {step.name} failed: {e}")
                step.status = StepStatus.FAILED
                self._compensate(i - 1)
                return False
        
        self.status = SagaStatus.COMPLETED
        print(f"[SAGA {self.saga_id[:8]}] Saga completed successfully")
        return True
    
    def _compensate(self, from_step: int):
        """Execute compensating actions in reverse order."""
        self.status = SagaStatus.COMPENSATING
        print(f"[SAGA {self.saga_id[:8]}] Starting compensation...")
        
        for i in range(from_step, -1, -1):
            step = self.steps[i]
            if step.status == StepStatus.COMPLETED:
                print(f"[SAGA {self.saga_id[:8]}] Compensating: {step.name}")
                try:
                    step.compensation(self.context)
                    step.status = StepStatus.COMPENSATED
                    print(f"[SAGA {self.saga_id[:8]}] ✓ {step.name} compensated")
                except Exception as e:
                    print(f"[SAGA {self.saga_id[:8]}] ✗ Compensation failed: {e}")
        
        self.status = SagaStatus.FAILED

# === SERVICE SIMULATORS ===

class OrderService:
    orders = {}
    
    @classmethod
    def create_order(cls, ctx: Dict) -> Dict:
        order_id = str(uuid4())[:8]
        cls.orders[order_id] = {"id": order_id, "status": "pending", **ctx}
        return {"order_id": order_id}
    
    @classmethod
    def cancel_order(cls, ctx: Dict):
        order_id = ctx.get("order_id")
        if order_id in cls.orders:
            cls.orders[order_id]["status"] = "cancelled"

class InventoryService:
    reservations = {}
    
    @classmethod
    def reserve_stock(cls, ctx: Dict) -> Dict:
        reservation_id = str(uuid4())[:8]
        cls.reservations[reservation_id] = {
            "order_id": ctx["order_id"],
            "product_id": ctx.get("product_id"),
            "quantity": ctx.get("quantity", 1)
        }
        return {"reservation_id": reservation_id}
    
    @classmethod
    def release_stock(cls, ctx: Dict):
        reservation_id = ctx.get("reservation_id")
        if reservation_id in cls.reservations:
            del cls.reservations[reservation_id]

class PaymentService:
    payments = {}
    should_fail = False  # For testing failures
    
    @classmethod
    def charge_payment(cls, ctx: Dict) -> Dict:
        if cls.should_fail:
            raise Exception("Payment declined")
        
        payment_id = str(uuid4())[:8]
        cls.payments[payment_id] = {
            "order_id": ctx["order_id"],
            "amount": ctx.get("amount", 0)
        }
        return {"payment_id": payment_id}
    
    @classmethod
    def refund_payment(cls, ctx: Dict):
        payment_id = ctx.get("payment_id")
        if payment_id in cls.payments:
            del cls.payments[payment_id]

# === SAGA FACTORY ===

def create_order_saga(customer_id: str, product_id: str, 
                      quantity: int, amount: float) -> Saga:
    """Create an order saga with all steps."""
    
    initial_context = {
        "customer_id": customer_id,
        "product_id": product_id,
        "quantity": quantity,
        "amount": amount
    }
    
    return Saga(
        saga_id=str(uuid4()),
        context=initial_context,
        steps=[
            SagaStep(
                name="Create Order",
                action=OrderService.create_order,
                compensation=OrderService.cancel_order
            ),
            SagaStep(
                name="Reserve Inventory",
                action=InventoryService.reserve_stock,
                compensation=InventoryService.release_stock
            ),
            SagaStep(
                name="Charge Payment",
                action=PaymentService.charge_payment,
                compensation=PaymentService.refund_payment
            ),
        ]
    )

# === DEMO ===
print("=== Saga Pattern Demo ===\n")

# Successful saga
print("--- Test 1: Successful Order ---")
saga1 = create_order_saga("cust-1", "prod-1", 2, 99.99)
success = saga1.execute()
print(f"Saga succeeded: {success}")
print(f"Context: {saga1.context}\n")

# Failed saga (payment fails)
print("--- Test 2: Failed Order (Payment Declined) ---")
PaymentService.should_fail = True
saga2 = create_order_saga("cust-2", "prod-2", 1, 49.99)
success = saga2.execute()
print(f"Saga succeeded: {success}")
print(f"Final status: {saga2.status.value}")

# Reset
PaymentService.should_fail = False
</VSCode.Cell>
<VSCode.Cell language="markdown">
## Choreography Saga with Events

```python
# Event-driven saga (choreography style)

from dataclasses import dataclass
from typing import List, Callable
from datetime import datetime

# Event definitions
@dataclass
class OrderCreated:
    order_id: str
    customer_id: str
    items: list
    total: float

@dataclass  
class InventoryReserved:
    order_id: str
    reservation_id: str

@dataclass
class InventoryReservationFailed:
    order_id: str
    reason: str

@dataclass
class PaymentCompleted:
    order_id: str
    payment_id: str

@dataclass
class PaymentFailed:
    order_id: str
    reason: str

# Event handlers in each service
class InventoryEventHandler:
    def on_order_created(self, event: OrderCreated):
        try:
            reservation = self.reserve_stock(event.items)
            publish(InventoryReserved(event.order_id, reservation.id))
        except Exception as e:
            publish(InventoryReservationFailed(event.order_id, str(e)))

class PaymentEventHandler:
    def on_inventory_reserved(self, event: InventoryReserved):
        try:
            payment = self.process_payment(event.order_id)
            publish(PaymentCompleted(event.order_id, payment.id))
        except Exception as e:
            publish(PaymentFailed(event.order_id, str(e)))

class OrderEventHandler:
    def on_payment_completed(self, event: PaymentCompleted):
        self.confirm_order(event.order_id)
    
    def on_payment_failed(self, event: PaymentFailed):
        self.cancel_order(event.order_id)
        publish(OrderCancelled(event.order_id))

class InventoryCompensationHandler:
    def on_order_cancelled(self, event):
        self.release_reservation(event.order_id)
```
</VSCode.Cell>
<VSCode.Cell language="markdown">
## Circuit Breaker Pattern

Prevent cascading failures by "tripping" when a service fails repeatedly.

```
CIRCUIT BREAKER STATES:

     ┌────────┐   failures > threshold   ┌────────┐
     │ CLOSED │ ────────────────────────▶│  OPEN  │
     │(normal)│                          │(reject)│
     └────────┘                          └────────┘
          ▲                                   │
          │                           timeout expires
          │     success      ┌───────────┐    ▼
          └──────────────────│ HALF-OPEN │◀───┘
                 failure ───▶│  (test)   │───▶ back to OPEN
                             └───────────┘

STATE BEHAVIOR:
┌────────────┬─────────────────────────────────────────────────────────────┐
│ CLOSED     │ Normal operation, requests pass through, failures counted  │
├────────────┼─────────────────────────────────────────────────────────────┤
│ OPEN       │ Fail-fast, reject immediately, don't call downstream       │
├────────────┼─────────────────────────────────────────────────────────────┤
│ HALF-OPEN  │ Allow one test request to check if service recovered       │
└────────────┴─────────────────────────────────────────────────────────────┘

CONFIGURATION:
┌────────────────────────┬────────────────────────────────────────────────┐
│ failure_threshold      │ Number of failures before opening (e.g., 5)   │
├────────────────────────┼────────────────────────────────────────────────┤
│ success_threshold      │ Successes in half-open to close (e.g., 3)     │
├────────────────────────┼────────────────────────────────────────────────┤
│ timeout                │ Time in open state before half-open (e.g., 30s)│
├────────────────────────┼────────────────────────────────────────────────┤
│ failure_rate_threshold │ % of calls that fail (e.g., 50%)              │
└────────────────────────┴────────────────────────────────────────────────┘
```
</VSCode.Cell>
<VSCode.Cell language="python">
# Circuit Breaker Implementation

import time
from enum import Enum
from dataclasses import dataclass, field
from typing import Callable, Optional, Any
from functools import wraps
from datetime import datetime
import threading

class CircuitState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

@dataclass
class CircuitBreakerConfig:
    """Configuration for circuit breaker."""
    failure_threshold: int = 5
    success_threshold: int = 3
    timeout_seconds: int = 30
    half_open_max_calls: int = 3

class CircuitBreaker:
    """
    Circuit Breaker with proper state management.
    Thread-safe implementation.
    """
    
    def __init__(self, name: str, config: CircuitBreakerConfig = None):
        self.name = name
        self.config = config or CircuitBreakerConfig()
        self._state = CircuitState.CLOSED
        self._failure_count = 0
        self._success_count = 0
        self._last_failure_time: Optional[float] = None
        self._half_open_calls = 0
        self._lock = threading.Lock()
        
        # Metrics
        self.total_calls = 0
        self.total_failures = 0
        self.total_rejections = 0
    
    @property
    def state(self) -> CircuitState:
        with self._lock:
            self._check_state_transition()
            return self._state
    
    def _check_state_transition(self):
        """Check if state should transition based on timeout."""
        if self._state == CircuitState.OPEN:
            if self._last_failure_time:
                elapsed = time.time() - self._last_failure_time
                if elapsed >= self.config.timeout_seconds:
                    self._transition_to(CircuitState.HALF_OPEN)
    
    def _transition_to(self, new_state: CircuitState):
        old_state = self._state
        self._state = new_state
        
        if new_state == CircuitState.CLOSED:
            self._failure_count = 0
            self._success_count = 0
        elif new_state == CircuitState.HALF_OPEN:
            self._half_open_calls = 0
            self._success_count = 0
        elif new_state == CircuitState.OPEN:
            self._last_failure_time = time.time()
        
        print(f"[CB:{self.name}] {old_state.value} → {new_state.value}")
    
    def call(self, func: Callable, *args, **kwargs) -> Any:
        """Execute function through circuit breaker."""
        with self._lock:
            self._check_state_transition()
            
            if self._state == CircuitState.OPEN:
                self.total_rejections += 1
                raise CircuitOpenError(
                    f"Circuit {self.name} is OPEN. "
                    f"Retry after {self.config.timeout_seconds}s"
                )
            
            if self._state == CircuitState.HALF_OPEN:
                if self._half_open_calls >= self.config.half_open_max_calls:
                    raise CircuitOpenError("Half-open call limit reached")
                self._half_open_calls += 1
        
        self.total_calls += 1
        
        try:
            result = func(*args, **kwargs)
            self._on_success()
            return result
        except Exception as e:
            self._on_failure()
            raise
    
    def _on_success(self):
        with self._lock:
            if self._state == CircuitState.HALF_OPEN:
                self._success_count += 1
                if self._success_count >= self.config.success_threshold:
                    self._transition_to(CircuitState.CLOSED)
            elif self._state == CircuitState.CLOSED:
                self._failure_count = 0  # Reset on success
    
    def _on_failure(self):
        with self._lock:
            self.total_failures += 1
            self._failure_count += 1
            self._last_failure_time = time.time()
            
            if self._state == CircuitState.HALF_OPEN:
                self._transition_to(CircuitState.OPEN)
            elif self._state == CircuitState.CLOSED:
                if self._failure_count >= self.config.failure_threshold:
                    self._transition_to(CircuitState.OPEN)
    
    def get_stats(self) -> dict:
        return {
            "name": self.name,
            "state": self._state.value,
            "total_calls": self.total_calls,
            "total_failures": self.total_failures,
            "total_rejections": self.total_rejections,
            "failure_count": self._failure_count
        }

class CircuitOpenError(Exception):
    pass

# Decorator version
def circuit_breaker(name: str, config: CircuitBreakerConfig = None):
    cb = CircuitBreaker(name, config)
    
    def decorator(func: Callable):
        @wraps(func)
        def wrapper(*args, **kwargs):
            return cb.call(func, *args, **kwargs)
        wrapper.circuit_breaker = cb
        return wrapper
    return decorator

# === DEMO ===
print("=== Circuit Breaker Demo ===\n")

cb = CircuitBreaker("payment-service", CircuitBreakerConfig(
    failure_threshold=3,
    success_threshold=2,
    timeout_seconds=5
))

def unreliable_payment(succeed: bool):
    if not succeed:
        raise ConnectionError("Payment service unavailable")
    return {"status": "success"}

# Simulate failures
for i in range(5):
    try:
        cb.call(unreliable_payment, succeed=False)
    except CircuitOpenError as e:
        print(f"Call {i+1}: Rejected (circuit open)")
    except ConnectionError:
        print(f"Call {i+1}: Failed (error)")

print(f"\nCircuit state: {cb.state.value}")
print(f"Stats: {cb.get_stats()}")

# Wait for timeout and test recovery
print(f"\nWaiting for timeout ({cb.config.timeout_seconds}s)...")
time.sleep(cb.config.timeout_seconds + 1)

print(f"Circuit state after timeout: {cb.state.value}")

# Successful calls to close circuit
for i in range(3):
    try:
        result = cb.call(unreliable_payment, succeed=True)
        print(f"Recovery call {i+1}: Success")
    except Exception as e:
        print(f"Recovery call {i+1}: {e}")

print(f"\nFinal circuit state: {cb.state.value}")
</VSCode.Cell>
<VSCode.Cell language="markdown">
## Bulkhead Pattern

Isolate failures to prevent resource exhaustion.

```
BULKHEAD: Partition resources to contain failures

WITHOUT BULKHEAD                    WITH BULKHEAD
───────────────────                 ─────────────────────

┌─────────────────────┐             ┌─────────┬─────────┬─────────┐
│   Shared Thread     │             │  Pool A │  Pool B │  Pool C │
│       Pool          │             │ (10)    │ (10)    │ (10)    │
│  ████████████████   │             │ ████    │         │ ██      │
│     (50 threads)    │             │ ████    │         │         │
│                     │             └─────────┴─────────┴─────────┘
│ All threads used by │                  │
│ slow Service A      │             Service A slow,
│                     │             B and C unaffected!
│ Services B, C       │
│ starved!            │
└─────────────────────┘

BULKHEAD TYPES:
┌─────────────────────┬─────────────────────────────────────────────────────┐
│ Thread Pool         │ Separate thread pools per dependency               │
├─────────────────────┼─────────────────────────────────────────────────────┤
│ Semaphore           │ Limit concurrent calls with semaphores             │
├─────────────────────┼─────────────────────────────────────────────────────┤
│ Connection Pool     │ Separate DB/HTTP connection pools                  │
├─────────────────────┼─────────────────────────────────────────────────────┤
│ Container/Pod       │ K8s resource limits (CPU, memory)                  │
└─────────────────────┴─────────────────────────────────────────────────────┘
```
</VSCode.Cell>
<VSCode.Cell language="python">
# Bulkhead Pattern Implementation

import threading
from concurrent.futures import ThreadPoolExecutor, TimeoutError
from dataclasses import dataclass
from typing import Callable, Dict, Any
from functools import wraps
import time

class BulkheadFullError(Exception):
    """Raised when bulkhead is at capacity."""
    pass

@dataclass
class BulkheadConfig:
    max_concurrent: int = 10
    max_wait_seconds: float = 1.0

class SemaphoreBulkhead:
    """Semaphore-based bulkhead limiting concurrent access."""
    
    def __init__(self, name: str, config: BulkheadConfig = None):
        self.name = name
        self.config = config or BulkheadConfig()
        self._semaphore = threading.Semaphore(self.config.max_concurrent)
        self._active_count = 0
        self._rejected_count = 0
        self._lock = threading.Lock()
    
    def call(self, func: Callable, *args, **kwargs) -> Any:
        """Execute function within bulkhead constraints."""
        acquired = self._semaphore.acquire(
            blocking=True, 
            timeout=self.config.max_wait_seconds
        )
        
        if not acquired:
            with self._lock:
                self._rejected_count += 1
            raise BulkheadFullError(
                f"Bulkhead {self.name} at capacity "
                f"({self.config.max_concurrent} concurrent calls)"
            )
        
        with self._lock:
            self._active_count += 1
        
        try:
            return func(*args, **kwargs)
        finally:
            with self._lock:
                self._active_count -= 1
            self._semaphore.release()
    
    def get_stats(self) -> dict:
        with self._lock:
            return {
                "name": self.name,
                "max_concurrent": self.config.max_concurrent,
                "active": self._active_count,
                "rejected": self._rejected_count
            }

class ThreadPoolBulkhead:
    """Thread pool-based bulkhead with dedicated executors per service."""
    
    def __init__(self, name: str, max_workers: int = 10, queue_size: int = 100):
        self.name = name
        self.max_workers = max_workers
        self._executor = ThreadPoolExecutor(
            max_workers=max_workers,
            thread_name_prefix=f"bulkhead-{name}"
        )
        self._submitted = 0
        self._completed = 0
        self._failed = 0
    
    def submit(self, func: Callable, *args, **kwargs):
        """Submit function to dedicated thread pool."""
        self._submitted += 1
        future = self._executor.submit(func, *args, **kwargs)
        future.add_done_callback(self._on_complete)
        return future
    
    def _on_complete(self, future):
        if future.exception():
            self._failed += 1
        else:
            self._completed += 1
    
    def get_stats(self) -> dict:
        return {
            "name": self.name,
            "max_workers": self.max_workers,
            "submitted": self._submitted,
            "completed": self._completed,
            "failed": self._failed
        }
    
    def shutdown(self, wait: bool = True):
        self._executor.shutdown(wait=wait)

class BulkheadRegistry:
    """Registry for managing multiple bulkheads."""
    
    def __init__(self):
        self._bulkheads: Dict[str, SemaphoreBulkhead] = {}
    
    def get_or_create(self, name: str, config: BulkheadConfig = None) -> SemaphoreBulkhead:
        if name not in self._bulkheads:
            self._bulkheads[name] = SemaphoreBulkhead(name, config)
        return self._bulkheads[name]
    
    def get_all_stats(self) -> Dict[str, dict]:
        return {name: bh.get_stats() for name, bh in self._bulkheads.items()}

# Decorator
def bulkhead(name: str, max_concurrent: int = 10, max_wait: float = 1.0):
    _bulkhead = SemaphoreBulkhead(name, BulkheadConfig(max_concurrent, max_wait))
    
    def decorator(func: Callable):
        @wraps(func)
        def wrapper(*args, **kwargs):
            return _bulkhead.call(func, *args, **kwargs)
        wrapper.bulkhead = _bulkhead
        return wrapper
    return decorator

# === DEMO ===
print("=== Bulkhead Pattern Demo ===\n")

# Create bulkheads for different services
payment_bulkhead = SemaphoreBulkhead("payment", BulkheadConfig(max_concurrent=3))
inventory_bulkhead = SemaphoreBulkhead("inventory", BulkheadConfig(max_concurrent=5))

def slow_payment_call(order_id: str):
    time.sleep(0.5)
    return f"Payment processed for {order_id}"

def fast_inventory_call(product_id: str):
    time.sleep(0.1)
    return f"Stock checked for {product_id}"

# Simulate concurrent calls
threads = []

def make_payment_call(i):
    try:
        result = payment_bulkhead.call(slow_payment_call, f"order-{i}")
        print(f"  Payment {i}: Success")
    except BulkheadFullError:
        print(f"  Payment {i}: Rejected (bulkhead full)")

# Start 6 payment calls (bulkhead allows 3)
print("Starting 6 payment calls (bulkhead limit: 3)...")
for i in range(6):
    t = threading.Thread(target=make_payment_call, args=(i,))
    threads.append(t)
    t.start()
    time.sleep(0.1)  # Stagger starts

for t in threads:
    t.join()

print(f"\nPayment bulkhead stats: {payment_bulkhead.get_stats()}")
</VSCode.Cell>
<VSCode.Cell language="markdown">
## Retry with Exponential Backoff

```
RETRY STRATEGY: Handle transient failures gracefully

EXPONENTIAL BACKOFF:
────────────────────

  Attempt 1  ──▶ Fail ──▶ Wait 1s
  Attempt 2  ──▶ Fail ──▶ Wait 2s
  Attempt 3  ──▶ Fail ──▶ Wait 4s
  Attempt 4  ──▶ Fail ──▶ Wait 8s
  Attempt 5  ──▶ Fail ──▶ Give up

  Formula: delay = min(base * 2^attempt, max_delay)


WITH JITTER (Prevents thundering herd):
───────────────────────────────────────

  Attempt │ Base Delay │ With Jitter (±25%)
  ────────┼────────────┼────────────────────
     1    │    1s      │   0.75s - 1.25s
     2    │    2s      │   1.5s - 2.5s
     3    │    4s      │   3s - 5s
     4    │    8s      │   6s - 10s


RETRY BEST PRACTICES:
┌────────────────────────┬────────────────────────────────────────────────────┐
│ Only retry transient   │ 5xx errors, timeouts, connection errors           │
│ errors                 │ NOT 4xx client errors                             │
├────────────────────────┼────────────────────────────────────────────────────┤
│ Only retry idempotent  │ GET, PUT, DELETE (usually)                        │
│ operations             │ Careful with POST                                  │
├────────────────────────┼────────────────────────────────────────────────────┤
│ Set max retries        │ Typically 3-5                                      │
├────────────────────────┼────────────────────────────────────────────────────┤
│ Use jitter             │ Randomize delay to avoid thundering herd          │
├────────────────────────┼────────────────────────────────────────────────────┤
│ Set max delay          │ Cap the backoff (e.g., 30-60 seconds)             │
├────────────────────────┼────────────────────────────────────────────────────┤
│ Circuit breaker combo  │ Stop retrying if circuit is open                  │
└────────────────────────┴────────────────────────────────────────────────────┘
```
</VSCode.Cell>
<VSCode.Cell language="python">
# Retry with Exponential Backoff

import random
import time
from dataclasses import dataclass
from typing import Callable, Tuple, Optional, Set
from functools import wraps

@dataclass
class RetryConfig:
    """Configuration for retry behavior."""
    max_retries: int = 3
    base_delay: float = 1.0
    max_delay: float = 30.0
    jitter: float = 0.25  # ±25%
    exponential_base: float = 2.0
    retryable_exceptions: Tuple = (ConnectionError, TimeoutError)

class RetryError(Exception):
    """Raised when all retries are exhausted."""
    def __init__(self, message: str, last_exception: Exception):
        super().__init__(message)
        self.last_exception = last_exception

def calculate_delay(attempt: int, config: RetryConfig) -> float:
    """Calculate delay with exponential backoff and jitter."""
    # Exponential backoff
    delay = config.base_delay * (config.exponential_base ** attempt)
    
    # Apply max delay cap
    delay = min(delay, config.max_delay)
    
    # Apply jitter
    jitter_range = delay * config.jitter
    delay += random.uniform(-jitter_range, jitter_range)
    
    return max(0, delay)  # Ensure non-negative

def retry_with_backoff(config: RetryConfig = None):
    """Decorator for retry with exponential backoff."""
    cfg = config or RetryConfig()
    
    def decorator(func: Callable):
        @wraps(func)
        def wrapper(*args, **kwargs):
            last_exception = None
            
            for attempt in range(cfg.max_retries + 1):
                try:
                    return func(*args, **kwargs)
                except cfg.retryable_exceptions as e:
                    last_exception = e
                    
                    if attempt == cfg.max_retries:
                        raise RetryError(
                            f"All {cfg.max_retries} retries exhausted",
                            last_exception
                        )
                    
                    delay = calculate_delay(attempt, cfg)
                    print(f"  Attempt {attempt + 1} failed: {e}. "
                          f"Retrying in {delay:.2f}s...")
                    time.sleep(delay)
            
        return wrapper
    return decorator

class RetryableClient:
    """HTTP client with built-in retry logic."""
    
    def __init__(self, config: RetryConfig = None):
        self.config = config or RetryConfig()
        self.total_attempts = 0
        self.successful_attempts = 0
        self.failed_attempts = 0
    
    def call(self, func: Callable, *args, **kwargs):
        """Execute function with retry logic."""
        last_exception = None
        
        for attempt in range(self.config.max_retries + 1):
            self.total_attempts += 1
            
            try:
                result = func(*args, **kwargs)
                self.successful_attempts += 1
                return result
                
            except self.config.retryable_exceptions as e:
                last_exception = e
                self.failed_attempts += 1
                
                if attempt == self.config.max_retries:
                    break
                
                delay = calculate_delay(attempt, self.config)
                print(f"  Attempt {attempt + 1}/{self.config.max_retries + 1} "
                      f"failed. Retry in {delay:.2f}s")
                time.sleep(delay)
        
        raise RetryError(
            f"Failed after {self.config.max_retries + 1} attempts",
            last_exception
        )

# === DEMO ===
print("=== Retry with Backoff Demo ===\n")

call_count = 0

@retry_with_backoff(RetryConfig(max_retries=3, base_delay=0.5, max_delay=5.0))
def flaky_service():
    global call_count
    call_count += 1
    
    if call_count < 3:  # Fail first 2 attempts
        raise ConnectionError(f"Connection failed (attempt {call_count})")
    return {"status": "success", "attempts": call_count}

try:
    result = flaky_service()
    print(f"Success after {result['attempts']} attempts: {result}")
except RetryError as e:
    print(f"All retries failed: {e.last_exception}")

# Demo with client
print("\n--- RetryableClient Demo ---")
client = RetryableClient(RetryConfig(max_retries=2, base_delay=0.3))

attempt_counter = 0
def always_fails():
    global attempt_counter
    attempt_counter += 1
    raise TimeoutError(f"Timeout (attempt {attempt_counter})")

try:
    client.call(always_fails)
except RetryError as e:
    print(f"Final failure: {e.last_exception}")
    print(f"Stats: total={client.total_attempts}, "
          f"success={client.successful_attempts}, "
          f"failed={client.failed_attempts}")
</VSCode.Cell>
<VSCode.Cell language="markdown">
## Idempotency Keys

Ensure operations are safe to retry without duplicate side effects.

```
PROBLEM: Duplicate requests

  Client ───Request───▶ Server ───Process───▶ Database
    │                                              │
    │ ◀───Timeout (response lost)──────────────────┘
    │
    └───Retry Request───▶ Server ───Process───▶ Database
                                                   │
                                   Duplicate record! ✗


SOLUTION: Idempotency Key

  Client ───Request + Idempotency-Key: abc123───▶ Server
    │                                               │
    │                                    Check: seen abc123?
    │                                         │
    │                                    No ──┼──▶ Process, store abc123
    │                                         │
    │ ◀───Timeout────────────────────────────-┘
    │
    └───Retry + Idempotency-Key: abc123───▶ Server
                                               │
                                    Check: seen abc123?
                                         │
                                    Yes ─┼──▶ Return cached response
                                              (no duplicate!)


IMPLEMENTATION APPROACHES:
┌────────────────────────┬────────────────────────────────────────────────────┐
│ Database unique        │ Use idempotency key as unique constraint          │
│ constraint             │ Second insert fails, return original              │
├────────────────────────┼────────────────────────────────────────────────────┤
│ Idempotency store      │ Store key → response mapping                      │
│ (Redis/DB)             │ TTL-based expiration (24h typical)                │
├────────────────────────┼────────────────────────────────────────────────────┤
│ Natural idempotency    │ Some operations are naturally idempotent          │
│                        │ PUT /users/123 is idempotent                      │
├────────────────────────┼────────────────────────────────────────────────────┤
│ Optimistic locking     │ Version numbers prevent duplicate updates         │
└────────────────────────┴────────────────────────────────────────────────────┘
```
</VSCode.Cell>
<VSCode.Cell language="python">
# Idempotency Implementation

from dataclasses import dataclass, field
from typing import Optional, Dict, Any, Callable
from datetime import datetime, timedelta
from functools import wraps
import hashlib
import json
import threading

@dataclass
class IdempotencyRecord:
    """Stored idempotency record."""
    key: str
    response: Any
    status_code: int
    created_at: datetime
    expires_at: datetime
    
    def is_expired(self) -> bool:
        return datetime.utcnow() > self.expires_at

class IdempotencyStore:
    """
    In-memory idempotency store.
    In production, use Redis or database.
    """
    
    def __init__(self, ttl_hours: int = 24):
        self._store: Dict[str, IdempotencyRecord] = {}
        self._lock = threading.Lock()
        self.ttl = timedelta(hours=ttl_hours)
    
    def get(self, key: str) -> Optional[IdempotencyRecord]:
        with self._lock:
            record = self._store.get(key)
            if record and not record.is_expired():
                return record
            elif record:
                del self._store[key]
            return None
    
    def set(self, key: str, response: Any, status_code: int):
        with self._lock:
            now = datetime.utcnow()
            self._store[key] = IdempotencyRecord(
                key=key,
                response=response,
                status_code=status_code,
                created_at=now,
                expires_at=now + self.ttl
            )
    
    def exists(self, key: str) -> bool:
        return self.get(key) is not None

class IdempotentEndpoint:
    """
    Wrapper for making endpoints idempotent.
    """
    
    def __init__(self, store: IdempotencyStore = None):
        self.store = store or IdempotencyStore()
    
    def handle(
        self,
        idempotency_key: str,
        handler: Callable[[], tuple[Any, int]]
    ) -> tuple[Any, int]:
        """
        Execute handler idempotently.
        
        Args:
            idempotency_key: Unique key for this request
            handler: Function returning (response, status_code)
        
        Returns:
            (response, status_code) - from cache or handler
        """
        # Check for existing response
        existing = self.store.get(idempotency_key)
        if existing:
            print(f"[IDEMPOTENCY] Returning cached response for key: {idempotency_key[:16]}...")
            return existing.response, existing.status_code
        
        # Execute handler
        response, status_code = handler()
        
        # Store response (only for successful operations)
        if 200 <= status_code < 300:
            self.store.set(idempotency_key, response, status_code)
            print(f"[IDEMPOTENCY] Stored response for key: {idempotency_key[:16]}...")
        
        return response, status_code

def generate_idempotency_key(*args) -> str:
    """Generate deterministic key from arguments."""
    content = json.dumps(args, sort_keys=True, default=str)
    return hashlib.sha256(content.encode()).hexdigest()

# === PAYMENT SERVICE EXAMPLE ===

class PaymentService:
    """Payment service with idempotency."""
    
    def __init__(self):
        self.idempotent = IdempotentEndpoint()
        self.payments: Dict[str, dict] = {}
        self._payment_counter = 0
    
    def create_payment(
        self,
        idempotency_key: str,
        order_id: str,
        amount: float,
        currency: str = "USD"
    ) -> tuple[dict, int]:
        """
        Create payment with idempotency guarantee.
        """
        def _process_payment():
            self._payment_counter += 1
            payment_id = f"pay_{self._payment_counter}"
            
            payment = {
                "id": payment_id,
                "order_id": order_id,
                "amount": amount,
                "currency": currency,
                "status": "completed",
                "created_at": datetime.utcnow().isoformat()
            }
            self.payments[payment_id] = payment
            
            print(f"[PAYMENT] Processed payment {payment_id} for ${amount}")
            return payment, 201
        
        return self.idempotent.handle(idempotency_key, _process_payment)

# === DEMO ===
print("=== Idempotency Demo ===\n")

payment_service = PaymentService()

# Generate idempotency key from request details
key = generate_idempotency_key("order-123", 99.99, "USD")

# First request
print("Request 1 (original):")
response1, status1 = payment_service.create_payment(
    key, "order-123", 99.99, "USD"
)
print(f"  Response: {response1['id']}, Status: {status1}")

# Duplicate request (retry scenario)
print("\nRequest 2 (retry - same key):")
response2, status2 = payment_service.create_payment(
    key, "order-123", 99.99, "USD"
)
print(f"  Response: {response2['id']}, Status: {status2}")

# Verify same payment returned
print(f"\nSame payment returned: {response1['id'] == response2['id']}")
print(f"Total payments created: {len(payment_service.payments)}")

# Different request (new key)
print("\nRequest 3 (different order - new key):")
key2 = generate_idempotency_key("order-456", 49.99, "USD")
response3, status3 = payment_service.create_payment(
    key2, "order-456", 49.99, "USD"
)
print(f"  Response: {response3['id']}, Status: {status3}")
print(f"Total payments created: {len(payment_service.payments)}")
</VSCode.Cell>
<VSCode.Cell language="markdown">
## Distributed Tracing

Track requests across service boundaries.

```
DISTRIBUTED TRACING: Follow a request through multiple services

┌───────────────────────────────────────────────────────────────────────────────┐
│  Trace ID: abc-123-def                                                        │
├───────────────────────────────────────────────────────────────────────────────┤
│                                                                                │
│  [API Gateway]            0ms ████████████████████████████████████████ 250ms  │
│   │                                                                            │
│   └─[Order Service]      20ms ████████████████████████████████ 200ms          │
│      │                                                                         │
│      ├─[User Service]    40ms ████████ 80ms                                   │
│      │                                                                         │
│      ├─[Inventory Svc]   85ms ████████████ 140ms                              │
│      │                                                                         │
│      └─[Payment Svc]    145ms ████████████████ 195ms                          │
│                                                                                │
└───────────────────────────────────────────────────────────────────────────────┘

TRACE CONTEXT PROPAGATION:
┌───────────────────────────────────────────────────────────────────────────────┐
│                                                                                │
│  HTTP Headers (W3C Trace Context):                                            │
│                                                                                │
│  traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01         │
│               │  │                                  │                │        │
│               │  │                                  │                └─ flags │
│               │  │                                  └─ parent span ID         │
│               │  └─ trace ID                                                  │
│               └─ version                                                      │
│                                                                                │
│  tracestate: vendor1=value1,vendor2=value2                                    │
│                                                                                │
└───────────────────────────────────────────────────────────────────────────────┘

SPAN ATTRIBUTES:
┌────────────────────────┬────────────────────────────────────────────────────┐
│ service.name           │ order-service                                      │
├────────────────────────┼────────────────────────────────────────────────────┤
│ http.method            │ POST                                               │
├────────────────────────┼────────────────────────────────────────────────────┤
│ http.url               │ /api/orders                                        │
├────────────────────────┼────────────────────────────────────────────────────┤
│ http.status_code       │ 201                                                │
├────────────────────────┼────────────────────────────────────────────────────┤
│ db.system              │ postgresql                                         │
├────────────────────────┼────────────────────────────────────────────────────┤
│ db.statement           │ INSERT INTO orders...                              │
├────────────────────────┼────────────────────────────────────────────────────┤
│ error                  │ true/false                                         │
└────────────────────────┴────────────────────────────────────────────────────┘
```
</VSCode.Cell>
<VSCode.Cell language="python">
# Distributed Tracing Implementation

from dataclasses import dataclass, field
from typing import Optional, Dict, List, Any
from datetime import datetime
from uuid import uuid4
from contextlib import contextmanager
import threading
import time

@dataclass
class Span:
    """A single span in a trace."""
    trace_id: str
    span_id: str
    parent_span_id: Optional[str]
    operation_name: str
    service_name: str
    start_time: datetime
    end_time: Optional[datetime] = None
    duration_ms: Optional[float] = None
    status: str = "OK"
    tags: Dict[str, Any] = field(default_factory=dict)
    logs: List[Dict] = field(default_factory=list)
    
    def finish(self):
        self.end_time = datetime.utcnow()
        self.duration_ms = (self.end_time - self.start_time).total_seconds() * 1000
    
    def set_tag(self, key: str, value: Any):
        self.tags[key] = value
    
    def log(self, message: str, **kwargs):
        self.logs.append({
            "timestamp": datetime.utcnow().isoformat(),
            "message": message,
            **kwargs
        })
    
    def set_error(self, error: Exception):
        self.status = "ERROR"
        self.set_tag("error", True)
        self.set_tag("error.message", str(error))
        self.set_tag("error.type", type(error).__name__)

class TraceContext:
    """Thread-local trace context."""
    _local = threading.local()
    
    @classmethod
    def get_current_span(cls) -> Optional[Span]:
        return getattr(cls._local, 'current_span', None)
    
    @classmethod
    def set_current_span(cls, span: Optional[Span]):
        cls._local.current_span = span
    
    @classmethod
    def get_trace_id(cls) -> Optional[str]:
        span = cls.get_current_span()
        return span.trace_id if span else None

class Tracer:
    """Simple distributed tracer."""
    
    def __init__(self, service_name: str, collector=None):
        self.service_name = service_name
        self.spans: List[Span] = []
        self._collector = collector  # Would send to Jaeger/Zipkin
    
    @contextmanager
    def start_span(self, operation_name: str, parent: Optional[Span] = None):
        """Start a new span as a context manager."""
        # Get or create trace context
        current_span = parent or TraceContext.get_current_span()
        
        trace_id = current_span.trace_id if current_span else str(uuid4())[:32]
        parent_span_id = current_span.span_id if current_span else None
        
        span = Span(
            trace_id=trace_id,
            span_id=str(uuid4())[:16],
            parent_span_id=parent_span_id,
            operation_name=operation_name,
            service_name=self.service_name,
            start_time=datetime.utcnow()
        )
        
        # Set as current span
        previous_span = TraceContext.get_current_span()
        TraceContext.set_current_span(span)
        
        try:
            yield span
        except Exception as e:
            span.set_error(e)
            raise
        finally:
            span.finish()
            self.spans.append(span)
            TraceContext.set_current_span(previous_span)
            
            # Export span
            if self._collector:
                self._collector.collect(span)
    
    def inject_headers(self, headers: Dict[str, str]):
        """Inject trace context into outgoing request headers."""
        span = TraceContext.get_current_span()
        if span:
            # W3C Trace Context format
            headers['traceparent'] = f"00-{span.trace_id}-{span.span_id}-01"
    
    def extract_context(self, headers: Dict[str, str]) -> Optional[Dict]:
        """Extract trace context from incoming request headers."""
        traceparent = headers.get('traceparent')
        if traceparent:
            parts = traceparent.split('-')
            if len(parts) >= 3:
                return {
                    'trace_id': parts[1],
                    'parent_span_id': parts[2]
                }
        return None
    
    def print_trace(self):
        """Print trace visualization."""
        if not self.spans:
            return
        
        trace_id = self.spans[0].trace_id
        print(f"\n=== Trace: {trace_id} ===\n")
        
        # Sort by start time
        sorted_spans = sorted(self.spans, key=lambda s: s.start_time)
        
        for span in sorted_spans:
            indent = "  " if span.parent_span_id else ""
            if span.parent_span_id:
                indent = "  └─"
            
            status_icon = "✓" if span.status == "OK" else "✗"
            print(f"{indent}[{span.service_name}] {span.operation_name} "
                  f"({span.duration_ms:.1f}ms) {status_icon}")

# === DEMO: Multi-Service Tracing ===
print("=== Distributed Tracing Demo ===")

# Create tracers for different services
gateway_tracer = Tracer("api-gateway")
order_tracer = Tracer("order-service")
payment_tracer = Tracer("payment-service")

# Simulate request flow
def simulate_request():
    # API Gateway receives request
    with gateway_tracer.start_span("HTTP GET /orders/123") as gateway_span:
        gateway_span.set_tag("http.method", "GET")
        gateway_span.set_tag("http.url", "/orders/123")
        time.sleep(0.01)
        
        # Forward to Order Service
        with order_tracer.start_span("GetOrder") as order_span:
            order_span.set_tag("order.id", "123")
            time.sleep(0.02)
            order_span.log("Fetching order from database")
            
            # Call Payment Service
            with payment_tracer.start_span("GetPaymentStatus") as payment_span:
                payment_span.set_tag("payment.order_id", "123")
                time.sleep(0.015)
                payment_span.log("Payment status: completed")
        
        gateway_span.set_tag("http.status_code", 200)

simulate_request()

# Print traces
gateway_tracer.print_trace()

# Show all spans
print("\n=== All Spans ===")
all_spans = gateway_tracer.spans + order_tracer.spans + payment_tracer.spans
for span in all_spans:
    print(f"  {span.service_name}/{span.operation_name}: "
          f"{span.duration_ms:.1f}ms, tags={span.tags}")
</VSCode.Cell>
<VSCode.Cell language="markdown">
## Combining Patterns: Resilient Service Call

```
COMPLETE RESILIENCE STACK:

  Request
     │
     ▼
  ┌──────────────────────────────────────────────────────────────┐
  │                    RATE LIMITER                               │
  │        (Prevent overload - token bucket/sliding window)       │
  └──────────────────────────────┬───────────────────────────────┘
                                 │
                                 ▼
  ┌──────────────────────────────────────────────────────────────┐
  │                      BULKHEAD                                 │
  │              (Isolate - semaphore/thread pool)                │
  └──────────────────────────────┬───────────────────────────────┘
                                 │
                                 ▼
  ┌──────────────────────────────────────────────────────────────┐
  │                   CIRCUIT BREAKER                             │
  │            (Fail fast when service is down)                   │
  └──────────────────────────────┬───────────────────────────────┘
                                 │
                                 ▼
  ┌──────────────────────────────────────────────────────────────┐
  │                 RETRY + BACKOFF                               │
  │           (Handle transient failures)                         │
  └──────────────────────────────┬───────────────────────────────┘
                                 │
                                 ▼
  ┌──────────────────────────────────────────────────────────────┐
  │                    TIMEOUT                                    │
  │              (Don't wait forever)                             │
  └──────────────────────────────┬───────────────────────────────┘
                                 │
                                 ▼
                          Actual Call
```
</VSCode.Cell>
<VSCode.Cell language="python">
# Combined Resilience Patterns

from dataclasses import dataclass
from typing import Callable, Any, Optional
from functools import wraps
import time
import threading

@dataclass
class ResilienceConfig:
    """Combined configuration for all resilience patterns."""
    # Circuit breaker
    cb_failure_threshold: int = 5
    cb_reset_timeout: int = 30
    
    # Retry
    max_retries: int = 3
    retry_base_delay: float = 1.0
    retry_max_delay: float = 10.0
    
    # Bulkhead
    max_concurrent: int = 10
    
    # Timeout
    timeout_seconds: float = 5.0

class ResilientClient:
    """
    Client with full resilience stack:
    Bulkhead → Circuit Breaker → Retry → Timeout
    """
    
    def __init__(self, name: str, config: ResilienceConfig = None):
        self.name = name
        self.config = config or ResilienceConfig()
        
        # Circuit breaker state
        self._cb_state = "closed"
        self._cb_failures = 0
        self._cb_last_failure = 0
        
        # Bulkhead
        self._semaphore = threading.Semaphore(self.config.max_concurrent)
        
        # Metrics
        self.metrics = {
            "total_calls": 0,
            "successful_calls": 0,
            "failed_calls": 0,
            "circuit_opens": 0,
            "bulkhead_rejections": 0,
            "retries": 0
        }
        self._lock = threading.Lock()
    
    def call(self, func: Callable, *args, **kwargs) -> Any:
        """Execute function with all resilience patterns."""
        self.metrics["total_calls"] += 1
        
        # 1. Bulkhead check
        if not self._semaphore.acquire(blocking=False):
            self.metrics["bulkhead_rejections"] += 1
            raise BulkheadFullError(f"{self.name}: Bulkhead at capacity")
        
        try:
            # 2. Circuit breaker check
            if not self._check_circuit():
                raise CircuitOpenError(f"{self.name}: Circuit is open")
            
            # 3. Retry with backoff
            return self._retry_call(func, *args, **kwargs)
            
        finally:
            self._semaphore.release()
    
    def _check_circuit(self) -> bool:
        """Check and update circuit breaker state."""
        with self._lock:
            if self._cb_state == "open":
                # Check if timeout has elapsed
                if time.time() - self._cb_last_failure >= self.config.cb_reset_timeout:
                    self._cb_state = "half-open"
                    return True
                return False
            return True
    
    def _record_success(self):
        with self._lock:
            self._cb_failures = 0
            if self._cb_state == "half-open":
                self._cb_state = "closed"
            self.metrics["successful_calls"] += 1
    
    def _record_failure(self):
        with self._lock:
            self._cb_failures += 1
            self._cb_last_failure = time.time()
            self.metrics["failed_calls"] += 1
            
            if self._cb_state == "half-open":
                self._cb_state = "open"
                self.metrics["circuit_opens"] += 1
            elif self._cb_failures >= self.config.cb_failure_threshold:
                self._cb_state = "open"
                self.metrics["circuit_opens"] += 1
    
    def _retry_call(self, func: Callable, *args, **kwargs) -> Any:
        """Execute with retry and timeout."""
        last_exception = None
        
        for attempt in range(self.config.max_retries + 1):
            try:
                # 4. Timeout wrapper would go here
                # In production, use asyncio.wait_for or threading timeout
                result = func(*args, **kwargs)
                self._record_success()
                return result
                
            except Exception as e:
                last_exception = e
                self._record_failure()
                
                if attempt < self.config.max_retries:
                    self.metrics["retries"] += 1
                    delay = min(
                        self.config.retry_base_delay * (2 ** attempt),
                        self.config.retry_max_delay
                    )
                    time.sleep(delay)
        
        raise last_exception
    
    def get_status(self) -> dict:
        return {
            "name": self.name,
            "circuit_state": self._cb_state,
            "metrics": self.metrics
        }

# === DEMO ===
print("=== Combined Resilience Patterns Demo ===\n")

client = ResilientClient("payment-service", ResilienceConfig(
    cb_failure_threshold=3,
    cb_reset_timeout=5,
    max_retries=2,
    max_concurrent=5
))

call_count = 0

def flaky_payment(order_id: str):
    global call_count
    call_count += 1
    
    if call_count <= 4:  # First 4 calls fail
        raise ConnectionError(f"Payment failed (call {call_count})")
    return {"status": "success", "order_id": order_id}

# Make calls
for i in range(10):
    try:
        result = client.call(flaky_payment, f"order-{i}")
        print(f"Call {i+1}: Success - {result}")
    except CircuitOpenError:
        print(f"Call {i+1}: Circuit Open (fast fail)")
    except BulkheadFullError:
        print(f"Call {i+1}: Bulkhead Full (rejected)")
    except Exception as e:
        print(f"Call {i+1}: Failed - {e}")
    
    time.sleep(0.5)

print(f"\n=== Final Status ===")
status = client.get_status()
print(f"Circuit state: {status['circuit_state']}")
print(f"Metrics: {status['metrics']}")
</VSCode.Cell>
<VSCode:Cell language="markdown">
## Summary: Pattern Selection Guide

| Pattern | Use When | Key Benefit |
|---------|----------|-------------|
| **Saga** | Multi-service transactions | Eventual consistency |
| **Circuit Breaker** | Calling unreliable services | Fail fast, prevent cascade |
| **Bulkhead** | Resource isolation needed | Limit blast radius |
| **Retry + Backoff** | Transient failures expected | Auto-recovery |
| **Idempotency** | Operations can be retried | Safe retries |
| **Distributed Tracing** | Debugging needed | Request visibility |
| **Timeout** | Preventing hung requests | Resource protection |
</VSCode.Cell>
<VSCode.Cell language="markdown">
## Further Reading

- **Topics in this Knowledge Base:**
  - [Microservices Overview](./00_overview.ipynb)
  - [Service Communication](./02_service_communication.ipynb)
  - [API Gateway & Service Mesh](./03_api_gateway_service_mesh.ipynb)
  - [Backend: Resilience Patterns](../../backend/resilience_performance/00_overview.ipynb)

- **External Resources:**
  - [Microservices Patterns (Chris Richardson)](https://microservices.io/patterns/)
  - [OpenTelemetry Documentation](https://opentelemetry.io/docs/)
  - [Resilience4j Library](https://resilience4j.readme.io/)
</VSCode.Cell>
```