# Resilience & Performance Patterns

Essential patterns for building fault-tolerant, high-performance backend systems.

```
┌─────────────────────────────────────────────────────────────────┐
│                    RESILIENCE PATTERNS                          │
├─────────────────┬─────────────────┬─────────────────────────────┤
│  Circuit Breaker│   Retry/Backoff │   Timeout/Deadline          │
│  ───────────────│   ─────────────│   ──────────────────         │
│  Stop cascading │   Transient     │   Prevent hanging           │
│  failures       │   fault recovery│   requests                  │
├─────────────────┴─────────────────┴─────────────────────────────┤
│  Bulkhead          │   Fallback       │   Performance Opt       │
│  ────────          │   ────────       │   ───────────────       │
│  Isolate failures  │   Graceful       │   Pooling, Async I/O    │
│  limit blast radius│   degradation    │   throughput gains      │
└─────────────────────────────────────────────────────────────────┘
```

## 1. Circuit Breaker Pattern

Prevents cascading failures by "tripping" when a service fails repeatedly.

```
     ┌────────┐   failures > threshold   ┌────────┐
     │ CLOSED │ ────────────────────────▶│  OPEN  │
     │(normal)│                          │(reject)│
     └────────┘                          └────────┘
          ▲                                   │
          │                           timeout expires
          │     success      ┌───────────┐    ▼
          └──────────────────│ HALF-OPEN │◀───┘
                 failure ───▶│  (test)   │───▶ back to OPEN
                             └───────────┘

• CLOSED  → Requests flow normally; failures counted
• OPEN    → Requests fail-fast; no calls to downstream
• HALF-OPEN → Allow limited test requests to check recovery
```

In [None]:
import time
from enum import Enum
from functools import wraps

class CircuitState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

class CircuitBreaker:
    """Circuit Breaker implementation with configurable thresholds."""
    
    def __init__(self, failure_threshold=5, reset_timeout=30):
        self.failure_threshold = failure_threshold
        self.reset_timeout = reset_timeout
        self.state = CircuitState.CLOSED
        self.failure_count = 0
        self.last_failure_time = None
    
    def __call__(self, func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            # Check state transitions
            if self.state == CircuitState.OPEN:
                if time.time() - self.last_failure_time >= self.reset_timeout:
                    self.state = CircuitState.HALF_OPEN
                else:
                    raise CircuitOpenError(f"Circuit OPEN - failing fast")
            
            try:
                result = func(*args, **kwargs)
                self._on_success()
                return result
            except Exception as e:
                self._on_failure()
                raise
        return wrapper
    
    def _on_success(self):
        self.state = CircuitState.CLOSED
        self.failure_count = 0
    
    def _on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN

class CircuitOpenError(Exception):
    pass

# Usage Example
@CircuitBreaker(failure_threshold=3, reset_timeout=10)
def call_external_api(endpoint):
    import random
    if random.random() < 0.7:  # Simulate 70% failure rate
        raise ConnectionError("Service unavailable")
    return {"status": "success"}

print("Circuit Breaker ready - threshold: 3 failures, reset: 10s")

## 2. Retry with Exponential Backoff & Timeouts

```
EXPONENTIAL BACKOFF:
Attempt │  Delay   │  With Jitter (±20%)
────────┼──────────┼─────────────────────
   1    │   1s     │   0.8s - 1.2s
   2    │   2s     │   1.6s - 2.4s
   3    │   4s     │   3.2s - 4.8s
   4    │   8s     │   6.4s - 9.6s

Formula: delay = min(base × 2^attempt + jitter, max_delay)

BEST PRACTICES:
✓ Add jitter to prevent thundering herd
✓ Set max retries (3-5 typically)
✓ Only retry idempotent operations
✓ Only retry transient errors (5xx, timeouts)
✗ Don't retry 4xx client errors
```

In [None]:
import random
import asyncio
from functools import wraps

def retry_with_backoff(max_retries=3, base_delay=1, max_delay=30, jitter=0.2):
    """Decorator for retry with exponential backoff and jitter."""
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_retries + 1):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    if attempt == max_retries:
                        raise
                    delay = min(base_delay * (2 ** attempt), max_delay)
                    delay *= (1 + random.uniform(-jitter, jitter))
                    print(f"Attempt {attempt+1} failed. Retrying in {delay:.2f}s...")
                    import time; time.sleep(delay)
        return wrapper
    return decorator

# Async timeout pattern
async def fetch_with_timeout(coro, timeout_seconds=5):
    """Execute coroutine with timeout."""
    try:
        return await asyncio.wait_for(coro, timeout=timeout_seconds)
    except asyncio.TimeoutError:
        raise TimeoutError(f"Operation timed out after {timeout_seconds}s")

@retry_with_backoff(max_retries=3, base_delay=0.1)
def flaky_operation():
    if random.random() < 0.6:
        raise ConnectionError("Temporary failure")
    return "Success!"

print("Retry and Timeout patterns ready")

## 3. Bulkhead Pattern

Isolate components to contain failures and prevent resource exhaustion.

```
WITHOUT BULKHEAD                   WITH BULKHEAD
───────────────────               ─────────────────────
┌─────────────────┐               ┌─────┬─────┬─────┐
│ Shared Pool     │               │Pool │Pool │Pool │
│ ████████████████│ ◀─ All        │ A   │ B   │ C   │
│ All services    │    exhausted  │ ███ │     │ ██  │
└─────────────────┘               └─────┴─────┴─────┘
     Total failure                   A fails, B & C work!

IMPLEMENTATIONS:
┌────────────────┬───────────────────────────────────┐
│ Thread Pool    │ Separate pools per service        │
│ Semaphore      │ Limit concurrent calls            │
│ Container/Pod  │ K8s resource limits per service   │
└────────────────┴───────────────────────────────────┘
```

In [None]:
from concurrent.futures import ThreadPoolExecutor
from threading import Semaphore

class BulkheadSemaphore:
    """Semaphore-based bulkhead limiting concurrent access."""
    
    def __init__(self, name: str, max_concurrent: int):
        self.name = name
        self.semaphore = Semaphore(max_concurrent)
    
    def __call__(self, func):
        def wrapper(*args, **kwargs):
            if not self.semaphore.acquire(blocking=False):
                raise BulkheadFullError(f"{self.name} bulkhead at capacity")
            try:
                return func(*args, **kwargs)
            finally:
                self.semaphore.release()
        return wrapper

class BulkheadFullError(Exception):
    pass

# Thread pool bulkhead - isolated pools per service
class ServiceBulkheads:
    def __init__(self):
        self.pools = {
            "payment": ThreadPoolExecutor(max_workers=10),
            "inventory": ThreadPoolExecutor(max_workers=5),
            "notification": ThreadPoolExecutor(max_workers=3),
        }
    
    def submit(self, service: str, func, *args):
        return self.pools[service].submit(func, *args)

payment_bulkhead = BulkheadSemaphore("payment", max_concurrent=5)

@payment_bulkhead
def process_payment(order_id):
    return {"order_id": order_id, "status": "processed"}

print("Bulkhead patterns ready: semaphore & thread pool isolation")

## 4. Fallback Strategies

```
FALLBACK CHAIN:
  ┌──────────┐   fail   ┌───────────┐  fail  ┌─────────┐  fail  ┌─────────┐
  │ Primary  │ ────────▶│ Secondary │ ──────▶│  Cache  │ ──────▶│ Default │
  │ Service  │          │  Service  │        │ (stale) │        │  Value  │
  └──────────┘          └───────────┘        └─────────┘        └─────────┘

STRATEGIES:
┌────────────────┬─────────────────────────────────────────────┐
│ Cache Fallback │ Return cached/stale data when service down │
│ Default Value  │ Return safe default (empty list, zero)     │
│ Alternate Svc  │ Call backup/secondary service              │
│ Queue for Later│ Accept request, process asynchronously    │
└────────────────┴─────────────────────────────────────────────┘
```

In [None]:
from functools import wraps

def with_fallback(*fallbacks):
    """Decorator that tries fallback functions on failure."""
    def decorator(primary):
        @wraps(primary)
        def wrapper(*args, **kwargs):
            try:
                return primary(*args, **kwargs)
            except Exception as e:
                print(f"Primary failed: {e}")
            
            for i, fallback in enumerate(fallbacks):
                try:
                    return fallback(*args, **kwargs)
                except Exception as e:
                    print(f"Fallback {i+1} failed: {e}")
            
            raise RuntimeError("All fallbacks exhausted")
        return wrapper
    return decorator

# Example: Product lookup with fallback chain
_cache = {"prod_123": {"name": "Widget", "price": 29.99, "cached": True}}

def from_primary(pid): raise ConnectionError("DB down")
def from_replica(pid): raise ConnectionError("Replica down")
def from_cache(pid): return _cache.get(pid) or (_ for _ in ()).throw(KeyError("Miss"))
def default_val(pid): return {"name": "Unknown", "error": "Unavailable"}

@with_fallback(from_replica, from_cache, default_val)
def get_product(product_id):
    return from_primary(product_id)

result = get_product("prod_123")
print(f"Result: {result}")

## 5. Performance Optimization

```
CONNECTION POOLING:
  WITHOUT POOL              WITH POOL
  ────────────              ─────────
  Request → Connect         Request ─┐
  Request → Connect              Pool├──▶ Reuse (10-100x faster)
  Request → Connect         Request ─┘
           ↓
  50ms overhead each        ~0.1ms from pool

ASYNC I/O MODEL:
  SYNC (Blocking)           ASYNC (Non-blocking)
  ────────────────          ─────────────────────
  Thread 1: [====wait====]  Task 1: [==]     [==]
  Thread 2: [====wait====]  Task 2:    [===]    [=]
  Thread 3: [====wait====]  Task 3:      [==] [===]
       ↓                           ↓
  3 threads, 3 requests     1 thread, 3 concurrent requests
```

In [None]:
import asyncio
import queue
import threading

class ConnectionPool:
    """Simple thread-safe connection pool."""
    
    def __init__(self, create_conn, max_size=10):
        self._create = create_conn
        self._pool = queue.Queue(maxsize=max_size)
        self._size = 0
        self._lock = threading.Lock()
        self._max = max_size
    
    def acquire(self, timeout=5):
        try:
            return self._pool.get_nowait()
        except queue.Empty:
            with self._lock:
                if self._size < self._max:
                    self._size += 1
                    return self._create()
            return self._pool.get(timeout=timeout)
    
    def release(self, conn):
        self._pool.put_nowait(conn)

# Async concurrent fetch with bounded concurrency
async def fetch_all(urls, max_concurrent=10):
    """Fetch multiple URLs with semaphore-bounded concurrency."""
    semaphore = asyncio.Semaphore(max_concurrent)
    
    async def fetch_one(url):
        async with semaphore:
            await asyncio.sleep(0.1)  # Simulated I/O
            return {"url": url, "status": 200}
    
    return await asyncio.gather(*[fetch_one(u) for u in urls])

print("Performance patterns summary:")
print("─" * 40)
print("│ Connection Pool  │ 10-100x faster    │")
print("│ Async I/O        │ 10x+ throughput   │")
print("│ Batching         │ Reduce round trips│")
print("─" * 40)

## Quick Reference

| Pattern | When to Use | Key Libraries |
|---------|-------------|---------------|
| **Circuit Breaker** | External service calls | `pybreaker`, `resilience4j` |
| **Retry + Backoff** | Transient failures | `tenacity`, `backoff` |
| **Timeout** | Any I/O operation | `asyncio.wait_for` |
| **Bulkhead** | Resource isolation | Thread pools, semaphores |
| **Fallback** | Graceful degradation | Custom decorators |
| **Connection Pool** | DB/HTTP clients | `sqlalchemy`, `httpx` |
| **Async I/O** | High concurrency I/O | `asyncio`, `aiohttp` |

**Key Takeaways:**
- Always set **timeouts** on external calls
- Combine **circuit breaker + retry** for robust external calls  
- Use **bulkheads** to isolate failure domains
- **Pool connections** to reduce latency overhead
- **Async I/O** for I/O-bound workloads with high concurrency