<a href="https://colab.research.google.com/github/tonyjosephsebastians/AI-Design-patterns/blob/main/GROUP_1_%E2%80%94_%E2%80%9CRequests_are_slow_or_unreliable%E2%80%9D.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# üîµ GROUP 1 ‚Äî Requests Are Slow or Unreliable  
## Foundation Patterns (Start Here)

This group addresses the **most common and dangerous failure** in AI systems:

> **AI workloads are slow, flaky, and expensive ‚Äî but beginners design APIs as if they are fast and reliable.**

We will intentionally build a **bad system first**, feel the pain, and then fix it step by step.

---

## üéØ Patterns Covered in Group 1

We learn these **together** because they solve the *same class of problems*:

1. Sync vs Async Execution  
2. Long-Running Task Pattern  
3. Job / Workflow Pattern  
4. Retry + Backoff Pattern  
5. Timeout Pattern  
6. Circuit Breaker Pattern  
7. Partial Result Pattern  
8. Graceful Degradation Pattern  

---

# üß™ STEP 1 ‚Äî CREATE THE FAILURE  
**(DO THIS FIRST ‚Äî DO NOT SKIP)**

---

## üéØ Goal of Step 1

Understand **why synchronous APIs break** for AI workloads.

If you don‚Äôt *feel* this failure, the patterns will feel abstract.

---

## üß± What We Will Build (INTENTIONALLY BAD)

A naive API that:
- uploads a document
- does ‚Äúheavy AI processing‚Äù (OCR, embeddings, etc.)
- **blocks the HTTP request**

‚ö†Ô∏è This is exactly how many beginner AI APIs are built.

---

## üìå Colab Cell 1 ‚Äî Install Dependencies

```bash
!pip install fastapi uvicorn nest_asyncio


In [19]:
!pip install fastapi uvicorn nest_asyncio



In [20]:
import time
import nest_asyncio
from fastapi import FastAPI
from fastapi.responses import JSONResponse

nest_asyncio.apply()

app = FastAPI()

@app.post("/documents")
def uploadDocument():
  time.sleep(20)
  return JSONResponse({"status":"done"})

In [21]:
import uvicorn
import asyncio

config = uvicorn.Config(app, host="0.0.0.0", port=8000, loop="asyncio")
server = uvicorn.Server(config)
asyncio.run(server.serve())

INFO:     Started server process [259]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [259]


KeyboardInterrupt: 

#PATTERN #1 ‚Äî Sync vs Async Execution

Anything that may take more than a few seconds must be asynchronous.

Why Sync APIs Fail for AI

Sync APIs assume:

fast execution

reliable downstream services

no retries

AI workloads are:

slow

flaky

retry-prone

rate-limited (429s)

‚ùå Sync + AI = broken system

Type of Work	API Style
Chat responses	Sync (often streaming)
OCR, embeddings, indexing	Async
Agent workflows	Async
Batch extraction	Async

Async by default for AI pipelines

STEP 2 ‚Äî APPLY THE FIRST FIX

Introduce an Async Boundary
üéØ Goal of Step 2

Return immediately to the client

Move heavy work out of the request lifecycle

This introduces two patterns at once:

Sync vs Async Execution

Long-Running Task Pattern

In [None]:
import uuid
import time
from fastapi import BackgroundTasks

jobs = {}  # in-memory job store (temporary)

@app.post("/documents")
def uploadDocument(background_tasks: BackgroundTasks):
  jobId = str(uuid.uuid4())
  jobs[jobId] = "pending"
  background_tasks.add_task(processDocument, jobId)
  return JSONResponse({
        "job_id": job_id,
        "status": "queued"
    })


def process_document(job_id: str):
    jobs[job_id]["status"] = "running"
    time.sleep(20)  # simulate heavy AI work
    jobs[job_id]["status"] = "done"

In [None]:
@app.get("/jobs/{job_id}")
def get_job_status(job_id: str):
      return jobs.get(job_id, {"error": "job not found"})


PATTERNS YOU JUST LEARNED
‚úÖ Pattern 1 ‚Äî Sync vs Async Execution

API responds immediately

Work continues in background

‚úÖ Pattern 2 ‚Äî Long-Running Task Pattern

Heavy AI work detached from HTTP request

No blocking of workers

‚úÖ Pattern 3 ‚Äî Job / Workflow Pattern

Explicit job identifier

Client polls job status

Work is trackable

Current limitations:

‚ùå No failure handling

‚ùå No retries

‚ùå No timeout protection

‚ùå No circuit breaker

‚ùå No progress tracking

# üîµ GROUP 1 ‚Äî Step 3  
## Job States + Failure Handling  
**(Workflow Pattern + State Pattern ‚Äî GoF)**

---

## üß† Why Step 3 exists (Failure First)

Right now, our async version stores only:

```python
jobs[job_id] = {"status": "queued"}


What breaks?

If background processing crashes, the job can get stuck forever

There is no progress visibility

Failures are not captured

No clear lifecycle guarantees

So we need a real job lifecycle (state machine).

e will:

Define explicit job states

Track progress

Capture errors

Enforce a simple, predictable lifecycle

Patterns learned here:

‚úÖ Job / Workflow Pattern

‚úÖ State Pattern (GoF)

‚úÖ Partial Result (foundation via progress reporting)

Job State Machine

Production-friendly lifecycle:

queued ‚Üí running ‚Üí succeeded
              ‚Üò failed


(We can add cancelled later.)

In [None]:
from enum import Enum

class JobStatus(str, Enum):
    QUEUED = "queued"
    RUNNING = "running"
    SUCCEEDED = "succeeded"
    FAILED = "failed"


In [None]:
import uuid

jobs = {}

def create_job():
    job_id = str(uuid.uuid4())
    jobs[job_id] = {
        "id": job_id,
        "status": JobStatus.QUEUED,
        "progress": 0,
        "error": None
    }
    return job_id


In [None]:
import time

def process_document(job_id: str):
    try:
        jobs[job_id]["status"] = JobStatus.RUNNING
        jobs[job_id]["progress"] = 10

        # Simulate work steps (OCR, chunking, embeddings...)
        time.sleep(2)
        jobs[job_id]["progress"] = 40

        time.sleep(2)
        jobs[job_id]["progress"] = 70

        # Simulate a deterministic failure for learning
        if job_id.endswith("7"):
            raise RuntimeError("Embedding service failed (simulated)")

        time.sleep(2)
        jobs[job_id]["progress"] = 100
        jobs[job_id]["status"] = JobStatus.SUCCEEDED

    except Exception as e:
        jobs[job_id]["status"] = JobStatus.FAILED
        jobs[job_id]["error"] = str(e)


In [None]:
from fastapi import FastAPI, BackgroundTasks, HTTPException

app = FastAPI()

@app.post("/documents")
def upload_document(background_tasks: BackgroundTasks):
    job_id = create_job()
    background_tasks.add_task(process_document, job_id)
    return {"job_id": job_id, "status": jobs[job_id]["status"]}

@app.get("/jobs/{job_id}")
def get_job(job_id: str):
    job = jobs.get(job_id)
    if not job:
        raise HTTPException(status_code=404, detail="Job not found")
    return job


ob / Workflow Pattern

Long-running operations become jobs with:

job id

status

progress

error

‚úÖ State Pattern (GoF)

Job behavior depends on state:

queued

running

succeeded

failed

‚úÖ Partial Result (foundation)

Progress reporting enables:

better UX

partial completion later

# üîµ GROUP 1 ‚Äî Step 4  
## Retry + Exponential Backoff  
**(Retry Pattern + Backoff Strategy)**

---

## üß† Why Step 4 exists (Failure First)

In Step 3, we added:
- job states (`queued/running/succeeded/failed`)
- progress updates
- error capture

But the system still fails badly in real life because AI dependencies are **flaky**:

- LLM APIs return **429 Too Many Requests**
- network hiccups happen
- embedding services fail temporarily
- search indexes sometimes time out

### ‚ùå What breaks without retries?
- One transient failure marks the job as `FAILED` permanently  
- Users re-run jobs manually (waste money + time)  
- System looks unreliable

So we need:
‚úÖ automatic retry  
‚úÖ exponential backoff  
‚úÖ a max retry limit  

---

## üéØ Goal of Step 4

We will:
1. Simulate a **flaky external service**
2. Add a **retry wrapper**
3. Use **exponential backoff**
4. Update the job status properly when retries fail

Patterns learned here:
- ‚úÖ Retry Pattern  
- ‚úÖ Exponential Backoff Strategy  

---

This function fails randomly to mimic:
- 429 throttling
- network instability

```python



In [None]:
import random

class RateLimitError(Exception):
    pass

def flaky_embedding_call():
    r = random.random()

    # ~35% chance of "429"
    if r < 0.35:
        raise RateLimitError("429 Too Many Requests (simulated)")

    # ~10% chance of other transient failure
    if r < 0.45:
        raise RuntimeError("Temporary network failure (simulated)")

    return "ok"

In [None]:
import time

def retry_with_backoff(fn, *, max_retries=5, base_delay=0.5, max_delay=8.0):
    """
    Retry Pattern + Exponential Backoff
    - base_delay grows exponentially: base_delay * 2^attempt
    - max_delay caps the sleep
    """
    attempt = 0
    last_err = None

    while attempt <= max_retries:
        try:
            return fn()
        except Exception as e:
            last_err = e
            if attempt == max_retries:
                break

            delay = min(max_delay, base_delay * (2 ** attempt))
            time.sleep(delay)
            attempt += 1

    raise last_err


In [None]:
def process_document(job_id: str):
    try:
        jobs[job_id]["status"] = JobStatus.RUNNING
        jobs[job_id]["progress"] = 10
        jobs[job_id]["error"] = None

        # Step A: OCR/chunking simulation
        time.sleep(1)
        jobs[job_id]["progress"] = 35

        # Step B: Embeddings (flaky) with retries
        # Store how many retries we used for observability
        retries_used = 0

        def call_with_count():
            nonlocal retries_used
            retries_used += 1
            return flaky_embedding_call()

        result = retry_with_backoff(call_with_count, max_retries=4, base_delay=0.5, max_delay=4.0)

        jobs[job_id]["progress"] = 75
        jobs[job_id]["retries_used"] = retries_used - 1  # first call isn't a "retry"

        # Step C: Indexing simulation
        time.sleep(1)
        jobs[job_id]["progress"] = 100
        jobs[job_id]["status"] = JobStatus.SUCCEEDED

    except Exception as e:
        jobs[job_id]["status"] = JobStatus.FAILED
        jobs[job_id]["error"] = str(e)


Patterns Learned
‚úÖ Retry Pattern

transient failures should be retried automatically

improves reliability without user intervention

‚úÖ Exponential Backoff Strategy

spacing out retries reduces load

avoids hammering services during outages/throttling

# üîµ GROUP 1 ‚Äî Step 5  
## Timeouts + Circuit Breaker  
**(Timeout Pattern + Circuit Breaker Pattern)**

---

## üß† Why Step 5 exists (Failure First)

In Step 4, we added:
- retries
- exponential backoff

But retries alone are not enough.

### ‚ùå What breaks without timeouts?
- A dependency can **hang forever** (LLM call stuck, search stuck)
- Your job stays `RUNNING` forever
- Workers get stuck and stop processing other jobs

### ‚ùå What breaks without a circuit breaker?
- If a dependency is **down** (or constantly 429/500), retries keep hammering it
- You waste time and money
- You cause cascading failures across your system

So we need:
‚úÖ **Timeouts**: ‚Äúdon‚Äôt wait forever‚Äù  
‚úÖ **Circuit breaker**: ‚Äústop calling a broken dependency temporarily‚Äù

---

## üéØ Goal of Step 5

We will:
1. Add a **timeout wrapper** around external calls
2. Add a **circuit breaker** that:
   - opens after too many failures
   - blocks calls for a cooldown period
   - attempts recovery (half-open)

Patterns learned:
- ‚úÖ Timeout Pattern
- ‚úÖ Circuit Breaker Pattern

---


In [None]:
import concurrent.futures

class TimeoutError(Exception):
    pass

_executor = concurrent.futures.ThreadPoolExecutor(max_workers=8)

def run_with_timeout(fn, timeout_seconds: float):
    """
    Timeout Pattern:
    - run fn in a thread
    - if it doesn't finish in timeout_seconds => raise TimeoutError
    """
    future = _executor.submit(fn)
    try:
        return future.result(timeout=timeout_seconds)
    except concurrent.futures.TimeoutError:
        raise TimeoutError(f"Timed out after {timeout_seconds}s")

‚úÖ Part B ‚Äî Circuit Breaker Pattern

In [None]:
import time
from dataclasses import dataclass

class CircuitOpenError(Exception):
    pass

@dataclass
class CircuitBreaker:
    failure_threshold: int = 3      # how many failures before opening
    recovery_seconds: int = 10      # how long to stay open
    half_open_trial: int = 1        # allow 1 trial call in half-open

    _failures: int = 0
    _state: str = "CLOSED"          # CLOSED, OPEN, HALF_OPEN
    _opened_at: float = 0.0
    _half_open_used: int = 0

    def allow_call(self) -> None:
        now = time.time()

        if self._state == "OPEN":
            if now - self._opened_at >= self.recovery_seconds:
                self._state = "HALF_OPEN"
                self._half_open_used = 0
            else:
                raise CircuitOpenError("Circuit is OPEN ‚Äî calls blocked temporarily")

        if self._state == "HALF_OPEN":
            if self._half_open_used >= self.half_open_trial:
                raise CircuitOpenError("Circuit HALF_OPEN ‚Äî trial already used")
            self._half_open_used += 1

    def record_success(self) -> None:
        self._failures = 0
        self._state = "CLOSED"

    def record_failure(self) -> None:
        self._failures += 1
        if self._failures >= self.failure_threshold:
            self._state = "OPEN"
            self._opened_at = time.time()


In [None]:
breaker = CircuitBreaker(failure_threshold=3, recovery_seconds=10, half_open_trial=1)

def guarded_external_call(fn, *, timeout_seconds=3.0):
    """
    Combines:
    - Circuit Breaker Pattern (stop calling when dependency is failing)
    - Timeout Pattern (avoid hanging forever)
    """
    breaker.allow_call()
    try:
        result = run_with_timeout(fn, timeout_seconds=timeout_seconds)
        breaker.record_success()
        return result
    except Exception:
        breaker.record_failure()
        raise


In [None]:
import random

def flaky_or_hanging_embedding_call():
    r = random.random()

    # 20% chance to hang longer than timeout
    if r < 0.20:
        time.sleep(10)  # will exceed timeout_seconds
        return "ok"

    # 30% chance of throttling
    if r < 0.50:
        raise RateLimitError("429 Too Many Requests (simulated)")

    # 10% chance of other transient failure
    if r < 0.60:
        raise RuntimeError("Temporary network failure (simulated)")

    return "ok"


Update your job processor to use timeout + circuit breaker

In [None]:
def process_document(job_id: str):
    try:
        jobs[job_id]["status"] = JobStatus.RUNNING
        jobs[job_id]["progress"] = 10
        jobs[job_id]["error"] = None

        # Step A: chunking simulation
        time.sleep(1)
        jobs[job_id]["progress"] = 35

        # Step B: embeddings with:
        # - circuit breaker
        # - timeout
        # - retry + backoff (from Step 4)
        retries_used = 0

        def call_embedding():
            return guarded_external_call(
                flaky_or_hanging_embedding_call,
                timeout_seconds=2.0
            )

        def call_with_count():
            nonlocal retries_used
            retries_used += 1
            return call_embedding()

        # bounded retries (still important!)
        retry_with_backoff(call_with_count, max_retries=3, base_delay=0.5, max_delay=4.0)

        jobs[job_id]["retries_used"] = max(0, retries_used - 1)
        jobs[job_id]["progress"] = 75

        # Step C: indexing simulation
        time.sleep(1)
        jobs[job_id]["progress"] = 100
        jobs[job_id]["status"] = JobStatus.SUCCEEDED

    except CircuitOpenError as e:
        jobs[job_id]["status"] = JobStatus.FAILED
        jobs[job_id]["error"] = f"CIRCUIT_OPEN: {str(e)}"

    except TimeoutError as e:
        jobs[job_id]["status"] = JobStatus.FAILED
        jobs[job_id]["error"] = f"TIMEOUT: {str(e)}"

    except Exception as e:
        jobs[job_id]["status"] = JobStatus.FAILED
        jobs[job_id]["error"] = str(e)


Patterns Learned
‚úÖ Timeout Pattern

Prevents ‚Äúhang forever‚Äù behavior

Keeps workers healthy

Forces fast failure

‚úÖ Circuit Breaker Pattern

Stops hammering a broken dependency

Protects your system during outages

Enables controlled recovery

# üîµ GROUP 1 ‚Äî Step 6  
## Partial Results + Graceful Degradation  
**(Partial Result Pattern + Graceful Degradation Pattern)**

---

## üß† Why Step 6 exists (Failure First)

In Step 5 we added:
- retries + backoff
- **timeouts**
- circuit breaker

Now the system is *safer*, but still frustrating:

### ‚ùå What breaks without partial results?
If a job fails at 80%, the user gets **nothing**, even though:
- OCR might be done
- chunking might be done
- some embeddings might be done
- indexing might be partially done

That wastes time and cost.

### ‚ùå What breaks without graceful degradation?
When dependencies fail (LLM down, embedding slow), the system should still:
- return a simpler result
- reduce features
- fall back to cached or keyword-only search
- keep UX usable

So we need:
‚úÖ store partial artifacts as we go  
‚úÖ return something useful even when the ‚Äúbest path‚Äù fails  

---

## üéØ Goal of Step 6

We will:
1. Track **stages** of the job (OCR ‚Üí chunk ‚Üí embed ‚Üí index)
2. Store **partial outputs** in the job record
3. If a stage fails, **degrade gracefully** instead of total failure
4. Make the job result explain *what it did and didn‚Äôt do*

Patterns learned:
- ‚úÖ Partial Result Pattern  
- ‚úÖ Graceful Degradation Pattern  

# üîµ GROUP 1 ‚Äî Step 6  
## Partial Results + Graceful Degradation  
**(Partial Result Pattern + Graceful Degradation Pattern)**

---

## üß† Why Step 6 exists (Failure First)

In Step 5 we added:
- retries + backoff
- **timeouts**
- circuit breaker

Now the system is *safer*, but still frustrating:

### ‚ùå What breaks without partial results?
If a job fails at 80%, the user gets **nothing**, even though:
- OCR might be done
- chunking might be done
- some embeddings might be done
- indexing might be partially done

That wastes time and cost.

### ‚ùå What breaks without graceful degradation?
When dependencies fail (LLM down, embedding slow), the system should still:
- return a simpler result
- reduce features
- fall back to cached or keyword-only search
- keep UX usable

So we need:
‚úÖ store partial artifacts as we go  
‚úÖ return something useful even when the ‚Äúbest path‚Äù fails  

---

## üéØ Goal of Step 6

We will:
1. Track **stages** of the job (OCR ‚Üí chunk ‚Üí embed ‚Üí index)
2. Store **partial outputs** in the job record
3. If a stage fails, **degrade gracefully** instead of total failure
4. Make the job result explain *what it did and didn‚Äôt do*

Patterns learned:
- ‚úÖ Partial Result Pattern  
- ‚úÖ Graceful Degradation Pattern  

In [None]:
from enum import Enum

class PipelineStage(str, Enum):
    OCR = "ocr"
    CHUNK = "chunk"
    EMBED = "embed"
    INDEX = "index"

In [None]:
def create_job():
    job_id = str(uuid.uuid4())
    jobs[job_id] = {
        "id": job_id,
        "status": JobStatus.QUEUED,
        "progress": 0,
        "error": None,
        "completed_stages": [],
        "warnings": [],
        "artifacts": {
            "text_preview": None,
            "chunks_preview": [],
            "embeddings_done": 0,
            "indexed": False,
        },
        "result": None
    }
    return job_id

Step 6C ‚Äî Stage functions (simulate partial outputs)

In [None]:
def do_ocr():
    # pretend we extracted text from a scanned PDF
    time.sleep(1)
    return "This is extracted text from the PDF. It contains clauses and definitions..."

def do_chunking(text: str):
    # pretend we chunked the text
    time.sleep(1)
    chunks = [text[i:i+40] for i in range(0, min(len(text), 160), 40)]
    return chunks

In [None]:
def embed_chunk(chunk: str):
    # use guarded call from Step 5 (timeout + circuit breaker)
    return guarded_external_call(flaky_or_hanging_embedding_call, timeout_seconds=2.0)

def index_chunks(chunks):
    # indexing simulation
    time.sleep(1)
    return True


Step 6D ‚Äî Graceful degradation rules

When embedding fails, we will:

still mark OCR + chunk done

set a warning

degrade to ‚Äúkeyword-only mode‚Äù (simulated)

mark job as succeeded with degraded mode (or ‚Äúsucceeded_with_warnings‚Äù)

To keep it simple, we‚Äôll keep status SUCCEEDED but include warnings.
(Production systems often use SUCCEEDED_WITH_WARNINGS.)

In [22]:
def process_document(job_id: str):
    try:
        jobs[job_id]["status"] = JobStatus.RUNNING
        jobs[job_id]["progress"] = 5

        # Stage 1: OCR
        text = do_ocr()
        jobs[job_id]["completed_stages"].append(PipelineStage.OCR)
        jobs[job_id]["artifacts"]["text_preview"] = text[:120]
        jobs[job_id]["progress"] = 25

        # Stage 2: Chunking
        chunks = do_chunking(text)
        jobs[job_id]["completed_stages"].append(PipelineStage.CHUNK)
        jobs[job_id]["artifacts"]["chunks_preview"] = chunks[:3]
        jobs[job_id]["progress"] = 50

        # Stage 3: Embeddings (best effort)
        embeddings_done = 0
        for c in chunks:
            try:
                # Retry (Step 4) + Timeout/Circuit (Step 5) are applied here
                retry_with_backoff(lambda: embed_chunk(c), max_retries=2, base_delay=0.5, max_delay=2.0)
                embeddings_done += 1
            except Exception as e:
                # Graceful degradation: stop embedding, continue with what we have
                jobs[job_id]["warnings"].append(
                    f"Degraded: embeddings stopped early due to error: {type(e).__name__}"
                )
                break

        jobs[job_id]["artifacts"]["embeddings_done"] = embeddings_done
        if embeddings_done > 0:
            jobs[job_id]["completed_stages"].append(PipelineStage.EMBED)
        jobs[job_id]["progress"] = 75

        # Stage 4: Indexing (only if embeddings done, otherwise degrade)
        if embeddings_done > 0:
            ok = index_chunks(chunks)
            jobs[job_id]["artifacts"]["indexed"] = bool(ok)
            jobs[job_id]["completed_stages"].append(PipelineStage.INDEX)
        else:
            # Degrade: keyword-only / basic text search mode
            jobs[job_id]["warnings"].append(
                "Degraded: index skipped; system will use keyword-only search for this document."
            )

        jobs[job_id]["progress"] = 100
        jobs[job_id]["status"] = JobStatus.SUCCEEDED

        # Final result summary (what the system achieved)
        jobs[job_id]["result"] = {
            "mode": "full_rag" if jobs[job_id]["artifacts"]["indexed"] else "keyword_only",
            "completed_stages": [s.value for s in jobs[job_id]["completed_stages"]],
            "warnings": jobs[job_id]["warnings"],
        }

    except Exception as e:
        # If something truly fatal happens very early
        jobs[job_id]["status"] = JobStatus.FAILED
        jobs[job_id]["error"] = str(e)


What to Observe When Testing

Create jobs using POST /documents

Poll GET /jobs/{job_id}

You should see outcomes like:

‚úÖ Success (full path)

completed_stages: ocr, chunk, embed, index

mode: full_rag

warnings: []

‚úÖ Success with degradation

completed_stages: ocr, chunk (maybe embed partially)

mode: keyword_only

warnings includes why it degraded

‚ùå Failure (rare)

only if OCR/chunking fails in our simple demo

üß† Patterns Learned
‚úÖ Partial Result Pattern

Store useful intermediate outputs:

extracted text preview

chunk previews

number of embeddings completed

Job still provides value even if later stages fail

‚úÖ Graceful Degradation Pattern

When ‚Äúbest path‚Äù fails, system switches to a simpler mode:

skip indexing

fall back to keyword-only search

User still gets usable output + clear warnings