From 486296cbbf1984120b5ad5a8686a2953a95b92ad Mon Sep 17 00:00:00 2001
From: "Gabe@w7dev" <gabor@w7-7.net>
Date: Sun, 1 Feb 2026 10:44:09 +0000
Subject: [PATCH 1/3] docs: restructure INITIAL-9 into modular three-phase
 roadmap

Decompose monolithic INITIAL-9 into three specialized technical phases:

- INITIAL-9: RAG Knowledge Base ("The Memory")
  - pgvector + OpenAI embeddings
  - Markdown/OpenAPI-aware chunking
  - Semantic retrieval endpoints

- INITIAL-10: Agentic Layer ("The Brain")
  - PydanticAI agents (Experiment Orchestrator, RAG Assistant)
  - Tool orchestration with structured outputs
  - Human-in-the-loop approval workflow

- INITIAL-11: ForecastLab Dashboard ("The Face")
  - React 19 + Vite + shadcn/ui
  - TanStack Table/Query for data management
  - Recharts for time series visualization
  - Agent chat interface with streaming

Update PHASE-index.md and DAILY-FLOW.md to align with new structure.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 INITIAL-10.md       | 421 ++++++++++++++++++++++++++++++++++++++++++++
 INITIAL-11.md       | 417 +++++++++++++++++++++++++++++++++++++++++++
 INITIAL-9.md        | 352 ++++++++++++++++++++++++++++++++----
 docs/DAILY-FLOW.md  |  36 ++--
 docs/PHASE-index.md |  35 ++--
 5 files changed, 1204 insertions(+), 57 deletions(-)
 create mode 100644 INITIAL-10.md
 create mode 100644 INITIAL-11.md

diff --git a/INITIAL-10.md b/INITIAL-10.md
new file mode 100644
index 00000000..1c510772
--- /dev/null
+++ b/INITIAL-10.md
@@ -0,0 +1,421 @@
+# INITIAL-10.md — Agentic Layer (The Brain)
+
+## Architectural Role
+
+**"The Brain"** - Autonomous decision-making, tool orchestration, and structured outputs using PydanticAI.
+
+This phase provides intelligent orchestration capabilities:
+- Experiment automation (config generation → backtest → deploy)
+- RAG-powered Q&A with evidence-grounded answers and citations
+- Human-in-the-loop approval for sensitive operations
+- Structured, schema-enforced outputs
+
+---
+
+## Tech Stack
+
+| Component | Technology | Purpose |
+|-----------|------------|---------|
+| Agent Framework | [PydanticAI](https://ai.pydantic.dev/) | Type-safe agent orchestration |
+| Tool System | [Function Tools](https://ai.pydantic.dev/tools/) | API binding |
+| Tool Groups | [Toolsets](https://ai.pydantic.dev/toolsets/) | Grouped tool management |
+| LLM Provider | Anthropic Claude / OpenAI GPT-4 | Configurable provider |
+| Streaming | [PydanticAI Streaming](https://ai.pydantic.dev/results/#streamed-results) | Real-time responses |
+
+---
+
+## FEATURE
+
+### Experiment Orchestrator Agent
+Autonomous experiment workflow management:
+- **Tools**: `list_models`, `run_backtest`, `compare_runs`, `create_alias`, `archive_run`
+- **Workflow**: Generate configs → Run backtests → Analyze metrics → Select best → Deploy alias
+- **Output**: Structured `ExperimentReport` with methodology, results, and recommendations
+
+### RAG Assistant Agent
+Evidence-grounded question answering:
+- **Tools**: `retrieve_context` (from INITIAL-9), `format_citation`
+- **Workflow**: Parse query → Retrieve chunks → Synthesize answer → Format citations
+- **Output**: Structured `RAGResponse` with answer, citations, and confidence score
+
+### Agent Session Management
+- Session state persistence for multi-turn conversations
+- Tool call logging with correlation IDs
+- Human-in-the-loop approval for sensitive actions
+- Graceful LLM API failure handling with retries
+
+---
+
+## ENDPOINTS
+
+### POST /agents/experiment/run
+Execute an experiment workflow with the Orchestrator Agent.
+
+**Request**:
+```json
+{
+  "objective": "Find the best model configuration for store S001, product P001",
+  "constraints": {
+    "model_types": ["moving_average", "seasonal_naive"],
+    "min_train_size": 60,
+    "max_splits": 5
+  },
+  "auto_deploy": false,
+  "session_id": "optional-session-id"
+}
+```
+
+**Response**:
+```json
+{
+  "session_id": "sess_abc123",
+  "status": "completed",
+  "report": {
+    "objective": "Find the best model configuration for store S001, product P001",
+    "methodology": "Evaluated 6 configurations using 5-fold expanding window CV",
+    "experiments_run": 6,
+    "best_run": {
+      "run_id": "run_xyz789",
+      "model_type": "moving_average",
+      "config": {"window": 14},
+      "metrics": {
+        "mae": 12.5,
+        "smape": 15.2,
+        "wape": 0.08
+      }
+    },
+    "baseline_comparison": {
+      "vs_naive": {
+        "mae_improvement_pct": 23.5,
+        "smape_improvement_pct": 18.2
+      }
+    },
+    "recommendation": "Deploy moving_average with window=14",
+    "approval_required": true,
+    "pending_action": "create_alias"
+  },
+  "tool_calls": [
+    {
+      "tool": "list_models",
+      "args": {},
+      "result_summary": "Found 4 model types"
+    },
+    {
+      "tool": "run_backtest",
+      "args": {"model_type": "moving_average", "window": 7},
+      "result_summary": "MAE: 15.2"
+    }
+  ],
+  "tokens_used": 2450,
+  "duration_ms": 45000
+}
+```
+
+### POST /agents/experiment/approve
+Approve a pending action from an experiment session.
+
+**Request**:
+```json
+{
+  "session_id": "sess_abc123",
+  "action": "create_alias",
+  "approved": true,
+  "comment": "Approved for staging deployment"
+}
+```
+
+**Response**:
+```json
+{
+  "session_id": "sess_abc123",
+  "action": "create_alias",
+  "status": "executed",
+  "result": {
+    "alias_name": "production",
+    "run_id": "run_xyz789"
+  }
+}
+```
+
+### POST /agents/rag/query
+Query with answer generation using the RAG Assistant Agent.
+
+**Request**:
+```json
+{
+  "query": "How does the backtesting module prevent data leakage?",
+  "session_id": "optional-session-id",
+  "include_sources": true
+}
+```
+
+**Response**:
+```json
+{
+  "session_id": "sess_def456",
+  "answer": "The backtesting module prevents data leakage through several mechanisms:\n\n1. **Time-based splits only**: The TimeSeriesSplitter uses expanding or sliding window strategies, never random splits.\n\n2. **Gap parameter**: Configurable gap between train and test sets simulates operational latency.\n\n3. **Lag feature validation**: Features are computed with explicit cutoff dates to prevent future data access.",
+  "confidence": 0.92,
+  "citations": [
+    {
+      "source_type": "markdown",
+      "source_path": "docs/PHASE/5-BACKTESTING.md",
+      "chunk_id": "chunk_abc123",
+      "snippet": "TimeSeriesSplitter uses time-based splits (expanding/sliding window)...",
+      "relevance_score": 0.94
+    },
+    {
+      "source_type": "markdown",
+      "source_path": "CLAUDE.md",
+      "chunk_id": "chunk_def456",
+      "snippet": "Backtesting uses time-based splits (rolling/expanding), never random split...",
+      "relevance_score": 0.89
+    }
+  ],
+  "tokens_used": 1250,
+  "duration_ms": 3200
+}
+```
+
+### GET /agents/status/{session_id}
+Check agent session status.
+
+**Response**:
+```json
+{
+  "session_id": "sess_abc123",
+  "agent_type": "experiment_orchestrator",
+  "status": "awaiting_approval",
+  "created_at": "2026-02-01T10:30:00Z",
+  "last_activity": "2026-02-01T10:35:00Z",
+  "pending_action": {
+    "action": "create_alias",
+    "details": {
+      "alias_name": "production",
+      "run_id": "run_xyz789"
+    }
+  },
+  "tool_calls_count": 8,
+  "tokens_used": 2450
+}
+```
+
+### WS /agents/stream
+WebSocket endpoint for streaming responses.
+
+**Client → Server**:
+```json
+{
+  "type": "query",
+  "agent": "rag_assistant",
+  "payload": {
+    "query": "Explain the model registry workflow"
+  }
+}
+```
+
+**Server → Client (streaming)**:
+```json
+{"type": "token", "content": "The"}
+{"type": "token", "content": " model"}
+{"type": "token", "content": " registry"}
+{"type": "tool_call", "tool": "retrieve_context", "status": "started"}
+{"type": "tool_call", "tool": "retrieve_context", "status": "completed", "summary": "Found 5 relevant chunks"}
+{"type": "token", "content": " tracks..."}
+{"type": "complete", "session_id": "sess_xyz", "tokens_used": 850}
+```
+
+---
+
+## AGENT DEFINITIONS
+
+### Experiment Orchestrator Agent
+
+```python
+from pydantic_ai import Agent
+from pydantic import BaseModel
+
+class ExperimentReport(BaseModel):
+    """Structured output for experiment results."""
+    objective: str
+    methodology: str
+    experiments_run: int
+    best_run: RunSummary
+    baseline_comparison: BaselineComparison
+    recommendation: str
+    approval_required: bool
+    pending_action: str | None
+
+experiment_agent = Agent(
+    model="anthropic:claude-sonnet-4-20250514",
+    result_type=ExperimentReport,
+    system_prompt="""You are an ML experiment orchestrator for retail demand forecasting.
+
+Your goal is to find the best model configuration through systematic experimentation.
+Always:
+1. Start with baseline models (naive, seasonal_naive)
+2. Compare against baselines with improvement percentages
+3. Use time-based backtesting with appropriate train/test splits
+4. Recommend the best configuration with justification
+5. Request approval before deployment actions""",
+    tools=[list_models, run_backtest, compare_runs, create_alias, archive_run]
+)
+```
+
+### RAG Assistant Agent
+
+```python
+class RAGResponse(BaseModel):
+    """Structured output for RAG queries."""
+    answer: str
+    confidence: float  # 0.0 - 1.0
+    citations: list[Citation]
+    insufficient_context: bool = False
+
+rag_agent = Agent(
+    model="anthropic:claude-sonnet-4-20250514",
+    result_type=RAGResponse,
+    system_prompt="""You are a documentation assistant for ForecastLabAI.
+
+Your responses must be evidence-grounded:
+- Only answer based on retrieved context
+- Include citations for all claims
+- If context is insufficient, set insufficient_context=True and explain what's missing
+- Never hallucinate information not in the retrieved chunks""",
+    tools=[retrieve_context, format_citation]
+)
+```
+
+---
+
+## TOOL DEFINITIONS
+
+### list_models
+```python
+@tool
+async def list_models(ctx: RunContext[AgentDeps]) -> list[ModelInfo]:
+    """List available forecasting models with their configurations.
+
+    Use this to discover what model types can be experimented with.
+    Returns model_type, default_config, and description.
+    """
+    ...
+```
+
+### run_backtest
+```python
+@tool
+async def run_backtest(
+    ctx: RunContext[AgentDeps],
+    model_type: str,
+    config: dict[str, Any],
+    store_id: str,
+    product_id: str,
+    n_splits: int = 5
+) -> BacktestResult:
+    """Run a backtest for a model configuration.
+
+    Use this to evaluate model performance with time-series CV.
+    Returns per-fold and aggregated metrics (MAE, sMAPE, WAPE).
+    """
+    ...
+```
+
+### retrieve_context
+```python
+@tool
+async def retrieve_context(
+    ctx: RunContext[AgentDeps],
+    query: str,
+    top_k: int = 5
+) -> list[RetrievedChunk]:
+    """Retrieve relevant documentation chunks for a query.
+
+    Use this before answering any question about the system.
+    Returns chunks with content, source_path, and relevance_score.
+    """
+    ...
+```
+
+---
+
+## CONFIGURATION (Settings)
+
+```python
+# app/core/config.py additions
+
+# Agent LLM Configuration
+agent_default_model: str = "anthropic:claude-sonnet-4-20250514"
+agent_fallback_model: str = "openai:gpt-4o"
+agent_temperature: float = 0.1
+agent_max_tokens: int = 4096
+
+# Agent Execution Configuration
+agent_max_tool_calls: int = 10
+agent_timeout_seconds: int = 120
+agent_retry_attempts: int = 3
+agent_retry_delay_seconds: float = 1.0
+
+# Human-in-the-Loop Configuration
+agent_require_approval: list[str] = ["create_alias", "archive_run"]
+agent_approval_timeout_minutes: int = 60
+
+# Streaming Configuration
+agent_enable_streaming: bool = True
+agent_stream_chunk_size: int = 10  # tokens per chunk
+
+# Session Configuration
+agent_session_ttl_minutes: int = 120
+agent_max_sessions_per_user: int = 5
+```
+
+---
+
+## SUCCESS CRITERIA
+
+- [ ] Agents produce schema-enforced structured outputs
+- [ ] Tool calls are logged with correlation IDs and timing
+- [ ] Human-in-the-loop approval blocks sensitive actions
+- [ ] Graceful handling of LLM API failures with retries
+- [ ] WebSocket streaming delivers tokens in real-time
+- [ ] Session state persists across multiple requests
+- [ ] Unit tests with mocked LLM responses
+- [ ] Integration tests with real LLM calls (rate-limited)
+- [ ] Structured logging for all agent operations
+- [ ] Token usage tracked per session for cost monitoring
+
+---
+
+## CROSS-MODULE INTEGRATION
+
+| Direction | Module | Integration Point |
+|-----------|--------|-------------------|
+| **← RAG Layer** | INITIAL-9 | Uses `retrieve_context` tool |
+| **← Registry** | Phase 6 | Uses `list_runs`, `compare_runs`, `create_alias` tools |
+| **← Backtesting** | Phase 5 | Uses `run_backtest` tool |
+| **← Forecasting** | Phase 4 | Uses `list_models`, `train_model` tools |
+| **→ Dashboard** | INITIAL-11 | Provides chat interface backend |
+| **→ Jobs** | Phase 7 | Creates job records for audit trail |
+
+---
+
+## DOCUMENTATION LINKS
+
+- [PydanticAI Documentation](https://ai.pydantic.dev/)
+- [PydanticAI Agents](https://ai.pydantic.dev/agents/)
+- [PydanticAI Tools](https://ai.pydantic.dev/tools/)
+- [PydanticAI Toolsets](https://ai.pydantic.dev/toolsets/)
+- [PydanticAI Built-in Tools](https://ai.pydantic.dev/builtin-tools/)
+- [PydanticAI Streaming Results](https://ai.pydantic.dev/results/#streamed-results)
+- [PydanticAI GitHub](https://github.com/pydantic/pydantic-ai)
+- [Anthropic Claude API](https://docs.anthropic.com/en/api)
+
+---
+
+## OTHER CONSIDERATIONS
+
+- **Structured Outputs**: All agent responses are Pydantic models, never raw text
+- **Tool Docstrings**: Follow guidance in CLAUDE.md for agent-optimized tool documentation
+- **Cost Control**: Track and limit token usage per session
+- **Audit Trail**: All tool calls logged with request correlation for debugging
+- **Fallback Provider**: Automatic failover to fallback model on primary failure
+- **Approval Workflow**: Pending actions expire after `agent_approval_timeout_minutes`
diff --git a/INITIAL-11.md b/INITIAL-11.md
new file mode 100644
index 00000000..3138f3c6
--- /dev/null
+++ b/INITIAL-11.md
@@ -0,0 +1,417 @@
+# INITIAL-11.md — ForecastLab Dashboard (The Face)
+
+## Architectural Role
+
+**"The Face"** - User interface, data visualization, and agent interaction using React 19 + shadcn/ui.
+
+This phase provides the visual layer for:
+- Data exploration with server-side pagination and filtering
+- Time series visualization with interactive charts
+- Agent chat interface with streaming responses
+- Admin panel for system management
+
+---
+
+## Tech Stack
+
+| Component | Technology | Purpose |
+|-----------|------------|---------|
+| Framework | React 19 + [Vite](https://vite.dev/) | Fast build, HMR |
+| Components | [shadcn/ui](https://ui.shadcn.com/) | Accessible, customizable UI |
+| Data Tables | [TanStack Table](https://tanstack.com/table/latest) | Server-side data grids |
+| Data Fetching | [TanStack Query](https://tanstack.com/query/latest) | Caching, invalidation |
+| Charts | [Recharts](https://recharts.org/) | Time series visualization |
+| Styling | Tailwind CSS 4 | Utility-first CSS |
+| State | React 19 `use()` + TanStack Query | Server state management |
+
+---
+
+## FEATURE
+
+### Data Explorer
+Interactive data tables with full server-side capabilities:
+- **Tables**: Sales, Stores, Products, Model Runs, Jobs
+- **Features**: Pagination, sorting, filtering, column visibility
+- **Export**: CSV download for selected/all rows
+- **Pattern**: [shadcn/ui Data Table](https://ui.shadcn.com/docs/components/data-table)
+
+### Time Series Visualizers
+Charts for forecasting analysis:
+- **Actual vs Predicted**: Line chart with confidence intervals
+- **Backtest Folds**: Train/test split visualization
+- **Metric Comparison**: Bar charts for model comparison
+- **Interactive**: Tooltips, zoom, pan, brush selection
+
+### Agent Chat Interface
+Real-time interaction with AI agents:
+- **Streaming**: WebSocket-based token streaming
+- **Citations**: Rendered with source links
+- **Tool Calls**: Collapsible visualization of agent actions
+- **History**: Session sidebar with conversation threads
+
+### Admin Panel
+System management and monitoring:
+- **RAG Sources**: Index/delete documentation sources
+- **Model Aliases**: Manage deployment aliases
+- **Health Dashboard**: Service status, recent errors
+- **Job Monitor**: Active and historical job status
+
+---
+
+## PAGE STRUCTURE
+
+### /dashboard
+Main dashboard with KPI summary cards and quick actions.
+
+### /explorer/sales
+Sales data explorer with date range filtering.
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│  Sales Explorer                                    [Export] │
+├─────────────────────────────────────────────────────────────┤
+│  Filters: [Date Range] [Store ▼] [Product ▼] [Search...]    │
+├─────────────────────────────────────────────────────────────┤
+│  Date        │ Store   │ Product │ Quantity │ Revenue       │
+│  2026-01-15  │ S001    │ P001    │ 150      │ $2,250.00     │
+│  2026-01-15  │ S001    │ P002    │ 75       │ $1,125.00     │
+│  ...         │ ...     │ ...     │ ...      │ ...           │
+├─────────────────────────────────────────────────────────────┤
+│  Page 1 of 50  │  [< Prev]  [1] [2] [3] ... [50]  [Next >]  │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### /explorer/runs
+Model run explorer with comparison capabilities.
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│  Model Runs                              [Compare Selected] │
+├─────────────────────────────────────────────────────────────┤
+│  [☐] │ Run ID    │ Model    │ Status  │ MAE   │ Created     │
+│  [☐] │ run_abc   │ MA(14)   │ SUCCESS │ 12.5  │ 2h ago      │
+│  [☐] │ run_def   │ SN(7)    │ SUCCESS │ 15.2  │ 3h ago      │
+│  [☐] │ run_ghi   │ Naive    │ SUCCESS │ 18.9  │ 5h ago      │
+├─────────────────────────────────────────────────────────────┤
+│  Showing 3 of 127 runs                                      │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### /visualize/forecast
+Forecast visualization with actual vs predicted overlay.
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│  Forecast: Store S001, Product P001                         │
+├─────────────────────────────────────────────────────────────┤
+│  [Store ▼] [Product ▼] [Model Run ▼] [Date Range]          │
+├─────────────────────────────────────────────────────────────┤
+│                                                             │
+│  200 ─┤                               ╭──────              │
+│       │                          ╭────╯    Predicted       │
+│  150 ─┤                     ╭────╯                         │
+│       │                ╭────╯      ───── Actual            │
+│  100 ─┤           ╭────╯           - - - Confidence        │
+│       │      ╭────╯                                        │
+│   50 ─┤ ╭────╯                                             │
+│       │─╯                                                   │
+│    0 ─┼────────────────────────────────────────────────    │
+│       Jan 1     Jan 15    Feb 1     Feb 15    Mar 1        │
+│                                                             │
+├─────────────────────────────────────────────────────────────┤
+│  MAE: 12.5  │  sMAPE: 15.2%  │  WAPE: 8.1%  │  Bias: -2.3  │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### /visualize/backtest
+Backtest fold visualization.
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│  Backtest: run_abc123 (5-fold Expanding Window)             │
+├─────────────────────────────────────────────────────────────┤
+│                                                             │
+│  Fold 1: ████████████░░░░  MAE: 14.2  sMAPE: 16.8%         │
+│  Fold 2: █████████████████░░░░  MAE: 13.1  sMAPE: 15.4%    │
+│  Fold 3: ███████████████████████░░░░  MAE: 12.8  sMAPE: 14.9│
+│  Fold 4: █████████████████████████████░░░░  MAE: 11.9      │
+│  Fold 5: ███████████████████████████████████░░░░  MAE: 11.2│
+│                                                             │
+│  █ Train   ░ Test                                          │
+├─────────────────────────────────────────────────────────────┤
+│  Aggregated: MAE: 12.6 ± 1.1  │  Stability: 0.91           │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### /chat
+Agent chat interface with streaming.
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│  ForecastLab Assistant                                      │
+├────────────┬────────────────────────────────────────────────┤
+│  Sessions  │                                                │
+│  ─────────│  How does backtesting prevent data leakage?    │
+│  Today     │                                                │
+│  ◉ Current │  The backtesting module prevents data leakage │
+│  ○ 10:30am │  through several mechanisms:                   │
+│  ○ 9:15am  │                                                │
+│  Yesterday │  1. **Time-based splits**: Uses expanding...   │
+│  ○ 4:45pm  │                                                │
+│            │  📚 Citations:                                  │
+│            │  [1] docs/PHASE/5-BACKTESTING.md               │
+│            │  [2] CLAUDE.md                                 │
+│            │                                                │
+│            │  ──────────────────────────────────────────    │
+│            │  🔧 Tool: retrieve_context (5 chunks found)    │
+│            │  ──────────────────────────────────────────    │
+├────────────┴────────────────────────────────────────────────┤
+│  [Type your question...]                          [Send ➤] │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### /admin
+Admin panel for system management.
+
+---
+
+## COMPONENTS
+
+### DataTable (shadcn/ui pattern)
+
+```tsx
+// components/data-table/data-table.tsx
+import {
+  ColumnDef,
+  flexRender,
+  getCoreRowModel,
+  useReactTable,
+} from "@tanstack/react-table"
+
+interface DataTableProps<TData, TValue> {
+  columns: ColumnDef<TData, TValue>[]
+  data: TData[]
+  pageCount: number
+  pageIndex: number
+  pageSize: number
+  onPaginationChange: (pagination: PaginationState) => void
+  onSortingChange: (sorting: SortingState) => void
+  onFilterChange: (filters: ColumnFiltersState) => void
+}
+
+export function DataTable<TData, TValue>({
+  columns,
+  data,
+  pageCount,
+  ...props
+}: DataTableProps<TData, TValue>) {
+  const table = useReactTable({
+    data,
+    columns,
+    pageCount,
+    manualPagination: true,
+    manualSorting: true,
+    manualFiltering: true,
+    getCoreRowModel: getCoreRowModel(),
+    // ...
+  })
+
+  return (
+    <Table>
+      <TableHeader>...</TableHeader>
+      <TableBody>...</TableBody>
+    </Table>
+  )
+}
+```
+
+### TimeSeriesChart
+
+```tsx
+// components/charts/time-series-chart.tsx
+import { LineChart, Line, XAxis, YAxis, Tooltip, Legend } from 'recharts'
+
+interface TimeSeriesChartProps {
+  data: { date: string; actual: number; predicted?: number }[]
+  showConfidence?: boolean
+  height?: number
+}
+
+export function TimeSeriesChart({ data, showConfidence, height = 400 }: TimeSeriesChartProps) {
+  return (
+    <LineChart data={data} height={height}>
+      <XAxis dataKey="date" />
+      <YAxis />
+      <Tooltip />
+      <Legend />
+      <Line type="monotone" dataKey="actual" stroke="#2563eb" name="Actual" />
+      {data[0]?.predicted !== undefined && (
+        <Line type="monotone" dataKey="predicted" stroke="#16a34a" name="Predicted" strokeDasharray="5 5" />
+      )}
+    </LineChart>
+  )
+}
+```
+
+### ChatMessage
+
+```tsx
+// components/chat/chat-message.tsx
+interface ChatMessageProps {
+  role: 'user' | 'assistant'
+  content: string
+  citations?: Citation[]
+  toolCalls?: ToolCall[]
+  isStreaming?: boolean
+}
+
+export function ChatMessage({ role, content, citations, toolCalls, isStreaming }: ChatMessageProps) {
+  return (
+    <div className={cn("flex", role === 'user' ? 'justify-end' : 'justify-start')}>
+      <div className="max-w-[80%] rounded-lg p-4 bg-muted">
+        <Markdown>{content}</Markdown>
+        {isStreaming && <span className="animate-pulse">▋</span>}
+        {citations && <CitationList citations={citations} />}
+        {toolCalls && <ToolCallList toolCalls={toolCalls} />}
+      </div>
+    </div>
+  )
+}
+```
+
+---
+
+## API HOOKS (TanStack Query)
+
+```tsx
+// hooks/use-sales.ts
+export function useSales(params: SalesQueryParams) {
+  return useQuery({
+    queryKey: ['sales', params],
+    queryFn: () => api.get('/analytics/drilldowns', { params }),
+    placeholderData: keepPreviousData,
+  })
+}
+
+// hooks/use-runs.ts
+export function useRuns(params: RunsQueryParams) {
+  return useQuery({
+    queryKey: ['runs', params],
+    queryFn: () => api.get('/registry/runs', { params }),
+  })
+}
+
+// hooks/use-chat.ts
+export function useChat(sessionId?: string) {
+  const [messages, setMessages] = useState<Message[]>([])
+  const ws = useWebSocket(`${WS_URL}/agents/stream`)
+
+  const sendMessage = useCallback((content: string) => {
+    ws.send(JSON.stringify({ type: 'query', agent: 'rag_assistant', payload: { query: content } }))
+  }, [ws])
+
+  return { messages, sendMessage, isConnected: ws.readyState === WebSocket.OPEN }
+}
+```
+
+---
+
+## CONFIGURATION (Environment)
+
+```env
+# .env.example for frontend
+
+# API Configuration
+VITE_API_BASE_URL=http://localhost:8123
+VITE_WS_URL=ws://localhost:8123/agents/stream
+
+# Feature Flags
+VITE_ENABLE_AGENT_CHAT=true
+VITE_ENABLE_ADMIN_PANEL=true
+
+# Visualization
+VITE_DEFAULT_PAGE_SIZE=25
+VITE_MAX_CHART_POINTS=365
+```
+
+---
+
+## EXAMPLES
+
+### examples/ui/README.md
+```markdown
+# Dashboard Page Map
+
+| Page | API Endpoints | Description |
+|------|---------------|-------------|
+| /dashboard | GET /analytics/kpis | KPI summary cards |
+| /explorer/sales | GET /analytics/drilldowns | Sales data table |
+| /explorer/runs | GET /registry/runs | Model run table |
+| /visualize/forecast | GET /forecasting/predict | Forecast chart |
+| /visualize/backtest | GET /backtesting/results/{run_id} | Fold visualization |
+| /chat | WS /agents/stream | Agent chat |
+| /admin | GET /rag/sources, GET /registry/aliases | Admin panel |
+
+## Running the Dashboard
+
+\`\`\`bash
+cd frontend
+pnpm install
+pnpm dev
+\`\`\`
+
+Open http://localhost:5173
+```
+
+---
+
+## SUCCESS CRITERIA
+
+- [ ] Data tables handle 10k+ rows with virtual scrolling
+- [ ] Server-side pagination, sorting, filtering all functional
+- [ ] Charts render smoothly with 365+ data points
+- [ ] WebSocket chat shows streaming tokens in real-time
+- [ ] Citations render as clickable source links
+- [ ] Tool calls displayed in collapsible sections
+- [ ] Responsive design works on tablet and mobile
+- [ ] Lighthouse performance score > 90
+- [ ] Accessibility: keyboard navigation, screen reader support
+- [ ] Dark/light theme toggle
+
+---
+
+## CROSS-MODULE INTEGRATION
+
+| Direction | Module | Integration Point |
+|-----------|--------|-------------------|
+| **← RAG Layer** | INITIAL-9 | Displays indexed sources, allows re-indexing |
+| **← Agentic Layer** | INITIAL-10 | Chat interface, experiment status display |
+| **← Registry** | Phase 6 | Run leaderboard, comparison views |
+| **← Analytics** | Phase 7 | KPI dashboard, drilldown charts |
+| **← Jobs** | Phase 7 | Job status monitoring |
+| **← Dimensions** | Phase 7 | Store/product selectors |
+
+---
+
+## DOCUMENTATION LINKS
+
+- [shadcn/ui Documentation](https://ui.shadcn.com/)
+- [shadcn/ui Data Table](https://ui.shadcn.com/docs/components/data-table)
+- [shadcn/ui Table](https://ui.shadcn.com/docs/components/table)
+- [TanStack Table](https://tanstack.com/table/latest)
+- [TanStack Query](https://tanstack.com/query/latest)
+- [Recharts](https://recharts.org/)
+- [Vite Documentation](https://vite.dev/)
+- [React 19 Documentation](https://react.dev/)
+- [Tailwind CSS 4](https://tailwindcss.com/)
+
+---
+
+## OTHER CONSIDERATIONS
+
+- **No Hardcoded URLs**: API base URL from environment variable only
+- **Error Boundaries**: Graceful error handling with retry options
+- **Loading States**: Skeleton components for all async data
+- **Optimistic Updates**: Instant UI feedback for mutations
+- **Caching**: TanStack Query manages cache invalidation
+- **Bundle Size**: Code splitting per route for fast initial load
diff --git a/INITIAL-9.md b/INITIAL-9.md
index e82c4453..da491760 100644
--- a/INITIAL-9.md
+++ b/INITIAL-9.md
@@ -1,33 +1,319 @@
-# INITIAL-9.md — Dashboard + RAG + Agentic Layer (PydanticAI)
-
-## FEATURE:
-- Dashboard (React + Vite + shadcn/ui Data Table):
-  - Data Explorer (tables, filters, export)
-  - Model Runs (leaderboard, compare)
-  - Train & Predict (forms, status)
-  - Predictions (tabular view)
-- RAG assistant (pgvector):
-  - indexed sources: README.md, /docs/*, OpenAPI export, run reports
-  - retrieve top-k → answer with citations
-- Optional PydanticAI:
-  - agent with tools:
-    - experiment orchestrator (generate configs → backtest → select best → report)
-    - rag assistant (query → retrieve → structured answer)
-  - enforced structured outputs
-
-## EXAMPLES:
-- `examples/ui/README.md` — page map + API mapping (no hardcoded base URL).
-- `examples/rag/index_docs.py` — chunk+embed+store (Settings-driven).
-- `examples/rag/query.http` — Q&A returning a citations schema.
-- `examples/agent/` — best-practice agent setup (providers, tools, dependencies).
-
-## DOCUMENTATION:
-- shadcn/ui Data Table pattern + TanStack Table
-- pgvector similarity search + indexing strategies
-- PydanticAI docs (include link in README as a code block)
-
-## OTHER CONSIDERATIONS:
-- Required: `.env.example` for frontend (`VITE_API_BASE_URL`).
-- RAG must be evidence-grounded: if no support, return “not found” (no hallucinations).
-- Stable citation schema: source_type, source_id/path, chunk_id, snippet/span.
-- Embedding model + dimension must come from Settings (never hardcoded).
+# INITIAL-9.md — RAG Knowledge Base (The Memory)
+
+## Architectural Role
+
+**"The Memory"** - Vector storage, document ingestion, and semantic retrieval infrastructure.
+
+This phase provides the foundational knowledge layer that enables:
+- Indexed documentation and run reports for AI-assisted search
+- Semantic retrieval with relevance scoring
+- Evidence-grounded context for the Agentic Layer (INITIAL-10)
+
+---
+
+## Tech Stack
+
+| Component | Technology | Purpose |
+|-----------|------------|---------|
+| Vector Store | PostgreSQL 16 + [pgvector](https://github.com/pgvector/pgvector) | Similarity search |
+| Embeddings | [OpenAI text-embedding-3-small](https://platform.openai.com/docs/models/text-embedding-3-small) | 1536-dim vectors (configurable) |
+| Chunking | Markdown-aware, OpenAPI endpoint-aware | Semantic boundaries |
+| Index Type | HNSW (default) or IVFFlat | Approximate nearest neighbor |
+
+---
+
+## FEATURE
+
+### Database Layer
+- `document_chunk` table with vector column (`embedding VECTOR(1536)`)
+- HNSW index for cosine similarity search
+- Unique constraint `(source_id, chunk_index)` for idempotent re-indexing
+- Metadata JSONB for source type, heading hierarchy, timestamps
+
+### Ingestion Pipeline
+- **Markdown Chunker**: Heading-aware splitting (configurable size/overlap)
+- **OpenAPI Chunker**: Endpoint-based granularity (one chunk per operation)
+- **Embedding Service**: Async batch processing with rate limiting
+- **Source Registry**: Track indexed sources with version/hash for change detection
+
+### Retrieval Engine
+- Top-k semantic search with configurable similarity threshold
+- Metadata filtering (source_type, date_range, tags)
+- Relevance score normalization (0.0 - 1.0)
+- Context window assembly for downstream consumption
+
+---
+
+## ENDPOINTS
+
+### POST /rag/index
+Index documents from various sources.
+
+**Request**:
+```json
+{
+  "source_type": "markdown",
+  "source_path": "docs/ARCHITECTURE.md",
+  "metadata": {
+    "category": "documentation",
+    "version": "1.0.0"
+  }
+}
+```
+
+**Response**:
+```json
+{
+  "source_id": "src_abc123",
+  "chunks_created": 15,
+  "tokens_processed": 4250,
+  "duration_ms": 1234.56,
+  "status": "indexed"
+}
+```
+
+### POST /rag/retrieve
+Semantic search across indexed documents.
+
+**Request**:
+```json
+{
+  "query": "How does backtesting prevent data leakage?",
+  "top_k": 5,
+  "similarity_threshold": 0.7,
+  "filters": {
+    "source_type": ["markdown", "openapi"],
+    "category": "documentation"
+  }
+}
+```
+
+**Response**:
+```json
+{
+  "results": [
+    {
+      "chunk_id": "chunk_xyz789",
+      "source_id": "src_abc123",
+      "source_path": "docs/PHASE/5-BACKTESTING.md",
+      "content": "TimeSeriesSplitter uses time-based splits (expanding/sliding window) to prevent leakage...",
+      "relevance_score": 0.92,
+      "metadata": {
+        "heading": "Leakage Prevention",
+        "section_path": ["Phase 5: Backtesting", "Implementation", "Leakage Prevention"]
+      }
+    }
+  ],
+  "query_embedding_time_ms": 45.2,
+  "search_time_ms": 12.8,
+  "total_chunks_searched": 1250
+}
+```
+
+### GET /rag/sources
+List all indexed sources with metadata.
+
+**Response**:
+```json
+{
+  "sources": [
+    {
+      "source_id": "src_abc123",
+      "source_type": "markdown",
+      "source_path": "docs/ARCHITECTURE.md",
+      "chunk_count": 15,
+      "indexed_at": "2026-02-01T10:30:00Z",
+      "content_hash": "sha256:abc123..."
+    }
+  ],
+  "total_sources": 12,
+  "total_chunks": 450
+}
+```
+
+### DELETE /rag/sources/{source_id}
+Remove an indexed source and all its chunks.
+
+**Response**:
+```json
+{
+  "source_id": "src_abc123",
+  "chunks_deleted": 15,
+  "status": "deleted"
+}
+```
+
+---
+
+## DATABASE SCHEMA
+
+```sql
+-- Enable pgvector extension
+CREATE EXTENSION IF NOT EXISTS vector;
+
+-- Document source registry
+CREATE TABLE document_source (
+    source_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    source_type VARCHAR(50) NOT NULL,  -- 'markdown', 'openapi', 'run_report'
+    source_path TEXT NOT NULL,
+    content_hash VARCHAR(64) NOT NULL,  -- SHA-256 for change detection
+    metadata JSONB DEFAULT '{}',
+    indexed_at TIMESTAMPTZ DEFAULT NOW(),
+    updated_at TIMESTAMPTZ DEFAULT NOW(),
+    UNIQUE (source_type, source_path)
+);
+
+-- Document chunks with embeddings
+CREATE TABLE document_chunk (
+    chunk_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    source_id UUID NOT NULL REFERENCES document_source(source_id) ON DELETE CASCADE,
+    chunk_index INTEGER NOT NULL,
+    content TEXT NOT NULL,
+    embedding VECTOR(1536),  -- Configurable dimension
+    token_count INTEGER NOT NULL,
+    metadata JSONB DEFAULT '{}',  -- heading, section_path, etc.
+    created_at TIMESTAMPTZ DEFAULT NOW(),
+    UNIQUE (source_id, chunk_index)
+);
+
+-- HNSW index for cosine similarity
+CREATE INDEX idx_chunk_embedding_hnsw
+ON document_chunk
+USING hnsw (embedding vector_cosine_ops)
+WITH (m = 16, ef_construction = 64);
+
+-- Metadata filtering index
+CREATE INDEX idx_chunk_metadata ON document_chunk USING gin (metadata);
+```
+
+---
+
+## EXAMPLES
+
+### examples/rag/index_docs.py
+```python
+"""Index documentation into RAG knowledge base."""
+import asyncio
+from pathlib import Path
+import httpx
+
+async def index_markdown_docs():
+    """Index all markdown docs from docs/ directory."""
+    async with httpx.AsyncClient(base_url="http://localhost:8123") as client:
+        docs_dir = Path("docs")
+        for md_file in docs_dir.rglob("*.md"):
+            response = await client.post(
+                "/rag/index",
+                json={
+                    "source_type": "markdown",
+                    "source_path": str(md_file),
+                    "metadata": {"category": "documentation"}
+                }
+            )
+            result = response.json()
+            print(f"Indexed {md_file}: {result['chunks_created']} chunks")
+
+if __name__ == "__main__":
+    asyncio.run(index_markdown_docs())
+```
+
+### examples/rag/query.http
+```http
+### Semantic search query
+POST http://localhost:8123/rag/retrieve
+Content-Type: application/json
+
+{
+  "query": "How do I configure backtesting splits?",
+  "top_k": 5,
+  "similarity_threshold": 0.7
+}
+
+### List all indexed sources
+GET http://localhost:8123/rag/sources
+
+### Re-index after documentation update
+POST http://localhost:8123/rag/index
+Content-Type: application/json
+
+{
+  "source_type": "markdown",
+  "source_path": "README.md",
+  "metadata": {"category": "overview"}
+}
+```
+
+---
+
+## CONFIGURATION (Settings)
+
+```python
+# app/core/config.py additions
+
+# RAG Embedding Configuration
+rag_embedding_model: str = "text-embedding-3-small"
+rag_embedding_dimension: int = 1536
+rag_embedding_batch_size: int = 100
+
+# RAG Chunking Configuration
+rag_chunk_size: int = 512  # tokens
+rag_chunk_overlap: int = 50  # tokens
+rag_min_chunk_size: int = 100  # minimum tokens per chunk
+
+# RAG Retrieval Configuration
+rag_top_k: int = 5
+rag_similarity_threshold: float = 0.7
+rag_max_context_tokens: int = 4000
+
+# RAG Index Configuration
+rag_index_type: Literal["hnsw", "ivfflat"] = "hnsw"
+rag_hnsw_m: int = 16
+rag_hnsw_ef_construction: int = 64
+```
+
+---
+
+## SUCCESS CRITERIA
+
+- [ ] pgvector extension enabled and tested in docker-compose
+- [ ] Markdown chunker respects heading boundaries
+- [ ] OpenAPI chunker produces one chunk per endpoint
+- [ ] Embeddings generated via async batch processing
+- [ ] Retrieval returns top-k with normalized relevance scores
+- [ ] Re-indexing is idempotent (content_hash change detection)
+- [ ] Unique constraint prevents duplicate chunks
+- [ ] HNSW index provides sub-100ms search latency
+- [ ] Integration tests with real embeddings (mocked in unit tests)
+- [ ] Structured logging for all index/retrieve operations
+
+---
+
+## CROSS-MODULE INTEGRATION
+
+| Direction | Module | Integration Point |
+|-----------|--------|-------------------|
+| **→ Agentic Layer** | INITIAL-10 | Provides `retrieve_context` tool for RAG Assistant agent |
+| **→ Dashboard** | INITIAL-11 | Sources list displayed in Admin panel |
+| **← Registry** | Phase 6 | Run reports indexed as knowledge sources |
+| **← Jobs** | Phase 7 | Indexing operations tracked as jobs |
+
+---
+
+## DOCUMENTATION LINKS
+
+- [pgvector GitHub](https://github.com/pgvector/pgvector)
+- [pgvector Tutorial (DataCamp)](https://www.datacamp.com/tutorial/pgvector-tutorial)
+- [OpenAI Embeddings Guide](https://platform.openai.com/docs/guides/embeddings)
+- [OpenAI API Reference](https://platform.openai.com/docs/api-reference/embeddings)
+- [Neon pgvector Docs](https://neon.com/docs/extensions/pgvector)
+- [HNSW Algorithm Paper](https://arxiv.org/abs/1603.09320)
+
+---
+
+## OTHER CONSIDERATIONS
+
+- **Evidence-Grounded**: Retrieval returns raw chunks only; no answer generation in this layer
+- **Idempotency**: Content hash comparison prevents unnecessary re-embedding
+- **Rate Limiting**: Respect OpenAI API rate limits during batch embedding
+- **Cost Tracking**: Log token counts for embedding cost monitoring
+- **Dimension Flexibility**: Support for other embedding models (e.g., 3072-dim text-embedding-3-large)
diff --git a/docs/DAILY-FLOW.md b/docs/DAILY-FLOW.md
index 66521dbc..7ecba511 100644
--- a/docs/DAILY-FLOW.md
+++ b/docs/DAILY-FLOW.md
@@ -162,21 +162,29 @@ gh run watch <RUN_ID>
 
 ---
 
-## Következő Phase: Forecasting (PRP-5)
+## Következő Phases (INITIAL-9 → INITIAL-11)
 
-```bash
-# Kezdés
-git checkout dev
-git pull origin dev
-git checkout -b feat/prp-5-forecasting
+A projekt a moduláris három-fázisú roadmap szerint halad:
 
-# Fejlesztés...
-# PR → dev → main → release → phase-4 snapshot
 ```
+Phase 8: RAG Knowledge Base ("The Memory")
+         ↓
+Phase 9: Agentic Layer ("The Brain")
+         ↓
+Phase 10: ForecastLab Dashboard ("The Face")
+```
+
+### Phase 8: RAG Knowledge Base (INITIAL-9)
+- pgvector embeddings + semantic retrieval
+- Markdown/OpenAPI chunking
+- POST /rag/index, POST /rag/retrieve endpoints
+
+### Phase 9: Agentic Layer (INITIAL-10)
+- PydanticAI agents (Experiment Orchestrator, RAG Assistant)
+- Tool orchestration + structured outputs
+- WebSocket streaming
 
-### PRP-5 Scope (INITIAL-5)
-- Model zoo: naive, seasonal naive, moving average
-- Unified model interface: fit/predict, serialize/load
-- Scikit-learn Pipeline: Scaling → Encoding → Regressor
-- Joblib-based ModelBundle persistence
-- Multi-horizon recursive forecasting
+### Phase 10: Dashboard (INITIAL-11)
+- React 19 + Vite + shadcn/ui
+- Data tables + time series charts
+- Agent chat interface
diff --git a/docs/PHASE-index.md b/docs/PHASE-index.md
index 280fa43b..b655d0c9 100644
--- a/docs/PHASE-index.md
+++ b/docs/PHASE-index.md
@@ -17,8 +17,8 @@ This document indexes all implementation phases of the ForecastLabAI project.
 | 6 | Model Registry | Completed | PRP-7 | [6-MODEL_REGISTRY.md](./PHASE/6-MODEL_REGISTRY.md) |
 | 7 | Serving Layer | Completed | PRP-8 | [7-SERVING_LAYER.md](./PHASE/7-SERVING_LAYER.md) |
 | 8 | RAG Knowledge Base | Pending | PRP-9 | - |
-| 9 | Dashboard | Pending | PRP-10 | - |
-| 10 | Agentic Layer | Pending | - | - |
+| 9 | Agentic Layer | Pending | PRP-10 | - |
+| 10 | ForecastLab Dashboard | Pending | PRP-11 | - |
 
 ---
 
@@ -277,14 +277,29 @@ jobs_retention_days: int = 30
 
 ## Pending Phases
 
-### Phase 8: RAG Knowledge Base
-pgvector embeddings with evidence-grounded answers and citations.
-
-### Phase 9: Dashboard
-React + Vite + shadcn/ui frontend with data tables and visualizations.
-
-### Phase 10: Agentic Layer (Optional)
-PydanticAI integration for experiment orchestration.
+### Phase 8: RAG Knowledge Base ("The Memory")
+Vector storage, document ingestion, and semantic retrieval infrastructure.
+- PostgreSQL 16 + pgvector extension
+- OpenAI text-embedding-3-small embeddings (1536 dimensions)
+- Markdown-aware and OpenAPI endpoint-aware chunking
+- HNSW index for cosine similarity search
+- Endpoints: POST /rag/index, POST /rag/retrieve, GET /rag/sources, DELETE /rag/sources/{id}
+
+### Phase 9: Agentic Layer ("The Brain")
+Autonomous decision-making, tool orchestration, and structured outputs using PydanticAI.
+- Experiment Orchestrator Agent (backtest → compare → deploy workflow)
+- RAG Assistant Agent (query → retrieve → answer with citations)
+- Human-in-the-loop approval for sensitive operations
+- WebSocket streaming for real-time responses
+- Endpoints: POST /agents/experiment/run, POST /agents/rag/query, WS /agents/stream
+
+### Phase 10: ForecastLab Dashboard ("The Face")
+User interface, data visualization, and agent interaction.
+- React 19 + Vite + shadcn/ui + Tailwind CSS 4
+- TanStack Table for server-side data grids
+- TanStack Query for data fetching and caching
+- Recharts for time series visualization
+- Agent chat interface with streaming and citations
 
 ---
 

From e6a8fdaac564636bc1d778e9651b7dc128bcd9e4 Mon Sep 17 00:00:00 2001
From: "Gabe@w7dev" <gabor@w7-7.net>
Date: Sun, 1 Feb 2026 10:55:42 +0000
Subject: [PATCH 2/3] docs(prp): add PRP-9 RAG Knowledge Base implementation
 plan
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Comprehensive PRP for INITIAL-9 RAG Knowledge Base feature:

- pgvector + SQLAlchemy 2.0 integration patterns
- Markdown-aware and OpenAPI-aware chunking
- Async OpenAI embeddings with batch processing
- HNSW index for cosine similarity search
- 15 ordered implementation tasks
- 5-level validation loop (syntax → types → unit → integration → smoke)
- Full ORM models and Pydantic schemas
- Known gotchas and anti-patterns documented

Confidence score: 8.5/10

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 PRPs/PRP-9-rag-knowledge-base.md | 776 +++++++++++++++++++++++++++++++
 1 file changed, 776 insertions(+)
 create mode 100644 PRPs/PRP-9-rag-knowledge-base.md

diff --git a/PRPs/PRP-9-rag-knowledge-base.md b/PRPs/PRP-9-rag-knowledge-base.md
new file mode 100644
index 00000000..011ef88b
--- /dev/null
+++ b/PRPs/PRP-9-rag-knowledge-base.md
@@ -0,0 +1,776 @@
+# PRP-9: RAG Knowledge Base ("The Memory")
+
+**Feature**: INITIAL-9.md — RAG Knowledge Base
+**Status**: Ready for Implementation
+**Confidence Score**: 8.5/10
+
+---
+
+## Goal
+
+Build the RAG Knowledge Base layer providing:
+1. **Document ingestion** with markdown-aware and OpenAPI-aware chunking
+2. **Vector storage** using PostgreSQL + pgvector for embeddings
+3. **Semantic retrieval** with configurable top-k and similarity thresholds
+4. **Idempotent re-indexing** via content hash comparison
+
+This is the foundational "Memory" layer that INITIAL-10 (Agentic Layer) will consume via the `retrieve_context` tool.
+
+---
+
+## Why
+
+- **Agent-Ready**: Provides `retrieve_context` tool for INITIAL-10 RAG Assistant
+- **Evidence-Grounded**: Returns raw chunks with citations (no hallucination)
+- **Cost-Effective**: Uses existing PostgreSQL (no new infrastructure)
+- **Portfolio Value**: Demonstrates full-stack RAG implementation
+
+---
+
+## What
+
+### Endpoints
+
+| Method | Path | Description |
+|--------|------|-------------|
+| `POST` | `/rag/index` | Index document (markdown/openapi) |
+| `POST` | `/rag/retrieve` | Semantic search with filters |
+| `GET` | `/rag/sources` | List indexed sources |
+| `DELETE` | `/rag/sources/{source_id}` | Remove source and chunks |
+
+### Success Criteria
+
+- [ ] pgvector extension enabled via migration
+- [ ] Markdown chunker respects heading boundaries
+- [ ] OpenAPI chunker produces one chunk per endpoint
+- [ ] Async batch embedding with OpenAI API
+- [ ] HNSW index for sub-100ms retrieval
+- [ ] Idempotent re-indexing (content_hash change detection)
+- [ ] 80+ unit tests, 15+ integration tests
+- [ ] All validation gates green (ruff, mypy, pyright, pytest)
+
+---
+
+## All Needed Context
+
+### Documentation & References
+
+```yaml
+# CRITICAL - pgvector SQLAlchemy Integration
+- url: https://github.com/pgvector/pgvector-python
+  why: "Official pgvector Python library - Vector column, HNSW index, cosine_distance"
+
+- url: https://github.com/pgvector/pgvector-python/blob/master/README.md
+  why: "SQLAlchemy 2.0 patterns, Index creation with postgresql_ops"
+
+# pgvector Indexing
+- url: https://neon.com/blog/understanding-vector-search-and-hnsw-index-with-pgvector
+  why: "HNSW vs IVFFlat tradeoffs, index tuning parameters"
+
+# OpenAI Embeddings
+- url: https://platform.openai.com/docs/api-reference/embeddings
+  why: "Embeddings API reference - batch processing, input limits (8192 tokens)"
+
+- url: https://platform.openai.com/docs/guides/embeddings
+  why: "Best practices, token counting with tiktoken cl100k_base"
+
+# Markdown Chunking
+- url: https://python.langchain.com/docs/how_to/markdown_header_metadata_splitter/
+  why: "MarkdownHeaderTextSplitter pattern for heading-aware splitting"
+
+# Codebase Patterns (CRITICAL)
+- file: app/features/registry/models.py
+  why: "ORM pattern with JSONB, TimestampMixin, Index creation"
+
+- file: app/features/registry/schemas.py
+  why: "Pydantic v2 patterns - ConfigDict, field_validator, from_attributes"
+
+- file: app/features/registry/routes.py
+  why: "FastAPI patterns - APIRouter, response_model, HTTPException"
+
+- file: app/features/registry/service.py
+  why: "Async service pattern - get_settings(), structured logging"
+
+- file: app/features/registry/tests/conftest.py
+  why: "Test fixtures - db_session, client, cleanup patterns"
+
+# ADR
+- file: docs/ADR/ADR-0003-vector-storage-pgvector-in-postgres.md
+  why: "Architectural decision for pgvector over dedicated vector DB"
+```
+
+### Current Codebase Tree (Relevant Parts)
+
+```
+app/
+├── core/
+│   ├── config.py          # Settings singleton - ADD RAG settings here
+│   ├── database.py         # Base, get_db, get_engine
+│   ├── logging.py          # get_logger, structured logging
+│   └── exceptions.py       # ForecastLabError base class
+├── shared/
+│   └── models.py           # TimestampMixin
+├── features/
+│   ├── registry/           # REFERENCE: Follow this pattern exactly
+│   │   ├── models.py
+│   │   ├── schemas.py
+│   │   ├── routes.py
+│   │   ├── service.py
+│   │   ├── storage.py
+│   │   └── tests/
+│   └── rag/                # NEW: Create this vertical slice
+├── main.py                  # Include rag router here
+docker-compose.yml           # Already uses pgvector/pgvector:pg16
+alembic/versions/            # Add migration for pgvector extension + tables
+```
+
+### Desired Codebase Tree (Files to Create)
+
+```
+app/features/rag/
+├── __init__.py              # Export router
+├── models.py                # DocumentSource, DocumentChunk ORM models
+├── schemas.py               # IndexRequest/Response, RetrieveRequest/Response, etc.
+├── routes.py                # FastAPI router with /rag/* endpoints
+├── service.py               # RAGService - indexing and retrieval logic
+├── chunkers.py              # MarkdownChunker, OpenAPIChunker classes
+├── embeddings.py            # EmbeddingService - async OpenAI API calls
+├── tests/
+│   ├── __init__.py
+│   ├── conftest.py          # db_session, client fixtures
+│   ├── test_schemas.py      # Schema validation tests
+│   ├── test_chunkers.py     # Chunking logic tests (unit, no DB)
+│   ├── test_embeddings.py   # Embedding tests with mocked API
+│   ├── test_service.py      # Service tests (unit + integration)
+│   └── test_routes.py       # Route integration tests
+
+alembic/versions/
+└── xxxx_create_rag_tables.py  # Migration with CREATE EXTENSION vector
+
+examples/rag/
+├── index_docs.py            # Example: index docs/ directory
+└── query.http               # HTTP client examples
+```
+
+### Known Gotchas & Library Quirks
+
+```python
+# CRITICAL: pgvector SQLAlchemy requires explicit import
+from pgvector.sqlalchemy import Vector  # NOT from sqlalchemy
+
+# CRITICAL: HNSW index requires vector_cosine_ops for cosine distance
+Index(
+    "ix_embedding_hnsw",
+    DocumentChunk.embedding,
+    postgresql_using="hnsw",
+    postgresql_with={"m": 16, "ef_construction": 64},
+    postgresql_ops={"embedding": "vector_cosine_ops"},  # MUST match query distance
+)
+
+# CRITICAL: Cosine distance query uses cosine_distance method
+from pgvector.sqlalchemy import Vector
+stmt = select(DocumentChunk).order_by(
+    DocumentChunk.embedding.cosine_distance(query_embedding)  # NOT <=> operator
+).limit(top_k)
+
+# CRITICAL: OpenAI embeddings input limit is 8192 tokens per text
+# Use tiktoken to count tokens before sending to API
+import tiktoken
+enc = tiktoken.get_encoding("cl100k_base")
+tokens = enc.encode(text)
+if len(tokens) > 8191:
+    # Truncate or split
+
+# CRITICAL: OpenAI API returns embeddings in same order as input
+# But batch requests should be <= 2048 inputs per call
+
+# CRITICAL: Pydantic v2 uses ConfigDict, not class Config
+from pydantic import BaseModel, ConfigDict
+class MySchema(BaseModel):
+    model_config = ConfigDict(from_attributes=True, extra="forbid")
+
+# CRITICAL: SQLAlchemy 2.0 uses Mapped[] and mapped_column()
+from sqlalchemy.orm import Mapped, mapped_column
+embedding = mapped_column(Vector(1536))  # Vector column
+
+# CRITICAL: Alembic migration needs op.execute for CREATE EXTENSION
+op.execute("CREATE EXTENSION IF NOT EXISTS vector")
+```
+
+---
+
+## Implementation Blueprint
+
+### Data Models
+
+#### ORM Models (models.py)
+
+```python
+"""RAG knowledge base ORM models."""
+from __future__ import annotations
+import uuid
+from datetime import datetime
+from typing import Any
+from sqlalchemy import (
+    DateTime, Index, Integer, String, Text, UniqueConstraint, ForeignKey,
+)
+from sqlalchemy.dialects.postgresql import JSONB
+from sqlalchemy.orm import Mapped, mapped_column, relationship
+from pgvector.sqlalchemy import Vector
+from app.core.database import Base
+from app.shared.models import TimestampMixin
+
+
+class DocumentSource(TimestampMixin, Base):
+    """Registered document source for indexing."""
+    __tablename__ = "document_source"
+
+    id: Mapped[int] = mapped_column(Integer, primary_key=True)
+    source_id: Mapped[str] = mapped_column(String(32), unique=True, index=True)
+    source_type: Mapped[str] = mapped_column(String(50), index=True)  # markdown, openapi
+    source_path: Mapped[str] = mapped_column(Text, nullable=False)
+    content_hash: Mapped[str] = mapped_column(String(64), nullable=False)  # SHA-256
+    metadata_: Mapped[dict[str, Any] | None] = mapped_column("metadata", JSONB, nullable=True)
+    indexed_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), nullable=False)
+
+    # Relationship
+    chunks: Mapped[list[DocumentChunk]] = relationship(
+        back_populates="source", cascade="all, delete-orphan"
+    )
+
+    __table_args__ = (
+        UniqueConstraint("source_type", "source_path", name="uq_source_type_path"),
+    )
+
+
+class DocumentChunk(TimestampMixin, Base):
+    """Indexed document chunk with embedding."""
+    __tablename__ = "document_chunk"
+
+    id: Mapped[int] = mapped_column(Integer, primary_key=True)
+    chunk_id: Mapped[str] = mapped_column(String(32), unique=True, index=True)
+    source_id: Mapped[int] = mapped_column(
+        Integer, ForeignKey("document_source.id", ondelete="CASCADE"), index=True
+    )
+    chunk_index: Mapped[int] = mapped_column(Integer, nullable=False)
+    content: Mapped[str] = mapped_column(Text, nullable=False)
+    embedding = mapped_column(Vector(1536), nullable=True)  # Dimension from settings
+    token_count: Mapped[int] = mapped_column(Integer, nullable=False)
+    metadata_: Mapped[dict[str, Any] | None] = mapped_column("metadata", JSONB, nullable=True)
+
+    # Relationship
+    source: Mapped[DocumentSource] = relationship(back_populates="chunks")
+
+    __table_args__ = (
+        UniqueConstraint("source_id", "chunk_index", name="uq_source_chunk_index"),
+        Index(
+            "ix_chunk_embedding_hnsw",
+            "embedding",
+            postgresql_using="hnsw",
+            postgresql_with={"m": 16, "ef_construction": 64},
+            postgresql_ops={"embedding": "vector_cosine_ops"},
+        ),
+        Index("ix_chunk_metadata_gin", "metadata", postgresql_using="gin"),
+    )
+```
+
+#### Pydantic Schemas (schemas.py)
+
+```python
+"""Pydantic schemas for RAG API contracts."""
+from datetime import datetime
+from typing import Any, Literal
+from pydantic import BaseModel, ConfigDict, Field, field_validator
+
+
+class IndexRequest(BaseModel):
+    """Request to index a document."""
+    model_config = ConfigDict(extra="forbid")
+
+    source_type: Literal["markdown", "openapi"] = Field(
+        ..., description="Type of document to index"
+    )
+    source_path: str = Field(..., min_length=1, max_length=500)
+    content: str | None = Field(None, description="Optional content override")
+    metadata: dict[str, Any] | None = Field(None, description="Custom metadata")
+
+
+class IndexResponse(BaseModel):
+    """Response from indexing operation."""
+    model_config = ConfigDict(from_attributes=True)
+
+    source_id: str
+    source_path: str
+    chunks_created: int
+    tokens_processed: int
+    duration_ms: float
+    status: Literal["indexed", "updated", "unchanged"]
+
+
+class RetrieveRequest(BaseModel):
+    """Request for semantic search."""
+    model_config = ConfigDict(extra="forbid")
+
+    query: str = Field(..., min_length=1, max_length=2000)
+    top_k: int = Field(default=5, ge=1, le=50)
+    similarity_threshold: float = Field(default=0.7, ge=0.0, le=1.0)
+    filters: dict[str, Any] | None = Field(None, description="Metadata filters")
+
+
+class ChunkResult(BaseModel):
+    """Single chunk in retrieval results."""
+    model_config = ConfigDict(from_attributes=True)
+
+    chunk_id: str
+    source_id: str
+    source_path: str
+    source_type: str
+    content: str
+    relevance_score: float
+    metadata: dict[str, Any] | None = None
+
+
+class RetrieveResponse(BaseModel):
+    """Response from retrieval operation."""
+    results: list[ChunkResult]
+    query_embedding_time_ms: float
+    search_time_ms: float
+    total_chunks_searched: int
+
+
+class SourceResponse(BaseModel):
+    """Source details response."""
+    model_config = ConfigDict(from_attributes=True)
+
+    source_id: str
+    source_type: str
+    source_path: str
+    chunk_count: int
+    content_hash: str
+    indexed_at: datetime
+    metadata: dict[str, Any] | None = None
+
+
+class SourceListResponse(BaseModel):
+    """List of indexed sources."""
+    sources: list[SourceResponse]
+    total_sources: int
+    total_chunks: int
+
+
+class DeleteResponse(BaseModel):
+    """Response from delete operation."""
+    source_id: str
+    chunks_deleted: int
+    status: Literal["deleted"]
+```
+
+---
+
+## Task List
+
+### Task 1: Add Dependencies to pyproject.toml
+
+```yaml
+MODIFY: pyproject.toml
+ADD to dependencies:
+  - "pgvector>=0.3.0"      # pgvector SQLAlchemy support
+  - "openai>=1.40.0"       # OpenAI API client (async)
+  - "tiktoken>=0.7.0"      # Token counting for chunk size
+  - "httpx>=0.28.0"        # Already in dev, may need in main for async HTTP
+```
+
+### Task 2: Add RAG Settings to config.py
+
+```yaml
+MODIFY: app/core/config.py
+ADD after "jobs_retention_days" (~line 65):
+  # RAG Embedding Configuration
+  rag_embedding_model: str = "text-embedding-3-small"
+  rag_embedding_dimension: int = 1536
+  rag_embedding_batch_size: int = 100
+  openai_api_key: str = ""  # Required for embeddings
+
+  # RAG Chunking Configuration
+  rag_chunk_size: int = 512  # tokens
+  rag_chunk_overlap: int = 50  # tokens
+  rag_min_chunk_size: int = 100
+
+  # RAG Retrieval Configuration
+  rag_top_k: int = 5
+  rag_similarity_threshold: float = 0.7
+  rag_max_context_tokens: int = 4000
+
+  # RAG Index Configuration
+  rag_index_type: Literal["hnsw", "ivfflat"] = "hnsw"
+  rag_hnsw_m: int = 16
+  rag_hnsw_ef_construction: int = 64
+```
+
+### Task 3: Create Alembic Migration
+
+```yaml
+CREATE: alembic/versions/xxxx_create_rag_tables.py
+PATTERN: Follow app/features/registry migration pattern
+
+Pseudocode:
+def upgrade():
+    # Enable pgvector extension
+    op.execute("CREATE EXTENSION IF NOT EXISTS vector")
+
+    # Create document_source table
+    op.create_table("document_source", ...)
+
+    # Create document_chunk table with Vector column
+    op.create_table("document_chunk",
+        sa.Column("embedding", Vector(1536), nullable=True),
+        ...
+    )
+
+    # Create HNSW index
+    op.create_index(
+        "ix_chunk_embedding_hnsw",
+        "document_chunk",
+        ["embedding"],
+        postgresql_using="hnsw",
+        postgresql_with={"m": 16, "ef_construction": 64},
+        postgresql_ops={"embedding": "vector_cosine_ops"},
+    )
+```
+
+### Task 4: Create ORM Models
+
+```yaml
+CREATE: app/features/rag/models.py
+MIRROR: app/features/registry/models.py pattern
+CRITICAL:
+  - Use pgvector.sqlalchemy.Vector for embedding column
+  - Add HNSW index in __table_args__
+  - Use TimestampMixin
+  - Cascade delete from source to chunks
+```
+
+### Task 5: Create Pydantic Schemas
+
+```yaml
+CREATE: app/features/rag/schemas.py
+MIRROR: app/features/registry/schemas.py pattern
+INCLUDE:
+  - IndexRequest, IndexResponse
+  - RetrieveRequest, RetrieveResponse, ChunkResult
+  - SourceResponse, SourceListResponse
+  - DeleteResponse
+```
+
+### Task 6: Create Chunker Classes
+
+```yaml
+CREATE: app/features/rag/chunkers.py
+
+Classes:
+  BaseChunker (ABC):
+    - chunk(content: str) -> list[ChunkData]
+
+  MarkdownChunker(BaseChunker):
+    - Split on heading boundaries (# ## ###)
+    - Respect chunk_size and chunk_overlap from settings
+    - Extract heading hierarchy for metadata
+    - Use tiktoken cl100k_base for token counting
+
+  OpenAPIChunker(BaseChunker):
+    - Parse OpenAPI JSON/YAML
+    - One chunk per endpoint (path + method)
+    - Include operation summary, description, parameters
+
+CRITICAL:
+  - Use tiktoken for token counting (cl100k_base encoding)
+  - Never exceed 8191 tokens per chunk (OpenAI limit)
+```
+
+### Task 7: Create Embedding Service
+
+```yaml
+CREATE: app/features/rag/embeddings.py
+
+Class EmbeddingService:
+  __init__(self):
+    - Load settings (api_key, model, dimension, batch_size)
+    - Initialize AsyncOpenAI client
+
+  async def embed_texts(self, texts: list[str]) -> list[list[float]]:
+    - Batch texts into groups of batch_size
+    - Call OpenAI embeddings API for each batch
+    - Handle rate limits with exponential backoff
+    - Return embeddings in same order as input
+
+  async def embed_query(self, query: str) -> list[float]:
+    - Single text embedding for retrieval queries
+
+CRITICAL:
+  - Use openai.AsyncOpenAI for async calls
+  - Validate token count before API call
+  - Log token usage for cost tracking
+```
+
+### Task 8: Create RAG Service
+
+```yaml
+CREATE: app/features/rag/service.py
+MIRROR: app/features/registry/service.py pattern
+
+Class RAGService:
+  async def index_document(self, db, request: IndexRequest) -> IndexResponse:
+    - Read content from source_path (or use provided content)
+    - Compute SHA-256 content hash
+    - Check if source exists with same hash (skip if unchanged)
+    - Chunk content using appropriate chunker
+    - Generate embeddings for all chunks
+    - Upsert source record
+    - Delete old chunks, insert new chunks
+    - Return IndexResponse with stats
+
+  async def retrieve(self, db, request: RetrieveRequest) -> RetrieveResponse:
+    - Generate query embedding
+    - Build pgvector similarity query with cosine_distance
+    - Apply metadata filters if provided
+    - Execute query, compute relevance scores
+    - Return top-k results above threshold
+
+  async def list_sources(self, db) -> SourceListResponse:
+    - Query all sources with chunk counts
+    - Return paginated list
+
+  async def delete_source(self, db, source_id: str) -> DeleteResponse:
+    - Find source by source_id
+    - Delete (cascades to chunks)
+    - Return delete count
+
+CRITICAL:
+  - Use cosine_distance for similarity (NOT l2_distance)
+  - Relevance score = 1 - cosine_distance (normalized to 0-1)
+  - Handle source not found with 404
+```
+
+### Task 9: Create FastAPI Routes
+
+```yaml
+CREATE: app/features/rag/routes.py
+MIRROR: app/features/registry/routes.py pattern
+
+Routes:
+  POST /rag/index -> IndexResponse (201 CREATED)
+  POST /rag/retrieve -> RetrieveResponse (200 OK)
+  GET /rag/sources -> SourceListResponse (200 OK)
+  DELETE /rag/sources/{source_id} -> DeleteResponse (200 OK)
+
+CRITICAL:
+  - Use structured logging with rag.* event prefix
+  - Handle OpenAI API errors gracefully
+  - Validate source_id format
+```
+
+### Task 10: Register Router in main.py
+
+```yaml
+MODIFY: app/main.py
+ADD import: from app.features.rag.routes import router as rag_router
+ADD router: app.include_router(rag_router)
+```
+
+### Task 11: Create Test Fixtures
+
+```yaml
+CREATE: app/features/rag/tests/conftest.py
+MIRROR: app/features/registry/tests/conftest.py
+
+Fixtures:
+  - db_session: Async session with cleanup (delete test-* sources)
+  - client: AsyncClient with db override
+  - sample_markdown_content: Test markdown with headings
+  - sample_openapi_content: Test OpenAPI spec
+  - mock_embedding_service: Mocked EmbeddingService for unit tests
+```
+
+### Task 12: Create Unit Tests
+
+```yaml
+CREATE: app/features/rag/tests/test_schemas.py
+  - Test IndexRequest validation
+  - Test RetrieveRequest validation (query length, threshold bounds)
+
+CREATE: app/features/rag/tests/test_chunkers.py
+  - Test MarkdownChunker respects heading boundaries
+  - Test MarkdownChunker respects chunk_size
+  - Test MarkdownChunker extracts heading metadata
+  - Test OpenAPIChunker creates one chunk per endpoint
+  - Test chunk token counts are within limits
+
+CREATE: app/features/rag/tests/test_embeddings.py
+  - Test embed_texts batching logic
+  - Test embed_query returns correct dimension
+  - Mock OpenAI API responses
+
+CREATE: app/features/rag/tests/test_service.py (unit)
+  - Test content hash computation
+  - Test idempotent re-indexing logic
+  - Test relevance score normalization
+```
+
+### Task 13: Create Integration Tests
+
+```yaml
+CREATE: app/features/rag/tests/test_routes.py
+@pytest.mark.integration tests:
+  - test_index_markdown_creates_chunks
+  - test_index_same_content_returns_unchanged
+  - test_index_updated_content_re_indexes
+  - test_retrieve_returns_relevant_chunks
+  - test_retrieve_respects_threshold
+  - test_list_sources_returns_all
+  - test_delete_source_removes_chunks
+  - test_delete_nonexistent_returns_404
+```
+
+### Task 14: Create Examples
+
+```yaml
+CREATE: examples/rag/index_docs.py
+  - Script to index docs/ directory
+
+CREATE: examples/rag/query.http
+  - HTTP client examples for all endpoints
+```
+
+### Task 15: Update .env.example
+
+```yaml
+MODIFY: .env.example
+ADD:
+  # RAG Configuration
+  OPENAI_API_KEY=sk-...
+  RAG_EMBEDDING_MODEL=text-embedding-3-small
+  RAG_CHUNK_SIZE=512
+  RAG_TOP_K=5
+```
+
+---
+
+## Validation Loop
+
+### Level 1: Syntax & Style
+
+```bash
+# Run FIRST - fix any errors before proceeding
+uv run ruff check app/features/rag/ --fix
+uv run ruff format app/features/rag/
+
+# Expected: No errors
+```
+
+### Level 2: Type Checking
+
+```bash
+# MUST be green
+uv run mypy app/features/rag/
+uv run pyright app/features/rag/
+
+# Expected: 0 errors on both
+```
+
+### Level 3: Unit Tests
+
+```bash
+# No database required
+uv run pytest app/features/rag/tests/ -v -m "not integration"
+
+# Expected: All pass
+# If failing: Read error, fix code, re-run
+```
+
+### Level 4: Integration Tests
+
+```bash
+# Requires PostgreSQL running
+docker-compose up -d
+
+# Run migrations
+uv run alembic upgrade head
+
+# Run integration tests
+uv run pytest app/features/rag/tests/ -v -m integration
+
+# Expected: All pass
+```
+
+### Level 5: Manual Smoke Test
+
+```bash
+# Start API
+uv run uvicorn app.main:app --reload --port 8123
+
+# Index a document
+curl -X POST http://localhost:8123/rag/index \
+  -H "Content-Type: application/json" \
+  -d '{"source_type": "markdown", "source_path": "README.md"}'
+
+# Expected: {"source_id": "...", "chunks_created": N, ...}
+
+# Retrieve
+curl -X POST http://localhost:8123/rag/retrieve \
+  -H "Content-Type: application/json" \
+  -d '{"query": "What is ForecastLabAI?", "top_k": 3}'
+
+# Expected: {"results": [...], ...}
+
+# List sources
+curl http://localhost:8123/rag/sources
+
+# Delete source
+curl -X DELETE http://localhost:8123/rag/sources/{source_id}
+```
+
+---
+
+## Final Validation Checklist
+
+- [ ] All tests pass: `uv run pytest app/features/rag/tests/ -v`
+- [ ] No linting errors: `uv run ruff check app/features/rag/`
+- [ ] No type errors: `uv run mypy app/features/rag/ && uv run pyright app/features/rag/`
+- [ ] Migration applies cleanly: `uv run alembic upgrade head`
+- [ ] Manual smoke test successful
+- [ ] Structured logging events follow `rag.*` prefix
+- [ ] Content hash prevents duplicate embeddings
+- [ ] HNSW index used for similarity queries
+
+---
+
+## Anti-Patterns to Avoid
+
+- ❌ Don't use `l2_distance` when you want cosine similarity
+- ❌ Don't forget to enable pgvector extension in migration
+- ❌ Don't exceed 8191 tokens per embedding input
+- ❌ Don't use sync OpenAI client - use AsyncOpenAI
+- ❌ Don't hardcode embedding dimensions - use settings
+- ❌ Don't catch all exceptions - be specific
+- ❌ Don't skip content hash comparison (wastes API calls)
+- ❌ Don't create new patterns when registry patterns work
+
+---
+
+## Confidence Score: 8.5/10
+
+**Strengths:**
+- Docker already has pgvector image
+- Clear patterns from registry module to follow
+- Comprehensive documentation available
+- ADR decision already made
+
+**Risks:**
+- OpenAI API rate limits during bulk indexing
+- HNSW index creation on large datasets may be slow
+- tiktoken token counting edge cases
+
+**Mitigations:**
+- Implement exponential backoff for API calls
+- Create index after initial data load
+- Extensive unit tests for chunking edge cases

From 16d6b5a7d2cead156672c1dfff1525cf88f3947a Mon Sep 17 00:00:00 2001
From: "Gabe@w7dev" <gabor@w7-7.net>
Date: Sun, 1 Feb 2026 11:01:58 +0000
Subject: [PATCH 3/3] docs(prp): add PRP-10 Agentic Layer implementation plan
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Comprehensive PRP for INITIAL-10 Agentic Layer feature:

- PydanticAI agent framework integration
- Experiment Orchestrator Agent (backtest → compare → deploy)
- RAG Assistant Agent (query → retrieve → answer with citations)
- Human-in-the-loop approval workflow for sensitive actions
- WebSocket streaming for real-time token delivery
- Session persistence with JSONB message history
- 17 ordered implementation tasks
- Tool definitions for registry, backtesting, forecasting, RAG
- Full Pydantic schemas and ORM models

Confidence score: 7.5/10

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 PRPs/PRP-10-agentic-layer.md | 920 +++++++++++++++++++++++++++++++++++
 1 file changed, 920 insertions(+)
 create mode 100644 PRPs/PRP-10-agentic-layer.md

diff --git a/PRPs/PRP-10-agentic-layer.md b/PRPs/PRP-10-agentic-layer.md
new file mode 100644
index 00000000..6cade0dc
--- /dev/null
+++ b/PRPs/PRP-10-agentic-layer.md
@@ -0,0 +1,920 @@
+# PRP-10: Agentic Layer ("The Brain")
+
+**Feature**: INITIAL-10.md — Agentic Layer
+**Status**: Ready for Implementation
+**Confidence Score**: 7.5/10
+
+---
+
+## Goal
+
+Build the Agentic Layer using PydanticAI providing:
+1. **Experiment Orchestrator Agent** - Autonomous model experimentation workflow
+2. **RAG Assistant Agent** - Evidence-grounded Q&A with citations
+3. **Human-in-the-Loop Approval** - Blocking sensitive actions until approved
+4. **WebSocket Streaming** - Real-time token delivery to clients
+5. **Session Management** - Persistent state across multi-turn conversations
+
+This is the "Brain" layer that orchestrates tools from INITIAL-9 (RAG), Phase 5 (Backtesting), and Phase 6 (Registry).
+
+---
+
+## Why
+
+- **Autonomous Experimentation**: Agent runs backtests, compares results, deploys winners
+- **Evidence-Grounded Answers**: RAG-powered Q&A prevents hallucination
+- **Safety Controls**: Human approval for deployment actions
+- **Real-Time UX**: Streaming responses for responsive chat interface
+- **Portfolio Value**: Demonstrates modern AI agent architecture
+
+---
+
+## What
+
+### Endpoints
+
+| Method | Path | Description |
+|--------|------|-------------|
+| `POST` | `/agents/experiment/run` | Execute experiment workflow |
+| `POST` | `/agents/experiment/approve` | Approve pending action |
+| `POST` | `/agents/rag/query` | Query with answer generation |
+| `GET` | `/agents/status/{session_id}` | Check session status |
+| `WS` | `/agents/stream` | WebSocket for streaming |
+
+### Success Criteria
+
+- [ ] Agents produce schema-enforced structured outputs
+- [ ] Tool calls logged with correlation IDs and timing
+- [ ] Human-in-the-loop blocks sensitive actions
+- [ ] WebSocket streaming delivers tokens in real-time
+- [ ] Session state persists across requests
+- [ ] Graceful LLM API failure handling with retries
+- [ ] 60+ unit tests with mocked LLM responses
+- [ ] 15+ integration tests (rate-limited real LLM calls)
+- [ ] All validation gates green
+
+---
+
+## All Needed Context
+
+### Documentation & References
+
+```yaml
+# CRITICAL - PydanticAI Documentation
+- url: https://ai.pydantic.dev/
+  why: "Official PydanticAI docs - main reference"
+
+- url: https://ai.pydantic.dev/agents/
+  why: "Agent constructor, result_type, system_prompt, run/run_stream methods"
+
+- url: https://ai.pydantic.dev/tools/
+  why: "@agent.tool decorator, RunContext, deps_type, tool parameters"
+
+- url: https://ai.pydantic.dev/output/
+  why: "AgentRunResult, StreamedRunResult, token usage tracking"
+
+- url: https://ai.pydantic.dev/examples/chat-app/
+  why: "FastAPI + streaming integration example"
+
+- url: https://github.com/pydantic/pydantic-ai
+  why: "Source code for edge cases"
+
+# Anthropic API (fallback reference)
+- url: https://docs.anthropic.com/en/api
+  why: "Claude model IDs, rate limits, error codes"
+
+# Codebase Patterns (CRITICAL)
+- file: app/features/registry/service.py
+  why: "Service pattern - __init__, get_settings(), structured logging"
+
+- file: app/features/jobs/service.py
+  why: "Job execution pattern - state machine, error handling, audit trail"
+
+- file: app/features/backtesting/service.py
+  why: "BacktestingService - the agent will call this via tools"
+
+- file: app/features/registry/routes.py
+  why: "Route patterns - APIRouter, response_model, HTTPException"
+
+- file: app/features/registry/tests/conftest.py
+  why: "Test fixtures - db_session, client, async patterns"
+
+# RAG Integration (INITIAL-9 dependency)
+- file: PRPs/PRP-9-rag-knowledge-base.md
+  why: "RAG layer the agent will consume via retrieve_context tool"
+```
+
+### Current Codebase Tree (Relevant Parts)
+
+```
+app/
+├── core/
+│   ├── config.py          # Settings - ADD agent settings
+│   ├── database.py         # get_db dependency
+│   ├── logging.py          # get_logger
+│   └── exceptions.py       # ForecastLabError base
+├── features/
+│   ├── backtesting/        # Agent tool: run_backtest
+│   ├── registry/           # Agent tools: list_runs, compare_runs, create_alias
+│   ├── forecasting/        # Agent tool: list_models
+│   ├── rag/                # INITIAL-9 - Agent tool: retrieve_context
+│   └── agents/             # NEW: Create this vertical slice
+├── main.py                  # Include agents router + WebSocket
+```
+
+### Desired Codebase Tree (Files to Create)
+
+```
+app/features/agents/
+├── __init__.py              # Export router
+├── models.py                # AgentSession ORM model
+├── schemas.py               # Request/Response Pydantic schemas
+├── routes.py                # REST endpoints
+├── websocket.py             # WebSocket endpoint handler
+├── service.py               # AgentService orchestration
+├── agents/
+│   ├── __init__.py
+│   ├── base.py              # Base agent configuration
+│   ├── experiment.py        # Experiment Orchestrator Agent
+│   └── rag_assistant.py     # RAG Assistant Agent
+├── tools/
+│   ├── __init__.py
+│   ├── registry_tools.py    # list_runs, compare_runs, create_alias
+│   ├── backtesting_tools.py # run_backtest
+│   ├── forecasting_tools.py # list_models
+│   └── rag_tools.py         # retrieve_context, format_citation
+├── deps.py                  # AgentDeps dataclass for dependency injection
+├── tests/
+│   ├── __init__.py
+│   ├── conftest.py          # Fixtures with mocked LLM
+│   ├── test_schemas.py
+│   ├── test_tools.py
+│   ├── test_agents.py
+│   ├── test_service.py
+│   └── test_routes.py
+
+alembic/versions/
+└── xxxx_create_agent_sessions_table.py
+
+examples/agents/
+├── experiment_demo.py
+├── rag_query.http
+└── websocket_client.py
+```
+
+### Known Gotchas & Library Quirks
+
+```python
+# CRITICAL: PydanticAI model identifier format
+# Use "anthropic:claude-sonnet-4-20250514" NOT "claude-sonnet-4-20250514"
+agent = Agent(model="anthropic:claude-sonnet-4-20250514")
+
+# CRITICAL: deps_type must match RunContext generic parameter
+agent = Agent(
+    model="anthropic:claude-sonnet-4-20250514",
+    deps_type=AgentDeps,  # Your dependency dataclass
+)
+
+@agent.tool
+def my_tool(ctx: RunContext[AgentDeps], param: str) -> str:
+    # ctx.deps is typed as AgentDeps
+    db = ctx.deps.db
+    ...
+
+# CRITICAL: Use @agent.tool for context access, @agent.tool_plain without
+@agent.tool_plain
+def roll_dice() -> str:
+    """No RunContext needed here."""
+    return str(random.randint(1, 6))
+
+# CRITICAL: output_type (not result_type) for structured outputs
+agent = Agent(
+    model="...",
+    output_type=ExperimentReport,  # NOT result_type
+)
+
+# CRITICAL: run() is async, run_sync() is sync wrapper
+result = await agent.run(prompt, deps=deps)  # Async
+result = agent.run_sync(prompt, deps=deps)   # Sync
+
+# CRITICAL: Streaming requires async context manager
+async with agent.run_stream(prompt, deps=deps) as result:
+    async for text in result.stream_text():
+        yield text
+
+# CRITICAL: Access token usage after run completes
+print(result.usage())  # RunUsage(input_tokens=X, output_tokens=Y)
+
+# CRITICAL: Message history for multi-turn
+result2 = await agent.run(
+    "follow-up question",
+    deps=deps,
+    message_history=result.messages,  # Previous messages
+)
+
+# CRITICAL: Tool docstrings become schema descriptions
+@agent.tool
+async def run_backtest(
+    ctx: RunContext[AgentDeps],
+    model_type: str,
+    config: dict[str, Any],
+) -> BacktestResult:
+    """Run a backtest for a model configuration.
+
+    Use this to evaluate model performance with time-series CV.
+    Returns per-fold and aggregated metrics (MAE, sMAPE, WAPE).
+
+    Args:
+        model_type: Type of model (naive, seasonal_naive, moving_average)
+        config: Model-specific configuration
+    """
+    ...
+
+# CRITICAL: FastAPI WebSocket pattern
+from fastapi import WebSocket, WebSocketDisconnect
+
+@router.websocket("/agents/stream")
+async def websocket_stream(websocket: WebSocket):
+    await websocket.accept()
+    try:
+        while True:
+            data = await websocket.receive_json()
+            # Process and stream response
+            async for chunk in stream_agent_response(data):
+                await websocket.send_json(chunk)
+    except WebSocketDisconnect:
+        pass
+
+# CRITICAL: PydanticAI retry mechanism
+from pydantic_ai import ModelRetry
+
+@agent.tool
+async def risky_tool(ctx: RunContext[AgentDeps]) -> str:
+    try:
+        return await external_api()
+    except APIError as e:
+        raise ModelRetry(f"API failed: {e}. Please try again.") from e
+```
+
+---
+
+## Implementation Blueprint
+
+### Data Models
+
+#### ORM Model (models.py)
+
+```python
+"""Agent session persistence."""
+from __future__ import annotations
+from datetime import datetime
+from enum import Enum
+from typing import Any
+from sqlalchemy import DateTime, Integer, String, Text
+from sqlalchemy.dialects.postgresql import JSONB
+from sqlalchemy.orm import Mapped, mapped_column
+from app.core.database import Base
+from app.shared.models import TimestampMixin
+
+
+class SessionStatus(str, Enum):
+    """Agent session states."""
+    ACTIVE = "active"
+    AWAITING_APPROVAL = "awaiting_approval"
+    COMPLETED = "completed"
+    EXPIRED = "expired"
+    FAILED = "failed"
+
+
+class AgentType(str, Enum):
+    """Available agent types."""
+    EXPERIMENT_ORCHESTRATOR = "experiment_orchestrator"
+    RAG_ASSISTANT = "rag_assistant"
+
+
+class AgentSession(TimestampMixin, Base):
+    """Persistent agent session for multi-turn conversations."""
+    __tablename__ = "agent_session"
+
+    id: Mapped[int] = mapped_column(Integer, primary_key=True)
+    session_id: Mapped[str] = mapped_column(String(32), unique=True, index=True)
+    agent_type: Mapped[str] = mapped_column(String(50), index=True)
+    status: Mapped[str] = mapped_column(String(30), default=SessionStatus.ACTIVE.value)
+
+    # Message history for multi-turn
+    message_history: Mapped[list[dict[str, Any]]] = mapped_column(JSONB, default=list)
+
+    # Pending approval
+    pending_action: Mapped[dict[str, Any] | None] = mapped_column(JSONB, nullable=True)
+
+    # Usage tracking
+    total_tokens_used: Mapped[int] = mapped_column(Integer, default=0)
+    tool_calls_count: Mapped[int] = mapped_column(Integer, default=0)
+
+    # Timing
+    last_activity: Mapped[datetime] = mapped_column(DateTime(timezone=True))
+    expires_at: Mapped[datetime] = mapped_column(DateTime(timezone=True))
+```
+
+#### Dependencies (deps.py)
+
+```python
+"""Agent dependencies for tool access."""
+from dataclasses import dataclass
+from sqlalchemy.ext.asyncio import AsyncSession
+
+
+@dataclass
+class AgentDeps:
+    """Dependencies passed to agent tools via RunContext."""
+    db: AsyncSession
+    session_id: str
+    request_id: str | None = None
+```
+
+#### Pydantic Schemas (schemas.py)
+
+```python
+"""Agent API schemas."""
+from datetime import datetime
+from typing import Any, Literal
+from pydantic import BaseModel, ConfigDict, Field
+
+
+# === Experiment Agent ===
+
+class ExperimentConstraints(BaseModel):
+    """Constraints for experiment search."""
+    model_config = ConfigDict(extra="forbid")
+
+    model_types: list[str] = Field(default_factory=lambda: ["naive", "seasonal_naive"])
+    min_train_size: int = Field(default=60, ge=30)
+    max_splits: int = Field(default=5, ge=1, le=20)
+
+
+class ExperimentRequest(BaseModel):
+    """Request to run experiment workflow."""
+    model_config = ConfigDict(extra="forbid")
+
+    objective: str = Field(..., min_length=10, max_length=500)
+    store_id: int = Field(..., ge=1)
+    product_id: int = Field(..., ge=1)
+    constraints: ExperimentConstraints = Field(default_factory=ExperimentConstraints)
+    auto_deploy: bool = False
+    session_id: str | None = None
+
+
+class RunSummary(BaseModel):
+    """Summary of a model run."""
+    run_id: str
+    model_type: str
+    config: dict[str, Any]
+    metrics: dict[str, float]
+
+
+class BaselineComparison(BaseModel):
+    """Comparison against baseline models."""
+    vs_naive: dict[str, float] | None = None
+    vs_seasonal_naive: dict[str, float] | None = None
+
+
+class ExperimentReport(BaseModel):
+    """Structured output from Experiment Agent."""
+    objective: str
+    methodology: str
+    experiments_run: int
+    best_run: RunSummary | None
+    baseline_comparison: BaselineComparison | None
+    recommendation: str
+    approval_required: bool
+    pending_action: str | None = None
+
+
+class ToolCallSummary(BaseModel):
+    """Summary of a tool call."""
+    tool: str
+    args: dict[str, Any]
+    result_summary: str
+    duration_ms: float
+
+
+class ExperimentResponse(BaseModel):
+    """Response from experiment workflow."""
+    session_id: str
+    status: Literal["completed", "awaiting_approval", "failed"]
+    report: ExperimentReport | None = None
+    tool_calls: list[ToolCallSummary] = Field(default_factory=list)
+    tokens_used: int = 0
+    duration_ms: float = 0
+
+
+# === Approval ===
+
+class ApprovalRequest(BaseModel):
+    """Request to approve/reject pending action."""
+    model_config = ConfigDict(extra="forbid")
+
+    session_id: str
+    action: str
+    approved: bool
+    comment: str | None = Field(None, max_length=500)
+
+
+class ApprovalResponse(BaseModel):
+    """Response from approval action."""
+    session_id: str
+    action: str
+    status: Literal["executed", "rejected"]
+    result: dict[str, Any] | None = None
+
+
+# === RAG Agent ===
+
+class RAGQueryRequest(BaseModel):
+    """Request for RAG-powered Q&A."""
+    model_config = ConfigDict(extra="forbid")
+
+    query: str = Field(..., min_length=5, max_length=2000)
+    session_id: str | None = None
+    include_sources: bool = True
+
+
+class Citation(BaseModel):
+    """Citation from RAG retrieval."""
+    source_type: str
+    source_path: str
+    chunk_id: str
+    snippet: str
+    relevance_score: float
+
+
+class RAGQueryResponse(BaseModel):
+    """Response from RAG query."""
+    session_id: str
+    answer: str
+    confidence: float = Field(..., ge=0.0, le=1.0)
+    citations: list[Citation] = Field(default_factory=list)
+    insufficient_context: bool = False
+    tokens_used: int = 0
+    duration_ms: float = 0
+
+
+# === Session Status ===
+
+class SessionStatusResponse(BaseModel):
+    """Session status details."""
+    session_id: str
+    agent_type: str
+    status: str
+    created_at: datetime
+    last_activity: datetime
+    pending_action: dict[str, Any] | None = None
+    tool_calls_count: int
+    tokens_used: int
+
+
+# === WebSocket Messages ===
+
+class WSMessage(BaseModel):
+    """WebSocket message from client."""
+    type: Literal["query", "approve", "cancel"]
+    agent: Literal["rag_assistant", "experiment_orchestrator"]
+    payload: dict[str, Any]
+
+
+class WSEvent(BaseModel):
+    """WebSocket event to client."""
+    type: Literal["token", "tool_call", "complete", "error"]
+    content: str | None = None
+    tool: str | None = None
+    status: str | None = None
+    summary: str | None = None
+    session_id: str | None = None
+    tokens_used: int | None = None
+```
+
+---
+
+## Task List
+
+### Task 1: Add Dependencies to pyproject.toml
+
+```yaml
+MODIFY: pyproject.toml
+ADD to dependencies:
+  - "pydantic-ai>=0.1.0"    # PydanticAI agent framework
+  - "anthropic>=0.40.0"      # Anthropic SDK for Claude
+  - "websockets>=13.0"       # WebSocket support (already in uvicorn[standard])
+```
+
+### Task 2: Add Agent Settings to config.py
+
+```yaml
+MODIFY: app/core/config.py
+ADD after RAG settings:
+
+  # Agent LLM Configuration
+  agent_default_model: str = "anthropic:claude-sonnet-4-20250514"
+  agent_fallback_model: str = "openai:gpt-4o"
+  agent_temperature: float = 0.1
+  agent_max_tokens: int = 4096
+  anthropic_api_key: str = ""  # Required
+
+  # Agent Execution Configuration
+  agent_max_tool_calls: int = 10
+  agent_timeout_seconds: int = 120
+  agent_retry_attempts: int = 3
+  agent_retry_delay_seconds: float = 1.0
+
+  # Human-in-the-Loop Configuration
+  agent_require_approval: list[str] = ["create_alias", "archive_run"]
+  agent_approval_timeout_minutes: int = 60
+
+  # Session Configuration
+  agent_session_ttl_minutes: int = 120
+  agent_max_sessions_per_user: int = 5
+
+  # Streaming Configuration
+  agent_enable_streaming: bool = True
+```
+
+### Task 3: Create Alembic Migration
+
+```yaml
+CREATE: alembic/versions/xxxx_create_agent_sessions_table.py
+PATTERN: Follow existing migration patterns
+
+Key columns:
+  - session_id (String 32, unique, indexed)
+  - agent_type (String 50, indexed)
+  - status (String 30)
+  - message_history (JSONB)
+  - pending_action (JSONB, nullable)
+  - total_tokens_used (Integer)
+  - tool_calls_count (Integer)
+  - last_activity (DateTime TZ)
+  - expires_at (DateTime TZ)
+  - created_at, updated_at (TimestampMixin)
+```
+
+### Task 4: Create ORM Models
+
+```yaml
+CREATE: app/features/agents/models.py
+MIRROR: app/features/registry/models.py pattern
+INCLUDE:
+  - SessionStatus enum
+  - AgentType enum
+  - AgentSession model with JSONB columns
+```
+
+### Task 5: Create Dependencies Dataclass
+
+```yaml
+CREATE: app/features/agents/deps.py
+CONTENT:
+  - AgentDeps dataclass
+  - Fields: db (AsyncSession), session_id, request_id
+```
+
+### Task 6: Create Pydantic Schemas
+
+```yaml
+CREATE: app/features/agents/schemas.py
+MIRROR: app/features/registry/schemas.py pattern
+INCLUDE:
+  - ExperimentRequest, ExperimentResponse, ExperimentReport
+  - ApprovalRequest, ApprovalResponse
+  - RAGQueryRequest, RAGQueryResponse, Citation
+  - SessionStatusResponse
+  - WSMessage, WSEvent
+```
+
+### Task 7: Create Tool Modules
+
+```yaml
+CREATE: app/features/agents/tools/registry_tools.py
+TOOLS:
+  - list_runs(ctx, filters) -> list[RunSummary]
+  - compare_runs(ctx, run_id_a, run_id_b) -> CompareResult
+  - create_alias(ctx, alias_name, run_id) -> AliasResult
+  - archive_run(ctx, run_id) -> ArchiveResult
+
+CREATE: app/features/agents/tools/backtesting_tools.py
+TOOLS:
+  - run_backtest(ctx, model_type, config, store_id, product_id, n_splits) -> BacktestResult
+
+CREATE: app/features/agents/tools/forecasting_tools.py
+TOOLS:
+  - list_models(ctx) -> list[ModelInfo]
+
+CREATE: app/features/agents/tools/rag_tools.py
+TOOLS:
+  - retrieve_context(ctx, query, top_k) -> list[RetrievedChunk]
+  - format_citation(ctx, chunk) -> Citation
+
+CRITICAL for all tools:
+  - Use @agent.tool decorator (not @agent.tool_plain) for db access
+  - First param is RunContext[AgentDeps]
+  - Detailed docstrings for LLM schema
+  - Structured logging with timing
+```
+
+### Task 8: Create Agent Definitions
+
+```yaml
+CREATE: app/features/agents/agents/base.py
+CONTENT:
+  - get_agent_settings() helper
+  - Common model configuration
+
+CREATE: app/features/agents/agents/experiment.py
+CONTENT:
+  - ExperimentReport output schema
+  - experiment_agent = Agent(...)
+  - System prompt for experiment orchestration
+  - Tools: list_models, run_backtest, compare_runs, create_alias
+
+CREATE: app/features/agents/agents/rag_assistant.py
+CONTENT:
+  - RAGResponse output schema
+  - rag_agent = Agent(...)
+  - System prompt for evidence-grounded answers
+  - Tools: retrieve_context, format_citation
+```
+
+### Task 9: Create Agent Service
+
+```yaml
+CREATE: app/features/agents/service.py
+MIRROR: app/features/jobs/service.py pattern
+
+Class AgentService:
+  async def run_experiment(self, db, request) -> ExperimentResponse:
+    - Create/resume session
+    - Build AgentDeps
+    - Run experiment_agent with tools
+    - Capture tool calls and timing
+    - Handle approval_required check
+    - Update session state
+    - Return structured response
+
+  async def run_rag_query(self, db, request) -> RAGQueryResponse:
+    - Create/resume session
+    - Run rag_agent with tools
+    - Extract citations from tool results
+    - Return structured response
+
+  async def approve_action(self, db, request) -> ApprovalResponse:
+    - Load session
+    - Validate pending_action matches
+    - Execute action if approved
+    - Update session status
+    - Return result
+
+  async def get_session_status(self, db, session_id) -> SessionStatusResponse:
+    - Load session
+    - Return status details
+
+  async def stream_response(self, db, message) -> AsyncGenerator[WSEvent]:
+    - Route to appropriate agent
+    - Use run_stream for token-by-token delivery
+    - Yield WSEvent for each chunk
+```
+
+### Task 10: Create REST Routes
+
+```yaml
+CREATE: app/features/agents/routes.py
+MIRROR: app/features/registry/routes.py pattern
+
+Routes:
+  POST /agents/experiment/run -> ExperimentResponse
+  POST /agents/experiment/approve -> ApprovalResponse
+  POST /agents/rag/query -> RAGQueryResponse
+  GET /agents/status/{session_id} -> SessionStatusResponse
+
+CRITICAL:
+  - Structured logging with agents.* prefix
+  - Handle LLM API errors gracefully
+  - Timeout handling
+```
+
+### Task 11: Create WebSocket Handler
+
+```yaml
+CREATE: app/features/agents/websocket.py
+PATTERN: FastAPI WebSocket with async iteration
+
+Key functions:
+  websocket_stream(websocket: WebSocket):
+    - Accept connection
+    - Receive JSON messages
+    - Parse WSMessage
+    - Call service.stream_response()
+    - Send WSEvent for each chunk
+    - Handle disconnect gracefully
+
+CRITICAL:
+  - Use asyncio.wait_for for timeout
+  - Catch WebSocketDisconnect
+  - Log all events with correlation ID
+```
+
+### Task 12: Register Router in main.py
+
+```yaml
+MODIFY: app/main.py
+ADD import: from app.features.agents.routes import router as agents_router
+ADD import: from app.features.agents.websocket import websocket_stream
+ADD router: app.include_router(agents_router)
+ADD websocket: app.add_api_websocket_route("/agents/stream", websocket_stream)
+```
+
+### Task 13: Create Test Fixtures
+
+```yaml
+CREATE: app/features/agents/tests/conftest.py
+FIXTURES:
+  - db_session: Async session with cleanup
+  - client: AsyncClient with db override
+  - mock_anthropic: Mock Anthropic API responses
+  - sample_experiment_request: Test request
+  - sample_rag_request: Test request
+```
+
+### Task 14: Create Unit Tests
+
+```yaml
+CREATE: app/features/agents/tests/test_schemas.py
+  - Test all request/response validation
+
+CREATE: app/features/agents/tests/test_tools.py
+  - Test each tool function with mocked deps
+  - Test tool return types
+  - Test error handling
+
+CREATE: app/features/agents/tests/test_agents.py
+  - Test agent with mocked LLM
+  - Test structured output parsing
+  - Test tool call ordering
+```
+
+### Task 15: Create Integration Tests
+
+```yaml
+CREATE: app/features/agents/tests/test_routes.py
+@pytest.mark.integration:
+  - test_experiment_run_creates_session
+  - test_experiment_approval_workflow
+  - test_rag_query_returns_citations
+  - test_session_status_returns_details
+  - test_websocket_streaming (with TestClient)
+```
+
+### Task 16: Create Examples
+
+```yaml
+CREATE: examples/agents/experiment_demo.py
+  - Full experiment workflow demo
+
+CREATE: examples/agents/rag_query.http
+  - HTTP client examples
+
+CREATE: examples/agents/websocket_client.py
+  - Python WebSocket client example
+```
+
+### Task 17: Update .env.example
+
+```yaml
+MODIFY: .env.example
+ADD:
+  # Agent Configuration
+  ANTHROPIC_API_KEY=sk-ant-...
+  AGENT_DEFAULT_MODEL=anthropic:claude-sonnet-4-20250514
+  AGENT_MAX_TOOL_CALLS=10
+  AGENT_TIMEOUT_SECONDS=120
+```
+
+---
+
+## Validation Loop
+
+### Level 1: Syntax & Style
+
+```bash
+# Run FIRST
+uv run ruff check app/features/agents/ --fix
+uv run ruff format app/features/agents/
+
+# Expected: No errors
+```
+
+### Level 2: Type Checking
+
+```bash
+# MUST be green
+uv run mypy app/features/agents/
+uv run pyright app/features/agents/
+
+# Expected: 0 errors
+```
+
+### Level 3: Unit Tests
+
+```bash
+# No LLM calls required (mocked)
+uv run pytest app/features/agents/tests/ -v -m "not integration"
+
+# Expected: All pass
+```
+
+### Level 4: Integration Tests
+
+```bash
+# Requires PostgreSQL + API keys
+docker-compose up -d
+uv run alembic upgrade head
+uv run pytest app/features/agents/tests/ -v -m integration
+
+# Expected: All pass (rate-limited)
+```
+
+### Level 5: Manual Smoke Test
+
+```bash
+# Start API
+uv run uvicorn app.main:app --reload --port 8123
+
+# RAG Query
+curl -X POST http://localhost:8123/agents/rag/query \
+  -H "Content-Type: application/json" \
+  -d '{"query": "How does backtesting prevent data leakage?"}'
+
+# Expected: {"session_id": "...", "answer": "...", "citations": [...]}
+
+# Experiment (requires indexed RAG data)
+curl -X POST http://localhost:8123/agents/experiment/run \
+  -H "Content-Type: application/json" \
+  -d '{
+    "objective": "Find best model for store 1, product 1",
+    "store_id": 1,
+    "product_id": 1
+  }'
+
+# Expected: {"session_id": "...", "status": "completed", "report": {...}}
+
+# WebSocket test
+python examples/agents/websocket_client.py
+```
+
+---
+
+## Final Validation Checklist
+
+- [ ] All tests pass: `uv run pytest app/features/agents/tests/ -v`
+- [ ] No linting errors: `uv run ruff check app/features/agents/`
+- [ ] No type errors: `uv run mypy && pyright`
+- [ ] Migration applies: `uv run alembic upgrade head`
+- [ ] Manual smoke tests pass
+- [ ] Structured logging with `agents.*` prefix
+- [ ] Tool calls logged with timing
+- [ ] Session state persists across requests
+- [ ] Approval workflow blocks sensitive actions
+- [ ] WebSocket streaming works
+
+---
+
+## Anti-Patterns to Avoid
+
+- ❌ Don't use `result_type` - use `output_type` in PydanticAI
+- ❌ Don't forget `deps_type` when using `RunContext[AgentDeps]`
+- ❌ Don't use `@agent.tool_plain` when db access needed
+- ❌ Don't forget to handle `WebSocketDisconnect`
+- ❌ Don't block on LLM calls without timeout
+- ❌ Don't store raw message_history as strings - use JSONB
+- ❌ Don't skip structured logging for tool calls
+- ❌ Don't hardcode model names - use settings
+
+---
+
+## Confidence Score: 7.5/10
+
+**Strengths:**
+- PydanticAI has excellent documentation
+- Clear FastAPI integration patterns
+- Existing service patterns to follow
+- Tool integrations with existing modules
+
+**Risks:**
+- PydanticAI is relatively new (versioning may change)
+- WebSocket streaming with tools is complex
+- LLM rate limits may affect tests
+- Message history serialization edge cases
+
+**Mitigations:**
+- Pin PydanticAI version in pyproject.toml
+- Comprehensive mocking for unit tests
+- Rate-limited integration tests
+- JSONB for flexible message storage