feat: release v0.2.10 — demo showcase page + e2e pipeline by w7-mgfcode · Pull Request #134 · w7-mgfcode/ForecastLabAI

w7-mgfcode · 2026-05-18T07:30:25Z

Summary

Promotes dev → main to cut v0.2.10. release-please will open a Release PR off this merge (pre-1.0: feat: → PATCH bump, 0.2.9 → 0.2.10).

Commits since v0.2.9 (`main`)

Commit	Type	Change
`51d9149`	feat(api,ui)	In-product demo showcase page — `/showcase` streams the live demo pipeline (#132/#133)
`7034e48`	chore(repo)	Bump authlib + fastmcp to clear Socket-flagged CVEs (#130/#131)
`1b4447d`	feat(api,docs)	E2E demo pipeline + showcase script (#128/#129)
`a864ae3`	feat(data)	`MarkdownGenerator` age_days trigger via heuristic (#94/#127)

Pre-merge verification

✅ Conflict-free — git merge-tree --write-tree origin/main origin/dev exits clean
✅ dev CI green on HEAD 51d9149 (Lint & Format, Type Check, Test, Migration Check)
✅ main's 0.2.9 release commits are absorbed automatically (no conflict)

Release note

PR title is intentionally feat: so the merge-commit subject triggers the release-please bump regardless of merge method (see docs/_base/RUNBOOKS.md → "release-please skipped the bump").

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

New Features
- Added end-to-end demo pipeline with make demo, make demo-quick, and make demo-clean commands for quick product walkthroughs.
- Added in-product Showcase page (/showcase) displaying live pipeline execution with step-by-step progress visualization.
- Added demo API endpoints for running and monitoring pipeline execution with WebSocket support.
Documentation
- Updated README with demo instructions and quick-start guidance.
- Added demo troubleshooting runbooks and API contracts.

#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94.

* docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-time. 4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate for demo_minimal can spend 60-90 s on slower laptops once you include inventory + prices + promotions inserts. 5. step_seed detail string: GenerateResult.records_created uses 'sales' (singular), not 'sales_daily'; cosmetic fix. tests/test_e2e_demo.py: - Redirect uvicorn stdout to a temp file rather than subprocess.PIPE. The seeder + structlog produce enough INFO log volume to fill a 64-KB pipe buffer; once full, uvicorn blocks on write and seeder requests hang for the full --timeout. Verified locally: integration suite now passes in ~6.5 s instead of timing out at 120 s. - Cleanup leaves the log file on disk only when the test failed (postmortem-friendly). tests/test_run_demo_unit.py: - Bump test_defaults timeout expectation to match the new 120 s default. End-to-end manual run on this machine: 11 steps, wall_clock=2 s, exit 0. Integration test: 2 passed in 6.48 s.

… (#131) Headline: - authlib 1.6.6 -> 1.7.2 (clears GHSA-wvwj-cvrp-7pv5 — JWS signature verification bypass; patched at >= 1.6.9) - fastmcp 2.14.4 -> 3.2.4 (clears GHSA-vv7q-7jx5-f767 — OpenAPI Provider SSRF + path traversal; patched at >= 3.2.0) Both CVEs were flagged on PR #129 by Socket Security and are pre-existing on dev (not introduced by #128). Wider scope — read before merging: `uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers a full re-resolve of the dependency graph. Because dev's uv.lock had drifted from pyproject.toml (the project's constraint envelope had loosened over time), this single command also brings the lockfile in sync with current pyproject.toml. Net diff: 243 insertions / 369 deletions on uv.lock; no other files touched. Transitive cascades worth flagging: - anthropic 0.77.0 -> 0.102.0 (pydantic-ai-slim extra) - pydantic-graph 1.51.0 -> 1.96.0 - temporalio 1.20.0 -> 1.27.2 - alembic 1.18.1 -> 1.18.4 - aws-* and cohere transitives bumped along - griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched) - Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus exporter, pydocket, redis, rsa, sortedcontainers — these were transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in. Verification on this host: - uv sync --extra dev -> green - ruff check . -> clean - mypy --strict app/ -> 192 files clean - pyright app/ -> 0 errors (50 warnings, pre-existing) - pytest -m 'not integration' -> 969 passed Known install quirk: griffelib 2.0.2 ships a top-level `griffe/` package whose RECORD files don't always materialize on first install when uv replaces an older `griffe` dist in the same sync. A clean venv install (which CI does via `uv sync --frozen`) is unaffected; local devs who upgrade in place may need a one-shot `uv pip install --force-reinstall griffelib` if `import griffe` fails.

* feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132) New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It drives the published API surface in-process via httpx.ASGITransport (no cross-slice imports, satisfying the vertical-slice rule) and streams one StepEvent per pipeline step: precheck -> reset -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup. A module-level asyncio.Lock enforces single-flight; concurrent runs get an RFC 7807 409. The orchestration is a faithful in-process port of scripts/run_demo.py (PR #129). Implements PRP-17. * test(api): cover the demo slice pipeline, routes, and e2e integration (#132) Unit tests mock the in-process HTTP client to exercise step sequencing, winner selection, and fail-fast; route tests cover POST /demo/run (200 + 409) and the WS /demo/stream handler. The integration test seeds demo_minimal and asserts an end-to-end green run against real Postgres. Implements PRP-17. * feat(ui): add showcase page streaming the live demo pipeline (#132) New /showcase route and nav entry. The page opens a one-shot WebSocket to /demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders the 11 pipeline steps as live status cards: glyph, detail, duration, the backtest per-model WAPE breakdown with the winner highlighted, and a pass/fail summary banner. Also block-scopes a pre-existing no-case-declarations lint error in chat.tsx so pnpm lint is green for this PR. Implements PRP-17. * test(ui): add vitest setup and use-demo-pipeline hook coverage (#132) Adds the frontend test stack (vitest + jsdom + @testing-library/react), a test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the pure event reducer (idle -> running -> pass transitions, summary assembly, error phase) and a renderHook smoke test. The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented fix for pnpm 11's esbuild build-script gate. Implements PRP-17. * docs(docs): document the demo slice and showcase page (#132) Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the /demo/run + /demo/stream rows and a WebSocket Events section in API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and REPO_MAP_INDEX rows for the demo slice and showcase page. Implements PRP-17.

coderabbitai · 2026-05-18T07:30:38Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR introduces a complete end-to-end demo pipeline for ForecastLabAI, enabling one-command demonstration of the forecasting system through both CLI (make demo) and web UI (/showcase). The implementation spans planning documents, backend orchestration service, CLI driver script, frontend showcase page, seeder support infrastructure, and CI automation, with comprehensive test coverage across all layers.

Changes

End-to-End Demo Pipeline & Showcase

Layer / File(s)	Summary
Planning Documents & Specifications `PRPs/PRP-15-e2e-demo-pipeline.md`, `PRPs/PRP-17-demo-showcase-page.md`, `INITIAL-14.md`, `README.md`, `Makefile`, `docs/*`	Comprehensive PRPs and requirement documents defining the demo pipeline architecture with 11 ordered steps (health, seed, features, train, backtest, register, verify, agent, cleanup), success criteria, testing strategy, and technical implementation details for both CLI driver and in-product showcase UI. Updated README and runbooks document the demo workflow and troubleshooting guides.
Seeder Support: demo_minimal & age_days `app/shared/seeder/config.py`, `app/shared/seeder/generators/markdowns.py`, `app/shared/seeder/core.py`, `app/features/seeder/service.py`, `app/shared/seeder/tests/*`	Adds `demo_minimal` scenario preset (3 stores, 10 products, Oct–Dec 2024) and implements age_days markdown trigger for rapid demo-specific data generation. The seeder now passes inventory_records to markdown generator, enabling spike-based refresh detection and age-based promotions without RNG consumption.
Backend Demo Slice: Core Architecture `app/features/demo/schemas.py`, `app/features/demo/pipeline.py`, `app/features/demo/service.py`	Typed Pydantic schemas (DemoRunRequest, StepEvent, DemoRunResult) and async 11-step pipeline orchestrator using httpx.ASGITransport to drive the FastAPI app as a black-box HTTP consumer. Module-level asyncio.Lock enforces single-flight execution; run_pipeline streams per-step events with exception-to-fail conversion and winner selection by lowest valid WAPE.
Backend Demo Slice: Routes & Integration `app/features/demo/routes.py`, `app/features/demo/__init__.py`, `app/main.py`	POST /demo/run endpoint returns aggregated DemoRunResult; WS /demo/stream endpoint streams StepEvent updates with error event fallback on validation/busy conditions. Wired into main FastAPI app with explicit module exports.
Backend Demo Tests `app/features/demo/tests/conftest.py`, `app/features/demo/tests/test_pipeline.py`, `app/features/demo/tests/test_routes.py`, `app/features/demo/tests/test_schemas.py`	Comprehensive unit tests covering schema validation and defaults, route success/error/busy paths, HTTP and WebSocket behavior, pipeline step sequencing, winner selection, artifact registration, and fail-fast behavior with monkeypatched HTTP clients and fake settings.
CLI Demo Driver `scripts/run_demo.py`, `scripts/__init__.py`	Standalone async HTTP-driven pipeline exercising documented endpoints with RFC 7807-aware error handling, structured step outcomes, Reporter output formatting (verbose/quiet modes), CLI flags (seed, skip-seed, reset, api-url, timeout), and canonical final-line output for CI grep. Handles transport errors as precondition failures (exit code 2) vs. step failures (exit code 1).
CLI Driver Unit Tests `tests/test_run_demo_unit.py`	Pure-Python unit test suite with mocked HTTP boundary validating CLI argument parsing, winner selection logic, model config payloads, reporter output formatting, step request payloads, agent LLM-key skipping, and module constants correctness.
E2E Integration Tests `tests/test_e2e_demo.py`, `tests/test_demo_showcase_integration.py`	Subprocess-based integration tests spawning isolated uvicorn server; test_e2e_demo.py validates CLI exit codes and success/failure output substrings; test_demo_showcase_integration.py exercises HTTP POST /demo/run against real Postgres-backed stack, verifying winner registration and alias resolution.
Frontend Types & Constants `frontend/src/types/api.ts`, `frontend/src/lib/constants.ts`	TypeScript API type definitions (DemoStepStatus, EventType, StepEvent, DemoRunRequest, DemoRunResult) and route constant updates including DEMO_WS_URL derivation from VITE_API_BASE_URL and EXPLORER.RUNS path addition.
Frontend State Management `frontend/src/hooks/use-demo-pipeline.ts`, `frontend/src/hooks/use-demo-pipeline.test.ts`	React hook managing 11-step pipeline state via one-shot WebSocket connection, pure reducer function (applyEvent) for state transitions, initial state builders, socket lifecycle (auto-reconnect on start, disconnect on completion/error), and comprehensive test coverage of phase transitions and event handling.
Frontend UI Components & Page `frontend/src/components/demo/demo-step-card.tsx`, `frontend/src/pages/showcase.tsx`	DemoStepCard component with per-status glyphs, duration formatting, and conditional backtest WAPE/register run_id + alias detail rendering. ShowcasePage wires pipeline state, provides run controls (button, reseed/resetDb checkboxes), renders live step cards, and displays summary banner with winner link to runs explorer on completion.
Frontend App Integration & Build `frontend/src/App.tsx`, `frontend/package.json`, `frontend/vitest.config.ts`, `frontend/tsconfig.node.json`	Lazy-loaded ShowcasePage route under ROUTES.SHOWCASE with Suspense fallback. Package.json adds test script (vitest run) and dev dependencies (`@testing-library/react/dom`, jsdom, vitest). Vitest config sets jsdom environment and @ alias; tsconfig.node.json includes vitest.config.ts.
CI Nightly Workflow & Misc `.github/workflows/e2e-nightly.yml`, `frontend/src/pages/chat.tsx`	GitHub Actions nightly e2e workflow (07:00 UTC cron + manual dispatch) provisioning pgvector Postgres, applying migrations, starting uvicorn on port 8123, waiting for health readiness, running demo pipeline with 60s timeout, and uploading uvicorn logs on failure. Includes chat.tsx reformatting without behavior changes.

Sequence Diagram(s)

sequenceDiagram
    participant Browser as Browser/CLI
    participant API as FastAPI App
    participant Pipeline as Demo Pipeline
    participant Registry as Registry Slice
    participant Seeder as Seeder Slice
    
    Browser->>API: GET /health (precheck)
    API-->>Browser: 200 OK
    
    Browser->>Seeder: POST /seeder/generate (demo_minimal)
    Seeder-->>Browser: seed_records, date_range
    
    Browser->>API: POST /featuresets/compute
    API-->>Browser: feature rows
    
    par Training
        Browser->>API: POST /forecasting/train (model_1)
        Browser->>API: POST /forecasting/train (model_2)
        Browser->>API: POST /forecasting/train (model_3)
    end
    API-->>Browser: train results
    
    Browser->>API: POST /backtesting/run (model_1)
    Browser->>API: POST /backtesting/run (model_2)
    Browser->>API: POST /backtesting/run (model_3)
    API-->>Browser: WAPE metrics, select winner
    
    Browser->>Registry: POST /registry/runs (artifact + metadata)
    Browser->>Registry: PATCH /registry/runs/{id} (pending→running→success)
    Browser->>Registry: POST /registry/aliases (demo-production)
    Registry-->>Browser: run_id, alias confirmed
    
    Browser->>Browser: Overall status: pass/fail, wall_clock

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related issues

w7-mgfcode/ForecastLabAI#128: Directly implements the end-to-end demo pipeline CLI driver and infrastructure described in the issue.
w7-mgfcode/ForecastLabAI#132: Directly implements the in-product demo showcase slice with routes and frontend UI described in the issue.

Possibly related PRs

w7-mgfcode/ForecastLabAI#97: Both PRs modify MarkdownGenerator in app/shared/seeder/generators/markdowns.py—PR#97 introduced the class with age_days as unimplemented; this PR completes the implementation via _emit_age_days() with inventory-based spike detection.
w7-mgfcode/ForecastLabAI#68: This PR's demo pipeline depends on the /seeder/*, /featuresets/compute, and other REST endpoints implemented in PR#68, establishing a critical integration dependency.
w7-mgfcode/ForecastLabAI#25: The demo driver invokes the /featuresets/compute API that was introduced in PR#25.

Suggested reviewers

w7-learn

🐰 A rabbit's ode to the demo pipeline:

Eleven steps through forecasts made,
From health checks to metrics displayed,
Seeds and features, train and test,
A showcase pipeline at its best! ✨

With WebSockets streaming live,
The dashboard comes alive and thrives,
One command to rule them all—
make demo answers the call! 🎯

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dev

sourcery-ai

Sorry @w7-mgfcode, your pull request is larger than the review limit of 150000 diff characters

socket-security · 2026-05-18T07:31:24Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Package	Supply Chain Security	Vulnerability	Quality
npm/jsdom@29.1.1
pypi/pandas@3.0.0 ⏵ 3.0.3	⁺¹
npm/vitest@4.1.6
npm/@testing-library/dom@10.4.1
npm/@testing-library/react@16.3.2
pypi/pydantic-ai@1.51.0 ⏵ 1.96.0		⁺¹⁶	^-10
pypi/anthropic@0.77.0 ⏵ 0.102.0	^-2
pypi/openai@2.16.0 ⏵ 2.36.0	⁺¹
pypi/alembic@1.18.1 ⏵ 1.18.4

View full report

…ts (#136) (#137) * chore(main): release 0.2.9 (#126) * feat: release v0.2.10 — demo showcase page + e2e pipeline (#134) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-time. 4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate for demo_minimal can spend 60-90 s on slower laptops once you include inventory + prices + promotions inserts. 5. step_seed detail string: GenerateResult.records_created uses 'sales' (singular), not 'sales_daily'; cosmetic fix. tests/test_e2e_demo.py: - Redirect uvicorn stdout to a temp file rather than subprocess.PIPE. The seeder + structlog produce enough INFO log volume to fill a 64-KB pipe buffer; once full, uvicorn blocks on write and seeder requests hang for the full --timeout. Verified locally: integration suite now passes in ~6.5 s instead of timing out at 120 s. - Cleanup leaves the log file on disk only when the test failed (postmortem-friendly). tests/test_run_demo_unit.py: - Bump test_defaults timeout expectation to match the new 120 s default. End-to-end manual run on this machine: 11 steps, wall_clock=2 s, exit 0. Integration test: 2 passed in 6.48 s. * chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131) Headline: - authlib 1.6.6 -> 1.7.2 (clears GHSA-wvwj-cvrp-7pv5 — JWS signature verification bypass; patched at >= 1.6.9) - fastmcp 2.14.4 -> 3.2.4 (clears GHSA-vv7q-7jx5-f767 — OpenAPI Provider SSRF + path traversal; patched at >= 3.2.0) Both CVEs were flagged on PR #129 by Socket Security and are pre-existing on dev (not introduced by #128). Wider scope — read before merging: `uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers a full re-resolve of the dependency graph. Because dev's uv.lock had drifted from pyproject.toml (the project's constraint envelope had loosened over time), this single command also brings the lockfile in sync with current pyproject.toml. Net diff: 243 insertions / 369 deletions on uv.lock; no other files touched. Transitive cascades worth flagging: - anthropic 0.77.0 -> 0.102.0 (pydantic-ai-slim extra) - pydantic-graph 1.51.0 -> 1.96.0 - temporalio 1.20.0 -> 1.27.2 - alembic 1.18.1 -> 1.18.4 - aws-* and cohere transitives bumped along - griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched) - Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus exporter, pydocket, redis, rsa, sortedcontainers — these were transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in. Verification on this host: - uv sync --extra dev -> green - ruff check . -> clean - mypy --strict app/ -> 192 files clean - pyright app/ -> 0 errors (50 warnings, pre-existing) - pytest -m 'not integration' -> 969 passed Known install quirk: griffelib 2.0.2 ships a top-level `griffe/` package whose RECORD files don't always materialize on first install when uv replaces an older `griffe` dist in the same sync. A clean venv install (which CI does via `uv sync --frozen`) is unaffected; local devs who upgrade in place may need a one-shot `uv pip install --force-reinstall griffelib` if `import griffe` fails. * feat(api,ui): in-product demo showcase page (#132) (#133) * feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132) New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It drives the published API surface in-process via httpx.ASGITransport (no cross-slice imports, satisfying the vertical-slice rule) and streams one StepEvent per pipeline step: precheck -> reset -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup. A module-level asyncio.Lock enforces single-flight; concurrent runs get an RFC 7807 409. The orchestration is a faithful in-process port of scripts/run_demo.py (PR #129). Implements PRP-17. * test(api): cover the demo slice pipeline, routes, and e2e integration (#132) Unit tests mock the in-process HTTP client to exercise step sequencing, winner selection, and fail-fast; route tests cover POST /demo/run (200 + 409) and the WS /demo/stream handler. The integration test seeds demo_minimal and asserts an end-to-end green run against real Postgres. Implements PRP-17. * feat(ui): add showcase page streaming the live demo pipeline (#132) New /showcase route and nav entry. The page opens a one-shot WebSocket to /demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders the 11 pipeline steps as live status cards: glyph, detail, duration, the backtest per-model WAPE breakdown with the winner highlighted, and a pass/fail summary banner. Also block-scopes a pre-existing no-case-declarations lint error in chat.tsx so pnpm lint is green for this PR. Implements PRP-17. * test(ui): add vitest setup and use-demo-pipeline hook coverage (#132) Adds the frontend test stack (vitest + jsdom + @testing-library/react), a test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the pure event reducer (idle -> running -> pass transitions, summary assembly, error phase) and a renderHook smoke test. The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented fix for pnpm 11's esbuild build-script gate. Implements PRP-17. * docs(docs): document the demo slice and showcase page (#132) Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the /demo/run + /demo/stream rows and a WebSocket Events section in API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and REPO_MAP_INDEX rows for the demo slice and showcase page. Implements PRP-17. * chore(main): release 0.2.10 (#135)

#158) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-time. 4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate for demo_minimal can spend 60-90 s on slower laptops once you include inventory + prices + promotions inserts. 5. step_seed detail string: GenerateResult.records_created uses 'sales' (singular), not 'sales_daily'; cosmetic fix. tests/test_e2e_demo.py: - Redirect uvicorn stdout to a temp file rather than subprocess.PIPE. The seeder + structlog produce enough INFO log volume to fill a 64-KB pipe buffer; once full, uvicorn blocks on write and seeder requests hang for the full --timeout. Verified locally: integration suite now passes in ~6.5 s instead of timing out at 120 s. - Cleanup leaves the log file on disk only when the test failed (postmortem-friendly). tests/test_run_demo_unit.py: - Bump test_defaults timeout expectation to match the new 120 s default. End-to-end manual run on this machine: 11 steps, wall_clock=2 s, exit 0. Integration test: 2 passed in 6.48 s. * chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131) Headline: - authlib 1.6.6 -> 1.7.2 (clears GHSA-wvwj-cvrp-7pv5 — JWS signature verification bypass; patched at >= 1.6.9) - fastmcp 2.14.4 -> 3.2.4 (clears GHSA-vv7q-7jx5-f767 — OpenAPI Provider SSRF + path traversal; patched at >= 3.2.0) Both CVEs were flagged on PR #129 by Socket Security and are pre-existing on dev (not introduced by #128). Wider scope — read before merging: `uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers a full re-resolve of the dependency graph. Because dev's uv.lock had drifted from pyproject.toml (the project's constraint envelope had loosened over time), this single command also brings the lockfile in sync with current pyproject.toml. Net diff: 243 insertions / 369 deletions on uv.lock; no other files touched. Transitive cascades worth flagging: - anthropic 0.77.0 -> 0.102.0 (pydantic-ai-slim extra) - pydantic-graph 1.51.0 -> 1.96.0 - temporalio 1.20.0 -> 1.27.2 - alembic 1.18.1 -> 1.18.4 - aws-* and cohere transitives bumped along - griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched) - Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus exporter, pydocket, redis, rsa, sortedcontainers — these were transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in. Verification on this host: - uv sync --extra dev -> green - ruff check . -> clean - mypy --strict app/ -> 192 files clean - pyright app/ -> 0 errors (50 warnings, pre-existing) - pytest -m 'not integration' -> 969 passed Known install quirk: griffelib 2.0.2 ships a top-level `griffe/` package whose RECORD files don't always materialize on first install when uv replaces an older `griffe` dist in the same sync. A clean venv install (which CI does via `uv sync --frozen`) is unaffected; local devs who upgrade in place may need a one-shot `uv pip install --force-reinstall griffelib` if `import griffe` fails. * feat(api,ui): in-product demo showcase page (#132) (#133) * feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132) New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It drives the published API surface in-process via httpx.ASGITransport (no cross-slice imports, satisfying the vertical-slice rule) and streams one StepEvent per pipeline step: precheck -> reset -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup. A module-level asyncio.Lock enforces single-flight; concurrent runs get an RFC 7807 409. The orchestration is a faithful in-process port of scripts/run_demo.py (PR #129). Implements PRP-17. * test(api): cover the demo slice pipeline, routes, and e2e integration (#132) Unit tests mock the in-process HTTP client to exercise step sequencing, winner selection, and fail-fast; route tests cover POST /demo/run (200 + 409) and the WS /demo/stream handler. The integration test seeds demo_minimal and asserts an end-to-end green run against real Postgres. Implements PRP-17. * feat(ui): add showcase page streaming the live demo pipeline (#132) New /showcase route and nav entry. The page opens a one-shot WebSocket to /demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders the 11 pipeline steps as live status cards: glyph, detail, duration, the backtest per-model WAPE breakdown with the winner highlighted, and a pass/fail summary banner. Also block-scopes a pre-existing no-case-declarations lint error in chat.tsx so pnpm lint is green for this PR. Implements PRP-17. * test(ui): add vitest setup and use-demo-pipeline hook coverage (#132) Adds the frontend test stack (vitest + jsdom + @testing-library/react), a test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the pure event reducer (idle -> running -> pass transitions, summary assembly, error phase) and a renderHook smoke test. The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented fix for pnpm 11's esbuild build-script gate. Implements PRP-17. * docs(docs): document the demo slice and showcase page (#132) Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the /demo/run + /demo/stream rows and a WebSocket Events section in API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and REPO_MAP_INDEX rows for the demo slice and showcase page. Implements PRP-17. * chore(repo): back-merge main into dev to absorb v0.2.10 release commits (#136) (#137) * chore(main): release 0.2.9 (#126) * feat: release v0.2.10 — demo showcase page + e2e pipeline (#134) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-time. 4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate for demo_minimal can spend 60-90 s on slower laptops once you include inventory + prices + promotions inserts. 5. step_seed detail string: GenerateResult.records_created uses 'sales' (singular), not 'sales_daily'; cosmetic fix. tests/test_e2e_demo.py: - Redirect uvicorn stdout to a temp file rather than subprocess.PIPE. The seeder + structlog produce enough INFO log volume to fill a 64-KB pipe buffer; once full, uvicorn blocks on write and seeder requests hang for the full --timeout. Verified locally: integration suite now passes in ~6.5 s instead of timing out at 120 s. - Cleanup leaves the log file on disk only when the test failed (postmortem-friendly). tests/test_run_demo_unit.py: - Bump test_defaults timeout expectation to match the new 120 s default. End-to-end manual run on this machine: 11 steps, wall_clock=2 s, exit 0. Integration test: 2 passed in 6.48 s. * chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131) Headline: - authlib 1.6.6 -> 1.7.2 (clears GHSA-wvwj-cvrp-7pv5 — JWS signature verification bypass; patched at >= 1.6.9) - fastmcp 2.14.4 -> 3.2.4 (clears GHSA-vv7q-7jx5-f767 — OpenAPI Provider SSRF + path traversal; patched at >= 3.2.0) Both CVEs were flagged on PR #129 by Socket Security and are pre-existing on dev (not introduced by #128). Wider scope — read before merging: `uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers a full re-resolve of the dependency graph. Because dev's uv.lock had drifted from pyproject.toml (the project's constraint envelope had loosened over time), this single command also brings the lockfile in sync with current pyproject.toml. Net diff: 243 insertions / 369 deletions on uv.lock; no other files touched. Transitive cascades worth flagging: - anthropic 0.77.0 -> 0.102.0 (pydantic-ai-slim extra) - pydantic-graph 1.51.0 -> 1.96.0 - temporalio 1.20.0 -> 1.27.2 - alembic 1.18.1 -> 1.18.4 - aws-* and cohere transitives bumped along - griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched) - Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus exporter, pydocket, redis, rsa, sortedcontainers — these were transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in. Verification on this host: - uv sync --extra dev -> green - ruff check . -> clean - mypy --strict app/ -> 192 files clean - pyright app/ -> 0 errors (50 warnings, pre-existing) - pytest -m 'not integration' -> 969 passed Known install quirk: griffelib 2.0.2 ships a top-level `griffe/` package whose RECORD files don't always materialize on first install when uv replaces an older `griffe` dist in the same sync. A clean venv install (which CI does via `uv sync --frozen`) is unaffected; local devs who upgrade in place may need a one-shot `uv pip install --force-reinstall griffelib` if `import griffe` fails. * feat(api,ui): in-product demo showcase page (#132) (#133) * feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132) New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It drives the published API surface in-process via httpx.ASGITransport (no cross-slice imports, satisfying the vertical-slice rule) and streams one StepEvent per pipeline step: precheck -> reset -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup. A module-level asyncio.Lock enforces single-flight; concurrent runs get an RFC 7807 409. The orchestration is a faithful in-process port of scripts/run_demo.py (PR #129). Implements PRP-17. * test(api): cover the demo slice pipeline, routes, and e2e integration (#132) Unit tests mock the in-process HTTP client to exercise step sequencing, winner selection, and fail-fast; route tests cover POST /demo/run (200 + 409) and the WS /demo/stream handler. The integration test seeds demo_minimal and asserts an end-to-end green run against real Postgres. Implements PRP-17. * feat(ui): add showcase page streaming the live demo pipeline (#132) New /showcase route and nav entry. The page opens a one-shot WebSocket to /demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders the 11 pipeline steps as live status cards: glyph, detail, duration, the backtest per-model WAPE breakdown with the winner highlighted, and a pass/fail summary banner. Also block-scopes a pre-existing no-case-declarations lint error in chat.tsx so pnpm lint is green for this PR. Implements PRP-17. * test(ui): add vitest setup and use-demo-pipeline hook coverage (#132) Adds the frontend test stack (vitest + jsdom + @testing-library/react), a test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the pure event reducer (idle -> running -> pass transitions, summary assembly, error phase) and a renderHook smoke test. The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented fix for pnpm 11's esbuild build-script gate. Implements PRP-17. * docs(docs): document the demo slice and showcase page (#132) Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the /demo/run + /demo/stream rows and a WebSocket Events section in API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and REPO_MAP_INDEX rows for the demo slice and showcase page. Implements PRP-17. * chore(main): release 0.2.10 (#135) * docs(docs): fix broken PRP-0/INITIAL-0 relative links in phase 0 doc (#138) (#139) * docs(repo): fix readme dev-deps command and stale .github template placeholders (#140) (#141) * docs(docs): fill DEV_GUIDE.md onboarding stub sections (#142) (#143) * docs(repo): refresh stale CLAUDE.md note, document /demo API, align PR template (#144) (#145) * fix(ui): chart series render black — drop hsl() wrapper on oklch chart vars (#149) (#153) index.css defines --chart-N twice: legacy shadcn-v3 HSL triplets and Tailwind-4 / shadcn-v4 complete oklch() colours. The oklch definitions win the cascade, so at runtime --chart-N is a full colour. The chart components still wrapped it as hsl(var(--chart-N)) → hsl(oklch(...)), which is invalid CSS, so recharts fell back to a black fill/stroke — invisible on the dark theme. Reference var(--chart-N) directly in backtest-folds-chart.tsx and time-series-chart.tsx. Verified in a browser: the backtest per-fold bars and the forecast line now render in colour. * fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148) (#152) * fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148) _execute_backtest ran BacktestingService.run_backtest — which computes per-fold metrics, stability indices and a naive/seasonal baseline comparison — but stored only four aggregated values and discarded the rest. The dashboard (/visualize/backtest) reads aggregated_metrics.{*_mean, stability_index}, fold_metrics[] and baseline_comparison, so it showed "0 folds", all-zero metrics and an empty chart. Add _shape_backtest_result(), which flattens a BacktestResponse into the contract the dashboard expects, and _finite(), which coerces NaN/inf to 0.0 so the result stays JSONB-safe (stability is NaN with fewer than two folds). Add app/features/jobs/tests/test_service.py with unit coverage for the shaping logic: fold metrics, *_mean keys, stability, baseline comparison, the no-baselines path, and NaN coercion. * refactor(jobs): centralize backtest metric keys and surface drift (#148) Addresses review feedback on PR #152. - Hoist the dashboard's metric set into _BACKTEST_METRICS and the headline stability metric into _STABILITY_METRIC, so the hardcoded keys live in one documented place instead of being repeated across the shaping logic. - Log jobs.backtest_metrics_missing when an expected metric is absent from the backtest response, so a future rename in the backtesting service fails loud instead of silently emitting 0.0. - Document the WAPE stability convention in the _shape_backtest_result docstring. - Tests: assert backtest_id / model_type / duration_ms pass through unchanged, and add a regression test for the missing-metric default path. * fix(ui): forecast page reads forecasts/forecast from predict job result (#147) (#151) The /visualize/forecast page never rendered the chart for a valid completed predict job. It read job.result.predictions with field `predicted`, but POST /jobs (job_type="predict") returns job.result.forecasts with field `forecast`. forecastData was therefore always undefined and the page fell through to "No prediction data available in job result". Read result.forecasts with field `forecast`, and pass predictedKey="forecast" to TimeSeriesChart (which already supports a configurable data key). Verified in a browser: entering a completed predict job ID now renders the 14-day forecast line chart with correct tooltip values. * fix(registry): tolerate multiple matches in _find_duplicate (#146) (#150) Under the default registry_duplicate_policy="detect", duplicate runs are created intentionally, so multiple non-archived model_run rows can share one config hash. _find_duplicate used scalar_one_or_none(), which raised MultipleResultsFound once two duplicates existed — POST /registry/runs then returned HTTP 500. This made the demo/Showcase register step fail deterministically on any DB with repeated runs. Order the lookup by created_at desc, LIMIT 1, and use scalars().first() so it returns the most recent matching run instead of asserting a single match. Add an integration regression test that POSTs an identical run three times under the detect policy and asserts all three return 201. * fix(ui): derive TimeSeriesChart line stroke from the config key (#156) (#157) TimeSeriesChart builds chartConfig with dynamic keys ([actualKey] / [predictedKey]), so shadcn's ChartContainer injects --color-<key> CSS variables. The <Line> elements, however, hardcoded stroke="var(--color-actual)" and stroke="var(--color-predicted)". The forecast page passes predictedKey="forecast", so the injected variable is --color-forecast; var(--color-predicted) was undefined, the stroke was invalid, and SVG fell back to its initial value `none` — the forecast line was invisible. Build the stroke from the key: stroke={`var(--color-${actualKey})`} / stroke={`var(--color-${predictedKey})`}. Verified in a browser: the forecast line now renders in colour. * feat(ui): job picker dropdown on forecast and backtest pages (#154) (#155) The visualization pages only accepted a job ID typed into a text box, so users had to already know the ID. Add a JobPicker component: a dropdown of completed jobs of the relevant type (predict / backtest), newest first, with each option labelled by short id, model and timestamp. - New shared component src/components/common/job-picker.tsx, used by both forecast.tsx and backtest.tsx. - The manual job-ID input stays alongside the dropdown for pasting an ID. - The most recent completed job auto-loads on mount so a chart shows immediately without interaction. No backend change — GET /jobs?job_type=&status=completed already exists. Verified in a browser on both pages.

* chore(main): release 0.2.9 (#126) * feat: release v0.2.10 — demo showcase page + e2e pipeline (#134) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-time. 4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate for demo_minimal can spend 60-90 s on slower laptops once you include inventory + prices + promotions inserts. 5. step_seed detail string: GenerateResult.records_created uses 'sales' (singular), not 'sales_daily'; cosmetic fix. tests/test_e2e_demo.py: - Redirect uvicorn stdout to a temp file rather than subprocess.PIPE. The seeder + structlog produce enough INFO log volume to fill a 64-KB pipe buffer; once full, uvicorn blocks on write and seeder requests hang for the full --timeout. Verified locally: integration suite now passes in ~6.5 s instead of timing out at 120 s. - Cleanup leaves the log file on disk only when the test failed (postmortem-friendly). tests/test_run_demo_unit.py: - Bump test_defaults timeout expectation to match the new 120 s default. End-to-end manual run on this machine: 11 steps, wall_clock=2 s, exit 0. Integration test: 2 passed in 6.48 s. * chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131) Headline: - authlib 1.6.6 -> 1.7.2 (clears GHSA-wvwj-cvrp-7pv5 — JWS signature verification bypass; patched at >= 1.6.9) - fastmcp 2.14.4 -> 3.2.4 (clears GHSA-vv7q-7jx5-f767 — OpenAPI Provider SSRF + path traversal; patched at >= 3.2.0) Both CVEs were flagged on PR #129 by Socket Security and are pre-existing on dev (not introduced by #128). Wider scope — read before merging: `uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers a full re-resolve of the dependency graph. Because dev's uv.lock had drifted from pyproject.toml (the project's constraint envelope had loosened over time), this single command also brings the lockfile in sync with current pyproject.toml. Net diff: 243 insertions / 369 deletions on uv.lock; no other files touched. Transitive cascades worth flagging: - anthropic 0.77.0 -> 0.102.0 (pydantic-ai-slim extra) - pydantic-graph 1.51.0 -> 1.96.0 - temporalio 1.20.0 -> 1.27.2 - alembic 1.18.1 -> 1.18.4 - aws-* and cohere transitives bumped along - griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched) - Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus exporter, pydocket, redis, rsa, sortedcontainers — these were transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in. Verification on this host: - uv sync --extra dev -> green - ruff check . -> clean - mypy --strict app/ -> 192 files clean - pyright app/ -> 0 errors (50 warnings, pre-existing) - pytest -m 'not integration' -> 969 passed Known install quirk: griffelib 2.0.2 ships a top-level `griffe/` package whose RECORD files don't always materialize on first install when uv replaces an older `griffe` dist in the same sync. A clean venv install (which CI does via `uv sync --frozen`) is unaffected; local devs who upgrade in place may need a one-shot `uv pip install --force-reinstall griffelib` if `import griffe` fails. * feat(api,ui): in-product demo showcase page (#132) (#133) * feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132) New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It drives the published API surface in-process via httpx.ASGITransport (no cross-slice imports, satisfying the vertical-slice rule) and streams one StepEvent per pipeline step: precheck -> reset -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup. A module-level asyncio.Lock enforces single-flight; concurrent runs get an RFC 7807 409. The orchestration is a faithful in-process port of scripts/run_demo.py (PR #129). Implements PRP-17. * test(api): cover the demo slice pipeline, routes, and e2e integration (#132) Unit tests mock the in-process HTTP client to exercise step sequencing, winner selection, and fail-fast; route tests cover POST /demo/run (200 + 409) and the WS /demo/stream handler. The integration test seeds demo_minimal and asserts an end-to-end green run against real Postgres. Implements PRP-17. * feat(ui): add showcase page streaming the live demo pipeline (#132) New /showcase route and nav entry. The page opens a one-shot WebSocket to /demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders the 11 pipeline steps as live status cards: glyph, detail, duration, the backtest per-model WAPE breakdown with the winner highlighted, and a pass/fail summary banner. Also block-scopes a pre-existing no-case-declarations lint error in chat.tsx so pnpm lint is green for this PR. Implements PRP-17. * test(ui): add vitest setup and use-demo-pipeline hook coverage (#132) Adds the frontend test stack (vitest + jsdom + @testing-library/react), a test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the pure event reducer (idle -> running -> pass transitions, summary assembly, error phase) and a renderHook smoke test. The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented fix for pnpm 11's esbuild build-script gate. Implements PRP-17. * docs(docs): document the demo slice and showcase page (#132) Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the /demo/run + /demo/stream rows and a WebSocket Events section in API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and REPO_MAP_INDEX rows for the demo slice and showcase page. Implements PRP-17. * chore(main): release 0.2.10 (#135) * feat: release v0.2.11 — visualization fixes, job picker, demo showcase (#158) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-time. 4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate for demo_minimal can spend 60-90 s on slower laptops once you include inventory + prices + promotions inserts. 5. step_seed detail string: GenerateResult.records_created uses 'sales' (singular), not 'sales_daily'; cosmetic fix. tests/test_e2e_demo.py: - Redirect uvicorn stdout to a temp file rather than subprocess.PIPE. The seeder + structlog produce enough INFO log volume to fill a 64-KB pipe buffer; once full, uvicorn blocks on write and seeder requests hang for the full --timeout. Verified locally: integration suite now passes in ~6.5 s instead of timing out at 120 s. - Cleanup leaves the log file on disk only when the test failed (postmortem-friendly). tests/test_run_demo_unit.py: - Bump test_defaults timeout expectation to match the new 120 s default. End-to-end manual run on this machine: 11 steps, wall_clock=2 s, exit 0. Integration test: 2 passed in 6.48 s. * chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131) Headline: - authlib 1.6.6 -> 1.7.2 (clears GHSA-wvwj-cvrp-7pv5 — JWS signature verification bypass; patched at >= 1.6.9) - fastmcp 2.14.4 -> 3.2.4 (clears GHSA-vv7q-7jx5-f767 — OpenAPI Provider SSRF + path traversal; patched at >= 3.2.0) Both CVEs were flagged on PR #129 by Socket Security and are pre-existing on dev (not introduced by #128). Wider scope — read before merging: `uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers a full re-resolve of the dependency graph. Because dev's uv.lock had drifted from pyproject.toml (the project's constraint envelope had loosened over time), this single command also brings the lockfile in sync with current pyproject.toml. Net diff: 243 insertions / 369 deletions on uv.lock; no other files touched. Transitive cascades worth flagging: - anthropic 0.77.0 -> 0.102.0 (pydantic-ai-slim extra) - pydantic-graph 1.51.0 -> 1.96.0 - temporalio 1.20.0 -> 1.27.2 - alembic 1.18.1 -> 1.18.4 - aws-* and cohere transitives bumped along - griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched) - Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus exporter, pydocket, redis, rsa, sortedcontainers — these were transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in. Verification on this host: - uv sync --extra dev -> green - ruff check . -> clean - mypy --strict app/ -> 192 files clean - pyright app/ -> 0 errors (50 warnings, pre-existing) - pytest -m 'not integration' -> 969 passed Known install quirk: griffelib 2.0.2 ships a top-level `griffe/` package whose RECORD files don't always materialize on first install when uv replaces an older `griffe` dist in the same sync. A clean venv install (which CI does via `uv sync --frozen`) is unaffected; local devs who upgrade in place may need a one-shot `uv pip install --force-reinstall griffelib` if `import griffe` fails. * feat(api,ui): in-product demo showcase page (#132) (#133) * feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132) New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It drives the published API surface in-process via httpx.ASGITransport (no cross-slice imports, satisfying the vertical-slice rule) and streams one StepEvent per pipeline step: precheck -> reset -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup. A module-level asyncio.Lock enforces single-flight; concurrent runs get an RFC 7807 409. The orchestration is a faithful in-process port of scripts/run_demo.py (PR #129). Implements PRP-17. * test(api): cover the demo slice pipeline, routes, and e2e integration (#132) Unit tests mock the in-process HTTP client to exercise step sequencing, winner selection, and fail-fast; route tests cover POST /demo/run (200 + 409) and the WS /demo/stream handler. The integration test seeds demo_minimal and asserts an end-to-end green run against real Postgres. Implements PRP-17. * feat(ui): add showcase page streaming the live demo pipeline (#132) New /showcase route and nav entry. The page opens a one-shot WebSocket to /demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders the 11 pipeline steps as live status cards: glyph, detail, duration, the backtest per-model WAPE breakdown with the winner highlighted, and a pass/fail summary banner. Also block-scopes a pre-existing no-case-declarations lint error in chat.tsx so pnpm lint is green for this PR. Implements PRP-17. * test(ui): add vitest setup and use-demo-pipeline hook coverage (#132) Adds the frontend test stack (vitest + jsdom + @testing-library/react), a test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the pure event reducer (idle -> running -> pass transitions, summary assembly, error phase) and a renderHook smoke test. The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented fix for pnpm 11's esbuild build-script gate. Implements PRP-17. * docs(docs): document the demo slice and showcase page (#132) Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the /demo/run + /demo/stream rows and a WebSocket Events section in API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and REPO_MAP_INDEX rows for the demo slice and showcase page. Implements PRP-17. * chore(repo): back-merge main into dev to absorb v0.2.10 release commits (#136) (#137) * chore(main): release 0.2.9 (#126) * feat: release v0.2.10 — demo showcase page + e2e pipeline (#134) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-time. 4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate for demo_minimal can spend 60-90 s on slower laptops once you include inventory + prices + promotions inserts. 5. step_seed detail string: GenerateResult.records_created uses 'sales' (singular), not 'sales_daily'; cosmetic fix. tests/test_e2e_demo.py: - Redirect uvicorn stdout to a temp file rather than subprocess.PIPE. The seeder + structlog produce enough INFO log volume to fill a 64-KB pipe buffer; once full, uvicorn blocks on write and seeder requests hang for the full --timeout. Verified locally: integration suite now passes in ~6.5 s instead of timing out at 120 s. - Cleanup leaves the log file on disk only when the test failed (postmortem-friendly). tests/test_run_demo_unit.py: - Bump test_defaults timeout expectation to match the new 120 s default. End-to-end manual run on this machine: 11 steps, wall_clock=2 s, exit 0. Integration test: 2 passed in 6.48 s. * chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131) Headline: - authlib 1.6.6 -> 1.7.2 (clears GHSA-wvwj-cvrp-7pv5 — JWS signature verification bypass; patched at >= 1.6.9) - fastmcp 2.14.4 -> 3.2.4 (clears GHSA-vv7q-7jx5-f767 — OpenAPI Provider SSRF + path traversal; patched at >= 3.2.0) Both CVEs were flagged on PR #129 by Socket Security and are pre-existing on dev (not introduced by #128). Wider scope — read before merging: `uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers a full re-resolve of the dependency graph. Because dev's uv.lock had drifted from pyproject.toml (the project's constraint envelope had loosened over time), this single command also brings the lockfile in sync with current pyproject.toml. Net diff: 243 insertions / 369 deletions on uv.lock; no other files touched. Transitive cascades worth flagging: - anthropic 0.77.0 -> 0.102.0 (pydantic-ai-slim extra) - pydantic-graph 1.51.0 -> 1.96.0 - temporalio 1.20.0 -> 1.27.2 - alembic 1.18.1 -> 1.18.4 - aws-* and cohere transitives bumped along - griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched) - Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus exporter, pydocket, redis, rsa, sortedcontainers — these were transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in. Verification on this host: - uv sync --extra dev -> green - ruff check . -> clean - mypy --strict app/ -> 192 files clean - pyright app/ -> 0 errors (50 warnings, pre-existing) - pytest -m 'not integration' -> 969 passed Known install quirk: griffelib 2.0.2 ships a top-level `griffe/` package whose RECORD files don't always materialize on first install when uv replaces an older `griffe` dist in the same sync. A clean venv install (which CI does via `uv sync --frozen`) is unaffected; local devs who upgrade in place may need a one-shot `uv pip install --force-reinstall griffelib` if `import griffe` fails. * feat(api,ui): in-product demo showcase page (#132) (#133) * feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132) New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It drives the published API surface in-process via httpx.ASGITransport (no cross-slice imports, satisfying the vertical-slice rule) and streams one StepEvent per pipeline step: precheck -> reset -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup. A module-level asyncio.Lock enforces single-flight; concurrent runs get an RFC 7807 409. The orchestration is a faithful in-process port of scripts/run_demo.py (PR #129). Implements PRP-17. * test(api): cover the demo slice pipeline, routes, and e2e integration (#132) Unit tests mock the in-process HTTP client to exercise step sequencing, winner selection, and fail-fast; route tests cover POST /demo/run (200 + 409) and the WS /demo/stream handler. The integration test seeds demo_minimal and asserts an end-to-end green run against real Postgres. Implements PRP-17. * feat(ui): add showcase page streaming the live demo pipeline (#132) New /showcase route and nav entry. The page opens a one-shot WebSocket to /demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders the 11 pipeline steps as live status cards: glyph, detail, duration, the backtest per-model WAPE breakdown with the winner highlighted, and a pass/fail summary banner. Also block-scopes a pre-existing no-case-declarations lint error in chat.tsx so pnpm lint is green for this PR. Implements PRP-17. * test(ui): add vitest setup and use-demo-pipeline hook coverage (#132) Adds the frontend test stack (vitest + jsdom + @testing-library/react), a test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the pure event reducer (idle -> running -> pass transitions, summary assembly, error phase) and a renderHook smoke test. The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented fix for pnpm 11's esbuild build-script gate. Implements PRP-17. * docs(docs): document the demo slice and showcase page (#132) Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the /demo/run + /demo/stream rows and a WebSocket Events section in API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and REPO_MAP_INDEX rows for the demo slice and showcase page. Implements PRP-17. * chore(main): release 0.2.10 (#135) * docs(docs): fix broken PRP-0/INITIAL-0 relative links in phase 0 doc (#138) (#139) * docs(repo): fix readme dev-deps command and stale .github template placeholders (#140) (#141) * docs(docs): fill DEV_GUIDE.md onboarding stub sections (#142) (#143) * docs(repo): refresh stale CLAUDE.md note, document /demo API, align PR template (#144) (#145) * fix(ui): chart series render black — drop hsl() wrapper on oklch chart vars (#149) (#153) index.css defines --chart-N twice: legacy shadcn-v3 HSL triplets and Tailwind-4 / shadcn-v4 complete oklch() colours. The oklch definitions win the cascade, so at runtime --chart-N is a full colour. The chart components still wrapped it as hsl(var(--chart-N)) → hsl(oklch(...)), which is invalid CSS, so recharts fell back to a black fill/stroke — invisible on the dark theme. Reference var(--chart-N) directly in backtest-folds-chart.tsx and time-series-chart.tsx. Verified in a browser: the backtest per-fold bars and the forecast line now render in colour. * fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148) (#152) * fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148) _execute_backtest ran BacktestingService.run_backtest — which computes per-fold metrics, stability indices and a naive/seasonal baseline comparison — but stored only four aggregated values and discarded the rest. The dashboard (/visualize/backtest) reads aggregated_metrics.{*_mean, stability_index}, fold_metrics[] and baseline_comparison, so it showed "0 folds", all-zero metrics and an empty chart. Add _shape_backtest_result(), which flattens a BacktestResponse into the contract the dashboard expects, and _finite(), which coerces NaN/inf to 0.0 so the result stays JSONB-safe (stability is NaN with fewer than two folds). Add app/features/jobs/tests/test_service.py with unit coverage for the shaping logic: fold metrics, *_mean keys, stability, baseline comparison, the no-baselines path, and NaN coercion. * refactor(jobs): centralize backtest metric keys and surface drift (#148) Addresses review feedback on PR #152. - Hoist the dashboard's metric set into _BACKTEST_METRICS and the headline stability metric into _STABILITY_METRIC, so the hardcoded keys live in one documented place instead of being repeated across the shaping logic. - Log jobs.backtest_metrics_missing when an expected metric is absent from the backtest response, so a future rename in the backtesting service fails loud instead of silently emitting 0.0. - Document the WAPE stability convention in the _shape_backtest_result docstring. - Tests: assert backtest_id / model_type / duration_ms pass through unchanged, and add a regression test for the missing-metric default path. * fix(ui): forecast page reads forecasts/forecast from predict job result (#147) (#151) The /visualize/forecast page never rendered the chart for a valid completed predict job. It read job.result.predictions with field `predicted`, but POST /jobs (job_type="predict") returns job.result.forecasts with field `forecast`. forecastData was therefore always undefined and the page fell through to "No prediction data available in job result". Read result.forecasts with field `forecast`, and pass predictedKey="forecast" to TimeSeriesChart (which already supports a configurable data key). Verified in a browser: entering a completed predict job ID now renders the 14-day forecast line chart with correct tooltip values. * fix(registry): tolerate multiple matches in _find_duplicate (#146) (#150) Under the default registry_duplicate_policy="detect", duplicate runs are created intentionally, so multiple non-archived model_run rows can share one config hash. _find_duplicate used scalar_one_or_none(), which raised MultipleResultsFound once two duplicates existed — POST /registry/runs then returned HTTP 500. This made the demo/Showcase register step fail deterministically on any DB with repeated runs. Order the lookup by created_at desc, LIMIT 1, and use scalars().first() so it returns the most recent matching run instead of asserting a single match. Add an integration regression test that POSTs an identical run three times under the detect policy and asserts all three return 201. * fix(ui): derive TimeSeriesChart line stroke from the config key (#156) (#157) TimeSeriesChart builds chartConfig with dynamic keys ([actualKey] / [predictedKey]), so shadcn's ChartContainer injects --color-<key> CSS variables. The <Line> elements, however, hardcoded stroke="var(--color-actual)" and stroke="var(--color-predicted)". The forecast page passes predictedKey="forecast", so the injected variable is --color-forecast; var(--color-predicted) was undefined, the stroke was invalid, and SVG fell back to its initial value `none` — the forecast line was invisible. Build the stroke from the key: stroke={`var(--color-${actualKey})`} / stroke={`var(--color-${predictedKey})`}. Verified in a browser: the forecast line now renders in colour. * feat(ui): job picker dropdown on forecast and backtest pages (#154) (#155) The visualization pages only accepted a job ID typed into a text box, so users had to already know the ID. Add a JobPicker component: a dropdown of completed jobs of the relevant type (predict / backtest), newest first, with each option labelled by short id, model and timestamp. - New shared component src/components/common/job-picker.tsx, used by both forecast.tsx and backtest.tsx. - The manual job-ID input stays alongside the dropdown for pasting an ID. - The most recent completed job auto-loads on mount so a chart shows immediately without interaction. No backend change — GET /jobs?job_type=&status=completed already exists. Verified in a browser on both pages. * chore(main): release 0.2.11 (#159)

…178) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-time. 4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate for demo_minimal can spend 60-90 s on slower laptops once you include inventory + prices + promotions inserts. 5. step_seed detail string: GenerateResult.records_created uses 'sales' (singular), not 'sales_daily'; cosmetic fix. tests/test_e2e_demo.py: - Redirect uvicorn stdout to a temp file rather than subprocess.PIPE. The seeder + structlog produce enough INFO log volume to fill a 64-KB pipe buffer; once full, uvicorn blocks on write and seeder requests hang for the full --timeout. Verified locally: integration suite now passes in ~6.5 s instead of timing out at 120 s. - Cleanup leaves the log file on disk only when the test failed (postmortem-friendly). tests/test_run_demo_unit.py: - Bump test_defaults timeout expectation to match the new 120 s default. End-to-end manual run on this machine: 11 steps, wall_clock=2 s, exit 0. Integration test: 2 passed in 6.48 s. * chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131) Headline: - authlib 1.6.6 -> 1.7.2 (clears GHSA-wvwj-cvrp-7pv5 — JWS signature verification bypass; patched at >= 1.6.9) - fastmcp 2.14.4 -> 3.2.4 (clears GHSA-vv7q-7jx5-f767 — OpenAPI Provider SSRF + path traversal; patched at >= 3.2.0) Both CVEs were flagged on PR #129 by Socket Security and are pre-existing on dev (not introduced by #128). Wider scope — read before merging: `uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers a full re-resolve of the dependency graph. Because dev's uv.lock had drifted from pyproject.toml (the project's constraint envelope had loosened over time), this single command also brings the lockfile in sync with current pyproject.toml. Net diff: 243 insertions / 369 deletions on uv.lock; no other files touched. Transitive cascades worth flagging: - anthropic 0.77.0 -> 0.102.0 (pydantic-ai-slim extra) - pydantic-graph 1.51.0 -> 1.96.0 - temporalio 1.20.0 -> 1.27.2 - alembic 1.18.1 -> 1.18.4 - aws-* and cohere transitives bumped along - griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched) - Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus exporter, pydocket, redis, rsa, sortedcontainers — these were transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in. Verification on this host: - uv sync --extra dev -> green - ruff check . -> clean - mypy --strict app/ -> 192 files clean - pyright app/ -> 0 errors (50 warnings, pre-existing) - pytest -m 'not integration' -> 969 passed Known install quirk: griffelib 2.0.2 ships a top-level `griffe/` package whose RECORD files don't always materialize on first install when uv replaces an older `griffe` dist in the same sync. A clean venv install (which CI does via `uv sync --frozen`) is unaffected; local devs who upgrade in place may need a one-shot `uv pip install --force-reinstall griffelib` if `import griffe` fails. * feat(api,ui): in-product demo showcase page (#132) (#133) * feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132) New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It drives the published API surface in-process via httpx.ASGITransport (no cross-slice imports, satisfying the vertical-slice rule) and streams one StepEvent per pipeline step: precheck -> reset -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup. A module-level asyncio.Lock enforces single-flight; concurrent runs get an RFC 7807 409. The orchestration is a faithful in-process port of scripts/run_demo.py (PR #129). Implements PRP-17. * test(api): cover the demo slice pipeline, routes, and e2e integration (#132) Unit tests mock the in-process HTTP client to exercise step sequencing, winner selection, and fail-fast; route tests cover POST /demo/run (200 + 409) and the WS /demo/stream handler. The integration test seeds demo_minimal and asserts an end-to-end green run against real Postgres. Implements PRP-17. * feat(ui): add showcase page streaming the live demo pipeline (#132) New /showcase route and nav entry. The page opens a one-shot WebSocket to /demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders the 11 pipeline steps as live status cards: glyph, detail, duration, the backtest per-model WAPE breakdown with the winner highlighted, and a pass/fail summary banner. Also block-scopes a pre-existing no-case-declarations lint error in chat.tsx so pnpm lint is green for this PR. Implements PRP-17. * test(ui): add vitest setup and use-demo-pipeline hook coverage (#132) Adds the frontend test stack (vitest + jsdom + @testing-library/react), a test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the pure event reducer (idle -> running -> pass transitions, summary assembly, error phase) and a renderHook smoke test. The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented fix for pnpm 11's esbuild build-script gate. Implements PRP-17. * docs(docs): document the demo slice and showcase page (#132) Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the /demo/run + /demo/stream rows and a WebSocket Events section in API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and REPO_MAP_INDEX rows for the demo slice and showcase page. Implements PRP-17. * chore(repo): back-merge main into dev to absorb v0.2.10 release commits (#136) (#137) * chore(main): release 0.2.9 (#126) * feat: release v0.2.10 — demo showcase page + e2e pipeline (#134) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-time. 4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate for demo_minimal can spend 60-90 s on slower laptops once you include inventory + prices + promotions inserts. 5. step_seed detail string: GenerateResult.records_created uses 'sales' (singular), not 'sales_daily'; cosmetic fix. tests/test_e2e_demo.py: - Redirect uvicorn stdout to a temp file rather than subprocess.PIPE. The seeder + structlog produce enough INFO log volume to fill a 64-KB pipe buffer; once full, uvicorn blocks on write and seeder requests hang for the full --timeout. Verified locally: integration suite now passes in ~6.5 s instead of timing out at 120 s. - Cleanup leaves the log file on disk only when the test failed (postmortem-friendly). tests/test_run_demo_unit.py: - Bump test_defaults timeout expectation to match the new 120 s default. End-to-end manual run on this machine: 11 steps, wall_clock=2 s, exit 0. Integration test: 2 passed in 6.48 s. * chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131) Headline: - authlib 1.6.6 -> 1.7.2 (clears GHSA-wvwj-cvrp-7pv5 — JWS signature verification bypass; patched at >= 1.6.9) - fastmcp 2.14.4 -> 3.2.4 (clears GHSA-vv7q-7jx5-f767 — OpenAPI Provider SSRF + path traversal; patched at >= 3.2.0) Both CVEs were flagged on PR #129 by Socket Security and are pre-existing on dev (not introduced by #128). Wider scope — read before merging: `uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers a full re-resolve of the dependency graph. Because dev's uv.lock had drifted from pyproject.toml (the project's constraint envelope had loosened over time), this single command also brings the lockfile in sync with current pyproject.toml. Net diff: 243 insertions / 369 deletions on uv.lock; no other files touched. Transitive cascades worth flagging: - anthropic 0.77.0 -> 0.102.0 (pydantic-ai-slim extra) - pydantic-graph 1.51.0 -> 1.96.0 - temporalio 1.20.0 -> 1.27.2 - alembic 1.18.1 -> 1.18.4 - aws-* and cohere transitives bumped along - griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched) - Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus exporter, pydocket, redis, rsa, sortedcontainers — these were transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in. Verification on this host: - uv sync --extra dev -> green - ruff check . -> clean - mypy --strict app/ -> 192 files clean - pyright app/ -> 0 errors (50 warnings, pre-existing) - pytest -m 'not integration' -> 969 passed Known install quirk: griffelib 2.0.2 ships a top-level `griffe/` package whose RECORD files don't always materialize on first install when uv replaces an older `griffe` dist in the same sync. A clean venv install (which CI does via `uv sync --frozen`) is unaffected; local devs who upgrade in place may need a one-shot `uv pip install --force-reinstall griffelib` if `import griffe` fails. * feat(api,ui): in-product demo showcase page (#132) (#133) * feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132) New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It drives the published API surface in-process via httpx.ASGITransport (no cross-slice imports, satisfying the vertical-slice rule) and streams one StepEvent per pipeline step: precheck -> reset -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup. A module-level asyncio.Lock enforces single-flight; concurrent runs get an RFC 7807 409. The orchestration is a faithful in-process port of scripts/run_demo.py (PR #129). Implements PRP-17. * test(api): cover the demo slice pipeline, routes, and e2e integration (#132) Unit tests mock the in-process HTTP client to exercise step sequencing, winner selection, and fail-fast; route tests cover POST /demo/run (200 + 409) and the WS /demo/stream handler. The integration test seeds demo_minimal and asserts an end-to-end green run against real Postgres. Implements PRP-17. * feat(ui): add showcase page streaming the live demo pipeline (#132) New /showcase route and nav entry. The page opens a one-shot WebSocket to /demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders the 11 pipeline steps as live status cards: glyph, detail, duration, the backtest per-model WAPE breakdown with the winner highlighted, and a pass/fail summary banner. Also block-scopes a pre-existing no-case-declarations lint error in chat.tsx so pnpm lint is green for this PR. Implements PRP-17. * test(ui): add vitest setup and use-demo-pipeline hook coverage (#132) Adds the frontend test stack (vitest + jsdom + @testing-library/react), a test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the pure event reducer (idle -> running -> pass transitions, summary assembly, error phase) and a renderHook smoke test. The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented fix for pnpm 11's esbuild build-script gate. Implements PRP-17. * docs(docs): document the demo slice and showcase page (#132) Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the /demo/run + /demo/stream rows and a WebSocket Events section in API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and REPO_MAP_INDEX rows for the demo slice and showcase page. Implements PRP-17. * chore(main): release 0.2.10 (#135) * docs(docs): fix broken PRP-0/INITIAL-0 relative links in phase 0 doc (#138) (#139) * docs(repo): fix readme dev-deps command and stale .github template placeholders (#140) (#141) * docs(docs): fill DEV_GUIDE.md onboarding stub sections (#142) (#143) * docs(repo): refresh stale CLAUDE.md note, document /demo API, align PR template (#144) (#145) * fix(ui): chart series render black — drop hsl() wrapper on oklch chart vars (#149) (#153) index.css defines --chart-N twice: legacy shadcn-v3 HSL triplets and Tailwind-4 / shadcn-v4 complete oklch() colours. The oklch definitions win the cascade, so at runtime --chart-N is a full colour. The chart components still wrapped it as hsl(var(--chart-N)) → hsl(oklch(...)), which is invalid CSS, so recharts fell back to a black fill/stroke — invisible on the dark theme. Reference var(--chart-N) directly in backtest-folds-chart.tsx and time-series-chart.tsx. Verified in a browser: the backtest per-fold bars and the forecast line now render in colour. * fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148) (#152) * fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148) _execute_backtest ran BacktestingService.run_backtest — which computes per-fold metrics, stability indices and a naive/seasonal baseline comparison — but stored only four aggregated values and discarded the rest. The dashboard (/visualize/backtest) reads aggregated_metrics.{*_mean, stability_index}, fold_metrics[] and baseline_comparison, so it showed "0 folds", all-zero metrics and an empty chart. Add _shape_backtest_result(), which flattens a BacktestResponse into the contract the dashboard expects, and _finite(), which coerces NaN/inf to 0.0 so the result stays JSONB-safe (stability is NaN with fewer than two folds). Add app/features/jobs/tests/test_service.py with unit coverage for the shaping logic: fold metrics, *_mean keys, stability, baseline comparison, the no-baselines path, and NaN coercion. * refactor(jobs): centralize backtest metric keys and surface drift (#148) Addresses review feedback on PR #152. - Hoist the dashboard's metric set into _BACKTEST_METRICS and the headline stability metric into _STABILITY_METRIC, so the hardcoded keys live in one documented place instead of being repeated across the shaping logic. - Log jobs.backtest_metrics_missing when an expected metric is absent from the backtest response, so a future rename in the backtesting service fails loud instead of silently emitting 0.0. - Document the WAPE stability convention in the _shape_backtest_result docstring. - Tests: assert backtest_id / model_type / duration_ms pass through unchanged, and add a regression test for the missing-metric default path. * fix(ui): forecast page reads forecasts/forecast from predict job result (#147) (#151) The /visualize/forecast page never rendered the chart for a valid completed predict job. It read job.result.predictions with field `predicted`, but POST /jobs (job_type="predict") returns job.result.forecasts with field `forecast`. forecastData was therefore always undefined and the page fell through to "No prediction data available in job result". Read result.forecasts with field `forecast`, and pass predictedKey="forecast" to TimeSeriesChart (which already supports a configurable data key). Verified in a browser: entering a completed predict job ID now renders the 14-day forecast line chart with correct tooltip values. * fix(registry): tolerate multiple matches in _find_duplicate (#146) (#150) Under the default registry_duplicate_policy="detect", duplicate runs are created intentionally, so multiple non-archived model_run rows can share one config hash. _find_duplicate used scalar_one_or_none(), which raised MultipleResultsFound once two duplicates existed — POST /registry/runs then returned HTTP 500. This made the demo/Showcase register step fail deterministically on any DB with repeated runs. Order the lookup by created_at desc, LIMIT 1, and use scalars().first() so it returns the most recent matching run instead of asserting a single match. Add an integration regression test that POSTs an identical run three times under the detect policy and asserts all three return 201. * fix(ui): derive TimeSeriesChart line stroke from the config key (#156) (#157) TimeSeriesChart builds chartConfig with dynamic keys ([actualKey] / [predictedKey]), so shadcn's ChartContainer injects --color-<key> CSS variables. The <Line> elements, however, hardcoded stroke="var(--color-actual)" and stroke="var(--color-predicted)". The forecast page passes predictedKey="forecast", so the injected variable is --color-forecast; var(--color-predicted) was undefined, the stroke was invalid, and SVG fell back to its initial value `none` — the forecast line was invisible. Build the stroke from the key: stroke={`var(--color-${actualKey})`} / stroke={`var(--color-${predictedKey})`}. Verified in a browser: the forecast line now renders in colour. * feat(ui): job picker dropdown on forecast and backtest pages (#154) (#155) The visualization pages only accepted a job ID typed into a text box, so users had to already know the ID. Add a JobPicker component: a dropdown of completed jobs of the relevant type (predict / backtest), newest first, with each option labelled by short id, model and timestamp. - New shared component src/components/common/job-picker.tsx, used by both forecast.tsx and backtest.tsx. - The manual job-ID input stays alongside the dropdown for pasting an ID. - The most recent completed job auto-loads on mount so a chart shows immediately without interaction. No backend change — GET /jobs?job_type=&status=completed already exists. Verified in a browser on both pages. * chore(repo): back-merge main into dev after v0.2.11 (#160) (#161) * chore(main): release 0.2.9 (#126) * feat: release v0.2.10 — demo showcase page + e2e pipeline (#134) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-time. 4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate for demo_minimal can spend 60-90 s on slower laptops once you include inventory + prices + promotions inserts. 5. step_seed detail string: GenerateResult.records_created uses 'sales' (singular), not 'sales_daily'; cosmetic fix. tests/test_e2e_demo.py: - Redirect uvicorn stdout to a temp file rather than subprocess.PIPE. The seeder + structlog produce enough INFO log volume to fill a 64-KB pipe buffer; once full, uvicorn blocks on write and seeder requests hang for the full --timeout. Verified locally: integration suite now passes in ~6.5 s instead of timing out at 120 s. - Cleanup leaves the log file on disk only when the test failed (postmortem-friendly). tests/test_run_demo_unit.py: - Bump test_defaults timeout expectation to match the new 120 s default. End-to-end manual run on this machine: 11 steps, wall_clock=2 s, exit 0. Integration test: 2 passed in 6.48 s. * chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131) Headline: - authlib 1.6.6 -> 1.7.2 (clears GHSA-wvwj-cvrp-7pv5 — JWS signature verification bypass; patched at >= 1.6.9) - fastmcp 2.14.4 -> 3.2.4 (clears GHSA-vv7q-7jx5-f767 — OpenAPI Provider SSRF + path traversal; patched at >= 3.2.0) Both CVEs were flagged on PR #129 by Socket Security and are pre-existing on dev (not introduced by #128). Wider scope — read before merging: `uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers a full re-resolve of the dependency graph. Because dev's uv.lock had drifted from pyproject.toml (the project's constraint envelope had loosened over time), this single command also brings the lockfile in sync with current pyproject.toml. Net diff: 243 insertions / 369 deletions on uv.lock; no other files touched. Transitive cascades worth flagging: - anthropic 0.77.0 -> 0.102.0 (pydantic-ai-slim extra) - pydantic-graph 1.51.0 -> 1.96.0 - temporalio 1.20.0 -> 1.27.2 - alembic 1.18.1 -> 1.18.4 - aws-* and cohere transitives bumped along - griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched) - Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus exporter, pydocket, redis, rsa, sortedcontainers — these were transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in. Verification on this host: - uv sync --extra dev -> green - ruff check . -> clean - mypy --strict app/ -> 192 files clean - pyright app/ -> 0 errors (50 warnings, pre-existing) - pytest -m 'not integration' -> 969 passed Known install quirk: griffelib 2.0.2 ships a top-level `griffe/` package whose RECORD files don't always materialize on first install when uv replaces an older `griffe` dist in the same sync. A clean venv install (which CI does via `uv sync --frozen`) is unaffected; local devs who upgrade in place may need a one-shot `uv pip install --force-reinstall griffelib` if `import griffe` fails. * feat(api,ui): in-product demo showcase page (#132) (#133) * feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132) New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It drives the published API surface in-process via httpx.ASGITransport (no cross-slice imports, satisfying the vertical-slice rule) and streams one StepEvent per pipeline step: precheck -> reset -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup. A module-level asyncio.Lock enforces single-flight; concurrent runs get an RFC 7807 409. The orchestration is a faithful in-process port of scripts/run_demo.py (PR #129). Implements PRP-17. * test(api): cover the demo slice pipeline, routes, and e2e integration (#132) Unit tests mock the in-process HTTP client to exercise step sequencing, winner selection, and fail-fast; route tests cover POST /demo/run (200 + 409) and the WS /demo/stream handler. The integration test seeds demo_minimal and asserts an end-to-end green run against real Postgres. Implements PRP-17. * feat(ui): add showcase page streaming the live demo pipeline (#132) New /showcase route and nav entry. The page opens a one-shot WebSocket to /demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders the 11 pipeline steps as live status cards: glyph, detail, duration, the backtest per-model WAPE breakdown with the winner highlighted, and a pass/fail summary banner. Also block-scopes a pre-existing no-case-declarations lint error in chat.tsx so pnpm lint is green for this PR. Implements PRP-17. * test(ui): add vitest setup and use-demo-pipeline hook coverage (#132) Adds the frontend test stack (vitest + jsdom + @testing-library/react), a test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the pure event reducer (idle -> running -> pass transitions, summary assembly, error phase) and a renderHook smoke test. The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented fix for pnpm 11's esbuild build-script gate. Implements PRP-17. * docs(docs): document the demo slice and showcase page (#132) Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the /demo/run + /demo/stream rows and a WebSocket Events section in API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and REPO_MAP_INDEX rows for the demo slice and showcase page. Implements PRP-17. * chore(main): release 0.2.10 (#135) * feat: release v0.2.11 — visualization fixes, job picker, demo showcase (#158) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-time. 4.…

…191) (#192) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-time. 4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate for demo_minimal can spend 60-90 s on slower laptops once you include inventory + prices + promotions inserts. 5. step_seed detail string: GenerateResult.records_created uses 'sales' (singular), not 'sales_daily'; cosmetic fix. tests/test_e2e_demo.py: - Redirect uvicorn stdout to a temp file rather than subprocess.PIPE. The seeder + structlog produce enough INFO log volume to fill a 64-KB pipe buffer; once full, uvicorn blocks on write and seeder requests hang for the full --timeout. Verified locally: integration suite now passes in ~6.5 s instead of timing out at 120 s. - Cleanup leaves the log file on disk only when the test failed (postmortem-friendly). tests/test_run_demo_unit.py: - Bump test_defaults timeout expectation to match the new 120 s default. End-to-end manual run on this machine: 11 steps, wall_clock=2 s, exit 0. Integration test: 2 passed in 6.48 s. * chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131) Headline: - authlib 1.6.6 -> 1.7.2 (clears GHSA-wvwj-cvrp-7pv5 — JWS signature verification bypass; patched at >= 1.6.9) - fastmcp 2.14.4 -> 3.2.4 (clears GHSA-vv7q-7jx5-f767 — OpenAPI Provider SSRF + path traversal; patched at >= 3.2.0) Both CVEs were flagged on PR #129 by Socket Security and are pre-existing on dev (not introduced by #128). Wider scope — read before merging: `uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers a full re-resolve of the dependency graph. Because dev's uv.lock had drifted from pyproject.toml (the project's constraint envelope had loosened over time), this single command also brings the lockfile in sync with current pyproject.toml. Net diff: 243 insertions / 369 deletions on uv.lock; no other files touched. Transitive cascades worth flagging: - anthropic 0.77.0 -> 0.102.0 (pydantic-ai-slim extra) - pydantic-graph 1.51.0 -> 1.96.0 - temporalio 1.20.0 -> 1.27.2 - alembic 1.18.1 -> 1.18.4 - aws-* and cohere transitives bumped along - griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched) - Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus exporter, pydocket, redis, rsa, sortedcontainers — these were transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in. Verification on this host: - uv sync --extra dev -> green - ruff check . -> clean - mypy --strict app/ -> 192 files clean - pyright app/ -> 0 errors (50 warnings, pre-existing) - pytest -m 'not integration' -> 969 passed Known install quirk: griffelib 2.0.2 ships a top-level `griffe/` package whose RECORD files don't always materialize on first install when uv replaces an older `griffe` dist in the same sync. A clean venv install (which CI does via `uv sync --frozen`) is unaffected; local devs who upgrade in place may need a one-shot `uv pip install --force-reinstall griffelib` if `import griffe` fails. * feat(api,ui): in-product demo showcase page (#132) (#133) * feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132) New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It drives the published API surface in-process via httpx.ASGITransport (no cross-slice imports, satisfying the vertical-slice rule) and streams one StepEvent per pipeline step: precheck -> reset -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup. A module-level asyncio.Lock enforces single-flight; concurrent runs get an RFC 7807 409. The orchestration is a faithful in-process port of scripts/run_demo.py (PR #129). Implements PRP-17. * test(api): cover the demo slice pipeline, routes, and e2e integration (#132) Unit tests mock the in-process HTTP client to exercise step sequencing, winner selection, and fail-fast; route tests cover POST /demo/run (200 + 409) and the WS /demo/stream handler. The integration test seeds demo_minimal and asserts an end-to-end green run against real Postgres. Implements PRP-17. * feat(ui): add showcase page streaming the live demo pipeline (#132) New /showcase route and nav entry. The page opens a one-shot WebSocket to /demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders the 11 pipeline steps as live status cards: glyph, detail, duration, the backtest per-model WAPE breakdown with the winner highlighted, and a pass/fail summary banner. Also block-scopes a pre-existing no-case-declarations lint error in chat.tsx so pnpm lint is green for this PR. Implements PRP-17. * test(ui): add vitest setup and use-demo-pipeline hook coverage (#132) Adds the frontend test stack (vitest + jsdom + @testing-library/react), a test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the pure event reducer (idle -> running -> pass transitions, summary assembly, error phase) and a renderHook smoke test. The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented fix for pnpm 11's esbuild build-script gate. Implements PRP-17. * docs(docs): document the demo slice and showcase page (#132) Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the /demo/run + /demo/stream rows and a WebSocket Events section in API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and REPO_MAP_INDEX rows for the demo slice and showcase page. Implements PRP-17. * chore(repo): back-merge main into dev to absorb v0.2.10 release commits (#136) (#137) * chore(main): release 0.2.9 (#126) * feat: release v0.2.10 — demo showcase page + e2e pipeline (#134) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-time. 4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate for demo_minimal can spend 60-90 s on slower laptops once you include inventory + prices + promotions inserts. 5. step_seed detail string: GenerateResult.records_created uses 'sales' (singular), not 'sales_daily'; cosmetic fix. tests/test_e2e_demo.py: - Redirect uvicorn stdout to a temp file rather than subprocess.PIPE. The seeder + structlog produce enough INFO log volume to fill a 64-KB pipe buffer; once full, uvicorn blocks on write and seeder requests hang for the full --timeout. Verified locally: integration suite now passes in ~6.5 s instead of timing out at 120 s. - Cleanup leaves the log file on disk only when the test failed (postmortem-friendly). tests/test_run_demo_unit.py: - Bump test_defaults timeout expectation to match the new 120 s default. End-to-end manual run on this machine: 11 steps, wall_clock=2 s, exit 0. Integration test: 2 passed in 6.48 s. * chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131) Headline: - authlib 1.6.6 -> 1.7.2 (clears GHSA-wvwj-cvrp-7pv5 — JWS signature verification bypass; patched at >= 1.6.9) - fastmcp 2.14.4 -> 3.2.4 (clears GHSA-vv7q-7jx5-f767 — OpenAPI Provider SSRF + path traversal; patched at >= 3.2.0) Both CVEs were flagged on PR #129 by Socket Security and are pre-existing on dev (not introduced by #128). Wider scope — read before merging: `uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers a full re-resolve of the dependency graph. Because dev's uv.lock had drifted from pyproject.toml (the project's constraint envelope had loosened over time), this single command also brings the lockfile in sync with current pyproject.toml. Net diff: 243 insertions / 369 deletions on uv.lock; no other files touched. Transitive cascades worth flagging: - anthropic 0.77.0 -> 0.102.0 (pydantic-ai-slim extra) - pydantic-graph 1.51.0 -> 1.96.0 - temporalio 1.20.0 -> 1.27.2 - alembic 1.18.1 -> 1.18.4 - aws-* and cohere transitives bumped along - griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched) - Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus exporter, pydocket, redis, rsa, sortedcontainers — these were transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in. Verification on this host: - uv sync --extra dev -> green - ruff check . -> clean - mypy --strict app/ -> 192 files clean - pyright app/ -> 0 errors (50 warnings, pre-existing) - pytest -m 'not integration' -> 969 passed Known install quirk: griffelib 2.0.2 ships a top-level `griffe/` package whose RECORD files don't always materialize on first install when uv replaces an older `griffe` dist in the same sync. A clean venv install (which CI does via `uv sync --frozen`) is unaffected; local devs who upgrade in place may need a one-shot `uv pip install --force-reinstall griffelib` if `import griffe` fails. * feat(api,ui): in-product demo showcase page (#132) (#133) * feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132) New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It drives the published API surface in-process via httpx.ASGITransport (no cross-slice imports, satisfying the vertical-slice rule) and streams one StepEvent per pipeline step: precheck -> reset -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup. A module-level asyncio.Lock enforces single-flight; concurrent runs get an RFC 7807 409. The orchestration is a faithful in-process port of scripts/run_demo.py (PR #129). Implements PRP-17. * test(api): cover the demo slice pipeline, routes, and e2e integration (#132) Unit tests mock the in-process HTTP client to exercise step sequencing, winner selection, and fail-fast; route tests cover POST /demo/run (200 + 409) and the WS /demo/stream handler. The integration test seeds demo_minimal and asserts an end-to-end green run against real Postgres. Implements PRP-17. * feat(ui): add showcase page streaming the live demo pipeline (#132) New /showcase route and nav entry. The page opens a one-shot WebSocket to /demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders the 11 pipeline steps as live status cards: glyph, detail, duration, the backtest per-model WAPE breakdown with the winner highlighted, and a pass/fail summary banner. Also block-scopes a pre-existing no-case-declarations lint error in chat.tsx so pnpm lint is green for this PR. Implements PRP-17. * test(ui): add vitest setup and use-demo-pipeline hook coverage (#132) Adds the frontend test stack (vitest + jsdom + @testing-library/react), a test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the pure event reducer (idle -> running -> pass transitions, summary assembly, error phase) and a renderHook smoke test. The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented fix for pnpm 11's esbuild build-script gate. Implements PRP-17. * docs(docs): document the demo slice and showcase page (#132) Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the /demo/run + /demo/stream rows and a WebSocket Events section in API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and REPO_MAP_INDEX rows for the demo slice and showcase page. Implements PRP-17. * chore(main): release 0.2.10 (#135) * docs(docs): fix broken PRP-0/INITIAL-0 relative links in phase 0 doc (#138) (#139) * docs(repo): fix readme dev-deps command and stale .github template placeholders (#140) (#141) * docs(docs): fill DEV_GUIDE.md onboarding stub sections (#142) (#143) * docs(repo): refresh stale CLAUDE.md note, document /demo API, align PR template (#144) (#145) * fix(ui): chart series render black — drop hsl() wrapper on oklch chart vars (#149) (#153) index.css defines --chart-N twice: legacy shadcn-v3 HSL triplets and Tailwind-4 / shadcn-v4 complete oklch() colours. The oklch definitions win the cascade, so at runtime --chart-N is a full colour. The chart components still wrapped it as hsl(var(--chart-N)) → hsl(oklch(...)), which is invalid CSS, so recharts fell back to a black fill/stroke — invisible on the dark theme. Reference var(--chart-N) directly in backtest-folds-chart.tsx and time-series-chart.tsx. Verified in a browser: the backtest per-fold bars and the forecast line now render in colour. * fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148) (#152) * fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148) _execute_backtest ran BacktestingService.run_backtest — which computes per-fold metrics, stability indices and a naive/seasonal baseline comparison — but stored only four aggregated values and discarded the rest. The dashboard (/visualize/backtest) reads aggregated_metrics.{*_mean, stability_index}, fold_metrics[] and baseline_comparison, so it showed "0 folds", all-zero metrics and an empty chart. Add _shape_backtest_result(), which flattens a BacktestResponse into the contract the dashboard expects, and _finite(), which coerces NaN/inf to 0.0 so the result stays JSONB-safe (stability is NaN with fewer than two folds). Add app/features/jobs/tests/test_service.py with unit coverage for the shaping logic: fold metrics, *_mean keys, stability, baseline comparison, the no-baselines path, and NaN coercion. * refactor(jobs): centralize backtest metric keys and surface drift (#148) Addresses review feedback on PR #152. - Hoist the dashboard's metric set into _BACKTEST_METRICS and the headline stability metric into _STABILITY_METRIC, so the hardcoded keys live in one documented place instead of being repeated across the shaping logic. - Log jobs.backtest_metrics_missing when an expected metric is absent from the backtest response, so a future rename in the backtesting service fails loud instead of silently emitting 0.0. - Document the WAPE stability convention in the _shape_backtest_result docstring. - Tests: assert backtest_id / model_type / duration_ms pass through unchanged, and add a regression test for the missing-metric default path. * fix(ui): forecast page reads forecasts/forecast from predict job result (#147) (#151) The /visualize/forecast page never rendered the chart for a valid completed predict job. It read job.result.predictions with field `predicted`, but POST /jobs (job_type="predict") returns job.result.forecasts with field `forecast`. forecastData was therefore always undefined and the page fell through to "No prediction data available in job result". Read result.forecasts with field `forecast`, and pass predictedKey="forecast" to TimeSeriesChart (which already supports a configurable data key). Verified in a browser: entering a completed predict job ID now renders the 14-day forecast line chart with correct tooltip values. * fix(registry): tolerate multiple matches in _find_duplicate (#146) (#150) Under the default registry_duplicate_policy="detect", duplicate runs are created intentionally, so multiple non-archived model_run rows can share one config hash. _find_duplicate used scalar_one_or_none(), which raised MultipleResultsFound once two duplicates existed — POST /registry/runs then returned HTTP 500. This made the demo/Showcase register step fail deterministically on any DB with repeated runs. Order the lookup by created_at desc, LIMIT 1, and use scalars().first() so it returns the most recent matching run instead of asserting a single match. Add an integration regression test that POSTs an identical run three times under the detect policy and asserts all three return 201. * fix(ui): derive TimeSeriesChart line stroke from the config key (#156) (#157) TimeSeriesChart builds chartConfig with dynamic keys ([actualKey] / [predictedKey]), so shadcn's ChartContainer injects --color-<key> CSS variables. The <Line> elements, however, hardcoded stroke="var(--color-actual)" and stroke="var(--color-predicted)". The forecast page passes predictedKey="forecast", so the injected variable is --color-forecast; var(--color-predicted) was undefined, the stroke was invalid, and SVG fell back to its initial value `none` — the forecast line was invisible. Build the stroke from the key: stroke={`var(--color-${actualKey})`} / stroke={`var(--color-${predictedKey})`}. Verified in a browser: the forecast line now renders in colour. * feat(ui): job picker dropdown on forecast and backtest pages (#154) (#155) The visualization pages only accepted a job ID typed into a text box, so users had to already know the ID. Add a JobPicker component: a dropdown of completed jobs of the relevant type (predict / backtest), newest first, with each option labelled by short id, model and timestamp. - New shared component src/components/common/job-picker.tsx, used by both forecast.tsx and backtest.tsx. - The manual job-ID input stays alongside the dropdown for pasting an ID. - The most recent completed job auto-loads on mount so a chart shows immediately without interaction. No backend change — GET /jobs?job_type=&status=completed already exists. Verified in a browser on both pages. * chore(repo): back-merge main into dev after v0.2.11 (#160) (#161) * chore(main): release 0.2.9 (#126) * feat: release v0.2.10 — demo showcase page + e2e pipeline (#134) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-time. 4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate for demo_minimal can spend 60-90 s on slower laptops once you include inventory + prices + promotions inserts. 5. step_seed detail string: GenerateResult.records_created uses 'sales' (singular), not 'sales_daily'; cosmetic fix. tests/test_e2e_demo.py: - Redirect uvicorn stdout to a temp file rather than subprocess.PIPE. The seeder + structlog produce enough INFO log volume to fill a 64-KB pipe buffer; once full, uvicorn blocks on write and seeder requests hang for the full --timeout. Verified locally: integration suite now passes in ~6.5 s instead of timing out at 120 s. - Cleanup leaves the log file on disk only when the test failed (postmortem-friendly). tests/test_run_demo_unit.py: - Bump test_defaults timeout expectation to match the new 120 s default. End-to-end manual run on this machine: 11 steps, wall_clock=2 s, exit 0. Integration test: 2 passed in 6.48 s. * chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131) Headline: - authlib 1.6.6 -> 1.7.2 (clears GHSA-wvwj-cvrp-7pv5 — JWS signature verification bypass; patched at >= 1.6.9) - fastmcp 2.14.4 -> 3.2.4 (clears GHSA-vv7q-7jx5-f767 — OpenAPI Provider SSRF + path traversal; patched at >= 3.2.0) Both CVEs were flagged on PR #129 by Socket Security and are pre-existing on dev (not introduced by #128). Wider scope — read before merging: `uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers a full re-resolve of the dependency graph. Because dev's uv.lock had drifted from pyproject.toml (the project's constraint envelope had loosened over time), this single command also brings the lockfile in sync with current pyproject.toml. Net diff: 243 insertions / 369 deletions on uv.lock; no other files touched. Transitive cascades worth flagging: - anthropic 0.77.0 -> 0.102.0 (pydantic-ai-slim extra) - pydantic-graph 1.51.0 -> 1.96.0 - temporalio 1.20.0 -> 1.27.2 - alembic 1.18.1 -> 1.18.4 - aws-* and cohere transitives bumped along - griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched) - Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus exporter, pydocket, redis, rsa, sortedcontainers — these were transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in. Verification on this host: - uv sync --extra dev -> green - ruff check . -> clean - mypy --strict app/ -> 192 files clean - pyright app/ -> 0 errors (50 warnings, pre-existing) - pytest -m 'not integration' -> 969 passed Known install quirk: griffelib 2.0.2 ships a top-level `griffe/` package whose RECORD files don't always materialize on first install when uv replaces an older `griffe` dist in the same sync. A clean venv install (which CI does via `uv sync --frozen`) is unaffected; local devs who upgrade in place may need a one-shot `uv pip install --force-reinstall griffelib` if `import griffe` fails. * feat(api,ui): in-product demo showcase page (#132) (#133) * feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132) New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It drives the published API surface in-process via httpx.ASGITransport (no cross-slice imports, satisfying the vertical-slice rule) and streams one StepEvent per pipeline step: precheck -> reset -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup. A module-level asyncio.Lock enforces single-flight; concurrent runs get an RFC 7807 409. The orchestration is a faithful in-process port of scripts/run_demo.py (PR #129). Implements PRP-17. * test(api): cover the demo slice pipeline, routes, and e2e integration (#132) Unit tests mock the in-process HTTP client to exercise step sequencing, winner selection, and fail-fast; route tests cover POST /demo/run (200 + 409) and the WS /demo/stream handler. The integration test seeds demo_minimal and asserts an end-to-end green run against real Postgres. Implements PRP-17. * feat(ui): add showcase page streaming the live demo pipeline (#132) New /showcase route and nav entry. The page opens a one-shot WebSocket to /demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders the 11 pipeline steps as live status cards: glyph, detail, duration, the backtest per-model WAPE breakdown with the winner highlighted, and a pass/fail summary banner. Also block-scopes a pre-existing no-case-declarations lint error in chat.tsx so pnpm lint is green for this PR. Implements PRP-17. * test(ui): add vitest setup and use-demo-pipeline hook coverage (#132) Adds the frontend test stack (vitest + jsdom + @testing-library/react), a test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the pure event reducer (idle -> running -> pass transitions, summary assembly, error phase) and a renderHook smoke test. The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented fix for pnpm 11's esbuild build-script gate. Implements PRP-17. * docs(docs): document the demo slice and showcase page (#132) Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the /demo/run + /demo/stream rows and a WebSocket Events section in API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and REPO_MAP_INDEX rows for the demo slice and showcase page. Implements PRP-17. * chore(main): release 0.2.10 (#135) * feat: release v0.2.11 — visualization fixes, job picker, demo showcase (#158) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-ti…

* feat: cut v0.2.13 — explorer interactivity, knowledge & guide pages (#191) (#192) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-time. 4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate for demo_minimal can spend 60-90 s on slower laptops once you include inventory + prices + promotions inserts. 5. step_seed detail string: GenerateResult.records_created uses 'sales' (singular), not 'sales_daily'; cosmetic fix. tests/test_e2e_demo.py: - Redirect uvicorn stdout to a temp file rather than subprocess.PIPE. The seeder + structlog produce enough INFO log volume to fill a 64-KB pipe buffer; once full, uvicorn blocks on write and seeder requests hang for the full --timeout. Verified locally: integration suite now passes in ~6.5 s instead of timing out at 120 s. - Cleanup leaves the log file on disk only when the test failed (postmortem-friendly). tests/test_run_demo_unit.py: - Bump test_defaults timeout expectation to match the new 120 s default. End-to-end manual run on this machine: 11 steps, wall_clock=2 s, exit 0. Integration test: 2 passed in 6.48 s. * chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131) Headline: - authlib 1.6.6 -> 1.7.2 (clears GHSA-wvwj-cvrp-7pv5 — JWS signature verification bypass; patched at >= 1.6.9) - fastmcp 2.14.4 -> 3.2.4 (clears GHSA-vv7q-7jx5-f767 — OpenAPI Provider SSRF + path traversal; patched at >= 3.2.0) Both CVEs were flagged on PR #129 by Socket Security and are pre-existing on dev (not introduced by #128). Wider scope — read before merging: `uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers a full re-resolve of the dependency graph. Because dev's uv.lock had drifted from pyproject.toml (the project's constraint envelope had loosened over time), this single command also brings the lockfile in sync with current pyproject.toml. Net diff: 243 insertions / 369 deletions on uv.lock; no other files touched. Transitive cascades worth flagging: - anthropic 0.77.0 -> 0.102.0 (pydantic-ai-slim extra) - pydantic-graph 1.51.0 -> 1.96.0 - temporalio 1.20.0 -> 1.27.2 - alembic 1.18.1 -> 1.18.4 - aws-* and cohere transitives bumped along - griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched) - Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus exporter, pydocket, redis, rsa, sortedcontainers — these were transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in. Verification on this host: - uv sync --extra dev -> green - ruff check . -> clean - mypy --strict app/ -> 192 files clean - pyright app/ -> 0 errors (50 warnings, pre-existing) - pytest -m 'not integration' -> 969 passed Known install quirk: griffelib 2.0.2 ships a top-level `griffe/` package whose RECORD files don't always materialize on first install when uv replaces an older `griffe` dist in the same sync. A clean venv install (which CI does via `uv sync --frozen`) is unaffected; local devs who upgrade in place may need a one-shot `uv pip install --force-reinstall griffelib` if `import griffe` fails. * feat(api,ui): in-product demo showcase page (#132) (#133) * feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132) New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It drives the published API surface in-process via httpx.ASGITransport (no cross-slice imports, satisfying the vertical-slice rule) and streams one StepEvent per pipeline step: precheck -> reset -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup. A module-level asyncio.Lock enforces single-flight; concurrent runs get an RFC 7807 409. The orchestration is a faithful in-process port of scripts/run_demo.py (PR #129). Implements PRP-17. * test(api): cover the demo slice pipeline, routes, and e2e integration (#132) Unit tests mock the in-process HTTP client to exercise step sequencing, winner selection, and fail-fast; route tests cover POST /demo/run (200 + 409) and the WS /demo/stream handler. The integration test seeds demo_minimal and asserts an end-to-end green run against real Postgres. Implements PRP-17. * feat(ui): add showcase page streaming the live demo pipeline (#132) New /showcase route and nav entry. The page opens a one-shot WebSocket to /demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders the 11 pipeline steps as live status cards: glyph, detail, duration, the backtest per-model WAPE breakdown with the winner highlighted, and a pass/fail summary banner. Also block-scopes a pre-existing no-case-declarations lint error in chat.tsx so pnpm lint is green for this PR. Implements PRP-17. * test(ui): add vitest setup and use-demo-pipeline hook coverage (#132) Adds the frontend test stack (vitest + jsdom + @testing-library/react), a test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the pure event reducer (idle -> running -> pass transitions, summary assembly, error phase) and a renderHook smoke test. The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented fix for pnpm 11's esbuild build-script gate. Implements PRP-17. * docs(docs): document the demo slice and showcase page (#132) Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the /demo/run + /demo/stream rows and a WebSocket Events section in API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and REPO_MAP_INDEX rows for the demo slice and showcase page. Implements PRP-17. * chore(repo): back-merge main into dev to absorb v0.2.10 release commits (#136) (#137) * chore(main): release 0.2.9 (#126) * feat: release v0.2.10 — demo showcase page + e2e pipeline (#134) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-time. 4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate for demo_minimal can spend 60-90 s on slower laptops once you include inventory + prices + promotions inserts. 5. step_seed detail string: GenerateResult.records_created uses 'sales' (singular), not 'sales_daily'; cosmetic fix. tests/test_e2e_demo.py: - Redirect uvicorn stdout to a temp file rather than subprocess.PIPE. The seeder + structlog produce enough INFO log volume to fill a 64-KB pipe buffer; once full, uvicorn blocks on write and seeder requests hang for the full --timeout. Verified locally: integration suite now passes in ~6.5 s instead of timing out at 120 s. - Cleanup leaves the log file on disk only when the test failed (postmortem-friendly). tests/test_run_demo_unit.py: - Bump test_defaults timeout expectation to match the new 120 s default. End-to-end manual run on this machine: 11 steps, wall_clock=2 s, exit 0. Integration test: 2 passed in 6.48 s. * chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131) Headline: - authlib 1.6.6 -> 1.7.2 (clears GHSA-wvwj-cvrp-7pv5 — JWS signature verification bypass; patched at >= 1.6.9) - fastmcp 2.14.4 -> 3.2.4 (clears GHSA-vv7q-7jx5-f767 — OpenAPI Provider SSRF + path traversal; patched at >= 3.2.0) Both CVEs were flagged on PR #129 by Socket Security and are pre-existing on dev (not introduced by #128). Wider scope — read before merging: `uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers a full re-resolve of the dependency graph. Because dev's uv.lock had drifted from pyproject.toml (the project's constraint envelope had loosened over time), this single command also brings the lockfile in sync with current pyproject.toml. Net diff: 243 insertions / 369 deletions on uv.lock; no other files touched. Transitive cascades worth flagging: - anthropic 0.77.0 -> 0.102.0 (pydantic-ai-slim extra) - pydantic-graph 1.51.0 -> 1.96.0 - temporalio 1.20.0 -> 1.27.2 - alembic 1.18.1 -> 1.18.4 - aws-* and cohere transitives bumped along - griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched) - Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus exporter, pydocket, redis, rsa, sortedcontainers — these were transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in. Verification on this host: - uv sync --extra dev -> green - ruff check . -> clean - mypy --strict app/ -> 192 files clean - pyright app/ -> 0 errors (50 warnings, pre-existing) - pytest -m 'not integration' -> 969 passed Known install quirk: griffelib 2.0.2 ships a top-level `griffe/` package whose RECORD files don't always materialize on first install when uv replaces an older `griffe` dist in the same sync. A clean venv install (which CI does via `uv sync --frozen`) is unaffected; local devs who upgrade in place may need a one-shot `uv pip install --force-reinstall griffelib` if `import griffe` fails. * feat(api,ui): in-product demo showcase page (#132) (#133) * feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132) New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It drives the published API surface in-process via httpx.ASGITransport (no cross-slice imports, satisfying the vertical-slice rule) and streams one StepEvent per pipeline step: precheck -> reset -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup. A module-level asyncio.Lock enforces single-flight; concurrent runs get an RFC 7807 409. The orchestration is a faithful in-process port of scripts/run_demo.py (PR #129). Implements PRP-17. * test(api): cover the demo slice pipeline, routes, and e2e integration (#132) Unit tests mock the in-process HTTP client to exercise step sequencing, winner selection, and fail-fast; route tests cover POST /demo/run (200 + 409) and the WS /demo/stream handler. The integration test seeds demo_minimal and asserts an end-to-end green run against real Postgres. Implements PRP-17. * feat(ui): add showcase page streaming the live demo pipeline (#132) New /showcase route and nav entry. The page opens a one-shot WebSocket to /demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders the 11 pipeline steps as live status cards: glyph, detail, duration, the backtest per-model WAPE breakdown with the winner highlighted, and a pass/fail summary banner. Also block-scopes a pre-existing no-case-declarations lint error in chat.tsx so pnpm lint is green for this PR. Implements PRP-17. * test(ui): add vitest setup and use-demo-pipeline hook coverage (#132) Adds the frontend test stack (vitest + jsdom + @testing-library/react), a test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the pure event reducer (idle -> running -> pass transitions, summary assembly, error phase) and a renderHook smoke test. The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented fix for pnpm 11's esbuild build-script gate. Implements PRP-17. * docs(docs): document the demo slice and showcase page (#132) Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the /demo/run + /demo/stream rows and a WebSocket Events section in API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and REPO_MAP_INDEX rows for the demo slice and showcase page. Implements PRP-17. * chore(main): release 0.2.10 (#135) * docs(docs): fix broken PRP-0/INITIAL-0 relative links in phase 0 doc (#138) (#139) * docs(repo): fix readme dev-deps command and stale .github template placeholders (#140) (#141) * docs(docs): fill DEV_GUIDE.md onboarding stub sections (#142) (#143) * docs(repo): refresh stale CLAUDE.md note, document /demo API, align PR template (#144) (#145) * fix(ui): chart series render black — drop hsl() wrapper on oklch chart vars (#149) (#153) index.css defines --chart-N twice: legacy shadcn-v3 HSL triplets and Tailwind-4 / shadcn-v4 complete oklch() colours. The oklch definitions win the cascade, so at runtime --chart-N is a full colour. The chart components still wrapped it as hsl(var(--chart-N)) → hsl(oklch(...)), which is invalid CSS, so recharts fell back to a black fill/stroke — invisible on the dark theme. Reference var(--chart-N) directly in backtest-folds-chart.tsx and time-series-chart.tsx. Verified in a browser: the backtest per-fold bars and the forecast line now render in colour. * fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148) (#152) * fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148) _execute_backtest ran BacktestingService.run_backtest — which computes per-fold metrics, stability indices and a naive/seasonal baseline comparison — but stored only four aggregated values and discarded the rest. The dashboard (/visualize/backtest) reads aggregated_metrics.{*_mean, stability_index}, fold_metrics[] and baseline_comparison, so it showed "0 folds", all-zero metrics and an empty chart. Add _shape_backtest_result(), which flattens a BacktestResponse into the contract the dashboard expects, and _finite(), which coerces NaN/inf to 0.0 so the result stays JSONB-safe (stability is NaN with fewer than two folds). Add app/features/jobs/tests/test_service.py with unit coverage for the shaping logic: fold metrics, *_mean keys, stability, baseline comparison, the no-baselines path, and NaN coercion. * refactor(jobs): centralize backtest metric keys and surface drift (#148) Addresses review feedback on PR #152. - Hoist the dashboard's metric set into _BACKTEST_METRICS and the headline stability metric into _STABILITY_METRIC, so the hardcoded keys live in one documented place instead of being repeated across the shaping logic. - Log jobs.backtest_metrics_missing when an expected metric is absent from the backtest response, so a future rename in the backtesting service fails loud instead of silently emitting 0.0. - Document the WAPE stability convention in the _shape_backtest_result docstring. - Tests: assert backtest_id / model_type / duration_ms pass through unchanged, and add a regression test for the missing-metric default path. * fix(ui): forecast page reads forecasts/forecast from predict job result (#147) (#151) The /visualize/forecast page never rendered the chart for a valid completed predict job. It read job.result.predictions with field `predicted`, but POST /jobs (job_type="predict") returns job.result.forecasts with field `forecast`. forecastData was therefore always undefined and the page fell through to "No prediction data available in job result". Read result.forecasts with field `forecast`, and pass predictedKey="forecast" to TimeSeriesChart (which already supports a configurable data key). Verified in a browser: entering a completed predict job ID now renders the 14-day forecast line chart with correct tooltip values. * fix(registry): tolerate multiple matches in _find_duplicate (#146) (#150) Under the default registry_duplicate_policy="detect", duplicate runs are created intentionally, so multiple non-archived model_run rows can share one config hash. _find_duplicate used scalar_one_or_none(), which raised MultipleResultsFound once two duplicates existed — POST /registry/runs then returned HTTP 500. This made the demo/Showcase register step fail deterministically on any DB with repeated runs. Order the lookup by created_at desc, LIMIT 1, and use scalars().first() so it returns the most recent matching run instead of asserting a single match. Add an integration regression test that POSTs an identical run three times under the detect policy and asserts all three return 201. * fix(ui): derive TimeSeriesChart line stroke from the config key (#156) (#157) TimeSeriesChart builds chartConfig with dynamic keys ([actualKey] / [predictedKey]), so shadcn's ChartContainer injects --color-<key> CSS variables. The <Line> elements, however, hardcoded stroke="var(--color-actual)" and stroke="var(--color-predicted)". The forecast page passes predictedKey="forecast", so the injected variable is --color-forecast; var(--color-predicted) was undefined, the stroke was invalid, and SVG fell back to its initial value `none` — the forecast line was invisible. Build the stroke from the key: stroke={`var(--color-${actualKey})`} / stroke={`var(--color-${predictedKey})`}. Verified in a browser: the forecast line now renders in colour. * feat(ui): job picker dropdown on forecast and backtest pages (#154) (#155) The visualization pages only accepted a job ID typed into a text box, so users had to already know the ID. Add a JobPicker component: a dropdown of completed jobs of the relevant type (predict / backtest), newest first, with each option labelled by short id, model and timestamp. - New shared component src/components/common/job-picker.tsx, used by both forecast.tsx and backtest.tsx. - The manual job-ID input stays alongside the dropdown for pasting an ID. - The most recent completed job auto-loads on mount so a chart shows immediately without interaction. No backend change — GET /jobs?job_type=&status=completed already exists. Verified in a browser on both pages. * chore(repo): back-merge main into dev after v0.2.11 (#160) (#161) * chore(main): release 0.2.9 (#126) * feat: release v0.2.10 — demo showcase page + e2e pipeline (#134) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-time. 4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate for demo_minimal can spend 60-90 s on slower laptops once you include inventory + prices + promotions inserts. 5. step_seed detail string: GenerateResult.records_created uses 'sales' (singular), not 'sales_daily'; cosmetic fix. tests/test_e2e_demo.py: - Redirect uvicorn stdout to a temp file rather than subprocess.PIPE. The seeder + structlog produce enough INFO log volume to fill a 64-KB pipe buffer; once full, uvicorn blocks on write and seeder requests hang for the full --timeout. Verified locally: integration suite now passes in ~6.5 s instead of timing out at 120 s. - Cleanup leaves the log file on disk only when the test failed (postmortem-friendly). tests/test_run_demo_unit.py: - Bump test_defaults timeout expectation to match the new 120 s default. End-to-end manual run on this machine: 11 steps, wall_clock=2 s, exit 0. Integration test: 2 passed in 6.48 s. * chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131) Headline: - authlib 1.6.6 -> 1.7.2 (clears GHSA-wvwj-cvrp-7pv5 — JWS signature verification bypass; patched at >= 1.6.9) - fastmcp 2.14.4 -> 3.2.4 (clears GHSA-vv7q-7jx5-f767 — OpenAPI Provider SSRF + path traversal; patched at >= 3.2.0) Both CVEs were flagged on PR #129 by Socket Security and are pre-existing on dev (not introduced by #128). Wider scope — read before merging: `uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers a full re-resolve of the dependency graph. Because dev's uv.lock had drifted from pyproject.toml (the project's constraint envelope had loosened over time), this single command also brings the lockfile in sync with current pyproject.toml. Net diff: 243 insertions / 369 deletions on uv.lock; no other files touched. Transitive cascades worth flagging: - anthropic 0.77.0 -> 0.102.0 (pydantic-ai-slim extra) - pydantic-graph 1.51.0 -> 1.96.0 - temporalio 1.20.0 -> 1.27.2 - alembic 1.18.1 -> 1.18.4 - aws-* and cohere transitives bumped along - griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched) - Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus exporter, pydocket, redis, rsa, sortedcontainers — these were transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in. Verification on this host: - uv sync --extra dev -> green - ruff check . -> clean - mypy --strict app/ -> 192 files clean - pyright app/ -> 0 errors (50 warnings, pre-existing) - pytest -m 'not integration' -> 969 passed Known install quirk: griffelib 2.0.2 ships a top-level `griffe/` package whose RECORD files don't always materialize on first install when uv replaces an older `griffe` dist in the same sync. A clean venv install (which CI does via `uv sync --frozen`) is unaffected; local devs who upgrade in place may need a one-shot `uv pip install --force-reinstall griffelib` if `import griffe` fails. * feat(api,ui): in-product demo showcase page (#132) (#133) * feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132) New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It drives the published API surface in-process via httpx.ASGITransport (no cross-slice imports, satisfying the vertical-slice rule) and streams one StepEvent per pipeline step: precheck -> reset -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup. A module-level asyncio.Lock enforces single-flight; concurrent runs get an RFC 7807 409. The orchestration is a faithful in-process port of scripts/run_demo.py (PR #129). Implements PRP-17. * test(api): cover the demo slice pipeline, routes, and e2e integration (#132) Unit tests mock the in-process HTTP client to exercise step sequencing, winner selection, and fail-fast; route tests cover POST /demo/run (200 + 409) and the WS /demo/stream handler. The integration test seeds demo_minimal and asserts an end-to-end green run against real Postgres. Implements PRP-17. * feat(ui): add showcase page streaming the live demo pipeline (#132) New /showcase route and nav entry. The page opens a one-shot WebSocket to /demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders the 11 pipeline steps as live status cards: glyph, detail, duration, the backtest per-model WAPE breakdown with the winner highlighted, and a pass/fail summary banner. Also block-scopes a pre-existing no-case-declarations lint error in chat.tsx so pnpm lint is green for this PR. Implements PRP-17. * test(ui): add vitest setup and use-demo-pipeline hook coverage (#132) Adds the frontend test stack (vitest + jsdom + @testing-library/react), a test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the pure event reducer (idle -> running -> pass transitions, summary assembly, error phase) and a renderHook smoke test. The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented fix for pnpm 11's esbuild build-script gate. Implements PRP-17. * docs(docs): document the demo slice and showcase page (#132) Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the /demo/run + /demo/stream rows and a WebSocket Events section in API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and REPO_MAP_INDEX rows for the demo slice and showcase page. Implements PRP-17. * chore(main): release 0.2.10 (#135) * feat: release v0.2.11 — visualization fixes, job picker, demo showcase (#158) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthr…

…#202) (#203) * feat: cut v0.2.13 — explorer interactivity, knowledge & guide pages (#191) (#192) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-time. 4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate for demo_minimal can spend 60-90 s on slower laptops once you include inventory + prices + promotions inserts. 5. step_seed detail string: GenerateResult.records_created uses 'sales' (singular), not 'sales_daily'; cosmetic fix. tests/test_e2e_demo.py: - Redirect uvicorn stdout to a temp file rather than subprocess.PIPE. The seeder + structlog produce enough INFO log volume to fill a 64-KB pipe buffer; once full, uvicorn blocks on write and seeder requests hang for the full --timeout. Verified locally: integration suite now passes in ~6.5 s instead of timing out at 120 s. - Cleanup leaves the log file on disk only when the test failed (postmortem-friendly). tests/test_run_demo_unit.py: - Bump test_defaults timeout expectation to match the new 120 s default. End-to-end manual run on this machine: 11 steps, wall_clock=2 s, exit 0. Integration test: 2 passed in 6.48 s. * chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131) Headline: - authlib 1.6.6 -> 1.7.2 (clears GHSA-wvwj-cvrp-7pv5 — JWS signature verification bypass; patched at >= 1.6.9) - fastmcp 2.14.4 -> 3.2.4 (clears GHSA-vv7q-7jx5-f767 — OpenAPI Provider SSRF + path traversal; patched at >= 3.2.0) Both CVEs were flagged on PR #129 by Socket Security and are pre-existing on dev (not introduced by #128). Wider scope — read before merging: `uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers a full re-resolve of the dependency graph. Because dev's uv.lock had drifted from pyproject.toml (the project's constraint envelope had loosened over time), this single command also brings the lockfile in sync with current pyproject.toml. Net diff: 243 insertions / 369 deletions on uv.lock; no other files touched. Transitive cascades worth flagging: - anthropic 0.77.0 -> 0.102.0 (pydantic-ai-slim extra) - pydantic-graph 1.51.0 -> 1.96.0 - temporalio 1.20.0 -> 1.27.2 - alembic 1.18.1 -> 1.18.4 - aws-* and cohere transitives bumped along - griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched) - Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus exporter, pydocket, redis, rsa, sortedcontainers — these were transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in. Verification on this host: - uv sync --extra dev -> green - ruff check . -> clean - mypy --strict app/ -> 192 files clean - pyright app/ -> 0 errors (50 warnings, pre-existing) - pytest -m 'not integration' -> 969 passed Known install quirk: griffelib 2.0.2 ships a top-level `griffe/` package whose RECORD files don't always materialize on first install when uv replaces an older `griffe` dist in the same sync. A clean venv install (which CI does via `uv sync --frozen`) is unaffected; local devs who upgrade in place may need a one-shot `uv pip install --force-reinstall griffelib` if `import griffe` fails. * feat(api,ui): in-product demo showcase page (#132) (#133) * feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132) New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It drives the published API surface in-process via httpx.ASGITransport (no cross-slice imports, satisfying the vertical-slice rule) and streams one StepEvent per pipeline step: precheck -> reset -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup. A module-level asyncio.Lock enforces single-flight; concurrent runs get an RFC 7807 409. The orchestration is a faithful in-process port of scripts/run_demo.py (PR #129). Implements PRP-17. * test(api): cover the demo slice pipeline, routes, and e2e integration (#132) Unit tests mock the in-process HTTP client to exercise step sequencing, winner selection, and fail-fast; route tests cover POST /demo/run (200 + 409) and the WS /demo/stream handler. The integration test seeds demo_minimal and asserts an end-to-end green run against real Postgres. Implements PRP-17. * feat(ui): add showcase page streaming the live demo pipeline (#132) New /showcase route and nav entry. The page opens a one-shot WebSocket to /demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders the 11 pipeline steps as live status cards: glyph, detail, duration, the backtest per-model WAPE breakdown with the winner highlighted, and a pass/fail summary banner. Also block-scopes a pre-existing no-case-declarations lint error in chat.tsx so pnpm lint is green for this PR. Implements PRP-17. * test(ui): add vitest setup and use-demo-pipeline hook coverage (#132) Adds the frontend test stack (vitest + jsdom + @testing-library/react), a test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the pure event reducer (idle -> running -> pass transitions, summary assembly, error phase) and a renderHook smoke test. The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented fix for pnpm 11's esbuild build-script gate. Implements PRP-17. * docs(docs): document the demo slice and showcase page (#132) Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the /demo/run + /demo/stream rows and a WebSocket Events section in API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and REPO_MAP_INDEX rows for the demo slice and showcase page. Implements PRP-17. * chore(repo): back-merge main into dev to absorb v0.2.10 release commits (#136) (#137) * chore(main): release 0.2.9 (#126) * feat: release v0.2.10 — demo showcase page + e2e pipeline (#134) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-time. 4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate for demo_minimal can spend 60-90 s on slower laptops once you include inventory + prices + promotions inserts. 5. step_seed detail string: GenerateResult.records_created uses 'sales' (singular), not 'sales_daily'; cosmetic fix. tests/test_e2e_demo.py: - Redirect uvicorn stdout to a temp file rather than subprocess.PIPE. The seeder + structlog produce enough INFO log volume to fill a 64-KB pipe buffer; once full, uvicorn blocks on write and seeder requests hang for the full --timeout. Verified locally: integration suite now passes in ~6.5 s instead of timing out at 120 s. - Cleanup leaves the log file on disk only when the test failed (postmortem-friendly). tests/test_run_demo_unit.py: - Bump test_defaults timeout expectation to match the new 120 s default. End-to-end manual run on this machine: 11 steps, wall_clock=2 s, exit 0. Integration test: 2 passed in 6.48 s. * chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131) Headline: - authlib 1.6.6 -> 1.7.2 (clears GHSA-wvwj-cvrp-7pv5 — JWS signature verification bypass; patched at >= 1.6.9) - fastmcp 2.14.4 -> 3.2.4 (clears GHSA-vv7q-7jx5-f767 — OpenAPI Provider SSRF + path traversal; patched at >= 3.2.0) Both CVEs were flagged on PR #129 by Socket Security and are pre-existing on dev (not introduced by #128). Wider scope — read before merging: `uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers a full re-resolve of the dependency graph. Because dev's uv.lock had drifted from pyproject.toml (the project's constraint envelope had loosened over time), this single command also brings the lockfile in sync with current pyproject.toml. Net diff: 243 insertions / 369 deletions on uv.lock; no other files touched. Transitive cascades worth flagging: - anthropic 0.77.0 -> 0.102.0 (pydantic-ai-slim extra) - pydantic-graph 1.51.0 -> 1.96.0 - temporalio 1.20.0 -> 1.27.2 - alembic 1.18.1 -> 1.18.4 - aws-* and cohere transitives bumped along - griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched) - Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus exporter, pydocket, redis, rsa, sortedcontainers — these were transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in. Verification on this host: - uv sync --extra dev -> green - ruff check . -> clean - mypy --strict app/ -> 192 files clean - pyright app/ -> 0 errors (50 warnings, pre-existing) - pytest -m 'not integration' -> 969 passed Known install quirk: griffelib 2.0.2 ships a top-level `griffe/` package whose RECORD files don't always materialize on first install when uv replaces an older `griffe` dist in the same sync. A clean venv install (which CI does via `uv sync --frozen`) is unaffected; local devs who upgrade in place may need a one-shot `uv pip install --force-reinstall griffelib` if `import griffe` fails. * feat(api,ui): in-product demo showcase page (#132) (#133) * feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132) New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It drives the published API surface in-process via httpx.ASGITransport (no cross-slice imports, satisfying the vertical-slice rule) and streams one StepEvent per pipeline step: precheck -> reset -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup. A module-level asyncio.Lock enforces single-flight; concurrent runs get an RFC 7807 409. The orchestration is a faithful in-process port of scripts/run_demo.py (PR #129). Implements PRP-17. * test(api): cover the demo slice pipeline, routes, and e2e integration (#132) Unit tests mock the in-process HTTP client to exercise step sequencing, winner selection, and fail-fast; route tests cover POST /demo/run (200 + 409) and the WS /demo/stream handler. The integration test seeds demo_minimal and asserts an end-to-end green run against real Postgres. Implements PRP-17. * feat(ui): add showcase page streaming the live demo pipeline (#132) New /showcase route and nav entry. The page opens a one-shot WebSocket to /demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders the 11 pipeline steps as live status cards: glyph, detail, duration, the backtest per-model WAPE breakdown with the winner highlighted, and a pass/fail summary banner. Also block-scopes a pre-existing no-case-declarations lint error in chat.tsx so pnpm lint is green for this PR. Implements PRP-17. * test(ui): add vitest setup and use-demo-pipeline hook coverage (#132) Adds the frontend test stack (vitest + jsdom + @testing-library/react), a test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the pure event reducer (idle -> running -> pass transitions, summary assembly, error phase) and a renderHook smoke test. The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented fix for pnpm 11's esbuild build-script gate. Implements PRP-17. * docs(docs): document the demo slice and showcase page (#132) Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the /demo/run + /demo/stream rows and a WebSocket Events section in API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and REPO_MAP_INDEX rows for the demo slice and showcase page. Implements PRP-17. * chore(main): release 0.2.10 (#135) * docs(docs): fix broken PRP-0/INITIAL-0 relative links in phase 0 doc (#138) (#139) * docs(repo): fix readme dev-deps command and stale .github template placeholders (#140) (#141) * docs(docs): fill DEV_GUIDE.md onboarding stub sections (#142) (#143) * docs(repo): refresh stale CLAUDE.md note, document /demo API, align PR template (#144) (#145) * fix(ui): chart series render black — drop hsl() wrapper on oklch chart vars (#149) (#153) index.css defines --chart-N twice: legacy shadcn-v3 HSL triplets and Tailwind-4 / shadcn-v4 complete oklch() colours. The oklch definitions win the cascade, so at runtime --chart-N is a full colour. The chart components still wrapped it as hsl(var(--chart-N)) → hsl(oklch(...)), which is invalid CSS, so recharts fell back to a black fill/stroke — invisible on the dark theme. Reference var(--chart-N) directly in backtest-folds-chart.tsx and time-series-chart.tsx. Verified in a browser: the backtest per-fold bars and the forecast line now render in colour. * fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148) (#152) * fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148) _execute_backtest ran BacktestingService.run_backtest — which computes per-fold metrics, stability indices and a naive/seasonal baseline comparison — but stored only four aggregated values and discarded the rest. The dashboard (/visualize/backtest) reads aggregated_metrics.{*_mean, stability_index}, fold_metrics[] and baseline_comparison, so it showed "0 folds", all-zero metrics and an empty chart. Add _shape_backtest_result(), which flattens a BacktestResponse into the contract the dashboard expects, and _finite(), which coerces NaN/inf to 0.0 so the result stays JSONB-safe (stability is NaN with fewer than two folds). Add app/features/jobs/tests/test_service.py with unit coverage for the shaping logic: fold metrics, *_mean keys, stability, baseline comparison, the no-baselines path, and NaN coercion. * refactor(jobs): centralize backtest metric keys and surface drift (#148) Addresses review feedback on PR #152. - Hoist the dashboard's metric set into _BACKTEST_METRICS and the headline stability metric into _STABILITY_METRIC, so the hardcoded keys live in one documented place instead of being repeated across the shaping logic. - Log jobs.backtest_metrics_missing when an expected metric is absent from the backtest response, so a future rename in the backtesting service fails loud instead of silently emitting 0.0. - Document the WAPE stability convention in the _shape_backtest_result docstring. - Tests: assert backtest_id / model_type / duration_ms pass through unchanged, and add a regression test for the missing-metric default path. * fix(ui): forecast page reads forecasts/forecast from predict job result (#147) (#151) The /visualize/forecast page never rendered the chart for a valid completed predict job. It read job.result.predictions with field `predicted`, but POST /jobs (job_type="predict") returns job.result.forecasts with field `forecast`. forecastData was therefore always undefined and the page fell through to "No prediction data available in job result". Read result.forecasts with field `forecast`, and pass predictedKey="forecast" to TimeSeriesChart (which already supports a configurable data key). Verified in a browser: entering a completed predict job ID now renders the 14-day forecast line chart with correct tooltip values. * fix(registry): tolerate multiple matches in _find_duplicate (#146) (#150) Under the default registry_duplicate_policy="detect", duplicate runs are created intentionally, so multiple non-archived model_run rows can share one config hash. _find_duplicate used scalar_one_or_none(), which raised MultipleResultsFound once two duplicates existed — POST /registry/runs then returned HTTP 500. This made the demo/Showcase register step fail deterministically on any DB with repeated runs. Order the lookup by created_at desc, LIMIT 1, and use scalars().first() so it returns the most recent matching run instead of asserting a single match. Add an integration regression test that POSTs an identical run three times under the detect policy and asserts all three return 201. * fix(ui): derive TimeSeriesChart line stroke from the config key (#156) (#157) TimeSeriesChart builds chartConfig with dynamic keys ([actualKey] / [predictedKey]), so shadcn's ChartContainer injects --color-<key> CSS variables. The <Line> elements, however, hardcoded stroke="var(--color-actual)" and stroke="var(--color-predicted)". The forecast page passes predictedKey="forecast", so the injected variable is --color-forecast; var(--color-predicted) was undefined, the stroke was invalid, and SVG fell back to its initial value `none` — the forecast line was invisible. Build the stroke from the key: stroke={`var(--color-${actualKey})`} / stroke={`var(--color-${predictedKey})`}. Verified in a browser: the forecast line now renders in colour. * feat(ui): job picker dropdown on forecast and backtest pages (#154) (#155) The visualization pages only accepted a job ID typed into a text box, so users had to already know the ID. Add a JobPicker component: a dropdown of completed jobs of the relevant type (predict / backtest), newest first, with each option labelled by short id, model and timestamp. - New shared component src/components/common/job-picker.tsx, used by both forecast.tsx and backtest.tsx. - The manual job-ID input stays alongside the dropdown for pasting an ID. - The most recent completed job auto-loads on mount so a chart shows immediately without interaction. No backend change — GET /jobs?job_type=&status=completed already exists. Verified in a browser on both pages. * chore(repo): back-merge main into dev after v0.2.11 (#160) (#161) * chore(main): release 0.2.9 (#126) * feat: release v0.2.10 — demo showcase page + e2e pipeline (#134) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-time. 4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate for demo_minimal can spend 60-90 s on slower laptops once you include inventory + prices + promotions inserts. 5. step_seed detail string: GenerateResult.records_created uses 'sales' (singular), not 'sales_daily'; cosmetic fix. tests/test_e2e_demo.py: - Redirect uvicorn stdout to a temp file rather than subprocess.PIPE. The seeder + structlog produce enough INFO log volume to fill a 64-KB pipe buffer; once full, uvicorn blocks on write and seeder requests hang for the full --timeout. Verified locally: integration suite now passes in ~6.5 s instead of timing out at 120 s. - Cleanup leaves the log file on disk only when the test failed (postmortem-friendly). tests/test_run_demo_unit.py: - Bump test_defaults timeout expectation to match the new 120 s default. End-to-end manual run on this machine: 11 steps, wall_clock=2 s, exit 0. Integration test: 2 passed in 6.48 s. * chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131) Headline: - authlib 1.6.6 -> 1.7.2 (clears GHSA-wvwj-cvrp-7pv5 — JWS signature verification bypass; patched at >= 1.6.9) - fastmcp 2.14.4 -> 3.2.4 (clears GHSA-vv7q-7jx5-f767 — OpenAPI Provider SSRF + path traversal; patched at >= 3.2.0) Both CVEs were flagged on PR #129 by Socket Security and are pre-existing on dev (not introduced by #128). Wider scope — read before merging: `uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers a full re-resolve of the dependency graph. Because dev's uv.lock had drifted from pyproject.toml (the project's constraint envelope had loosened over time), this single command also brings the lockfile in sync with current pyproject.toml. Net diff: 243 insertions / 369 deletions on uv.lock; no other files touched. Transitive cascades worth flagging: - anthropic 0.77.0 -> 0.102.0 (pydantic-ai-slim extra) - pydantic-graph 1.51.0 -> 1.96.0 - temporalio 1.20.0 -> 1.27.2 - alembic 1.18.1 -> 1.18.4 - aws-* and cohere transitives bumped along - griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched) - Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus exporter, pydocket, redis, rsa, sortedcontainers — these were transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in. Verification on this host: - uv sync --extra dev -> green - ruff check . -> clean - mypy --strict app/ -> 192 files clean - pyright app/ -> 0 errors (50 warnings, pre-existing) - pytest -m 'not integration' -> 969 passed Known install quirk: griffelib 2.0.2 ships a top-level `griffe/` package whose RECORD files don't always materialize on first install when uv replaces an older `griffe` dist in the same sync. A clean venv install (which CI does via `uv sync --frozen`) is unaffected; local devs who upgrade in place may need a one-shot `uv pip install --force-reinstall griffelib` if `import griffe` fails. * feat(api,ui): in-product demo showcase page (#132) (#133) * feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132) New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It drives the published API surface in-process via httpx.ASGITransport (no cross-slice imports, satisfying the vertical-slice rule) and streams one StepEvent per pipeline step: precheck -> reset -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup. A module-level asyncio.Lock enforces single-flight; concurrent runs get an RFC 7807 409. The orchestration is a faithful in-process port of scripts/run_demo.py (PR #129). Implements PRP-17. * test(api): cover the demo slice pipeline, routes, and e2e integration (#132) Unit tests mock the in-process HTTP client to exercise step sequencing, winner selection, and fail-fast; route tests cover POST /demo/run (200 + 409) and the WS /demo/stream handler. The integration test seeds demo_minimal and asserts an end-to-end green run against real Postgres. Implements PRP-17. * feat(ui): add showcase page streaming the live demo pipeline (#132) New /showcase route and nav entry. The page opens a one-shot WebSocket to /demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders the 11 pipeline steps as live status cards: glyph, detail, duration, the backtest per-model WAPE breakdown with the winner highlighted, and a pass/fail summary banner. Also block-scopes a pre-existing no-case-declarations lint error in chat.tsx so pnpm lint is green for this PR. Implements PRP-17. * test(ui): add vitest setup and use-demo-pipeline hook coverage (#132) Adds the frontend test stack (vitest + jsdom + @testing-library/react), a test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the pure event reducer (idle -> running -> pass transitions, summary assembly, error phase) and a renderHook smoke test. The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented fix for pnpm 11's esbuild build-script gate. Implements PRP-17. * docs(docs): document the demo slice and showcase page (#132) Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the /demo/run + /demo/stream rows and a WebSocket Events section in API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and REPO_MAP_INDEX rows for the demo slice and showcase page. Implements PRP-17. * chore(main): release 0.2.10 (#135) * feat: release v0.2.11 — visualization fixes, job picker, demo showcase (#158) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, …

w7-mgfcode added 4 commits May 14, 2026 06:11

sourcery-ai Bot reviewed May 18, 2026

View reviewed changes

w7-mgfcode merged commit 2ea68ae into main May 18, 2026
11 of 12 checks passed

This was referenced May 18, 2026

chore(main): release 0.2.10 #135

Merged

chore(repo): back-merge main into dev to absorb v0.2.10 release commits (#136) #137

Merged

coderabbitai Bot mentioned this pull request May 18, 2026

feat: release v0.2.14 — UI interactivity, AI admin console, agent reliability fixes #201

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: release v0.2.10 — demo showcase page + e2e pipeline#134

feat: release v0.2.10 — demo showcase page + e2e pipeline#134
w7-mgfcode merged 4 commits into
mainfrom
dev

w7-mgfcode commented May 18, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 18, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

socket-security Bot commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

w7-mgfcode commented May 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Commits since v0.2.9 (main)

Pre-merge verification

Release note

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

socket-security Bot commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

w7-mgfcode commented May 18, 2026 •

edited by coderabbitai Bot

Loading

Commits since v0.2.9 (`main`)

coderabbitai Bot commented May 18, 2026 •

edited

Loading