From ae37ca521eb9510c135def4a1e3730e137fb014b Mon Sep 17 00:00:00 2001 From: Gabor Szabo <168316277+w7-mgfcode@users.noreply.github.com> Date: Mon, 18 May 2026 21:51:32 +0200 Subject: [PATCH 1/2] =?UTF-8?q?feat:=20cut=20v0.2.13=20=E2=80=94=20explore?= =?UTF-8?q?r=20interactivity,=20knowledge=20&=20guide=20pages=20(#191)=20(?= =?UTF-8?q?#192)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-time. 4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate for demo_minimal can spend 60-90 s on slower laptops once you include inventory + prices + promotions inserts. 5. step_seed detail string: GenerateResult.records_created uses 'sales' (singular), not 'sales_daily'; cosmetic fix. tests/test_e2e_demo.py: - Redirect uvicorn stdout to a temp file rather than subprocess.PIPE. The seeder + structlog produce enough INFO log volume to fill a 64-KB pipe buffer; once full, uvicorn blocks on write and seeder requests hang for the full --timeout. Verified locally: integration suite now passes in ~6.5 s instead of timing out at 120 s. - Cleanup leaves the log file on disk only when the test failed (postmortem-friendly). tests/test_run_demo_unit.py: - Bump test_defaults timeout expectation to match the new 120 s default. End-to-end manual run on this machine: 11 steps, wall_clock=2 s, exit 0. Integration test: 2 passed in 6.48 s. * chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131) Headline: - authlib 1.6.6 -> 1.7.2 (clears GHSA-wvwj-cvrp-7pv5 — JWS signature verification bypass; patched at >= 1.6.9) - fastmcp 2.14.4 -> 3.2.4 (clears GHSA-vv7q-7jx5-f767 — OpenAPI Provider SSRF + path traversal; patched at >= 3.2.0) Both CVEs were flagged on PR #129 by Socket Security and are pre-existing on dev (not introduced by #128). Wider scope — read before merging: `uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers a full re-resolve of the dependency graph. Because dev's uv.lock had drifted from pyproject.toml (the project's constraint envelope had loosened over time), this single command also brings the lockfile in sync with current pyproject.toml. Net diff: 243 insertions / 369 deletions on uv.lock; no other files touched. Transitive cascades worth flagging: - anthropic 0.77.0 -> 0.102.0 (pydantic-ai-slim extra) - pydantic-graph 1.51.0 -> 1.96.0 - temporalio 1.20.0 -> 1.27.2 - alembic 1.18.1 -> 1.18.4 - aws-* and cohere transitives bumped along - griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched) - Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus exporter, pydocket, redis, rsa, sortedcontainers — these were transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in. Verification on this host: - uv sync --extra dev -> green - ruff check . -> clean - mypy --strict app/ -> 192 files clean - pyright app/ -> 0 errors (50 warnings, pre-existing) - pytest -m 'not integration' -> 969 passed Known install quirk: griffelib 2.0.2 ships a top-level `griffe/` package whose RECORD files don't always materialize on first install when uv replaces an older `griffe` dist in the same sync. A clean venv install (which CI does via `uv sync --frozen`) is unaffected; local devs who upgrade in place may need a one-shot `uv pip install --force-reinstall griffelib` if `import griffe` fails. * feat(api,ui): in-product demo showcase page (#132) (#133) * feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132) New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It drives the published API surface in-process via httpx.ASGITransport (no cross-slice imports, satisfying the vertical-slice rule) and streams one StepEvent per pipeline step: precheck -> reset -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup. A module-level asyncio.Lock enforces single-flight; concurrent runs get an RFC 7807 409. The orchestration is a faithful in-process port of scripts/run_demo.py (PR #129). Implements PRP-17. * test(api): cover the demo slice pipeline, routes, and e2e integration (#132) Unit tests mock the in-process HTTP client to exercise step sequencing, winner selection, and fail-fast; route tests cover POST /demo/run (200 + 409) and the WS /demo/stream handler. The integration test seeds demo_minimal and asserts an end-to-end green run against real Postgres. Implements PRP-17. * feat(ui): add showcase page streaming the live demo pipeline (#132) New /showcase route and nav entry. The page opens a one-shot WebSocket to /demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders the 11 pipeline steps as live status cards: glyph, detail, duration, the backtest per-model WAPE breakdown with the winner highlighted, and a pass/fail summary banner. Also block-scopes a pre-existing no-case-declarations lint error in chat.tsx so pnpm lint is green for this PR. Implements PRP-17. * test(ui): add vitest setup and use-demo-pipeline hook coverage (#132) Adds the frontend test stack (vitest + jsdom + @testing-library/react), a test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the pure event reducer (idle -> running -> pass transitions, summary assembly, error phase) and a renderHook smoke test. The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented fix for pnpm 11's esbuild build-script gate. Implements PRP-17. * docs(docs): document the demo slice and showcase page (#132) Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the /demo/run + /demo/stream rows and a WebSocket Events section in API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and REPO_MAP_INDEX rows for the demo slice and showcase page. Implements PRP-17. * chore(repo): back-merge main into dev to absorb v0.2.10 release commits (#136) (#137) * chore(main): release 0.2.9 (#126) * feat: release v0.2.10 — demo showcase page + e2e pipeline (#134) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-time. 4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate for demo_minimal can spend 60-90 s on slower laptops once you include inventory + prices + promotions inserts. 5. step_seed detail string: GenerateResult.records_created uses 'sales' (singular), not 'sales_daily'; cosmetic fix. tests/test_e2e_demo.py: - Redirect uvicorn stdout to a temp file rather than subprocess.PIPE. The seeder + structlog produce enough INFO log volume to fill a 64-KB pipe buffer; once full, uvicorn blocks on write and seeder requests hang for the full --timeout. Verified locally: integration suite now passes in ~6.5 s instead of timing out at 120 s. - Cleanup leaves the log file on disk only when the test failed (postmortem-friendly). tests/test_run_demo_unit.py: - Bump test_defaults timeout expectation to match the new 120 s default. End-to-end manual run on this machine: 11 steps, wall_clock=2 s, exit 0. Integration test: 2 passed in 6.48 s. * chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131) Headline: - authlib 1.6.6 -> 1.7.2 (clears GHSA-wvwj-cvrp-7pv5 — JWS signature verification bypass; patched at >= 1.6.9) - fastmcp 2.14.4 -> 3.2.4 (clears GHSA-vv7q-7jx5-f767 — OpenAPI Provider SSRF + path traversal; patched at >= 3.2.0) Both CVEs were flagged on PR #129 by Socket Security and are pre-existing on dev (not introduced by #128). Wider scope — read before merging: `uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers a full re-resolve of the dependency graph. Because dev's uv.lock had drifted from pyproject.toml (the project's constraint envelope had loosened over time), this single command also brings the lockfile in sync with current pyproject.toml. Net diff: 243 insertions / 369 deletions on uv.lock; no other files touched. Transitive cascades worth flagging: - anthropic 0.77.0 -> 0.102.0 (pydantic-ai-slim extra) - pydantic-graph 1.51.0 -> 1.96.0 - temporalio 1.20.0 -> 1.27.2 - alembic 1.18.1 -> 1.18.4 - aws-* and cohere transitives bumped along - griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched) - Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus exporter, pydocket, redis, rsa, sortedcontainers — these were transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in. Verification on this host: - uv sync --extra dev -> green - ruff check . -> clean - mypy --strict app/ -> 192 files clean - pyright app/ -> 0 errors (50 warnings, pre-existing) - pytest -m 'not integration' -> 969 passed Known install quirk: griffelib 2.0.2 ships a top-level `griffe/` package whose RECORD files don't always materialize on first install when uv replaces an older `griffe` dist in the same sync. A clean venv install (which CI does via `uv sync --frozen`) is unaffected; local devs who upgrade in place may need a one-shot `uv pip install --force-reinstall griffelib` if `import griffe` fails. * feat(api,ui): in-product demo showcase page (#132) (#133) * feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132) New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It drives the published API surface in-process via httpx.ASGITransport (no cross-slice imports, satisfying the vertical-slice rule) and streams one StepEvent per pipeline step: precheck -> reset -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup. A module-level asyncio.Lock enforces single-flight; concurrent runs get an RFC 7807 409. The orchestration is a faithful in-process port of scripts/run_demo.py (PR #129). Implements PRP-17. * test(api): cover the demo slice pipeline, routes, and e2e integration (#132) Unit tests mock the in-process HTTP client to exercise step sequencing, winner selection, and fail-fast; route tests cover POST /demo/run (200 + 409) and the WS /demo/stream handler. The integration test seeds demo_minimal and asserts an end-to-end green run against real Postgres. Implements PRP-17. * feat(ui): add showcase page streaming the live demo pipeline (#132) New /showcase route and nav entry. The page opens a one-shot WebSocket to /demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders the 11 pipeline steps as live status cards: glyph, detail, duration, the backtest per-model WAPE breakdown with the winner highlighted, and a pass/fail summary banner. Also block-scopes a pre-existing no-case-declarations lint error in chat.tsx so pnpm lint is green for this PR. Implements PRP-17. * test(ui): add vitest setup and use-demo-pipeline hook coverage (#132) Adds the frontend test stack (vitest + jsdom + @testing-library/react), a test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the pure event reducer (idle -> running -> pass transitions, summary assembly, error phase) and a renderHook smoke test. The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented fix for pnpm 11's esbuild build-script gate. Implements PRP-17. * docs(docs): document the demo slice and showcase page (#132) Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the /demo/run + /demo/stream rows and a WebSocket Events section in API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and REPO_MAP_INDEX rows for the demo slice and showcase page. Implements PRP-17. * chore(main): release 0.2.10 (#135) * docs(docs): fix broken PRP-0/INITIAL-0 relative links in phase 0 doc (#138) (#139) * docs(repo): fix readme dev-deps command and stale .github template placeholders (#140) (#141) * docs(docs): fill DEV_GUIDE.md onboarding stub sections (#142) (#143) * docs(repo): refresh stale CLAUDE.md note, document /demo API, align PR template (#144) (#145) * fix(ui): chart series render black — drop hsl() wrapper on oklch chart vars (#149) (#153) index.css defines --chart-N twice: legacy shadcn-v3 HSL triplets and Tailwind-4 / shadcn-v4 complete oklch() colours. The oklch definitions win the cascade, so at runtime --chart-N is a full colour. The chart components still wrapped it as hsl(var(--chart-N)) → hsl(oklch(...)), which is invalid CSS, so recharts fell back to a black fill/stroke — invisible on the dark theme. Reference var(--chart-N) directly in backtest-folds-chart.tsx and time-series-chart.tsx. Verified in a browser: the backtest per-fold bars and the forecast line now render in colour. * fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148) (#152) * fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148) _execute_backtest ran BacktestingService.run_backtest — which computes per-fold metrics, stability indices and a naive/seasonal baseline comparison — but stored only four aggregated values and discarded the rest. The dashboard (/visualize/backtest) reads aggregated_metrics.{*_mean, stability_index}, fold_metrics[] and baseline_comparison, so it showed "0 folds", all-zero metrics and an empty chart. Add _shape_backtest_result(), which flattens a BacktestResponse into the contract the dashboard expects, and _finite(), which coerces NaN/inf to 0.0 so the result stays JSONB-safe (stability is NaN with fewer than two folds). Add app/features/jobs/tests/test_service.py with unit coverage for the shaping logic: fold metrics, *_mean keys, stability, baseline comparison, the no-baselines path, and NaN coercion. * refactor(jobs): centralize backtest metric keys and surface drift (#148) Addresses review feedback on PR #152. - Hoist the dashboard's metric set into _BACKTEST_METRICS and the headline stability metric into _STABILITY_METRIC, so the hardcoded keys live in one documented place instead of being repeated across the shaping logic. - Log jobs.backtest_metrics_missing when an expected metric is absent from the backtest response, so a future rename in the backtesting service fails loud instead of silently emitting 0.0. - Document the WAPE stability convention in the _shape_backtest_result docstring. - Tests: assert backtest_id / model_type / duration_ms pass through unchanged, and add a regression test for the missing-metric default path. * fix(ui): forecast page reads forecasts/forecast from predict job result (#147) (#151) The /visualize/forecast page never rendered the chart for a valid completed predict job. It read job.result.predictions with field `predicted`, but POST /jobs (job_type="predict") returns job.result.forecasts with field `forecast`. forecastData was therefore always undefined and the page fell through to "No prediction data available in job result". Read result.forecasts with field `forecast`, and pass predictedKey="forecast" to TimeSeriesChart (which already supports a configurable data key). Verified in a browser: entering a completed predict job ID now renders the 14-day forecast line chart with correct tooltip values. * fix(registry): tolerate multiple matches in _find_duplicate (#146) (#150) Under the default registry_duplicate_policy="detect", duplicate runs are created intentionally, so multiple non-archived model_run rows can share one config hash. _find_duplicate used scalar_one_or_none(), which raised MultipleResultsFound once two duplicates existed — POST /registry/runs then returned HTTP 500. This made the demo/Showcase register step fail deterministically on any DB with repeated runs. Order the lookup by created_at desc, LIMIT 1, and use scalars().first() so it returns the most recent matching run instead of asserting a single match. Add an integration regression test that POSTs an identical run three times under the detect policy and asserts all three return 201. * fix(ui): derive TimeSeriesChart line stroke from the config key (#156) (#157) TimeSeriesChart builds chartConfig with dynamic keys ([actualKey] / [predictedKey]), so shadcn's ChartContainer injects --color- CSS variables. The elements, however, hardcoded stroke="var(--color-actual)" and stroke="var(--color-predicted)". The forecast page passes predictedKey="forecast", so the injected variable is --color-forecast; var(--color-predicted) was undefined, the stroke was invalid, and SVG fell back to its initial value `none` — the forecast line was invisible. Build the stroke from the key: stroke={`var(--color-${actualKey})`} / stroke={`var(--color-${predictedKey})`}. Verified in a browser: the forecast line now renders in colour. * feat(ui): job picker dropdown on forecast and backtest pages (#154) (#155) The visualization pages only accepted a job ID typed into a text box, so users had to already know the ID. Add a JobPicker component: a dropdown of completed jobs of the relevant type (predict / backtest), newest first, with each option labelled by short id, model and timestamp. - New shared component src/components/common/job-picker.tsx, used by both forecast.tsx and backtest.tsx. - The manual job-ID input stays alongside the dropdown for pasting an ID. - The most recent completed job auto-loads on mount so a chart shows immediately without interaction. No backend change — GET /jobs?job_type=&status=completed already exists. Verified in a browser on both pages. * chore(repo): back-merge main into dev after v0.2.11 (#160) (#161) * chore(main): release 0.2.9 (#126) * feat: release v0.2.10 — demo showcase page + e2e pipeline (#134) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-time. 4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate for demo_minimal can spend 60-90 s on slower laptops once you include inventory + prices + promotions inserts. 5. step_seed detail string: GenerateResult.records_created uses 'sales' (singular), not 'sales_daily'; cosmetic fix. tests/test_e2e_demo.py: - Redirect uvicorn stdout to a temp file rather than subprocess.PIPE. The seeder + structlog produce enough INFO log volume to fill a 64-KB pipe buffer; once full, uvicorn blocks on write and seeder requests hang for the full --timeout. Verified locally: integration suite now passes in ~6.5 s instead of timing out at 120 s. - Cleanup leaves the log file on disk only when the test failed (postmortem-friendly). tests/test_run_demo_unit.py: - Bump test_defaults timeout expectation to match the new 120 s default. End-to-end manual run on this machine: 11 steps, wall_clock=2 s, exit 0. Integration test: 2 passed in 6.48 s. * chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131) Headline: - authlib 1.6.6 -> 1.7.2 (clears GHSA-wvwj-cvrp-7pv5 — JWS signature verification bypass; patched at >= 1.6.9) - fastmcp 2.14.4 -> 3.2.4 (clears GHSA-vv7q-7jx5-f767 — OpenAPI Provider SSRF + path traversal; patched at >= 3.2.0) Both CVEs were flagged on PR #129 by Socket Security and are pre-existing on dev (not introduced by #128). Wider scope — read before merging: `uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers a full re-resolve of the dependency graph. Because dev's uv.lock had drifted from pyproject.toml (the project's constraint envelope had loosened over time), this single command also brings the lockfile in sync with current pyproject.toml. Net diff: 243 insertions / 369 deletions on uv.lock; no other files touched. Transitive cascades worth flagging: - anthropic 0.77.0 -> 0.102.0 (pydantic-ai-slim extra) - pydantic-graph 1.51.0 -> 1.96.0 - temporalio 1.20.0 -> 1.27.2 - alembic 1.18.1 -> 1.18.4 - aws-* and cohere transitives bumped along - griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched) - Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus exporter, pydocket, redis, rsa, sortedcontainers — these were transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in. Verification on this host: - uv sync --extra dev -> green - ruff check . -> clean - mypy --strict app/ -> 192 files clean - pyright app/ -> 0 errors (50 warnings, pre-existing) - pytest -m 'not integration' -> 969 passed Known install quirk: griffelib 2.0.2 ships a top-level `griffe/` package whose RECORD files don't always materialize on first install when uv replaces an older `griffe` dist in the same sync. A clean venv install (which CI does via `uv sync --frozen`) is unaffected; local devs who upgrade in place may need a one-shot `uv pip install --force-reinstall griffelib` if `import griffe` fails. * feat(api,ui): in-product demo showcase page (#132) (#133) * feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132) New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It drives the published API surface in-process via httpx.ASGITransport (no cross-slice imports, satisfying the vertical-slice rule) and streams one StepEvent per pipeline step: precheck -> reset -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup. A module-level asyncio.Lock enforces single-flight; concurrent runs get an RFC 7807 409. The orchestration is a faithful in-process port of scripts/run_demo.py (PR #129). Implements PRP-17. * test(api): cover the demo slice pipeline, routes, and e2e integration (#132) Unit tests mock the in-process HTTP client to exercise step sequencing, winner selection, and fail-fast; route tests cover POST /demo/run (200 + 409) and the WS /demo/stream handler. The integration test seeds demo_minimal and asserts an end-to-end green run against real Postgres. Implements PRP-17. * feat(ui): add showcase page streaming the live demo pipeline (#132) New /showcase route and nav entry. The page opens a one-shot WebSocket to /demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders the 11 pipeline steps as live status cards: glyph, detail, duration, the backtest per-model WAPE breakdown with the winner highlighted, and a pass/fail summary banner. Also block-scopes a pre-existing no-case-declarations lint error in chat.tsx so pnpm lint is green for this PR. Implements PRP-17. * test(ui): add vitest setup and use-demo-pipeline hook coverage (#132) Adds the frontend test stack (vitest + jsdom + @testing-library/react), a test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the pure event reducer (idle -> running -> pass transitions, summary assembly, error phase) and a renderHook smoke test. The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented fix for pnpm 11's esbuild build-script gate. Implements PRP-17. * docs(docs): document the demo slice and showcase page (#132) Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the /demo/run + /demo/stream rows and a WebSocket Events section in API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and REPO_MAP_INDEX rows for the demo slice and showcase page. Implements PRP-17. * chore(main): release 0.2.10 (#135) * feat: release v0.2.11 — visualization fixes, job picker, demo showcase (#158) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-time. 4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate for demo_minimal can spend 60-90 s on slower laptops once you include inventory + prices + promotions inserts. 5. step_seed detail string: GenerateResult.records_created uses 'sales' (singular), not 'sales_daily'; cosmetic fix. tests/test_e2e_demo.py: - Redirect uvicorn stdout to a temp file rather than subprocess.PIPE. The seeder + structlog produce enough INFO log volume to fill a 64-KB pipe buffer; once full, uvicorn blocks on write and seeder requests hang for the full --timeout. Verified locally: integration suite now passes in ~6.5 s instead of timing out at 120 s. - Cleanup leaves the log file on disk only when the test failed (postmortem-friendly). tests/test_run_demo_unit.py: - Bump test_defaults timeout expectation to match the new 120 s default. End-to-end manual run on this machine: 11 steps, wall_clock=2 s, exit 0. Integration test: 2 passed in 6.48 s. * chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131) Headline: - authlib 1.6.6 -> 1.7.2 (clears GHSA-wvwj-cvrp-7pv5 — JWS signature verification bypass; patched at >= 1.6.9) - fastmcp 2.14.4 -> 3.2.4 (clears GHSA-vv7q-7jx5-f767 — OpenAPI Provider SSRF + path traversal; patched at >= 3.2.0) Both CVEs were flagged on PR #129 by Socket Security and are pre-existing on dev (not introduced by #128). Wider scope — read before merging: `uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers a full re-resolve of the dependency graph. Because dev's uv.lock had drifted from pyproject.toml (the project's constraint envelope had loosened over time), this single command also brings the lockfile in sync with current pyproject.toml. Net diff: 243 insertions / 369 deletions on uv.lock; no other files touched. Transitive cascades worth flagging: - anthropic 0.77.0 -> 0.102.0 (pydantic-ai-slim extra) - pydantic-graph 1.51.0 -> 1.96.0 - temporalio 1.20.0 -> 1.27.2 - alembic 1.18.1 -> 1.18.4 - aws-* and cohere transitives bumped along - griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched) - Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus exporter, pydocket, redis, rsa, sortedcontainers — these were transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in. Verification on this host: - uv sync --extra dev -> green - ruff check . -> clean - mypy --strict app/ -> 192 files clean - pyright app/ -> 0 errors (50 warnings, pre-existing) - pytest -m 'not integration' -> 969 passed Known install quirk: griffelib 2.0.2 ships a top-level `griffe/` package whose RECORD files don't always materialize on first install when uv replaces an older `griffe` dist in the same sync. A clean venv install (which CI does via `uv sync --frozen`) is unaffected; local devs who upgrade in place may need a one-shot `uv pip install --force-reinstall griffelib` if `import griffe` fails. * feat(api,ui): in-product demo showcase page (#132) (#133) * feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132) New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It drives the published API surface in-process via httpx.ASGITransport (no cross-slice imports, satisfying the vertical-slice rule) and streams one StepEvent per pipeline step: precheck -> reset -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup. A module-level asyncio.Lock enforces single-flight; concurrent runs get an RFC 7807 409. The orchestration is a faithful in-process port of scripts/run_demo.py (PR #129). Implements PRP-17. * test(api): cover the demo slice pipeline, routes, and e2e integration (#132) Unit tests mock the in-process HTTP client to exercise step sequencing, winner selection, and fail-fast; route tests cover POST /demo/run (200 + 409) and the WS /demo/stream handler. The integration test seeds demo_minimal and asserts an end-to-end green run against real Postgres. Implements PRP-17. * feat(ui): add showcase page streaming the live demo pipeline (#132) New /showcase route and nav entry. The page opens a one-shot WebSocket to /demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders the 11 pipeline steps as live status cards: glyph, detail, duration, the backtest per-model WAPE breakdown with the winner highlighted, and a pass/fail summary banner. Also block-scopes a pre-existing no-case-declarations lint error in chat.tsx so pnpm lint is green for this PR. Implements PRP-17. * test(ui): add vitest setup and use-demo-pipeline hook coverage (#132) Adds the frontend test stack (vitest + jsdom + @testing-library/react), a test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the pure event reducer (idle -> running -> pass transitions, summary assembly, error phase) and a renderHook smoke test. The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented fix for pnpm 11's esbuild build-script gate. Implements PRP-17. * docs(docs): document the demo slice and showcase page (#132) Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the /demo/run + /demo/stream rows and a WebSocket Events section in API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and REPO_MAP_INDEX rows for the demo slice and showcase page. Implements PRP-17. * chore(repo): back-merge main into dev to absorb v0.2.10 release commits (#136) (#137) * chore(main): release 0.2.9 (#126) * feat: release v0.2.10 — demo showcase page + e2e pipeline (#134) * feat(data): implement MarkdownGenerator age_days trigger via heuristic (#94) (#127) Resolves #94 via the heuristic path documented in the issue. No schema column, no Alembic migration, no FIFO cohort tracking — the trigger self-reads the existing per-(store, product) on_hand_qty series and fires when inventory has been "unrefreshed" past `cfg.age_days_threshold`. Decision rationale (schema column vs. heuristic): - The schema column path would add `oldest_unit_age_days` to `inventory_snapshot_daily`, plus an Alembic migration, plus FIFO cohort tracking in `InventorySnapshotGenerator`. No downstream consumer reads this column today — adding it for one generator trigger violates the "don't design for hypothetical future requirements" rule in CLAUDE.md. - The heuristic path is self-contained in MarkdownGenerator, deterministic (preserves the zero-rng-draw regression invariant), and additive (no migration, no model change). 354 LOC net, all inside one slice. Heuristic spec: - A "refresh" is a day where `on_hand_qty` rose by >= `_AGE_DAYS_SPIKE_THRESHOLD` (0.3 = 30% jump) vs the prior day. - Age at day t = days since most recent refresh (or `dates[0]` if no refresh has been observed). - Firing requires age >= `age_days_threshold` AND on_hand >= `markdown_min_units_remaining` — never markdown an empty shelf. - After firing, refresh anchor resets to the day AFTER the markdown window ends, so back-to-back fires can't happen and the next age clock starts from a "clear shelf" baseline. Wiring: `MarkdownGenerator.generate()` gains an optional kwarg `inventory_records: list[dict[str, Any]] | None = None` which `core.py` passes through from `InventorySnapshotGenerator`. Disabled-path and non-age_days-path behavior is byte-identical (kwarg ignored). Tests: +7 new in `TestAgeDaysTrigger`, -1 obsolete `NotImplementedError` test. Coverage: no-records defensive, threshold not-met, threshold met, spike resets age, post-fire reset avoids back-to-back, low-inventory skip, unknown-product skip, rng non-consumption. Validation (local): - ruff check + format: clean - mypy --strict: 0 issues, 192 files - pyright --strict: 0 errors - pytest -m "not integration": 969 passed (+7 vs pre-PR) Closes #94. * feat(api,docs): e2e demo pipeline + showcase script (#128) (#129) * docs(docs): add INITIAL-14 + PRP-15 e2e demo pipeline plan (#128) Adds the planning documents for the end-to-end demo pipeline work tracked in #128. Implementation commits follow on this branch. - INITIAL-14.md: PRD for `make demo` (problem, solution, success metrics, open questions resolved in the PRP). - PRPs/PRP-15-e2e-demo-pipeline.md: full execution plan (16 tasks → 6 commits, additive only — no schema changes, no API edits). * feat(data): add demo_minimal scenario preset (#128) Tiny preset that powers the upcoming `make demo` target. Three stores × ten products × 92 days (2024-10-01..2024-12-31) — wide enough to satisfy an expanding backtest with n_splits=3, horizon=14, min_train_size=30 (needs >= 72 days, 92 leaves margin), small enough to keep the demo loop comfortable on a laptop. Mirrors the retail_standard tuning (mild linear trend, noise_sigma=0.10, modest promotion + stockout probabilities) so backtest WAPE stays non-NaN across all three baseline models. - app/shared/seeder/config.py: add DEMO_MINIMAL enum + from_scenario branch - app/features/seeder/service.py: add ScenarioInfo entry in list_scenarios - tests cover the new preset and the updated scenario count * feat(api,docs): scripts/run_demo.py end-to-end pipeline driver (#128) Single-file async driver that walks the published HTTP surface (precheck -> reset? -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup). Mirrors the shape of scripts/seed_random.py and scripts/check_db.py. - HttpClient: thin httpx.AsyncClient wrapper with explicit 60 s timeout (default 5 s is too short for /seeder/generate); surfaces RFC 7807 problem+json bodies as a typed StepError that echoes title / detail / request_id (never the raw body — secrets-safe). - DemoContext + StepOutcome dataclasses thread cross-step references. - Reporter renders the output-formatting.md glyphs (verbose by default, --quiet collapses to one line per step). - Per-step error handling converts httpx + StepError into fail outcomes; precheck failure exits 2, any other failure exits 1, green exits 0. - Agent step (10/11) skips with ⏭️ when neither OPENAI_API_KEY nor ANTHROPIC_API_KEY is set; reads via app.core.config.get_settings() to honor the no-os.environ-in-feature-code rule. - Registry handshake uses the mandatory pending -> running -> success transition and the wire alias "model_config" (not "model_config_data"); artifact_hash is computed client-side via sha256 since we share the FS with the API on this single-host system. - Winner selection: lowest aggregated WAPE, skipping NaN folds. Also adds scripts/__init__.py so tests can `import scripts.run_demo` without invoking the file as a script. * feat(repo): top-level Makefile with demo / demo-quick / demo-clean (#128) Wraps scripts/run_demo.py so reviewers can run the full end-to-end demo with one command. Recipes mirror the three modes the script supports: full run, skip-seed iteration, destructive reset. Make targets: - demo — docker compose up -d + alembic + run_demo - demo-quick — run_demo --skip-seed (no compose/migration touch) - demo-clean — full reset (--reset) before seeding - help — default goal; lists targets + preconditions Tab-indented recipes and .PHONY declarations per make conventions. Preconditions (Postgres on :5433, uvicorn on :8123) documented in the help block; the script itself enforces them via the precheck step and exits 2 on failure. * test(api): unit + integration coverage for run_demo (#128) Unit (`tests/test_run_demo_unit.py`, 32 cases): - argparse defaults + all-flags variants - DemoContext defaults (no leaking state across runs) - _select_winner: lowest-WAPE, NaN-skip, all-NaN -> None, empty -> None - _model_config_payload: discriminated-union shape per baseline; rejects unsupported model_type (defends the "no lightgbm in PRP-15" boundary) - Reporter: glyph mapping; verbose + quiet output; summary green / failure / over-budget soft-warn branches - StepError formats RFC 7807 (title/detail/request_id) without leaking the raw response body - HttpClient (mocked httpx.AsyncClient): 2xx, 204, non-2xx -> StepError - Step payload sanity: seed sends demo_minimal+correct dims+ISO dates; features sends cutoff_date as ISO; train fires three model_types in parallel; agent step skips with ⏭️ when no LLM key Integration (`tests/test_e2e_demo.py`, @pytest.mark.integration): - Skips if Postgres on :5433 isn't reachable - Boots uvicorn on :8124 as a subprocess (avoids the dev :8123 default) - Runs scripts/run_demo.py --reset against it; asserts exit 0 + canonical "runs=3 winner=... alias=demo-production" summary - Second case asserts a bogus URL exits 2 (no silent success) - Cleans up uvicorn on teardown with terminate/kill fallback - Resolves `uv` via shutil.which to keep ruff S607 happy and avoid PATH-dependent exec at test time * ci(repo): nightly e2e demo workflow (#128) Adds .github/workflows/e2e-nightly.yml — runs scripts/run_demo.py against a fresh Postgres+pgvector service every night at 07:00 UTC (plus on-demand via workflow_dispatch). Catches regressions in the documented end-to-end pipeline before they bleed into the per-PR gate. Per PRP-15 + INITIAL-14 risk note: this workflow is intentionally NOT a required status check on dev or main. Flake-budget lives in the nightly slot, not in ci.yml. - pgvector/pgvector:pg16 service container (same as ci.yml `test` job) - uvicorn started in background; /health polled with a 30 s deadline - run_demo.py called with --seed 42 (deterministic) - LLM-key env vars intentionally absent — agent step auto-skips via ⏭️, keeping the workflow self-contained - uvicorn logs uploaded as artifact on failure (7-day retention) so postmortems can read what the API was doing when the script broke - astral-sh/setup-uv pinned by 40-char SHA per security-patterns.md - permissions: contents: read (least-privilege) * docs(docs): cross-link make demo from README + RUNBOOKS + REPO_MAP_INDEX (#128) Discoverability layer for PRP-15. - README.md: new 'Try it: end-to-end demo' step right after the curl /health verification; shows the canonical final-line summary so reviewers know what green looks like. - docs/DAILY-FLOW.md: new 'First-Run Smoke (Demo Pipeline)' section documenting all three Make targets. - docs/_base/RUNBOOKS.md: new 'make demo fails at step X' Common Incidents entry with a 7-point diagnosis flow keyed to the script's step names + a postmortem-capture recipe. - docs/_base/REPO_MAP_INDEX.md: Makefile and scripts/run_demo.py rows added to the Document Index table. Pure additive; no existing content removed or renamed. * fix(data): update /seeder/scenarios route test for demo_minimal preset (#128) Companion to feat(data): add demo_minimal scenario preset — the route-level assertion in TestListScenarios.test_returns_scenarios still expected 6 scenarios; bumping to 7 and adding the demo_minimal name membership check to match the service-layer + config-layer tests already updated in 005c189. * fix(api): harden run_demo for integration test + real DB (#128) Three real failures surfaced when first running the integration test against docker-compose Postgres + a freshly booted uvicorn; all three are now closed: scripts/run_demo.py: 1. step_status: discover the real (store_id, product_id) from /dimensions/stores + /dimensions/products instead of hardcoding 1. Postgres auto-increment does NOT reset after delete, so the freshly seeded IDs are NOT 1 (they were ~150-260 on this branch after a few delete/seed cycles). 2. step_register: copy the trained-model artifact into the registry's own root (settings.registry_artifact_root) and record a registry- relative URI. The registry verify endpoint resolves artifact_uri against its own root, which is separate from where /forecasting/train writes (settings.forecast_model_artifacts_dir). Pre-fix, verify returned 404 even though the artifact existed on disk. 3. step_agent: skip with the soft-skip glyph on any LLM provider failure (invalid key, model unavailable, 5xx), and make _llm_key_present provider-aware so it matches the right env var to the configured agent_default_model. Pre-fix, an .env with anthropic/openai keys but a Gemini default model failed hard at chat-time. 4. Bumped DEFAULT_TIMEOUT_S from 60 to 120 because /seeder/generate for demo_minimal can spend 60-90 s on slower laptops once you include inventory + prices + promotions inserts. 5. step_seed detail string: GenerateResult.records_created uses 'sales' (singular), not 'sales_daily'; cosmetic fix. tests/test_e2e_demo.py: - Redirect uvicorn stdout to a temp file rather than subprocess.PIPE. The seeder + structlog produce enough INFO log volume to fill a 64-KB pipe buffer; once full, uvicorn blocks on write and seeder requests hang for the full --timeout. Verified locally: integration suite now passes in ~6.5 s instead of timing out at 120 s. - Cleanup leaves the log file on disk only when the test failed (postmortem-friendly). tests/test_run_demo_unit.py: - Bump test_defaults timeout expectation to match the new 120 s default. End-to-end manual run on this machine: 11 steps, wall_clock=2 s, exit 0. Integration test: 2 passed in 6.48 s. * chore(repo): bump authlib + fastmcp to clear Socket-flagged CVEs (#130) (#131) Headline: - authlib 1.6.6 -> 1.7.2 (clears GHSA-wvwj-cvrp-7pv5 — JWS signature verification bypass; patched at >= 1.6.9) - fastmcp 2.14.4 -> 3.2.4 (clears GHSA-vv7q-7jx5-f767 — OpenAPI Provider SSRF + path traversal; patched at >= 3.2.0) Both CVEs were flagged on PR #129 by Socket Security and are pre-existing on dev (not introduced by #128). Wider scope — read before merging: `uv lock --upgrade-package authlib --upgrade-package fastmcp` triggers a full re-resolve of the dependency graph. Because dev's uv.lock had drifted from pyproject.toml (the project's constraint envelope had loosened over time), this single command also brings the lockfile in sync with current pyproject.toml. Net diff: 243 insertions / 369 deletions on uv.lock; no other files touched. Transitive cascades worth flagging: - anthropic 0.77.0 -> 0.102.0 (pydantic-ai-slim extra) - pydantic-graph 1.51.0 -> 1.96.0 - temporalio 1.20.0 -> 1.27.2 - alembic 1.18.1 -> 1.18.4 - aws-* and cohere transitives bumped along - griffe 1.15.0 dropped in favor of griffelib 2.0.2 (fastmcp 3.x switched) - Removed: cloudpickle, diskcache, fakeredis, invoke, lupa, prometheus exporter, pydocket, redis, rsa, sortedcontainers — these were transitives of fastmcp 2.x that fastmcp 3.x no longer pulls in. Verification on this host: - uv sync --extra dev -> green - ruff check . -> clean - mypy --strict app/ -> 192 files clean - pyright app/ -> 0 errors (50 warnings, pre-existing) - pytest -m 'not integration' -> 969 passed Known install quirk: griffelib 2.0.2 ships a top-level `griffe/` package whose RECORD files don't always materialize on first install when uv replaces an older `griffe` dist in the same sync. A clean venv install (which CI does via `uv sync --frozen`) is unaffected; local devs who upgrade in place may need a one-shot `uv pip install --force-reinstall griffelib` if `import griffe` fails. * feat(api,ui): in-product demo showcase page (#132) (#133) * feat(api): add demo slice driving the e2e pipeline via /demo endpoints (#132) New app/features/demo slice exposing POST /demo/run and WS /demo/stream. It drives the published API surface in-process via httpx.ASGITransport (no cross-slice imports, satisfying the vertical-slice rule) and streams one StepEvent per pipeline step: precheck -> reset -> seed -> status -> features -> train x3 -> backtest x3 -> register -> verify -> agent -> cleanup. A module-level asyncio.Lock enforces single-flight; concurrent runs get an RFC 7807 409. The orchestration is a faithful in-process port of scripts/run_demo.py (PR #129). Implements PRP-17. * test(api): cover the demo slice pipeline, routes, and e2e integration (#132) Unit tests mock the in-process HTTP client to exercise step sequencing, winner selection, and fail-fast; route tests cover POST /demo/run (200 + 409) and the WS /demo/stream handler. The integration test seeds demo_minimal and asserts an end-to-end green run against real Postgres. Implements PRP-17. * feat(ui): add showcase page streaming the live demo pipeline (#132) New /showcase route and nav entry. The page opens a one-shot WebSocket to /demo/stream via a use-demo-pipeline hook (wrapping useWebSocket) and renders the 11 pipeline steps as live status cards: glyph, detail, duration, the backtest per-model WAPE breakdown with the winner highlighted, and a pass/fail summary banner. Also block-scopes a pre-existing no-case-declarations lint error in chat.tsx so pnpm lint is green for this PR. Implements PRP-17. * test(ui): add vitest setup and use-demo-pipeline hook coverage (#132) Adds the frontend test stack (vitest + jsdom + @testing-library/react), a test script, and vitest.config.ts. use-demo-pipeline.test.ts covers the pure event reducer (idle -> running -> pass transitions, summary assembly, error phase) and a renderHook smoke test. The package.json pnpm.onlyBuiltDependencies entry is the RUNBOOKS-documented fix for pnpm 11's esbuild build-script gate. Implements PRP-17. * docs(docs): document the demo slice and showcase page (#132) Adds the PRP-17 spec; a 'Try it in the browser' pointer in README; the /demo/run + /demo/stream rows and a WebSocket Events section in API_CONTRACTS; a 'Showcase pipeline fails' runbook incident; and REPO_MAP_INDEX rows for the demo slice and showcase page. Implements PRP-17. * chore(main): release 0.2.10 (#135) * docs(docs): fix broken PRP-0/INITIAL-0 relative links in phase 0 doc (#138) (#139) * docs(repo): fix readme dev-deps command and stale .github template placeholders (#140) (#141) * docs(docs): fill DEV_GUIDE.md onboarding stub sections (#142) (#143) * docs(repo): refresh stale CLAUDE.md note, document /demo API, align PR template (#144) (#145) * fix(ui): chart series render black — drop hsl() wrapper on oklch chart vars (#149) (#153) index.css defines --chart-N twice: legacy shadcn-v3 HSL triplets and Tailwind-4 / shadcn-v4 complete oklch() colours. The oklch definitions win the cascade, so at runtime --chart-N is a full colour. The chart components still wrapped it as hsl(var(--chart-N)) → hsl(oklch(...)), which is invalid CSS, so recharts fell back to a black fill/stroke — invisible on the dark theme. Reference var(--chart-N) directly in backtest-folds-chart.tsx and time-series-chart.tsx. Verified in a browser: the backtest per-fold bars and the forecast line now render in colour. * fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148) (#152) * fix(jobs): backtest job result keeps fold metrics, stability and baselines (#148) _execute_backtest ran BacktestingService.run_backtest — which computes per-fold metrics, stability indices and a naive/seasonal baseline comparison — but stored only four aggregated values and discarded the rest. The dashboard (/visualize/backtest) reads aggregated_metrics.{*_mean, stability_index}, fold_metrics[] and baseline_comparison, so it showed "0 folds", all-zero metrics and an empty chart. Add _shape_backtest_result(), which flattens a BacktestResponse into the contract the dashboard expects, and _finite(), which coerces NaN/inf to 0.0 so the result stays JSONB-safe (stability is NaN with fewer than two folds). Add app/features/jobs/tests/test_service.py with unit coverage for the shaping logic: fold metrics, *_mean keys, stability, baseline comparison, the no-baselines path, and NaN coercion. * refactor(jobs): centralize backtest metric keys and surface drift (#148) Addresses review feedback on PR #152. - Hoist the dashboard's metric set into _BACKTEST_METRICS and the headline stability metric into _STABILITY_METRIC, so the hardcoded keys live in one documented place instead of being repeated across the shaping logic. - Log jobs.backtest_metrics_missing when an expected metric is absent from the backtest response, so a future rename in the backtesting service fails loud instead of silently emitting 0.0. - Document the WAPE stability convention in the _shape_backtest_result docstring. - Tests: assert backtest_id / model_type / duration_ms pass through unchanged, and add a regression test for the missing-metric default path. * fix(ui): forecast page reads forecasts/forecast from predict job result (#147) (#151) The /visualize/forecast page never rendered the chart for a valid completed predict job. It read job.result.predictions with field `predicted`, but POST /jobs (job_type="predict") returns job.result.forecasts with field `forecast`. forecastData was therefore always undefined and the page fell through to "No prediction data available in job result". Read result.forecasts with field `forecast`, and pass predictedKey="forecast" to TimeSeriesChart (which already supports a configurable data key). Verified in a browser: entering a completed predict job ID now renders the 14-day forecast line chart with correct tooltip values. * fix(registry): tolerate multiple matches in _find_duplicate (#146) (#150) Under the default registry_duplicate_policy="detect", duplicate runs are created intentionally, so multiple non-archived model_run rows can share one config hash. _find_duplicate used scalar_one_or_none(), which raised MultipleResultsFound once two duplicates existed — POST /registry/runs then returned HTTP 500. This made the demo/Showcase register step fail deterministically on any DB with repeated runs. Order the lookup by created_at desc, LIMIT 1, and use scalars().first() so it returns the most recent matching run instead of asserting a single match. Add an integration regression test that POSTs an identical run three times under the detect policy and asserts all three return 201. * fix(ui): derive TimeSeriesChart line stroke from the config key (#156) (#157) TimeSeriesChart builds chartConfig with dynamic keys ([actualKey] / [predictedKey]), so shadcn's ChartContainer injects --color- CSS variables. The elements, however, hardcoded stroke="var(--color-actual)" and stroke="var(--color-predicted)". The forecast page passes predictedKey="forecast", so the injected variable is --color-forecast; var(--color-predicted) was undefined, the stroke was invalid, and SVG fell back to its initial value `none` — the forecast line was invisible. Build the stroke from the key: stroke={`var(--color-${actualKey})`} / stroke={`var(--color-${predictedKey})`}. Verified in a browser: the forecast line now renders in colour. * feat(ui): job picker dropdown on forecast and backtest pages (#154) (#155) The visualization pages only accepted a job ID typed into a text box, so users had to already know the ID. Add a JobPicker component: a dropdown of completed jobs of the relevant type (predict / backtest), newest first, with each option labelled by short id, model and timestamp. - New shared component src/components/common/job-picker.tsx, used by both forecast.tsx and backtest.tsx. - The manual job-ID input stays alongside the dropdown for pasting an ID. - The most recent completed job auto-loads on mount so a chart shows immediately without interaction. No backend change — GET /jobs?job_type=&status=completed already exists. Verified in a browser on both pages. * chore(main): release 0.2.11 (#159) * feat(api,ui): add AI model admin console with Ollama support (#162) (#163) * feat(api,ui): add AI model admin console with Ollama support (#162) * fix(db): register AppConfig model in alembic env for schema-drift check (#162) * fix(agents): handle model tool-retry crash gracefully (#164) (#165) A casual message to the Experiment agent could crash the WebSocket stream with a raw 'Tool ... exceeded max retries count of 1' error when the model produced an invalid tool call. - Catch PydanticAI's UnexpectedModelBehavior in stream_chat and chat; surface a clean, recoverable error event / message instead of leaking the internal exception string. - Make tool_compare_backtest_results tolerant of missing/empty args (return a self-correcting hint) so a malformed call no longer burns the retry budget and crashes the run. - Add a conversational-fallback line to the experiment system prompt so greetings are answered without invoking workflow tools. - Add regression tests for both the chat and stream-chat paths. * fix(agents): round-trip agent message history through pydantic-ai type adapter (#166) (#167) * fix(agents): round-trip agent message history through pydantic-ai type adapter (#166) Multi-turn agent chat crashed with a dict-has-no-attribute-conversation_id error: _deserialize_messages returned the raw stored dicts unchanged, but PydanticAI 1.96 requires real ModelMessage objects (it accesses msg.conversation_id on every history item). - _serialize_messages now uses ModelMessagesTypeAdapter.dump_python (mode=json), so stored history can be round-tripped. - _deserialize_messages uses ModelMessagesTypeAdapter.validate_python, degrading to an empty history (with a warning) when stored data predates this format instead of crashing the run. - Replace the serialization tests with a real round-trip test and a legacy-format fallback test. * fix(agents): broaden deserialize-failure handling and log session id (#166) Address code-review feedback on the message-history round-trip fix: - _deserialize_messages now catches any Exception, not only ValidationError, so a malformed stored record (wrong shape, type errors) can never crash an otherwise-valid agent run. - The warning logs exc_info (full type, message, traceback) instead of just str(e), and includes session_id so a failure can be correlated with the specific stored record. - Add a regression test that a non-ValidationError adapter failure also degrades to an empty history. * fix(agents): apply configured agent_retry_attempts to the agents (#170) (#171) Settings.agent_retry_attempts (default 3, set in .env) was never passed to the PydanticAI Agent constructor, so both agents silently used the framework default of 1. Agent runs failed with "Exceeded maximum output retries (1)" — a weaker model got only one attempt to emit a valid structured ExperimentReport / RAGAnswer. - Add get_agent_retries() helper in agents/base.py. - Pass output_retries and tool_retries to the experiment and rag_assistant Agent constructors (PydanticAI 1.96 deprecated the combined retries kwarg in favour of the two explicit ones). - Add tests asserting both agents are built with the configured budget. * fix(agents): complete tool-using runs — sequential session use + PromptedOutput (#172, #173) (#174) * fix(agents): run agent tool calls sequentially over the shared session (#172) When the model emitted multiple DB-touching tool calls in one turn, PydanticAI executed them concurrently. Every agent tool shares the single AgentDeps.db AsyncSession, and SQLAlchemy forbids concurrent operations on one session, so the run failed intermittently with "InvalidRequestError: concurrent operations are not permitted". - Wrap agent.run() (chat) and agent.run_stream() (stream_chat) in Agent.parallel_tool_call_execution_mode("sequential"). - Add a regression test asserting chat() runs under sequential mode. * fix(agents): use PromptedOutput so weaker models can produce structured output (#173) Both agents declared output_type as a plain model, which PydanticAI serves via its default ToolOutput mode (the model must call a hidden final_result tool). Weaker/local models answer in plain prose instead, PydanticAI rejects it as json_invalid, and the run fails with "Exceeded maximum output retries". - Wrap the experiment and rag_assistant output_type in PromptedOutput, which places the JSON schema in the prompt and parses the model's text reply. Works for local and cloud models alike. - Add tests asserting both agents build with a PromptedOutputSchema. * refactor(agents): centralize sequential-tool-execution policy and harden agent tests (#173) Addresses code-review feedback on PR #174: - Extract the duplicated Agent.parallel_tool_call_execution_mode("sequential") wrapping from chat() and stream_chat() into a _sequential_tool_execution() helper, so the issue-#172 execution-mode policy lives in one place. - Replace test reliance on private internals. test_base.py no longer asserts agent._output_schema's class-name string; it now verifies PromptedOutput behaviorally via the public FunctionModel test double (no final_result output tool registered, plain-text JSON reply parsed into the schema). - test_service.py drops the private _parallel_execution_mode_ctx_var import and asserts the public Agent.parallel_tool_call_execution_mode API instead. - Add test_stream_chat_runs_tools_sequentially mirroring the chat() test so the streaming path is covered against issue #172 regressions. * fix(agents): correct prompt tool names and recover from tool errors (#175) (#177) Two coupled robustness fixes for the agent layer, both surfaced by a capture_run_messages diagnostic. The changes share base.py and experiment.py, so they land in one commit. #175 — the experiment prompt named tools as run_backtest / list_runs / compare_backtest_results, but the registered tools are tool_-prefixed (tool_run_backtest, ...). Weaker models trusted the prompt and called unknown tool names. TOOL_USAGE_INSTRUCTIONS and the EXPERIMENT_SYSTEM_PROMPT workflow now use the exact registered names. #176 — a tool raising a plain exception aborted the whole run (observed: ValueError "No data found for store=..."). New recoverable() decorator wraps every async DB-touching tool so an expected ValueError becomes a ModelRetry the model can correct from; other exceptions still propagate. - Add recoverable() to agents/base.py; decorate the 6 experiment tools and the 2 rag_assistant tools (tool_plain pure tools left alone). - Tests: prompt names use tool_* ; recoverable converts ValueError to ModelRetry, passes other exceptions through, is transparent on success. * fix(data): anchor seeded data window to the current date (#181) (#182) * fix(agents): wire FallbackModel so a primary 503 retries the fallback (#183) (#184) * fix(agents): wire FallbackModel so a primary 503 retries the fallback (#183) * test(agents): assert FallbackModel wiring order and primary fail-fast (#183) * feat(ui): Knowledge page + Agent Guide page (PRP-19) (#186) * feat(api): expose agent session limits on GET /config/ai (#185) * feat(ui): add knowledge and agent guide pages with nav (#185) * test(ui): cover knowledge-utils pure helpers (#185) * docs(docs): document the knowledge and agent guide pages (#185) * feat(ui): Explorer interactivity — detail views, richer tables, Sales charts, cross-filtering (PRP-20) (#188) * feat(analytics): add GET /analytics/timeseries aggregated sales endpoint (#187) * feat(dimensions): add sort_by/sort_order to store and product listings (#187) * feat(ui): add explorer detail pages, sortable tables, and sales charts (#187) * test(ui): cover the csv-export pure helper (#187) * docs(docs): document the explorer interactivity extension (#187) * feat(ui): Explorer interactivity — Model Runs & Jobs detail views, comparison, verify, sorting (PRP-21) (#190) * feat(registry): add sort_by/sort_order to model-run listing (#189) * feat(jobs): add sort_by/sort_order to job listing (#189) * test(registry,jobs): cover list-endpoint sorting (#189) * feat(ui): add run/job detail and run-comparison pages (#189) * feat(ui): make Runs and Jobs tables interactive (#189) * docs(docs): document explorer runs/jobs interactivity (#189) --- .../PRP-19-knowledge-and-agent-guide-pages.md | 952 +++++++++++++++ PRPs/PRP-20-explorer-interactivity.md | 1002 ++++++++++++++++ ...PRP-21-explorer-runs-jobs-interactivity.md | 1022 +++++++++++++++++ README.md | 5 + app/features/agents/agents/base.py | 51 + app/features/agents/agents/experiment.py | 10 +- app/features/agents/agents/rag_assistant.py | 10 +- app/features/agents/tests/test_base.py | 62 + app/features/analytics/routes.py | 95 ++ app/features/analytics/schemas.py | 63 + app/features/analytics/service.py | 106 +- app/features/analytics/tests/conftest.py | 156 ++- .../tests/test_routes_integration.py | 180 +++ app/features/analytics/tests/test_schemas.py | 77 ++ app/features/config/schemas.py | 7 + app/features/config/service.py | 5 + app/features/config/tests/test_routes.py | 21 + app/features/config/tests/test_schemas.py | 32 + app/features/config/tests/test_service.py | 18 + app/features/demo/pipeline.py | 14 +- app/features/dimensions/routes.py | 28 + app/features/dimensions/service.py | 52 +- app/features/dimensions/tests/conftest.py | 151 +++ app/features/dimensions/tests/test_sort.py | 156 +++ app/features/jobs/routes.py | 16 + app/features/jobs/service.py | 28 +- app/features/jobs/tests/conftest.py | 108 +- app/features/jobs/tests/test_routes.py | 115 ++ app/features/registry/routes.py | 15 + app/features/registry/service.py | 29 +- app/features/registry/tests/test_routes.py | 91 ++ app/features/seeder/schemas.py | 10 +- app/features/seeder/service.py | 36 +- app/features/seeder/tests/test_service.py | 13 +- app/shared/seeder/config.py | 45 +- app/shared/seeder/tests/test_config.py | 26 +- app/shared/seeder/tests/test_core.py | 13 +- docs/_base/API_CONTRACTS.md | 9 +- docs/_base/REPO_MAP_INDEX.md | 7 + frontend/src/App.tsx | 63 + frontend/src/components/charts/index.ts | 1 + .../components/charts/revenue-bar-chart.tsx | 56 + frontend/src/components/common/index.ts | 1 + frontend/src/components/common/json-block.tsx | 28 + .../data-table/data-table-column-header.tsx | 41 + .../data-table/data-table-view-options.tsx | 45 + .../src/components/data-table/data-table.tsx | 21 + frontend/src/components/data-table/index.ts | 2 + frontend/src/hooks/index.ts | 2 + frontend/src/hooks/use-jobs.ts | 8 +- frontend/src/hooks/use-lifecycle-curve.ts | 27 + frontend/src/hooks/use-products.ts | 8 +- frontend/src/hooks/use-rag-sources.ts | 18 +- frontend/src/hooks/use-runs.ts | 35 +- frontend/src/hooks/use-stores.ts | 8 +- frontend/src/hooks/use-timeseries.ts | 41 + frontend/src/lib/constants.ts | 13 + frontend/src/lib/csv-export.test.ts | 56 + frontend/src/lib/csv-export.ts | 41 + frontend/src/lib/knowledge-utils.test.ts | 91 ++ frontend/src/lib/knowledge-utils.ts | 43 + frontend/src/pages/chat.tsx | 12 +- frontend/src/pages/explorer/job-detail.tsx | 208 ++++ frontend/src/pages/explorer/jobs.tsx | 212 ++-- .../src/pages/explorer/product-detail.tsx | 215 ++++ frontend/src/pages/explorer/products.tsx | 155 ++- frontend/src/pages/explorer/run-compare.tsx | 271 +++++ frontend/src/pages/explorer/run-detail.tsx | 272 +++++ frontend/src/pages/explorer/runs.tsx | 185 ++- frontend/src/pages/explorer/sales.tsx | 157 ++- frontend/src/pages/explorer/store-detail.tsx | 200 ++++ frontend/src/pages/explorer/stores.tsx | 178 ++- frontend/src/pages/guide.tsx | 362 ++++++ frontend/src/pages/knowledge.tsx | 344 ++++++ frontend/src/types/api.ts | 85 ++ scripts/run_demo.py | 14 +- scripts/seed_random.py | 16 +- tests/test_demo_showcase_integration.py | 11 +- tests/test_run_demo_unit.py | 14 +- 79 files changed, 8051 insertions(+), 345 deletions(-) create mode 100644 PRPs/PRP-19-knowledge-and-agent-guide-pages.md create mode 100644 PRPs/PRP-20-explorer-interactivity.md create mode 100644 PRPs/PRP-21-explorer-runs-jobs-interactivity.md create mode 100644 app/features/analytics/tests/test_routes_integration.py create mode 100644 app/features/analytics/tests/test_schemas.py create mode 100644 app/features/dimensions/tests/test_sort.py create mode 100644 app/features/jobs/tests/test_routes.py create mode 100644 frontend/src/components/charts/revenue-bar-chart.tsx create mode 100644 frontend/src/components/common/json-block.tsx create mode 100644 frontend/src/components/data-table/data-table-column-header.tsx create mode 100644 frontend/src/components/data-table/data-table-view-options.tsx create mode 100644 frontend/src/hooks/use-lifecycle-curve.ts create mode 100644 frontend/src/hooks/use-timeseries.ts create mode 100644 frontend/src/lib/csv-export.test.ts create mode 100644 frontend/src/lib/csv-export.ts create mode 100644 frontend/src/lib/knowledge-utils.test.ts create mode 100644 frontend/src/lib/knowledge-utils.ts create mode 100644 frontend/src/pages/explorer/job-detail.tsx create mode 100644 frontend/src/pages/explorer/product-detail.tsx create mode 100644 frontend/src/pages/explorer/run-compare.tsx create mode 100644 frontend/src/pages/explorer/run-detail.tsx create mode 100644 frontend/src/pages/explorer/store-detail.tsx create mode 100644 frontend/src/pages/guide.tsx create mode 100644 frontend/src/pages/knowledge.tsx diff --git a/PRPs/PRP-19-knowledge-and-agent-guide-pages.md b/PRPs/PRP-19-knowledge-and-agent-guide-pages.md new file mode 100644 index 00000000..2991a0df --- /dev/null +++ b/PRPs/PRP-19-knowledge-and-agent-guide-pages.md @@ -0,0 +1,952 @@ +name: "PRP-19 — Knowledge page + Agent Guide page (in-product self-documentation)" +description: | + Add two new React pages to the ForecastLabAI dashboard, frontend-led and + fully additive: + + 1. **Knowledge** (`/knowledge`) — presents, in detail, *what ForecastLabAI + currently knows*: the RAG knowledge base (indexed sources + a live semantic + search box) plus a summary of the live system state the agents can query + (seeded data, registered model runs, deployment aliases). + 2. **Agent Guide** (`/guide`) — explains, in detail, *how to use the Chat + agents*: the two agent types, their tools, the human-in-the-loop approval + flow, session limits, the streaming protocol, and copy-paste example prompts. + + Frontend-led, with one small additive backend change: the existing + `GET /config/ai` response gains read-only agent-limit fields so the Guide + shows session limits live. No new backend slice, no migration, no new env var. + Every other endpoint these pages consume already exists with a frontend hook. + +## Purpose +Close the in-product self-documentation gap. Today a dashboard visitor can open +`/chat` and talk to an agent, but nothing in the UI tells them (a) what the +RAG assistant actually has indexed to answer from, or (b) how the agents work, +what they can do, or how the approval gate behaves. The two pages turn implicit +system knowledge into a visible, browsable surface — a natural onboarding pair: +**Knowledge** = "what it knows" → **Agent Guide** = "how to ask it". + +> **PRP numbering:** `PRP-16` is reserved (Phase-2 LightGBM, per PRP-15). +> `PRP-17` (Showcase) and `PRP-18` (AI Model console) are used. This is `PRP-19`. + +## Core Principles +1. **Context is King** — every endpoint shape, hook name, schema field, and + pattern referenced below is linked to a real source file + line. +2. **Reuse existing patterns** — both pages are lazy routes registered exactly + like `Showcase` (PRP-17); data comes through existing TanStack Query hooks + (`useRagSources`, `useSeederStatus`, `useAIConfig`, …); UI uses existing + shadcn primitives (`Card`, `Badge`, `Input`, `Tabs`, `Button`). No new + streaming primitive, no new fetch wrapper. +3. **Additive only** — no new backend slice, no Alembic migration, no new + `.env` var. The one backend change is additive: read-only agent-limit fields + appended to the existing `AIModelConfig` (`GET /config/ai`) response. Plus + one new hook (`useRetrieve`), three new TS interfaces, two new pages, one + pure-helper module. +4. **Read-only, no duplication** — the Knowledge page is *presentational*. It + does NOT duplicate Admin's RAG management (index / delete) — those stay in + `frontend/src/pages/admin.tsx`. It adds the semantic-search exploration that + Admin lacks. +5. **Strict gates honored** — `pnpm tsc --noEmit` + `pnpm lint` + `pnpm test` + green; AND because the `config` slice `.py` files change, the repo-wide + `ruff`/`mypy`/`pyright`/`pytest` CI jobs must be run and stay green — the + `/config/ai` change ships with `config` slice tests. +6. **UI through skills** — pages built via `frontend-design` + `shadcn-ui` and + dogfooded via `webapp-testing` / `agent-browser` per `.claude/rules/ui-design.md`. + A green type-check is NOT proof the UI works. + +--- + +## Goal +Two new nav items route to two new pages. + +**`/knowledge` — Knowledge** +- A **Knowledge Base** section: `total_sources` / `total_chunks` summary, a + read-only list of every indexed RAG source (path, type badge, chunk count, + indexed date), and a **semantic search box** that POSTs to `/rag/retrieve` + and renders the matching chunks with relevance scores + source citations. +- A **Live System State** section: the seeded-data summary (stores / products / + sales / date range), the count of registered model runs, and the deployment + aliases — i.e. what the *experiment* agent can query through its tools. +- A short explainer tying it together: "the RAG assistant answers from the + Knowledge Base; the experiment agent acts on the Live System State." + +**`/guide` — Agent Guide** +- Describes the **two agents** (`rag_assistant`, `experiment`), each with its + purpose, its exact tool names, and what it returns. +- Walks through **how a chat session works**: pick agent → Start Session → + send a message → streamed text + tool-call chips → approval prompts → + New Session. +- Explains the **human-in-the-loop approval gate** (`create_alias`, + `archive_run`). +- Lists **session limits** (token budget, tool-call cap, timeout, TTL, retries) + — rendered **live** from `/config/ai`, which is extended to return them. +- Gives **copy-paste example prompts** per agent. +- Surfaces the **currently configured agent model** (live, from `/config/ai`) + and links to Chat and Admin → AI Models. +- Reachable both from a flat top-level nav item AND from a help link on the + Chat page. + +## Why +- **Portfolio identity.** `.claude/rules/product-vision.md` principle 1 — + "portfolio-grade, end-to-end … every phase ships working code". The agentic + layer (PRP-10) and RAG layer (PRP-9) are fully built but invisible as + *capabilities* — a reviewer has to read code to learn what the agents do. +- **Onboarding.** A first-time user opening `/chat` has no idea what to ask the + RAG assistant (it can only answer from indexed docs) or that the experiment + agent can run real backtests. These two pages remove that guesswork. +- **Low-cost surface.** Almost everything needed already exists server-side; + the only backend work is a small additive `/config/ai` extension. This is + high-value-per-line work: mostly composition of shipped endpoints into two + polished pages. + +## What +Frontend-led. Two lazy-loaded pages mirroring the `Showcase` registration +(PRP-17), two new `ROUTES` entries, two `NAV_ITEMS` entries, a help link to +`/guide` on the Chat page, one new mutation hook (`useRetrieve` for +`POST /rag/retrieve`), three new TS interfaces, and one pure-helper module with +a vitest. Plus one additive backend change: the `config` slice's `AIModelConfig` +schema + `get_effective_config` service gain read-only agent-limit fields +(`agent_max_tool_calls`, `agent_timeout_seconds`, `agent_retry_attempts`, +`agent_session_ttl_minutes`, `agent_require_approval`) so the Guide's limits are +live; shipped with `config` slice tests. No migration, no new env var. + +### Success Criteria +- [ ] `GET /knowledge` in the running SPA renders the Knowledge Base section + (source list + summary) and the Live System State section. +- [ ] The semantic search box on `/knowledge` POSTs `/rag/retrieve` and renders + `ChunkResult`s with a relevance score; an empty query is rejected client-side; + a `502` (no embedding provider) shows a graceful "search unavailable" state + while the source list still renders. +- [ ] An empty knowledge base shows a friendly empty state pointing at + Admin → RAG Sources (not a crash, not a blank card). +- [ ] `GET /guide` renders both agent cards with the **exact** tool names from + the agent definitions, the approval-gate explainer, the example prompts, + and the session limits + agent model rendered **live** from `/config/ai`. +- [ ] Both pages appear in the top nav (desktop + mobile sheet) and in `App.tsx` + as lazy ``s wrapped in ``; the Chat page links to `/guide`. +- [ ] `cd frontend && pnpm tsc --noEmit && pnpm lint && pnpm test --run` all clean. +- [ ] `frontend/src/lib/knowledge-utils.test.ts` passes (pure-helper coverage). +- [ ] `GET /config/ai` returns the five additive agent-limit fields; the `config` + slice tests (`test_schemas.py`/`test_service.py`/`test_routes.py`) cover + them and `ruff`/`mypy`/`pyright`/`pytest` stay green. +- [ ] Only the `config` slice changes server-side; no Alembic migration; no + `.env`/`.env.example` var. +- [ ] Admin's RAG index/delete management is untouched and NOT duplicated. +- [ ] Both pages dogfooded in a real browser (screenshot captured). + +--- + +## All Needed Context + +### Documentation & References +```yaml +- url: https://tanstack.com/query/latest/docs/framework/react/guides/queries + why: useQuery (GET) vs useMutation (POST) — the Knowledge search is a mutation + critical: | + GET data → useQuery({ queryKey, queryFn }). POST actions → useMutation({ + mutationFn }). The repo's hooks follow this exactly (see use-rag-sources.ts). + Semantic search is a POST → a useMutation, NOT a useQuery. + +- url: https://reactrouter.com/en/main/route/lazy + why: react-router v6 route registration; the repo lazy-loads every page + critical: Mirror App.tsx — `lazy(() => import('@/pages/x'))` + ``. + +- file: PRPs/PRP-17-demo-showcase-page.md + why: The most recent "add a new page" PRP. Its frontend tasks (constants + + App.tsx + lazy route + nav entry) are the exact pattern to copy. + critical: This PRP follows PRP-17's frontend half precisely; the only deltas + are "two pages instead of one" and "no backend slice". + +- file: frontend/src/App.tsx + why: Lazy-route registration. Add `KnowledgePage` and `GuidePage` lazily and a + `` / `` exactly + like the existing `ShowcasePage` block (lines 12, 42-49). + critical: Pages are `lazy(() => import(...))`; each route element is wrapped in + `}>`. + +- file: frontend/src/lib/constants.ts + why: ROUTES + NAV_ITEMS. Add `KNOWLEDGE: '/knowledge'` and `GUIDE: '/guide'` + to ROUTES, and two NAV_ITEMS entries. + critical: | + NAV_ITEMS is `as const`. Knowledge and Agent Guide are flat top-level items + (not grouped). Place `Knowledge` after `Visualize` and `Agent Guide` after + `Chat` so the nav reads: Dashboard · Showcase · Explorer · Visualize · + Knowledge · Chat · Agent Guide · Admin (a "know it → chat → how to chat" + cluster). No new WS URL needed. + +- file: frontend/src/pages/admin.tsx + why: THE reference page. `RagSourcesPanel` (lines 116-253) already lists + `/rag/sources` data — copy its source-row markup. `SeederPanel`'s `StatCard` + (lines 769-785) is the data-summary tile to reuse on the Knowledge page. + critical: | + - admin.tsx keeps all sub-components in ONE file (RagSourcesPanel, + AliasesPanel, SeederPanel, StatCard helpers). Mirror that: knowledge.tsx + and guide.tsx each hold their own internal function components — do NOT + create a components/knowledge/ directory. + - The Knowledge page is READ-ONLY. Copy the source LIST markup but DROP the + "Index Document" dialog and the per-row delete AlertDialog — those are + management actions that stay in Admin. + - Reuse loading/error states: `` and + ``. + +- file: frontend/src/hooks/use-rag-sources.ts + why: Existing RAG hooks. `useRagSources()` (GET /rag/sources) is reused as-is. + ADD a new `useRetrieve()` mutation hook here for POST /rag/retrieve. + critical: | + useRagSources already returns SourceListResponse. The new useRetrieve wraps + `api('/rag/retrieve', { method: 'POST', body })`. It is a + useMutation (no cache invalidation needed — search is ephemeral). + +- file: frontend/src/lib/api.ts + why: The `api()` fetch wrapper + `ApiError` (carries the RFC 7807 + ProblemDetail) + `getErrorMessage()`. + critical: | + `api('/rag/retrieve', { method: 'POST', body: {...} })` JSON-encodes `body`. + On non-2xx it throws `ApiError` with `.status` and `.detail`. The Knowledge + search must catch this: `502` → "search unavailable, configure an embedding + provider"; other → `getErrorMessage(err)`. + +- file: app/features/rag/routes.py + why: The RAG endpoints the Knowledge page consumes. + critical: | + - GET /rag/sources → SourceListResponse (no embeddings needed — always works) + - POST /rag/retrieve → RetrieveResponse (needs an embedding provider; + returns 502 application/problem+json if embedding generation fails — see + routes.py:214-224). The page must degrade gracefully on 502. + +- file: app/features/rag/schemas.py + why: AUTHORITATIVE wire shapes. Mirror these field-for-field into types/api.ts. + critical: | + RetrieveRequest (model_config = ConfigDict(extra="forbid") — send NOTHING + extra): query:str(1..2000), top_k:int(1..50, default 5), + similarity_threshold:float|null(0..1, default from settings — OMIT to use + the server default), filters:dict|null. + ChunkResult: chunk_id, source_id, source_path, source_type, content, + relevance_score:float(0..1), metadata:dict|null. + RetrieveResponse: results:ChunkResult[], query_embedding_time_ms:float, + search_time_ms:float, total_chunks_searched:int. + SourceResponse (already typed as `RagSource` in types/api.ts:157): source_id, + source_type, source_path, chunk_count, content_hash, indexed_at, metadata. + +- file: frontend/src/types/api.ts + why: TS type surface. `RagSource` + `SourceListResponse` (lines 157-171), + `AgentType` (line 199), `AIModelConfig`/`ProviderHealth` (lines 360-415) + already exist. ADD `RetrieveRequest`, `ChunkResult`, `RetrieveResponse` + near the `// === RAG ===` block (line 156). + critical: snake_case field names on the wire — match the Pydantic models exactly. + +- file: app/features/agents/agents/experiment.py + why: The experiment agent's EXACT tool names + behavior for the Guide page. + critical: | + Tools (use these EXACT names on the Guide page): tool_list_runs, + tool_get_run, tool_run_backtest, tool_compare_backtest_results, + tool_compare_runs, tool_create_alias (REQUIRES APPROVAL), + tool_archive_run (REQUIRES APPROVAL). The system prompt (lines 45-72) + describes the workflow — paraphrase it, do not invent capabilities. + +- file: app/features/agents/agents/rag_assistant.py + why: The RAG assistant's EXACT tool names + behavior for the Guide page. + critical: | + Tools: tool_retrieve_context, tool_format_citations, tool_check_evidence, + tool_list_sources. It answers ONLY from retrieved evidence, cites + source_path:chunk_id, and says "I don't have enough information" when the + knowledge base lacks coverage (system prompt lines 38-67). + +- file: app/features/agents/agents/base.py + why: Shared agent behavior + the approval helper for the Guide page. + critical: | + `requires_approval(name)` checks `settings.agent_require_approval`. + SYSTEM_PROMPT_HEADER / SAFETY_INSTRUCTIONS (lines 269-294) state the safety + contract — the Guide's "approval" section paraphrases SAFETY_INSTRUCTIONS. + +- file: app/core/config.py + why: The agent session limits to state on the Guide page (lines 147-172). + critical: | + Defaults to quote on the Guide (label them "default"): agent_max_tokens=4096, + agent_max_tool_calls=10, agent_timeout_seconds=120, agent_retry_attempts=3, + agent_require_approval=["create_alias","archive_run"], + agent_session_ttl_minutes=120, agent_default_model="anthropic:claude-sonnet-4-5". + The LIVE model is shown via /config/ai (useAIConfig) — the static numbers + above are config defaults; phrase them as "default" since an operator can + change them in Admin → AI Models. + +- file: frontend/src/hooks/use-config.ts + why: `useAIConfig()` (GET /config/ai) — the Guide page uses it to show the + currently-configured agent model AND the (now live) session limits. + critical: Reuse the hook as-is; do NOT add a config hook. The hook's response + type `AIModelConfig` in types/api.ts gains the five new agent-limit fields. + +- file: app/features/config/schemas.py + why: `AIModelConfig` (GET /config/ai response, lines 65-83). Extend it with + read-only agent-limit fields so the Guide renders limits live. + critical: | + ADD to AIModelConfig (NOT to AIModelConfigUpdate — these stay read-only, + not operator-settable here): agent_max_tool_calls:int, + agent_timeout_seconds:int, agent_retry_attempts:int, + agent_session_ttl_minutes:int, agent_require_approval:list[str]. + agent_max_tokens is ALREADY present — do not re-add it. + +- file: app/features/config/service.py + why: `get_effective_config` (line 129) builds AIModelConfig from the Settings + singleton. Populate the five new fields from `settings.*`. + critical: The new fields are sourced from Settings exactly like the existing + agent_* fields (app/core/config.py lines 147-172) — pure read, no DB, no + migration. Mirror the existing `agent_max_tokens=settings.agent_max_tokens` + line. + +- file: app/features/config/tests/ + why: test_schemas.py / test_service.py / test_routes.py — extend each so the + five new fields are covered (construction, service mapping from Settings, + and the GET /config/ai route response). Required by test-requirements.md. + +- file: frontend/src/pages/chat.tsx + why: The actual chat flow the Guide page describes — keep the Guide accurate + to it: pick agent in a Select → "Start Session" → type → stream → approval + prompt → "New Session". + critical: | + Client → server WS frame is `{ session_id, message }`. Server → client + events: text_delta, tool_call_start, tool_call_end, approval_required, + complete, error (see types/api.ts:185-197 AgentEventType). Describe these + accurately; do not invent event names. + +- file: docs/_base/API_CONTRACTS.md + why: Cross-check the /rag and /agents endpoint contracts + WS event list. + critical: The "WebSocket Events (/agents/stream)" section is the source of + truth for the Guide's streaming description. + +- file: frontend/src/hooks/use-demo-pipeline.test.ts + why: The vitest pattern — test PURE exported helpers (applyEvent, + createInitialSteps), not the React component. `knowledge-utils.test.ts` + mirrors this. + +- file: frontend/src/lib/date-utils.ts & frontend/src/lib/status-utils.ts + why: Precedent for a `lib/*.ts` pure-helper module. `knowledge-utils.ts` joins + them — pure functions, no React, easy to unit-test. + +- file: frontend/src/hooks/use-runs.ts & frontend/src/hooks/use-seeder.ts + why: The Live System State section reuses these. use-seeder.ts exports + `useSeederStatus()` (GET /seeder/status → SeederStatus). use-runs.ts exports + the runs + aliases hooks used by admin.tsx (`useAliases`) and + explorer/runs.tsx. + critical: VERIFY the exact export names in use-runs.ts before wiring — reuse + whatever it exports for runs (paginated) + aliases; do not add new hooks. + +- file: .claude/rules/ui-design.md + why: UI built/dogfooded via frontend-design + shadcn-ui + webapp-testing. +- file: .claude/rules/output-formatting.md + why: If the Guide uses status glyphs, reuse the ✅/⚠️/⏭️ vocabulary. +- file: .claude/rules/test-requirements.md + why: New TS component owning non-trivial state SHOULD have a vitest — satisfied + by extracting pure helpers into knowledge-utils.ts and testing them. +- file: .claude/rules/commit-format.md + why: `type(scope): description (#issue)`; scope `ui` for frontend/**, `docs` + for README/docs. Open the tracking issue FIRST. +- file: .claude/rules/branch-naming.md + why: `/` off dev → `feat/knowledge-and-guide-pages`. +``` + +### Current Codebase tree (relevant) +```bash +frontend/src/ +├── App.tsx # MOD — add /knowledge + /guide lazy routes +├── lib/ +│ ├── api.ts # reuse api() + ApiError + getErrorMessage +│ ├── constants.ts # MOD — ROUTES + NAV_ITEMS +│ ├── date-utils.ts # precedent: pure lib helper module +│ ├── status-utils.ts # precedent: pure lib helper module +│ └── knowledge-utils.ts # NEW — pure helpers for the Knowledge page +├── types/api.ts # MOD — +RetrieveRequest, ChunkResult, RetrieveResponse +├── hooks/ +│ ├── use-rag-sources.ts # MOD — +useRetrieve mutation +│ ├── use-seeder.ts # reuse useSeederStatus +│ ├── use-runs.ts # reuse runs + aliases hooks +│ └── use-config.ts # reuse useAIConfig +├── pages/ +│ ├── admin.tsx # reference (RagSourcesPanel, StatCard) — UNCHANGED +│ ├── chat.tsx # reference for the Guide's accuracy — UNCHANGED +│ ├── showcase.tsx # reference page registration (PRP-17) +│ ├── knowledge.tsx # NEW — the Knowledge page +│ └── guide.tsx # NEW — the Agent Guide page +└── components/ + ├── ui/ # reuse Card, Badge, Input, Button, Tabs, Separator + └── common/ # reuse LoadingState, ErrorDisplay +``` + +### Desired Codebase tree (files added / changed) +```bash +NEW frontend/src/pages/knowledge.tsx # Knowledge page (KB + live state) +NEW frontend/src/pages/guide.tsx # Agent Guide page +NEW frontend/src/lib/knowledge-utils.ts # pure helpers (testable, no React) +NEW frontend/src/lib/knowledge-utils.test.ts # vitest — pure-helper coverage +MOD frontend/src/types/api.ts # +RetrieveRequest/ChunkResult/RetrieveResponse; +5 AIModelConfig fields +MOD frontend/src/hooks/use-rag-sources.ts # +useRetrieve mutation hook +MOD frontend/src/lib/constants.ts # +KNOWLEDGE/GUIDE routes, +2 NAV_ITEMS +MOD frontend/src/App.tsx # +2 lazy imports, +2 s +MOD frontend/src/pages/chat.tsx # + help link to /guide +MOD app/features/config/schemas.py # +5 read-only agent-limit fields on AIModelConfig +MOD app/features/config/service.py # populate the 5 fields in get_effective_config +MOD app/features/config/tests/test_schemas.py # cover the new fields +MOD app/features/config/tests/test_service.py # cover get_effective_config mapping +MOD app/features/config/tests/test_routes.py # cover GET /config/ai response +MOD README.md # mention the two new pages in the feature list +MOD docs/_base/REPO_MAP_INDEX.md # +rows for knowledge.tsx + guide.tsx +KEEP frontend/src/pages/admin.tsx # UNCHANGED — management stays here +KEEP all other app/** (backend) # UNCHANGED — only the config slice changes +``` + +### Known Gotchas & Library Quirks +```typescript +// CRITICAL: FRONTEND-LED PRP with ONE additive backend change — the config +// slice only (schemas.py + service.py + tests). No new slice, no Alembic +// migration, no .env var. Because .py files DO change, the repo-wide +// ruff/mypy/pyright/pytest gates genuinely apply — run them (see Validation +// Level 4), do not assume they pass trivially. The three pnpm gates still +// gate the frontend half. + +// CRITICAL: /rag/retrieve needs an embedding provider (OpenAI key or Ollama). +// With none configured it returns 502 application/problem+json. The Knowledge +// page MUST degrade gracefully: the source LIST (GET /rag/sources) needs NO +// embeddings and always works; only the SEARCH box can 502 — catch ApiError, +// show "Semantic search unavailable — configure an embedding provider in +// Admin → AI Models", keep the rest of the page functional. + +// CRITICAL: RetrieveRequest is ConfigDict(extra="forbid"). Send ONLY +// { query, top_k } (+ optional similarity_threshold/filters). Any stray field +// → 422. OMIT similarity_threshold entirely to use the server-side default. + +// CRITICAL: search is a useMutation, NOT a useQuery. The query string is +// user-typed and submitted on click/Enter — it is an imperative action with +// ephemeral results, exactly the useMutation shape. (useQuery would re-fire +// on every keystroke / refetch.) + +// CRITICAL: the Knowledge page is READ-ONLY. Do NOT add index/delete actions — +// they already live in Admin → RAG Sources (admin.tsx RagSourcesPanel). The +// Knowledge page COPIES the source-row display markup but DROPS the dialog +// and the delete AlertDialog. Duplicating management UI is the anti-pattern. + +// CRITICAL: the Guide page must use the EXACT agent tool names from the agent +// definitions (experiment.py / rag_assistant.py). Do not paraphrase tool +// names. A user copying "tool_run_backtest" into chat must match reality. + +// GOTCHA: agent limit numbers (4096 tokens, 10 tool calls, 120s, TTL 120 min) +// are config DEFAULTS — an operator can change them. Label them "default" on +// the Guide. The LIVE agent model comes from /config/ai (useAIConfig); render +// that dynamically, not a hardcoded model string. + +// GOTCHA: empty knowledge base — a fresh DB has zero RAG sources. The Knowledge +// Base section must show a friendly empty state ("No documents indexed yet — +// add some in Admin → RAG Sources, or run the RAG seeder scenario"), not a +// blank card and not a crash. + +// GOTCHA: NAV_ITEMS is declared `as const`. Adding two flat entries is fine; +// keep the object shape `{ label, href }` identical to the existing flat +// items (Dashboard/Showcase/Chat/Admin) so top-nav.tsx's `'items' in item` +// discriminator still works. + +// GOTCHA: react-router lazy route — the page file MUST `export default` the +// component (App.tsx does `lazy(() => import('@/pages/knowledge'))`). Named +// helper exports from the SAME file are allowed, but the Knowledge page's +// pure helpers live in lib/knowledge-utils.ts so they are import-cheap to +// unit-test (mirrors use-demo-pipeline.ts exporting applyEvent et al.). + +// GOTCHA: new frontend files use LF line endings (the repo's CRLF note in +// memory applies to .py files only). Match the surrounding .tsx files — they +// are LF. eslint.config.js + tsc are the enforcers. + +// GOTCHA: every commit needs an open issue (commit-format.md). Open the +// tracking issue BEFORE the first commit. No AI co-author trailer, ever. +``` + +### Known Tradeoffs (decided — do not re-litigate) +```yaml +interpretation: + decision: "ForecastLab's current knowledge" = the RAG knowledge base (what the + rag_assistant answers from) PLUS the live system state (what the experiment + agent acts on: seeded data, runs, aliases). The Knowledge page shows both. + why: The agentic layer has two agents with two distinct knowledge surfaces. + Showing only the RAG corpus would under-represent "what the system knows" + and would also thinly duplicate Admin's RAG tab. Showing both makes the page + a genuine "knowledge dashboard" and a true counterpart to the Agent Guide. + status: confirmed — Resolved Decision 1 keeps both the RAG corpus and the + Live System State section; not scoped down to RAG-only. +minimal-backend: + decision: no NEW backend slice and no /knowledge or /guide API. The only + server-side change is additive: read-only agent-limit fields on the existing + AIModelConfig (GET /config/ai) response. + why: Every page datum except the live session limits is already served + (/rag/sources, /rag/retrieve, /seeder/status, /registry/runs, + /registry/aliases, /config/ai). The maintainer chose live limits over static + text (Resolved Decision 3), and /config/ai is the natural, already-existing + home for them — extending it beats a new endpoint. +guide-content-plus-live-config: + decision: the Guide page is hand-authored content + live /config/ai data (the + configured model AND the session limits). + why: It is documentation; the prose (agents, tools, approval flow, example + prompts) is stable. The two things that legitimately drift — the model and + the limits — are both fetched live from /config/ai. +search-is-mutation: + decision: semantic search uses useMutation, not useQuery. + why: it is a user-initiated imperative action with throwaway results. +``` + +--- + +## Implementation Blueprint + +### Data models / types (`frontend/src/types/api.ts`, add near line 156 `// === RAG ===`) +```typescript +// Append to the existing RAG block — mirror app/features/rag/schemas.py exactly. + +export interface RetrieveRequest { + query: string + top_k?: number // 1..50, server default 5 + similarity_threshold?: number // 0..1 — OMIT to use the server default + filters?: Record | null +} + +export interface ChunkResult { + chunk_id: string + source_id: string + source_path: string + source_type: string + content: string + relevance_score: number // 0..1 + metadata: Record | null +} + +export interface RetrieveResponse { + results: ChunkResult[] + query_embedding_time_ms: number + search_time_ms: number + total_chunks_searched: number +} +``` + +### Backend change (`app/features/config/schemas.py` + `service.py`) +```python +# schemas.py — append to AIModelConfig (the GET /config/ai response model), +# NOT to AIModelConfigUpdate (these are read-only, not operator-settable here): +agent_max_tool_calls: int = Field(description="Per-session tool-call cap") +agent_timeout_seconds: int = Field(description="Per-run agent timeout (seconds)") +agent_retry_attempts: int = Field(description="Agent retry attempts on failure") +agent_session_ttl_minutes: int = Field(description="Session time-to-live (minutes)") +agent_require_approval: list[str] = Field( + description="Tool names gated by human-in-the-loop approval" +) +# agent_max_tokens is ALREADY on AIModelConfig — do not re-add it. + +# service.py — get_effective_config(): populate each from the Settings singleton, +# mirroring the existing `agent_max_tokens=settings.agent_max_tokens` line. +``` + +### Frontend type extension (`frontend/src/types/api.ts`, existing `AIModelConfig`) +```typescript +// EXTEND the existing AIModelConfig interface (~line 360) with the five fields +// the backend now returns — snake_case, matching the Pydantic model: +// agent_max_tool_calls: number +// agent_timeout_seconds: number +// agent_retry_attempts: number +// agent_session_ttl_minutes: number +// agent_require_approval: string[] +// agent_max_tokens already exists on AIModelConfig — do not duplicate it. +``` + +### Hook (`frontend/src/hooks/use-rag-sources.ts`, append) +```typescript +// Pseudocode — mirror the existing useIndexDocument mutation shape. +import type { RetrieveRequest, RetrieveResponse } from '@/types/api' + +export function useRetrieve() { + return useMutation({ + mutationFn: (body: RetrieveRequest) => + api('/rag/retrieve', { method: 'POST', body }), + // no onSuccess cache invalidation — search results are ephemeral + }) +} +``` + +### Pure helpers (`frontend/src/lib/knowledge-utils.ts`) +```typescript +// Pure, React-free, unit-testable. Exact helper set is implementer's choice; +// at minimum provide these two so knowledge-utils.test.ts has real coverage: + +import type { RagSource, ChunkResult } from '@/types/api' + +/** Relevance score (0..1) → a display percentage string, e.g. 0.873 -> "87%". */ +export function formatRelevance(score: number): string { /* clamp 0..1, round */ } + +/** Group indexed sources by source_type for the "by type" summary. */ +export function groupSourcesByType(sources: RagSource[]): Record { /* ... */ } + +/** Optional: short, single-line excerpt of a chunk for the result card. */ +export function chunkExcerpt(chunk: ChunkResult, maxChars?: number): string { /* ... */ } +``` + +### Knowledge page (`frontend/src/pages/knowledge.tsx`) +```text +export default function KnowledgePage() +Layout (build with frontend-design + shadcn-ui; mirror admin.tsx structure): + +- Header:

Knowledge

+ one sentence: "Everything ForecastLabAI can + currently draw on — the RAG knowledge base its assistant answers from, and the + live data its experiment agent acts on." + +- SECTION 1 — Knowledge Base (Card): + * useRagSources() → SourceListResponse. + * CardDescription: "{total_sources} sources • {total_chunks} chunks". + * Source list: read-only rows (path, {source_type}, + "{chunk_count} chunks", "Indexed {date}"). COPY the row markup from + admin.tsx RagSourcesPanel lines 209-243 MINUS the delete AlertDialog. + * Empty state when sources.length === 0 → friendly message + link to + ROUTES.ADMIN ("Index documents in Admin → RAG Sources"). + * isLoading → ; error → . + +- SECTION 2 — Semantic Search (Card, inside or below Section 1): + * Controlled for the query + a "Search" +

Compare runs

+

+ Pick two model runs to compare their configuration and metrics side by side. +

+ + + + + Select runs + + The comparison is deep-linkable — the URL carries the two run ids. + + + + selectRun('a', id)} /> + selectRun('b', id)} /> + + + + {(!a || !b) && ( + + + Select two runs above to see the comparison. + + + )} + + {a && b && compareQuery.error && ( + void compareQuery.refetch()} /> + )} + + {a && b && compareQuery.isLoading && } + + {a && b && comparison && ( + <> + + + Profile + Side-by-side registry records. + + + + + + Field + Run A + Run B + + + + + Run ID + + {comparison.run_a.run_id} + + + {comparison.run_b.run_id} + + + + Model type + {comparison.run_a.model_type} + {comparison.run_b.model_type} + + + Status + + + {comparison.run_a.status} + + + + + {comparison.run_b.status} + + + + + Data window + + {comparison.run_a.data_window_start} → {comparison.run_a.data_window_end} + + + {comparison.run_b.data_window_start} → {comparison.run_b.data_window_end} + + + + Config hash + + {comparison.run_a.config_hash} + + + {comparison.run_b.config_hash} + + + + Created + {fmtDate(comparison.run_a.created_at)} + {fmtDate(comparison.run_b.created_at)} + + +
+
+
+ + + + Config diff + Keys whose values differ between the two runs. + + + {Object.keys(comparison.config_diff).length === 0 ? ( +

+ The two runs share an identical configuration. +

+ ) : ( + + )} +
+
+ + + + Metrics diff + + Δ is Run B minus Run A — sign only, not a quality judgement. + + + + {Object.keys(comparison.metrics_diff).length === 0 ? ( +

No metrics to compare.

+ ) : ( + + + + Metric + Run A + Run B + Δ + + + + {Object.entries(comparison.metrics_diff).map(([metric, m]) => ( + + {metric} + {m.a != null ? formatNumber(m.a, 4) : '—'} + {m.b != null ? formatNumber(m.b, 4) : '—'} + + + + + ))} + +
+ )} +
+
+ + )} + + ) +} diff --git a/frontend/src/pages/explorer/run-detail.tsx b/frontend/src/pages/explorer/run-detail.tsx new file mode 100644 index 00000000..1a41ca5e --- /dev/null +++ b/frontend/src/pages/explorer/run-detail.tsx @@ -0,0 +1,272 @@ +import { useState } from 'react' +import { Link, useParams } from 'react-router-dom' +import { format } from 'date-fns' +import { + AlertTriangle, + ArrowLeft, + CheckCircle2, + GitCompare, + Loader2, + ShieldCheck, +} from 'lucide-react' +import { useRun, useVerifyArtifact } from '@/hooks/use-runs' +import { JsonBlock } from '@/components/common/json-block' +import { ErrorDisplay } from '@/components/common/error-display' +import { LoadingState } from '@/components/common/loading-state' +import { StatusBadge } from '@/components/common/status-badge' +import { getStatusVariant } from '@/lib/status-utils' +import { Button } from '@/components/ui/button' +import { Card, CardContent, CardDescription, CardHeader, CardTitle } from '@/components/ui/card' +import { formatNumber, getErrorMessage } from '@/lib/api' +import { ROUTES } from '@/lib/constants' + +function fmtDate(value: string | null | undefined): string { + return value ? format(new Date(value), 'MMM d, yyyy HH:mm') : '—' +} + +function Field({ label, value, mono = false }: { label: string; value: string; mono?: boolean }) { + return ( +
+
{label}
+
{value}
+
+ ) +} + +export default function RunDetailPage() { + const { runId } = useParams() + const runQuery = useRun(runId ?? '', !!runId) + + // The verify GET is button-gated: disabled until the first click, then refetch. + const [verifyOn, setVerifyOn] = useState(false) + const verifyQuery = useVerifyArtifact(runId ?? '', verifyOn) + + if (!runId) { + return ( +
+

Run Detail

+ +
+ ) + } + + if (runQuery.error) { + return ( +
+

Run Detail

+ void runQuery.refetch()} /> +
+ ) + } + + if (runQuery.isLoading || !runQuery.data) { + return + } + + const run = runQuery.data + + function handleVerify() { + if (!verifyOn) setVerifyOn(true) + else void verifyQuery.refetch() + } + + return ( +
+
+
+ +
+

{run.run_id}

+ {run.status} +
+

{run.model_type}

+
+ +
+ + + + Run profile + Registry record for this model run. + + +
+ +
+
Store
+
+ + #{run.store_id} + +
+
+
+
Product
+
+ + #{run.product_id} + +
+
+ + + + + + +
+
+
+ + {run.status === 'failed' && run.error_message && ( + + + Error + + +

{run.error_message}

+
+
+ )} + + + + Metrics + Evaluation metrics recorded for this run. + + + + + + +
+ + + Model config + + + + + + + + Feature config + + + + + +
+ + + + Runtime info + Environment captured at training time. + + + + + + + {run.agent_context && ( + + + Agent context + The agent session that created this run. + + + + + + )} + + + + Artifact + Stored model artifact and SHA-256 integrity check. + + +
+ + + +
+ +
+ + {!run.artifact_uri && ( + This run has no artifact. + )} +
+ + {verifyOn && !verifyQuery.isFetching && verifyQuery.error && ( +
+ + {getErrorMessage(verifyQuery.error)} +
+ )} + + {verifyOn && + !verifyQuery.isFetching && + verifyQuery.data && + (verifyQuery.data.verified ? ( +
+ + + Artifact verified — the stored checksum matches. + {verifyQuery.data.computed_hash && ( + + {verifyQuery.data.computed_hash} + + )} + +
+ ) : ( +
+ + + Integrity check failed — the artifact does not match its stored hash. + {verifyQuery.data.error && ( + {verifyQuery.data.error} + )} + +
+ ))} +
+
+
+ ) +} diff --git a/frontend/src/pages/explorer/runs.tsx b/frontend/src/pages/explorer/runs.tsx index 31b1c436..c6fb43fc 100644 --- a/frontend/src/pages/explorer/runs.tsx +++ b/frontend/src/pages/explorer/runs.tsx @@ -1,26 +1,32 @@ -import { useState } from 'react' +import { Link, useNavigate, useSearchParams } from 'react-router-dom' import { format } from 'date-fns' -import { ColumnDef, PaginationState } from '@tanstack/react-table' +import { ColumnDef, OnChangeFn, PaginationState, SortingState } from '@tanstack/react-table' +import { Download, GitCompare } from 'lucide-react' import { useRuns } from '@/hooks/use-runs' import { DataTable } from '@/components/data-table/data-table' import { DataTableToolbar } from '@/components/data-table/data-table-toolbar' +import { DataTableColumnHeader } from '@/components/data-table/data-table-column-header' import { StatusBadge } from '@/components/common/status-badge' import { getStatusVariant } from '@/lib/status-utils' import { ErrorDisplay } from '@/components/common/error-display' +import { Button } from '@/components/ui/button' +import { toCsv, downloadCsv, type CsvColumn } from '@/lib/csv-export' import type { ModelRun, RunStatus } from '@/types/api' -import { DEFAULT_PAGE_SIZE } from '@/lib/constants' +import { DEFAULT_PAGE_SIZE, ROUTES } from '@/lib/constants' const columns: ColumnDef[] = [ { accessorKey: 'run_id', header: 'Run ID', + enableSorting: false, + enableHiding: false, cell: ({ row }) => ( {row.original.run_id.substring(0, 8)}... ), }, { accessorKey: 'status', - header: 'Status', + header: ({ column }) => , cell: ({ row }) => ( {row.original.status} @@ -29,20 +35,21 @@ const columns: ColumnDef[] = [ }, { accessorKey: 'model_type', - header: 'Model Type', + header: ({ column }) => , cell: ({ row }) => {row.original.model_type}, }, { accessorKey: 'store_id', - header: 'Store', + header: ({ column }) => , }, { accessorKey: 'product_id', - header: 'Product', + header: ({ column }) => , }, { accessorKey: 'data_window_start', header: 'Data Window', + enableSorting: false, cell: ({ row }) => ( {format(new Date(row.original.data_window_start), 'MMM d')} -{' '} @@ -53,6 +60,7 @@ const columns: ColumnDef[] = [ { accessorKey: 'metrics', header: 'MAE', + enableSorting: false, cell: ({ row }) => { const mae = row.original.metrics?.mae return mae !== undefined ? mae.toFixed(2) : '-' @@ -60,42 +68,95 @@ const columns: ColumnDef[] = [ }, { accessorKey: 'created_at', - header: 'Created', + header: ({ column }) => , cell: ({ row }) => format(new Date(row.original.created_at), 'MMM d, HH:mm'), }, ] +const csvColumns: CsvColumn[] = [ + { key: 'run_id', header: 'Run ID' }, + { key: 'status', header: 'Status' }, + { key: 'model_type', header: 'Model Type' }, + { key: 'store_id', header: 'Store' }, + { key: 'product_id', header: 'Product' }, + { key: 'data_window_start', header: 'Data Window Start' }, + { key: 'data_window_end', header: 'Data Window End' }, + { key: 'created_at', header: 'Created' }, +] + export default function RunsExplorerPage() { - const [pagination, setPagination] = useState({ - pageIndex: 0, + const navigate = useNavigate() + const [searchParams, setSearchParams] = useSearchParams() + + // URL query string is the single source of truth for filter/sort/page state, + // so a pasted URL reproduces the exact view. + const modelType = searchParams.get('model_type') ?? undefined + const status = searchParams.get('status') ?? undefined + const page = Number(searchParams.get('page')) || 1 + const sortBy = searchParams.get('sort_by') ?? undefined + const sortOrder: 'asc' | 'desc' = searchParams.get('sort_order') === 'desc' ? 'desc' : 'asc' + + const pagination: PaginationState = { + pageIndex: page - 1, pageSize: DEFAULT_PAGE_SIZE, - }) - const [filters, setFilters] = useState>({}) + } + const sorting: SortingState = sortBy ? [{ id: sortBy, desc: sortOrder === 'desc' }] : [] const { data, isLoading, error, refetch } = useRuns({ - page: pagination.pageIndex + 1, + page, pageSize: pagination.pageSize, - modelType: filters.modelType, - status: filters.status as RunStatus | undefined, + modelType, + status: status as RunStatus | undefined, + sortBy, + sortOrder: sortBy ? sortOrder : undefined, }) + function updateParams(updates: Record) { + setSearchParams((prev) => { + const next = new URLSearchParams(prev) + for (const [key, value] of Object.entries(updates)) { + if (value === undefined || value === '') next.delete(key) + else next.set(key, value) + } + return next + }) + } + + const handlePaginationChange: OnChangeFn = (updater) => { + const next = typeof updater === 'function' ? updater(pagination) : updater + updateParams({ page: String(next.pageIndex + 1) }) + } + + const handleSortingChange: OnChangeFn = (updater) => { + const next = typeof updater === 'function' ? updater(sorting) : updater + const first = next[0] + updateParams({ + sort_by: first?.id, + sort_order: first ? (first.desc ? 'desc' : 'asc') : undefined, + page: '1', + }) + } + const handleFilterChange = (key: string, value: string | undefined) => { - setFilters((prev) => ({ ...prev, [key]: value })) - setPagination((prev) => ({ ...prev, pageIndex: 0 })) + const paramKey = key === 'modelType' ? 'model_type' : key + updateParams({ [paramKey]: value, page: '1' }) } const handleReset = () => { - setFilters({}) - setPagination({ pageIndex: 0, pageSize: DEFAULT_PAGE_SIZE }) + setSearchParams(new URLSearchParams()) } - const hasActiveFilters = Object.values(filters).some(Boolean) + const handleExport = () => { + downloadCsv('model-runs.csv', toCsv(data?.runs ?? [], csvColumns)) + } + + const hasActiveFilters = !!modelType || !!status || !!sortBy if (error) { return (

Model Runs

- + void refetch()} />
) } @@ -104,44 +165,62 @@ export default function RunsExplorerPage() { return (
-

Model Runs

- - +
+

Model Runs

+ +
+ +
+ + +
navigate(`/explorer/runs/${run.run_id}`)} + enableColumnVisibility isLoading={isLoading} emptyMessage="No model runs found." /> diff --git a/frontend/src/pages/explorer/sales.tsx b/frontend/src/pages/explorer/sales.tsx index b2a7b9b9..ba32036b 100644 --- a/frontend/src/pages/explorer/sales.tsx +++ b/frontend/src/pages/explorer/sales.tsx @@ -1,54 +1,169 @@ import { useState } from 'react' +import { useSearchParams } from 'react-router-dom' import { format, subDays } from 'date-fns' +import { X } from 'lucide-react' import { DateRange } from 'react-day-picker' import { useDrilldowns } from '@/hooks/use-drilldowns' +import { useTimeseries } from '@/hooks/use-timeseries' import { DateRangePicker } from '@/components/common/date-range-picker' -import { dateRangeToStrings } from '@/lib/date-utils' import { ErrorDisplay } from '@/components/common/error-display' import { LoadingState } from '@/components/common/loading-state' +import { RevenueBarChart } from '@/components/charts/revenue-bar-chart' +import { TimeSeriesChart } from '@/components/charts/time-series-chart' +import { Badge } from '@/components/ui/badge' import { Card, CardContent, CardDescription, CardHeader, CardTitle } from '@/components/ui/card' import { Tabs, TabsContent, TabsList, TabsTrigger } from '@/components/ui/tabs' +import { dateRangeToStrings, stringsToDateRange } from '@/lib/date-utils' import { formatCurrency, formatNumber } from '@/lib/api' import type { DrilldownDimension } from '@/types/api' export default function SalesExplorerPage() { - const [dateRange, setDateRange] = useState({ - from: subDays(new Date(), 30), - to: new Date(), - }) - const [dimension, setDimension] = useState('store') + const [searchParams, setSearchParams] = useSearchParams() + + // dimension + cross-filter state live in the URL so the view is shareable. + const dimension = (searchParams.get('dimension') as DrilldownDimension | null) ?? 'store' + const storeIdParam = searchParams.get('store_id') + const productIdParam = searchParams.get('product_id') + const storeId = storeIdParam ? Number(storeIdParam) : undefined + const productId = productIdParam ? Number(productIdParam) : undefined + + const startParam = searchParams.get('start_date') + const endParam = searchParams.get('end_date') + const [dateRange, setDateRange] = useState(() => + startParam + ? stringsToDateRange(startParam, endParam ?? undefined) + : { from: subDays(new Date(), 30), to: new Date() } + ) const { startDate, endDate } = dateRangeToStrings(dateRange) + const rangeReady = !!startDate && !!endDate - const { data, isLoading, error, refetch } = useDrilldowns({ + function updateParams(updates: Record) { + setSearchParams((prev) => { + const next = new URLSearchParams(prev) + for (const [key, value] of Object.entries(updates)) { + if (value === undefined || value === '') next.delete(key) + else next.set(key, value) + } + return next + }) + } + + const handleDateChange = (range: DateRange | undefined) => { + setDateRange(range) + const { startDate: nextStart, endDate: nextEnd } = dateRangeToStrings(range) + updateParams({ start_date: nextStart, end_date: nextEnd }) + } + + const drilldown = useDrilldowns({ dimension, startDate: startDate ?? '', endDate: endDate ?? '', + storeId, + productId, maxItems: 20, - enabled: !!startDate && !!endDate, + enabled: rangeReady, + }) + + const timeseries = useTimeseries({ + startDate: startDate ?? '', + endDate: endDate ?? '', + granularity: 'day', + storeId, + productId, + enabled: rangeReady, }) - if (error) { + if (drilldown.error) { return (

Sales Explorer

- + void drilldown.refetch()} />
) } + const items = drilldown.data?.items ?? [] + const points = timeseries.data?.points ?? [] + const dimensionLabel = dimension.charAt(0).toUpperCase() + dimension.slice(1) + return (

Sales Explorer

- + +
+ + {(storeId !== undefined || productId !== undefined) && ( +
+ {storeId !== undefined && ( + + Filtered to store #{storeId} + + + )} + {productId !== undefined && ( + + Filtered to product #{productId} + + + )} +
+ )} + +
+ {items.length > 0 ? ( + ({ + label: item.dimension_value, + revenue: Number(item.metrics.total_revenue), + }))} + /> + ) : ( + + + Revenue by {dimensionLabel} + No sales data for the selected period. + + + )} + {points.length > 0 ? ( + ({ + date: p.period, + actual: Number(p.metrics.total_revenue), + }))} + showPredicted={false} + /> + ) : ( + + + Revenue over time + No sales data for the selected period. + + + )}
- setDimension(v as DrilldownDimension)}> + updateParams({ dimension: v })}> By Store By Product @@ -58,22 +173,22 @@ export default function SalesExplorerPage() { - {isLoading ? ( + {drilldown.isLoading ? ( ) : ( - Sales by {dimension.charAt(0).toUpperCase() + dimension.slice(1)} + Sales by {dimensionLabel} - {data?.total_items ?? 0} items found for{' '} + {drilldown.data?.total_items ?? 0} items found for{' '} {startDate && format(new Date(startDate), 'MMM d, yyyy')} -{' '} {endDate && format(new Date(endDate), 'MMM d, yyyy')} - {data?.items.length ? ( + {items.length ? (
- {data.items.map((item, idx) => ( + {items.map((item, idx) => (
0 + + const [dateRange, setDateRange] = useState({ + from: subDays(new Date(), 30), + to: new Date(), + }) + const { startDate, endDate } = dateRangeToStrings(dateRange) + const rangeReady = !!startDate && !!endDate + + const storeQuery = useStore(id, validId) + const kpiQuery = useKPIs({ + startDate: startDate ?? '', + endDate: endDate ?? '', + storeId: id, + enabled: validId && rangeReady, + }) + const timeseriesQuery = useTimeseries({ + startDate: startDate ?? '', + endDate: endDate ?? '', + granularity: 'day', + storeId: id, + enabled: validId && rangeReady, + }) + const topProductsQuery = useDrilldowns({ + dimension: 'product', + startDate: startDate ?? '', + endDate: endDate ?? '', + storeId: id, + maxItems: 10, + enabled: validId && rangeReady, + }) + + if (!validId) { + return ( +
+

Store Detail

+ +
+ ) + } + + if (storeQuery.error) { + return ( +
+

Store Detail

+ void storeQuery.refetch()} /> +
+ ) + } + + const store = storeQuery.data + const metrics = kpiQuery.data?.metrics + const points = timeseriesQuery.data?.points ?? [] + const topProducts = topProductsQuery.data?.items ?? [] + + return ( +
+
+
+ +

{store?.name ?? 'Store'}

+ {store && ( +

+ {store.code} + {store.region ? ` · ${store.region}` : ''} +

+ )} +
+
+ + +
+
+ + + + Store profile + Dimension record for this store. + + +
+
+
Code
+
{store?.code ?? '-'}
+
+
+
Region
+
{store?.region ?? '-'}
+
+
+
City
+
{store?.city ?? '-'}
+
+
+
Type
+
{store?.store_type ?? '-'}
+
+
+
+
+ +
+ + + + +
+ + {points.length > 0 ? ( + ({ + date: p.period, + actual: Number(p.metrics.total_revenue), + }))} + showPredicted={false} + /> + ) : ( + + + Revenue over time + No sales in the selected period. + + + )} + + {topProducts.length > 0 ? ( + ({ + label: item.dimension_value, + revenue: Number(item.metrics.total_revenue), + }))} + /> + ) : ( + + + Top products + No product sales in the selected period. + + + )} +
+ ) +} diff --git a/frontend/src/pages/explorer/stores.tsx b/frontend/src/pages/explorer/stores.tsx index 36f19d35..99914107 100644 --- a/frontend/src/pages/explorer/stores.tsx +++ b/frontend/src/pages/explorer/stores.tsx @@ -1,9 +1,13 @@ -import { useState } from 'react' -import { ColumnDef, PaginationState } from '@tanstack/react-table' +import { useNavigate, useSearchParams } from 'react-router-dom' +import { ColumnDef, OnChangeFn, PaginationState, SortingState } from '@tanstack/react-table' +import { Download } from 'lucide-react' import { useStores } from '@/hooks/use-stores' import { DataTable } from '@/components/data-table/data-table' import { DataTableToolbar } from '@/components/data-table/data-table-toolbar' +import { DataTableColumnHeader } from '@/components/data-table/data-table-column-header' import { ErrorDisplay } from '@/components/common/error-display' +import { Button } from '@/components/ui/button' +import { toCsv, downloadCsv, type CsvColumn } from '@/lib/csv-export' import type { Store } from '@/types/api' import { DEFAULT_PAGE_SIZE } from '@/lib/constants' @@ -11,74 +15,124 @@ const columns: ColumnDef[] = [ { accessorKey: 'id', header: 'ID', + enableSorting: false, + enableHiding: false, cell: ({ row }) => {row.original.id}, }, { accessorKey: 'code', - header: 'Code', + header: ({ column }) => , cell: ({ row }) => {row.original.code}, }, { accessorKey: 'name', - header: 'Name', + header: ({ column }) => , }, { accessorKey: 'region', - header: 'Region', + header: ({ column }) => , cell: ({ row }) => row.original.region ?? '-', }, { accessorKey: 'city', - header: 'City', + header: ({ column }) => , cell: ({ row }) => row.original.city ?? '-', }, { accessorKey: 'store_type', - header: 'Type', + header: ({ column }) => , cell: ({ row }) => row.original.store_type ?? '-', }, ] +const csvColumns: CsvColumn[] = [ + { key: 'id', header: 'ID' }, + { key: 'code', header: 'Code' }, + { key: 'name', header: 'Name' }, + { key: 'region', header: 'Region' }, + { key: 'city', header: 'City' }, + { key: 'store_type', header: 'Type' }, +] + export default function StoresExplorerPage() { - const [pagination, setPagination] = useState({ - pageIndex: 0, + const navigate = useNavigate() + const [searchParams, setSearchParams] = useSearchParams() + + // URL query string is the single source of truth for filter/sort/page state, + // so a pasted URL reproduces the exact view. + const search = searchParams.get('search') ?? '' + const region = searchParams.get('region') ?? undefined + const storeType = searchParams.get('store_type') ?? undefined + const page = Number(searchParams.get('page')) || 1 + const sortBy = searchParams.get('sort_by') ?? undefined + const sortOrder: 'asc' | 'desc' = searchParams.get('sort_order') === 'desc' ? 'desc' : 'asc' + + const pagination: PaginationState = { + pageIndex: page - 1, pageSize: DEFAULT_PAGE_SIZE, - }) - const [search, setSearch] = useState('') - const [filters, setFilters] = useState>({}) + } + const sorting: SortingState = sortBy ? [{ id: sortBy, desc: sortOrder === 'desc' }] : [] - // Convert 0-indexed pageIndex to 1-indexed page for API const { data, isLoading, error, refetch } = useStores({ - page: pagination.pageIndex + 1, + page, pageSize: pagination.pageSize, search: search.length >= 2 ? search : undefined, - region: filters.region, - storeType: filters.storeType, + region, + storeType, + sortBy, + sortOrder: sortBy ? sortOrder : undefined, }) - const handleFilterChange = (key: string, value: string | undefined) => { - setFilters((prev) => ({ ...prev, [key]: value })) - setPagination((prev) => ({ ...prev, pageIndex: 0 })) + function updateParams(updates: Record) { + setSearchParams((prev) => { + const next = new URLSearchParams(prev) + for (const [key, value] of Object.entries(updates)) { + if (value === undefined || value === '') next.delete(key) + else next.set(key, value) + } + return next + }) + } + + const handlePaginationChange: OnChangeFn = (updater) => { + const next = typeof updater === 'function' ? updater(pagination) : updater + updateParams({ page: String(next.pageIndex + 1) }) + } + + const handleSortingChange: OnChangeFn = (updater) => { + const next = typeof updater === 'function' ? updater(sorting) : updater + const first = next[0] + updateParams({ + sort_by: first?.id, + sort_order: first ? (first.desc ? 'desc' : 'asc') : undefined, + page: '1', + }) } const handleSearchChange = (value: string) => { - setSearch(value) - setPagination((prev) => ({ ...prev, pageIndex: 0 })) + updateParams({ search: value || undefined, page: '1' }) + } + + const handleFilterChange = (key: string, value: string | undefined) => { + const paramKey = key === 'storeType' ? 'store_type' : key + updateParams({ [paramKey]: value, page: '1' }) } const handleReset = () => { - setSearch('') - setFilters({}) - setPagination({ pageIndex: 0, pageSize: DEFAULT_PAGE_SIZE }) + setSearchParams(new URLSearchParams()) } - const hasActiveFilters = !!search || Object.values(filters).some(Boolean) + const handleExport = () => { + downloadCsv('stores.csv', toCsv(data?.stores ?? [], csvColumns)) + } + + const hasActiveFilters = !!search || !!region || !!storeType || !!sortBy if (error) { return (

Stores

- + void refetch()} />
) } @@ -89,43 +143,53 @@ export default function StoresExplorerPage() {

Stores

- +
+ + +
navigate(`/explorer/stores/${store.id}`)} + enableColumnVisibility isLoading={isLoading} emptyMessage="No stores found." /> diff --git a/frontend/src/pages/guide.tsx b/frontend/src/pages/guide.tsx new file mode 100644 index 00000000..787a24e8 --- /dev/null +++ b/frontend/src/pages/guide.tsx @@ -0,0 +1,362 @@ +import { Link } from 'react-router-dom' +import { + Bot, + Search, + FlaskConical, + ShieldCheck, + Workflow, + Gauge, + MessageSquare, + ArrowRight, + Settings, + AlertTriangle, +} from 'lucide-react' +import { useAIConfig } from '@/hooks/use-config' +import { Button } from '@/components/ui/button' +import { Card, CardContent, CardDescription, CardHeader, CardTitle } from '@/components/ui/card' +import { Badge } from '@/components/ui/badge' +import { Skeleton } from '@/components/ui/skeleton' +import { + Table, + TableBody, + TableCell, + TableHead, + TableHeader, + TableRow, +} from '@/components/ui/table' +import { ROUTES } from '@/lib/constants' + +// Tool inventories — kept verbatim in sync with the agent definitions +// (app/features/agents/agents/experiment.py + rag_assistant.py). The +// `approval` flag mirrors agent_require_approval (create_alias / archive_run). +interface ToolInfo { + name: string + desc: string + approval?: boolean +} + +const RAG_TOOLS: ToolInfo[] = [ + { name: 'tool_retrieve_context', desc: 'Semantic search over the indexed knowledge base.' }, + { name: 'tool_list_sources', desc: 'List indexed sources and chunk counts.' }, + { name: 'tool_format_citations', desc: 'Turn retrieval results into stable citations.' }, + { name: 'tool_check_evidence', desc: 'Decide whether the evidence is sufficient to answer.' }, +] + +const EXPERIMENT_TOOLS: ToolInfo[] = [ + { name: 'tool_list_runs', desc: 'Browse existing model runs in the registry.' }, + { name: 'tool_get_run', desc: 'Fetch the full detail of one model run.' }, + { name: 'tool_run_backtest', desc: 'Run a time-series backtest for a store / product.' }, + { + name: 'tool_compare_backtest_results', + desc: 'Compare two backtest results and recommend a winner.', + }, + { name: 'tool_compare_runs', desc: 'Diff two registered runs (config + metrics).' }, + { name: 'tool_create_alias', desc: 'Promote a successful run to a deployment alias.', approval: true }, + { name: 'tool_archive_run', desc: 'Archive a model run.', approval: true }, +] + +const SESSION_STEPS = [ + 'Open Chat, pick an agent type, and click "Start Session".', + 'Type a message and send it.', + 'Watch the reply stream token-by-token; tool calls appear as chips (start → end).', + 'If the agent proposes a guarded action, an approval prompt appears — approve or reject it.', + '"New Session" starts a fresh conversation with a clean history.', +] + +const RAG_PROMPTS = [ + 'What forecasting models does ForecastLabAI support?', + 'How does backtesting prevent data leakage?', + 'What is in your knowledge base?', +] + +const EXPERIMENT_PROMPTS = [ + 'Backtest a seasonal_naive model for store 1 product 1 over the last 90 days and compare it to the naive baseline.', + 'List the most recent model runs and tell me which has the lowest WAPE.', +] + +export default function GuidePage() { + const { data: config, isLoading: configLoading } = useAIConfig() + + return ( +
+ {/* Header */} +
+

Agent Guide

+

How to use the Chat agents.

+
+ + {/* Live model callout */} + {config && ( +
+ + + Agents currently run on{' '} + {config.agent_default_model}. + + + + Manage in Admin → AI Models + +
+ )} + + {/* The two agents */} +
+ + See what it can answer from → Knowledge + + + } + /> + + See the runs it acts on → Model Runs + + + } + /> +
+ + {/* How a chat session works */} + + +
+ + How a chat session works +
+ + Each session is one conversation. Replies stream over a WebSocket — text arrives as{' '} + text_delta events and tool + calls as tool_call_start /{' '} + tool_call_end events. + +
+ +
    + {SESSION_STEPS.map((step, i) => ( +
  1. + + {i + 1} + + {step} +
  2. + ))} +
+
+
+ + {/* Human-in-the-loop approval */} + + +
+ + Human-in-the-loop approval +
+ + Tools that change registry state never run unattended. + +
+ +

+ When an agent calls a guarded tool, the run pauses and the Chat page shows an approval + prompt. The action only proceeds once you approve it; rejecting it returns control to + the agent. This keeps every mutation of the model registry under human control. +

+
+ Approval-gated tools: + {config ? ( + config.agent_require_approval.map((tool) => ( + + + {tool} + + )) + ) : ( + + )} +
+
+
+ + {/* Session limits */} + + +
+ + Session limits +
+ + Live from GET /config/ai. + These are the configured defaults — an operator can change them in Admin → AI Models. + +
+ + {configLoading && } + {config && ( + + + + Limit + Value + + + + + Token budget per session + {config.agent_max_tokens.toLocaleString()} tokens + + + Tool calls per session + {config.agent_max_tool_calls} + + + Per-run timeout + {config.agent_timeout_seconds} seconds + + + Retry attempts + {config.agent_retry_attempts} + + + Session time-to-live + {config.agent_session_ttl_minutes} minutes + + + Approval-gated tools + + {config.agent_require_approval.join(', ') || 'none'} + + + +
+ )} + {!configLoading && !config && ( +

+ Session limits are unavailable right now — the configuration endpoint could not be + reached. +

+ )} +
+
+ + {/* Example prompts */} + + +
+ + Example prompts +
+ Copy one of these into Chat to get started. +
+ + + + +
+ + {/* CTA */} +
+ +
+
+ ) +} + +function AgentCard({ + icon: Icon, + title, + agentId, + purpose, + tools, + footer, +}: { + icon: React.ComponentType<{ className?: string }> + title: string + agentId: string + purpose: string + tools: ToolInfo[] + footer: React.ReactNode +}) { + return ( + + +
+ + {title} + + {agentId} + +
+ {purpose} +
+ +
+

Tools

+
    + {tools.map((tool) => ( +
  • +
    + {tool.name} + {tool.approval && ( + + + requires approval + + )} +
    +

    {tool.desc}

    +
  • + ))} +
+
+
{footer}
+
+
+ ) +} + +function PromptList({ title, prompts }: { title: string; prompts: string[] }) { + return ( +
+

{title}

+ {prompts.map((prompt) => ( + + {prompt} + + ))} +
+ ) +} diff --git a/frontend/src/pages/knowledge.tsx b/frontend/src/pages/knowledge.tsx new file mode 100644 index 00000000..1335ae88 --- /dev/null +++ b/frontend/src/pages/knowledge.tsx @@ -0,0 +1,344 @@ +import { useState } from 'react' +import { Link } from 'react-router-dom' +import { format } from 'date-fns' +import { + Library, + Search, + FileText, + Loader2, + Store, + Package, + TrendingUp, + CalendarRange, + Database, + Tag, + ArrowRight, + FolderOpen, +} from 'lucide-react' +import { useRagSources, useRetrieve } from '@/hooks/use-rag-sources' +import { useSeederStatus } from '@/hooks/use-seeder' +import { useRuns, useAliases } from '@/hooks/use-runs' +import { LoadingState } from '@/components/common/loading-state' +import { ErrorDisplay } from '@/components/common/error-display' +import { Button } from '@/components/ui/button' +import { Card, CardContent, CardDescription, CardHeader, CardTitle } from '@/components/ui/card' +import { Badge } from '@/components/ui/badge' +import { Input } from '@/components/ui/input' +import { Skeleton } from '@/components/ui/skeleton' +import { ApiError, getErrorMessage } from '@/lib/api' +import { ROUTES } from '@/lib/constants' +import { formatRelevance, chunkExcerpt, groupSourcesByType } from '@/lib/knowledge-utils' + +export default function KnowledgePage() { + return ( +
+ {/* Header */} +
+

Knowledge

+

+ Everything ForecastLabAI can currently draw on — the RAG knowledge base its assistant + answers from, and the live data its experiment agent acts on. +

+
+ + + + +
+ ) +} + +// === Section 1 — Knowledge Base (indexed RAG sources, read-only) === + +function KnowledgeBaseSection() { + const { data, isLoading, error, refetch } = useRagSources() + + if (error) { + return + } + if (isLoading) { + return + } + + const sources = data?.sources ?? [] + const byType = groupSourcesByType(sources) + + return ( + + +
+ + Knowledge Base +
+ + {data?.total_sources ?? 0} sources • {data?.total_chunks ?? 0} chunks + {sources.length > 0 && ( + <> + {' • '} + {Object.entries(byType) + .map(([type, items]) => `${items.length} ${type}`) + .join(', ')} + + )} + +
+ + {sources.length > 0 ? ( +
+ {sources.map((source) => ( +
+
+ +
+

{source.source_path}

+

+ {source.chunk_count} chunks • Indexed{' '} + {format(new Date(source.indexed_at), 'MMM d, yyyy')} +

+
+
+ + {source.source_type} + +
+ ))} +
+ ) : ( +
+ +
+

No documents indexed yet

+

+ The RAG assistant has nothing to answer from. Index documents in Admin → RAG + Sources, or run the RAG seeder scenario. +

+
+ +
+ )} +
+
+ ) +} + +// === Section 2 — Semantic Search (POST /rag/retrieve) === + +function SemanticSearchSection() { + const [query, setQuery] = useState('') + const retrieve = useRetrieve() + + const trimmed = query.trim() + + const handleSubmit = (e: React.FormEvent) => { + e.preventDefault() + if (!trimmed || retrieve.isPending) return + retrieve.mutate({ query: trimmed, top_k: 5 }) + } + + const searchUnavailable = retrieve.error instanceof ApiError && retrieve.error.status === 502 + const results = retrieve.data?.results ?? [] + + return ( + + +
+ + Semantic Search +
+ + Search the indexed knowledge base the way the RAG assistant does — by meaning, not + keywords. + +
+ +
+ setQuery(e.target.value)} + placeholder="e.g. How does backtesting prevent data leakage?" + aria-label="Semantic search query" + /> + +
+ + {searchUnavailable && ( +

+ Semantic search is unavailable — configure an embedding provider in{' '} + + Admin → AI Models + + . The source list above does not need embeddings and still works. +

+ )} + + {retrieve.isError && !searchUnavailable && ( +

+ {getErrorMessage(retrieve.error)} +

+ )} + + {retrieve.isSuccess && results.length === 0 && ( +

+ No matching content found. Try rephrasing the query. +

+ )} + + {results.length > 0 && ( +
+

+ {results.length} match{results.length === 1 ? '' : 'es'} •{' '} + {retrieve.data?.total_chunks_searched ?? 0} chunks searched in{' '} + {Math.round(retrieve.data?.search_time_ms ?? 0)} ms +

+ {results.map((chunk) => ( +
+
+

+ {chunk.source_path} + ({chunk.source_type}) +

+ + {formatRelevance(chunk.relevance_score)} match + +
+

{chunkExcerpt(chunk)}

+
+ ))} +
+ )} +
+
+ ) +} + +// === Section 3 — Live System State (what the experiment agent acts on) === + +function StatCard({ + icon: Icon, + label, + value, +}: { + icon: React.ComponentType<{ className?: string }> + label: string + value: string | number +}) { + return ( +
+ +

{typeof value === 'number' ? value.toLocaleString() : value}

+

{label}

+
+ ) +} + +function LiveSystemStateSection() { + const { data: status, isLoading: statusLoading } = useSeederStatus() + const { data: runs, isLoading: runsLoading } = useRuns({ page: 1, pageSize: 1 }) + const { data: aliases, isLoading: aliasesLoading } = useAliases() + + const dateRange = + status?.date_range_start && status?.date_range_end + ? `${status.date_range_start} → ${status.date_range_end}` + : 'No data' + + return ( + + +
+ + Live System State +
+ + The seeded data and registered models the experiment agent can query through its tools. + +
+ + {/* Seeded data tiles */} + {statusLoading ? ( +
+ {Array.from({ length: 4 }).map((_, i) => ( + + ))} +
+ ) : ( +
+ + + + +
+ )} + + {/* Registry summary */} +
+
+
+ +

Registered model runs

+
+

+ {runsLoading ? '—' : (runs?.total ?? 0).toLocaleString()} +

+ + Browse all runs + + +
+ +
+
+ +

Deployment aliases

+
+ {aliasesLoading ? ( +

Loading…

+ ) : aliases && aliases.length > 0 ? ( +
    + {aliases.map((alias) => ( +
  • + {alias.alias_name} + {alias.model_type} +
  • + ))} +
+ ) : ( +

No aliases yet.

+ )} +
+
+ + {/* Explainer */} +

+ The RAG assistant answers from the Knowledge Base above; the experiment agent acts on this + Live System State. Learn how to use them in the{' '} + + Agent Guide + + , or start a conversation in{' '} + + Chat + + . +

+
+
+ ) +} diff --git a/frontend/src/types/api.ts b/frontend/src/types/api.ts index c356c6ab..ecd51d02 100644 --- a/frontend/src/types/api.ts +++ b/frontend/src/types/api.ts @@ -76,6 +76,46 @@ export interface DrilldownResponse { product_id: number | null } +// Bucket size for GET /analytics/timeseries. +export type TimeGranularity = 'day' | 'week' | 'month' | 'quarter' + +// One aggregated period of the sales time series. +export interface TimeSeriesPoint { + period: string // ISO date (bucket start) + metrics: KPIMetrics +} + +// Response from GET /analytics/timeseries (points ascending by period). +export interface TimeSeriesResponse { + granularity: TimeGranularity + points: TimeSeriesPoint[] + total_points: number + start_date: string + end_date: string + store_id: number | null + product_id: number | null + category: string | null +} + +// One day of a product's lifecycle demand curve. +export interface LifecyclePoint { + date: string // ISO date + stage: string + multiplier: number +} + +// Response from GET /dimensions/products/{id}/lifecycle-curve. +export interface LifecycleCurveResponse { + product_id: number + sku: string + launch_date: string | null + discontinue_date: string | null + start_date: string + end_date: string + points: LifecyclePoint[] + total: number +} + // === Registry === export type RunStatus = 'pending' | 'running' | 'success' | 'failed' | 'archived' @@ -125,6 +165,17 @@ export interface RunCompareResponse { metrics_diff: Record } +// Response from GET /registry/runs/{run_id}/verify (SHA-256 integrity check). +// On a checksum mismatch the endpoint returns HTTP 200 with verified:false + error. +export interface ArtifactVerifyResponse { + verified: boolean + run_id: string + artifact_uri: string + stored_hash?: string + computed_hash?: string + error?: string +} + // === Jobs === export type JobType = 'train' | 'predict' | 'backtest' export type JobStatus = 'pending' | 'running' | 'completed' | 'failed' | 'cancelled' @@ -181,6 +232,35 @@ export interface IndexDocumentResponse { chunks_created: number } +// Semantic-search request for POST /rag/retrieve. +// Mirrors app/features/rag/schemas.py RetrieveRequest (extra="forbid" — send +// nothing beyond these fields). Omit similarity_threshold to use the server default. +export interface RetrieveRequest { + query: string + top_k?: number // 1..50, server default 5 + similarity_threshold?: number // 0..1 + filters?: Record | null +} + +// One matching chunk from a semantic search. +export interface ChunkResult { + chunk_id: string + source_id: string + source_path: string + source_type: string + content: string + relevance_score: number // 0..1 + metadata: Record | null +} + +// Response from POST /rag/retrieve. +export interface RetrieveResponse { + results: ChunkResult[] + query_embedding_time_ms: number + search_time_ms: number + total_chunks_searched: number +} + // === Agents WebSocket === export type AgentEventType = | 'text_delta' @@ -373,6 +453,11 @@ export interface AIModelConfig { agent_temperature: number agent_max_tokens: number agent_thinking_budget: number | null + agent_max_tool_calls: number + agent_timeout_seconds: number + agent_retry_attempts: number + agent_session_ttl_minutes: number + agent_require_approval: string[] rag_embedding_provider: string rag_embedding_model: string rag_embedding_dimension: number diff --git a/scripts/run_demo.py b/scripts/run_demo.py index 03d26913..8acfc255 100644 --- a/scripts/run_demo.py +++ b/scripts/run_demo.py @@ -43,7 +43,7 @@ import time from collections.abc import Awaitable, Callable from dataclasses import dataclass, field -from datetime import date, timedelta +from datetime import UTC, date, datetime, timedelta from pathlib import Path from typing import Any, Final @@ -75,8 +75,10 @@ DEMO_SCENARIO: Final[str] = "demo_minimal" DEMO_SEED_STORES: Final[int] = 3 DEMO_SEED_PRODUCTS: Final[int] = 10 -DEMO_SEED_START: Final[date] = date(2024, 10, 1) -DEMO_SEED_END: Final[date] = date(2024, 12, 31) +# Seed window is anchored to *today* so the demo always runs on +# current-looking data; it spans DEMO_SEED_SPAN_DAYS back from today (92 days +# inclusive). Must stay >= 72 for a non-NaN backtest WAPE. +DEMO_SEED_SPAN_DAYS: Final[int] = 91 DEMO_MODEL_TYPES: Final[tuple[str, ...]] = ("naive", "seasonal_naive", "moving_average") @@ -411,6 +413,8 @@ async def step_seed(ctx: DemoContext, client: HttpClient) -> StepOutcome: detail="--skip-seed set", duration_ms=(time.monotonic() - start) * 1000, ) + seed_end = datetime.now(UTC).date() + seed_start = seed_end - timedelta(days=DEMO_SEED_SPAN_DAYS) body = await client.request( "seed", "POST", @@ -420,8 +424,8 @@ async def step_seed(ctx: DemoContext, client: HttpClient) -> StepOutcome: "seed": ctx.seed, "stores": DEMO_SEED_STORES, "products": DEMO_SEED_PRODUCTS, - "start_date": DEMO_SEED_START.isoformat(), - "end_date": DEMO_SEED_END.isoformat(), + "start_date": seed_start.isoformat(), + "end_date": seed_end.isoformat(), "sparsity": 0.0, "dry_run": False, }, diff --git a/scripts/seed_random.py b/scripts/seed_random.py index 27263e96..c544ad43 100644 --- a/scripts/seed_random.py +++ b/scripts/seed_random.py @@ -40,6 +40,8 @@ RetailPatternConfig, SparsityConfig, TimeSeriesConfig, + default_seed_end_date, + default_seed_start_date, ) from app.shared.seeder.rag_scenario import run_rag_scenario @@ -112,8 +114,10 @@ def load_config_from_yaml(path: Path) -> SeederConfig: # Parse date range date_range = data.get("date_range", {}) - start_date = parse_date(date_range["start"]) if "start" in date_range else date(2024, 1, 1) - end_date = parse_date(date_range["end"]) if "end" in date_range else date(2024, 12, 31) + start_date = ( + parse_date(date_range["start"]) if "start" in date_range else default_seed_start_date() + ) + end_date = parse_date(date_range["end"]) if "end" in date_range else default_seed_end_date() # Parse time series config ts_data = data.get("time_series", {}) @@ -243,14 +247,14 @@ def create_parser() -> argparse.ArgumentParser: parser.add_argument( "--start-date", type=parse_date, - default=date(2024, 1, 1), - help="Start of date range (default: 2024-01-01)", + default=default_seed_start_date(), + help="Start of date range (default: one year before today)", ) parser.add_argument( "--end-date", type=parse_date, - default=date(2024, 12, 31), - help="End of date range (default: 2024-12-31)", + default=default_seed_end_date(), + help="End of date range (default: today)", ) parser.add_argument( "--sparsity", diff --git a/tests/test_demo_showcase_integration.py b/tests/test_demo_showcase_integration.py index a9538551..5c0c5b38 100644 --- a/tests/test_demo_showcase_integration.py +++ b/tests/test_demo_showcase_integration.py @@ -7,14 +7,21 @@ ``integration`` so it is excluded from the fast unit run. """ +from datetime import timedelta + import pytest +from app.shared.seeder.config import DEMO_MINIMAL_SPAN_DAYS, default_seed_end_date + pytestmark = pytest.mark.integration async def test_demo_run_pipeline_end_to_end(client): """Seed demo_minimal, run the demo pipeline, and verify the registered winner.""" # Precondition: seed the demo_minimal scenario so skip_seed=true has data. + # The window is anchored to today, mirroring the demo pipeline's own seed step. + seed_end = default_seed_end_date() + seed_start = seed_end - timedelta(days=DEMO_MINIMAL_SPAN_DAYS) seed_resp = await client.post( "/seeder/generate", json={ @@ -22,8 +29,8 @@ async def test_demo_run_pipeline_end_to_end(client): "seed": 42, "stores": 3, "products": 10, - "start_date": "2024-10-01", - "end_date": "2024-12-31", + "start_date": seed_start.isoformat(), + "end_date": seed_end.isoformat(), "sparsity": 0.0, "dry_run": False, }, diff --git a/tests/test_run_demo_unit.py b/tests/test_run_demo_unit.py index 4140d95a..0630b1d1 100644 --- a/tests/test_run_demo_unit.py +++ b/tests/test_run_demo_unit.py @@ -8,16 +8,19 @@ from __future__ import annotations import math +from datetime import timedelta from typing import Any from unittest.mock import AsyncMock import pytest +from app.shared.seeder.config import default_seed_end_date from scripts import run_demo from scripts.run_demo import ( DEMO_ALIAS, DEMO_HORIZON, DEMO_MODEL_TYPES, + DEMO_SEED_SPAN_DAYS, GLYPHS, DemoArgs, DemoContext, @@ -341,7 +344,11 @@ class TestStepPayloads: async def test_step_seed_sends_demo_minimal( self, ) -> None: - """Seed step posts demo_minimal scenario with correct dims + dates.""" + """Seed step posts demo_minimal scenario with correct dims + dates. + + The seed window is anchored to *today* and runs DEMO_SEED_SPAN_DAYS + backwards, so the demo always seeds current-looking data. + """ calls: list[dict[str, Any]] = [] class _RecordingClient: @@ -374,8 +381,9 @@ async def request( assert body["seed"] == 42 assert body["stores"] == 3 assert body["products"] == 10 - assert body["start_date"] == "2024-10-01" - assert body["end_date"] == "2024-12-31" + today = default_seed_end_date() + assert body["end_date"] == today.isoformat() + assert body["start_date"] == (today - timedelta(days=DEMO_SEED_SPAN_DAYS)).isoformat() @pytest.mark.asyncio async def test_step_seed_skipped(self) -> None: From f4737389f44f1f40c6fc9caf6f1d7fd02f8eeb18 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <168316277+w7-mgfcode@users.noreply.github.com> Date: Mon, 18 May 2026 21:53:36 +0200 Subject: [PATCH 2/2] chore(main): release 0.2.13 (#193) --- .release-please-manifest.json | 2 +- CHANGELOG.md | 7 +++++++ pyproject.toml | 2 +- 3 files changed, 9 insertions(+), 2 deletions(-) diff --git a/.release-please-manifest.json b/.release-please-manifest.json index bf1a075f..45f2163e 100644 --- a/.release-please-manifest.json +++ b/.release-please-manifest.json @@ -1,3 +1,3 @@ { - ".": "0.2.12" + ".": "0.2.13" } diff --git a/CHANGELOG.md b/CHANGELOG.md index 101e404c..8d99b4ab 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,12 @@ # Changelog +## [0.2.13](https://github.com/w7-mgfcode/ForecastLabAI/compare/v0.2.12...v0.2.13) (2026-05-18) + + +### Features + +* cut v0.2.13 — explorer interactivity, knowledge & guide pages ([#191](https://github.com/w7-mgfcode/ForecastLabAI/issues/191)) ([#192](https://github.com/w7-mgfcode/ForecastLabAI/issues/192)) ([ae37ca5](https://github.com/w7-mgfcode/ForecastLabAI/commit/ae37ca521eb9510c135def4a1e3730e137fb014b)) + ## [0.2.12](https://github.com/w7-mgfcode/ForecastLabAI/compare/v0.2.11...v0.2.12) (2026-05-18) diff --git a/pyproject.toml b/pyproject.toml index 46dcc482..7a42992e 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [project] name = "forecastlabai" -version = "0.2.12" +version = "0.2.13" description = "Portfolio-grade end-to-end retail demand forecasting system" readme = "README.md" requires-python = ">=3.12"