fix(agents): constrain experiment read-only queries to read tools

## Summary

A manual Chat UI test against the **Experiment Agent** derailed a read-only query into an unrelated what-if scenario proposal.

**User prompt:** "List the most recent model runs and tell me which has the lowest WAPE."

**Observed wrong answer:** "Proposed what-if for store 123 / product 456 toward the objective ''. Cut price 15% ..."

## Investigation evidence

- Correct agent (`experiment`), fresh session, no stale context.
- Models at the time: primary `ollama:llama3.1:8b`, fallback `ollama:qwen3:8b`.
- The model first called the **correct** read tool: `tool_list_runs({"status":"success"})`, which returned run data including WAPE.
- The model then produced **malformed structured output missing `ExperimentReport.summary`**.
- A PydanticAI output-validation retry occurred.
- On the retry, the local 8B model **derailed and called `tool_propose_scenario` with hallucinated `store_id=123`/`product_id=456`**.
- The final answer summarized the unrelated scenario proposal.

## Root cause

Local 8B model weakness under `PromptedOutput` validation-retry, **plus** the experiment agent exposing scenario/write/planning tools during a read-only task with no guard telling the model (a) to stick to read tools for read-only intents and (b) that an output-format retry is a *reformat*, not a reason to start a new action.

## Desired behavior

- Read-only queries answer using **read-only tools only**.
- Validation retries **only reformat** the previous result into `ExperimentReport` — they never call new scenario/write tools.
- Read-only intents that should never trigger scenario/write tools (unless the user explicitly asks to create/save/promote/archive/run something): top products, sales/revenue/units summaries, forecast summaries, registry aliases & deployment status, model-run & metric comparisons (WAPE/MAE/RMSE), backtest metrics, RAG/document questions.
- Ambiguous rankings (e.g. "top products") → ask a clarifying question ("Top by revenue, units sold, forecasted demand, or model error?").
- If no read-only tool exists for the requested metric, state the limitation rather than invent data.
- Never invent `store_id`/`product_id`/`run_id` values.

## Secondary validation gap

`tool_propose_scenario` accepted non-existent store/product IDs (123/456) and returned a normal proposal. It should reject non-existent store/product pairs with a clear, non-persistable validation error.

## Acceptance

- Generalized read-only intent guard added to the experiment-agent prompt.
- Regression tests covering the exact WAPE case and broader read-only questions (top products, highest forecasted demand, current deployment alias) — no live model calls.
- `propose_scenario` rejects non-existent entity pairs (123/456 covered explicitly); persists nothing on failure.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agents): constrain experiment read-only queries to read tools #347

Summary

Investigation evidence

Root cause

Desired behavior

Secondary validation gap

Acceptance

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

fix(agents): constrain experiment read-only queries to read tools #347

Description

Summary

Investigation evidence

Root cause

Desired behavior

Secondary validation gap

Acceptance

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions