fix(agents): stop experiment read-only tool-call loop by w7-mgfcode · Pull Request #350 · w7-mgfcode/ForecastLabAI

w7-mgfcode · 2026-06-01T02:58:25Z

Root cause

Follow-up to #347 / PR #348. The read-only intent guard correctly stops the Experiment Agent from derailing into scenario/write tools — but on a weak local model (ollama:llama3.1:8b) read-only queries then loop the read tool and never finish, surfacing as "invalid tool call".

Live evidence (session 3b5f965b…, three queries, identical pattern):

Model calls tool_list_runs → it returns the data (10 successful runs incl. WAPE) → model re-calls tool_list_runs 3 more times (tool_call_count 1→2→3→4) → "Exceeded maximum output retries (3)" → UnexpectedModelBehavior → error event.
Zero scenario/write tool calls — the fix(agents): constrain experiment read-only queries #348 guard works; the derail is gone.
The answer was available throughout: lowest WAPE = naive run 2fad611b… (18.93).

The #348 guard correctly closed the propose_scenario "escape hatch" the model previously used to emit something (a wrong but complete answer), so the weak model now loops to retry-exhaustion instead. Pure llama3.1:8b structured-output weakness — not a regression, not a data issue.

Fix summary

Prompt-only hardening of READ_ONLY_INTENT_GUARD (agents/base.py) — a new "FINISH IN ONE PASS — do not loop" section:

Call each read-only tool at most once per question.
The moment a read tool returns, STOP calling tools and write the ExperimentReport.summary from what it returned.
NEVER re-call a tool that already returned (the exact 4× tool_list_runs loop).
If a read tool returns an empty result, say so in the summary ("No model runs found.") instead of retrying.

No tool surfaces added, no mutation surfaces widened, HITL gates untouched, no API contract change.

Tests added

app/features/agents/tests/test_read_only_guard.py:

test_guard_forbids_tool_call_loops — asserts "FINISH IN ONE PASS", "AT MOST ONCE", "NEVER call a tool again that has already returned", "STOP calling tools".
test_guard_handles_empty_tool_result — asserts the empty-result → summarize (don't retry) rule.

Deterministic, no live model calls. The existing test_prompts_only_reference_registered_tool_names invariant and test_guard_is_delivered_in_system_prompt_to_model continue to pass (guard reaches the model).

Validation results

ruff check . → All checks passed · ruff format --check . → 334 files formatted
mypy app/ → only the pre-existing xgboost/lightgbm optional-extra import errors in untouched files; none in changed files
pyright (changed files) → 0 errors
pytest -m "not integration" → 1692 passed, 12 skipped

Notes / limitations

Prompt-level mitigation. It directly targets the observed re-call loop and should let the 8B model terminate with a summary in the common case, but cannot fully guarantee structured-output reliability on a weak local model. For consistently robust structured analytical queries, a cloud model remains the stronger option (not changed here).
No live model/agent calls were made; verification is via the persisted logs/DB and deterministic tests.

Closes #349

sourcery-ai

Sorry @w7-mgfcode, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

coderabbitai · 2026-06-01T02:58:32Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ca9ea618-89c3-4ecb-8646-bea247820e1e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/agents-read-tool-loop-guard

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

fix(agents): stop read-only tool-call loop in experiment guard (#349)

bcb80de

sourcery-ai Bot reviewed Jun 1, 2026

View reviewed changes

w7-mgfcode merged commit 082391d into dev Jun 1, 2026
8 checks passed

This was referenced Jun 1, 2026

fix(agents): salvage experiment answer when weak model fails structured output #351

Closed

fix(agents): salvage experiment answer when weak model fails structured output #352

Merged

fix(agents): stop experiment read-only tool-call loop on weak models #349

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agents): stop experiment read-only tool-call loop#350

fix(agents): stop experiment read-only tool-call loop#350
w7-mgfcode merged 1 commit into
devfrom
fix/agents-read-tool-loop-guard

w7-mgfcode commented Jun 1, 2026

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

coderabbitai Bot commented Jun 1, 2026

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

w7-mgfcode commented Jun 1, 2026

Root cause

Fix summary

Tests added

Validation results

Notes / limitations

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Jun 1, 2026

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant