Skip to content

Bound collection Agent runtime defaults#44

Draft
giaphutran12 wants to merge 2 commits into
codex/collection-runner-portfrom
codex/collection-runtime-agent-defaults
Draft

Bound collection Agent runtime defaults#44
giaphutran12 wants to merge 2 commits into
codex/collection-runner-portfrom
codex/collection-runtime-agent-defaults

Conversation

@giaphutran12
Copy link
Copy Markdown
Collaborator

@giaphutran12 giaphutran12 commented May 22, 2026

Summary

  • make the BigSet collection runner keep TinyFish Agent/browser calls off unless COLLECTION_AGENT_ENABLE_AGENT=true
  • add a bounded per-run Agent poll timeout for opt-in Agent runs via COLLECTION_AGENT_POLL_TIMEOUT_MS when AGENT_POLL_TIMEOUT_MS is unset
  • thread the timeout through the vendored collection pipeline, acquisition, repair, page processing, and TinyFish polling path instead of relying on import-time env mutation
  • extend collection runner tests to prove default no-Agent behavior and explicit Agent opt-in timeout behavior
  • document the runtime default so cron and benchmark runs stay cheap/repeatable by default

Verification

  • npm --prefix backend test -- test/collection-agent-runner.test.ts (repo script ran all backend tests: 58/58 pass)
  • npm --prefix backend run build
  • make verify-self-healing
  • critic subagent review: aligned as infra hardening; do not frame as quality/default-ready work
  • code-reviewer requested changes on warm-process timeout caching; fixed by passing timeout as an explicit per-run option and adding a warm-module test

Real Benchmark Evidence

With keys loaded execution-only and no COLLECTION_AGENT_ENABLE_AGENT set:

  • COLLECTION_AGENT_PIPELINE_MODULE=./backend/BigSet_Data_Collection_Agent/src/orchestrator/pipeline.ts
  • BIGSET_COLLECTION_BENCHMARK_RUNNER_MODULE=./backend/src/pipeline/collection-agent-runner.ts
  • 2-prompt default run: 0/2 passed, 2 failed, 0 blocked, 7 rows, 12 evidence quotes, 0 browser/Agent runs, cost $0.009562

Conclusion: this PR fixes budget/runtime behavior, not source-quality. The collection lane no longer accidentally hits the long TinyFish Agent path by default, but source/domain/evidence quality is still the next problem.

Notes

No merge. This stacks after PR #43.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ba8fd217-86f8-436a-9865-909d653c0716

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/collection-runtime-agent-defaults

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant