Skip to content

Port collection runner into self-healing path#43

Draft
giaphutran12 wants to merge 2 commits into
codex/migration-plan-status-refreshfrom
codex/collection-runner-port
Draft

Port collection runner into self-healing path#43
giaphutran12 wants to merge 2 commits into
codex/migration-plan-status-refreshfrom
codex/collection-runner-port

Conversation

@giaphutran12
Copy link
Copy Markdown
Collaborator

@giaphutran12 giaphutran12 commented May 22, 2026

Summary

  • vendor the collection pipeline runtime source needed by the BigSet populate stack
  • add backend/src/pipeline/collection-agent-runner.ts, exporting runCollectionPopulatePipeline(input) for POPULATE_COLLECTION_RUNNER_MODULE
  • map collection pipeline output into PopulateRuntimeResult rows, evidence, usage, metrics, and validation issues
  • require COLLECTION_AGENT_PIPELINE_MODULE explicitly so built backend does not silently import vendored TypeScript source
  • fix collection metrics so initial and repair TinyFish Agent dispatches are counted together without fake agentRuns=1
  • add a runner unit test proving recipe instructions, benchmark metadata, required columns, output mapping, and repair metrics flow through the runner contract
  • fix benchmark failure text so claim-support failures name missing claim-support entities, not missing entity-coverage entities
  • add @tiny-fish/sdk to backend dependencies for the vendored TinyFish integration

Verification

  • node --check benchmarks/dataset-agent/run-benchmark.mjs
  • npm --prefix backend test -- test/collection-agent-runner.test.ts
  • npm --prefix backend run build
  • make verify-self-healing
  • code-reviewer subagent re-review: no findings; prior P1/P2 blockers resolved

Real Benchmark Evidence

With keys loaded execution-only:

  • explicit pipeline env, no-Agent collection run, 2 prompts:
    • COLLECTION_AGENT_ENABLE_AGENT=false
    • COLLECTION_AGENT_PIPELINE_MODULE=./backend/BigSet_Data_Collection_Agent/src/orchestrator/pipeline.ts
    • BIGSET_COLLECTION_BENCHMARK_RUNNER_MODULE=./backend/src/pipeline/collection-agent-runner.ts
    • result: 2/2 passed, 7 rows, 13 evidence quotes, 13 source URLs, cost $0.010813
  • earlier default collection run, 2 prompts: 0/2 passed, 1 failed, 1 blocked by 10-minute timeout; saas-pricing-pages produced 3 rows, score 0.967, cost $0.006087
  • explicit pipeline env, no-Agent collection run, full 16-prompt benchmark:
    • result: 4/16 passed, 12 failed, 0 blocked
    • cost $0.100698; wall time 16m 13s
    • output volume: 131 rows, 195 evidence quotes, 94 source URLs
    • calls/tokens: 93 search calls, 206 fetch calls, 1,020,923 total tokens, 0 browser/Agent runs
    • passed: hcmc-bakery-products, california-insurance-prices, la-coke-menu-lol, pastry-things-menlo

Conclusion: the real collection runner now executes through the self-healing benchmark path. The cheap Search/Fetch + LLM lane produces rows and passes some prompts, but full-benchmark quality is not default-ready yet. The default TinyFish Agent path still needs timeout/polling work before it should become the default.

Notes

No merge. This stays stacked after PR #42.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7a99fa27-cf6f-44b4-bff9-9909c4c96881

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/collection-runner-port

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@giaphutran12 giaphutran12 self-assigned this May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant