Open harness for running, measuring, and visualizing agent benchmarks. Adapters for AutomationBench, τ-bench, LeRobot, WorkArena.
benchmark typescript eval agents ai-agents opentelemetry llm lerobot swe-bench tau-bench automationbench scene-otel workarena
-
Updated
May 3, 2026 - TypeScript