Retail LLM agent optimization and evaluation showcase built on top of tau2-bench, focused on execution-chain improvements, route comparison, and reproducible benchmark demos.
-
Updated
Apr 15, 2026 - Python
Retail LLM agent optimization and evaluation showcase built on top of tau2-bench, focused on execution-chain improvements, route comparison, and reproducible benchmark demos.
Open harness for running, measuring, and visualizing agent benchmarks. Adapters for AutomationBench, τ-bench, LeRobot, WorkArena.
Add a description, image, and links to the tau-bench topic page so that developers can more easily learn about it.
To associate your repository with the tau-bench topic, visit your repo's landing page and select "manage topics."