[Leaderboard] Pi Coding Agent — Claude Opus 4.6 — 62.6% Pass@1 by satish860 · Pull Request #31 · ucbepic/DataAgentBench

satish860 · 2026-04-05T13:54:07Z

Pi Coding Agent — Leaderboard Submission

Agent name: Pi Coding Agent
Backbone LLM: Claude Opus 4.6 (via OpenRouter)
Hints: Yes (db_description_withhint.txt)
Trials: 5 per query
Pass@1: 62.6%

Architecture

A TypeScript coding agent built on the Pi SDK that:

Pre-indexes each dataset via automated SQL/MongoDB introspection — schemas, sample rows, value ranges, and join key analysis. No manual input, no data leakage.
For each query, reads the index then writes and executes Node.js scripts that connect directly to PostgreSQL, SQLite, DuckDB, and MongoDB.
Iterates on errors — if the script crashes, the agent patches and reruns until it produces a final answer.

Key differences from the built-in agent

	Built-in Agent	Pi Coding Agent
Language	Python	TypeScript / Node.js
Tool interface	4 constrained tools	Full read/write/bash
DB context	Runtime exploration	Pre-built index (automated)
Script style	Iterative queries	One complete script per query

Results summary

Dataset	Pass@1
agnews	0.60
bookreview	0.87
crmarenapro	0.83
googlelocal	0.70
music_brainz_20k	0.67
stockindex	0.67
stockmarket	0.60
yelp	0.74
GITHUB_REPOS	0.40
PANCANCER_ATLAS	0.67
DEPS_DEV_V1	0.00
PATENTS	0.00
Overall	62.6%

…Pass@1

satish860 · 2026-04-06T11:51:01Z

@shreyashankar - Can you help with the PR.

shreyashankar · 2026-04-22T00:19:52Z

Hello! Sorry just got back to the US from traveling and am looking at the PRs. We will add your name to the leaderboard shortly!

Add PR #31, #32, #38 to leaderboard

shreyashankar · 2026-04-22T01:27:52Z

Hi @satish860 — verified with common_scaffold/validate/validate.py:

Pass@1 = 0.5603 (stratified) — average across the 12 datasets of the per-dataset average across queries of c/n. This is the leaderboard metric.
Pass@1 = 0.626 (micro) — total passes / total runs across all 270 trials. Equal weight per (query, run); matches the 62.6% in your PR title.

0.5603 puts you at #1 on the leaderboard, which now links back to this PR. Thanks for the submission! Closing.

…, Cohere (ucbepic#38) to leaderboard Verified Pass@1 numbers were re-computed from the raw submission JSONs using common_scaffold/validate/validate.py: PR ucbepic#31 Pi Coding Agent + Claude Opus 4.6 → 0.5603 (ucbepic#1) PR ucbepic#32 Oracle Forge (Tenacious) + Sonnet 4.6 → 0.4554 (ucbepic#4) PR ucbepic#38 Oracle Forge (Cohere) + Gemini 2.0 F. → 0.128 (ucbepic#10) Adds a Submission column on both the README table and the website leaderboard linking each submission to its PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…mission PRs Pi was linking to mariozechner/pi-coding-agent (the SDK author), not the team that made the submission. Cohere was linking to their source repo. Both now link to the PR they opened on this repo, matching the pattern already used for Tenacious (ucbepic#32). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add Pi Coding Agent (Claude Opus 4.6) leaderboard submission - 62.6% …

023ab66

…Pass@1

shreyashankar mentioned this pull request Apr 22, 2026

Add PR #31, #32, #38 to leaderboard #39

Merged

2 tasks

shreyashankar added a commit that referenced this pull request Apr 22, 2026

Merge pull request #39 from ucbepic/leaderboard/add-pr31-32-38

4ceeb67

Add PR #31, #32, #38 to leaderboard

shreyashankar closed this Apr 22, 2026

shreyashankar mentioned this pull request Apr 22, 2026

Fix team links on leaderboard #40

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Leaderboard] Pi Coding Agent — Claude Opus 4.6 — 62.6% Pass@1#31

[Leaderboard] Pi Coding Agent — Claude Opus 4.6 — 62.6% Pass@1#31
satish860 wants to merge 1 commit intoucbepic:mainfrom
satish860:add-pi-coding-agent-submission

satish860 commented Apr 5, 2026

Uh oh!

satish860 commented Apr 6, 2026

Uh oh!

shreyashankar commented Apr 22, 2026 •

edited

Loading

Uh oh!

shreyashankar commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

satish860 commented Apr 5, 2026

Pi Coding Agent — Leaderboard Submission

Architecture

Key differences from the built-in agent

Results summary

Uh oh!

satish860 commented Apr 6, 2026

Uh oh!

shreyashankar commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shreyashankar commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shreyashankar commented Apr 22, 2026 •

edited

Loading