Skip to content

v0.0.6 — 60/70 triggered, leaderboard reframe, manual workflow

Choose a tag to compare

@wuyoscar wuyoscar released this 29 May 08:53
· 41 commits to main since this release
7c5a73a

v0.0.6 — 60/70 triggered · leaderboard reframe · manual workflow

ISC Arena

  • No longer a "Top 100" ranking — now a tracked-model list: any triggered model stays in, nothing is trimmed.
  • Rank / Arena-Score columns dropped; groupings relabelled Split 1 / 2 / 3.
  • Model-name normalization — variants (Thinking / High / Chat / Reasoning / Instruct / Exp / dated / Preview) merged into one clean base name; a model is 🔴 if any variant triggered, with demo links merged.

Coverage — 60 / 70 triggered

  • New: Claude Opus 4.8, Claude Haiku 4.5, Kimi K2.6, plus single-turn template batches (Kimi K2, DeepSeek V3, Mimo V2 Flash, GPT-5, o1, o4-mini, GPT-5 Mini, and more Qwen/DeepSeek/MiniMax variants).
  • Claude Sonnet 4: refused single-turn, fell to the agent loop — the workflow, not the prompt.

Chart & workflow

  • Progress figure is now a static PNG badge (opens cleanly on GitHub; no SVG sandbox error), left-aligned, not a link.
  • Removed the auto-update GitHub Action and leaderboard_history.json — the board is now manually maintained: edit data → gen_leaderboard.py + gen_leaderboard_chart.py → commit.

Docs

  • All 7 language READMEs kept consistent (generated leaderboard, synced Updates) and rewritten in a colder, terser tone.
  • Closed 3 inactive issues.