Skip to content

feat(eval): Locomo bot eval add check#1629

Merged
chenjw merged 2 commits intomainfrom
feature_eval_check
Apr 22, 2026
Merged

feat(eval): Locomo bot eval add check#1629
chenjw merged 2 commits intomainfrom
feature_eval_check

Conversation

@yeshion23333
Copy link
Copy Markdown
Collaborator

@yeshion23333 yeshion23333 commented Apr 22, 2026

Description

本 PR 聚焦 Vikingbot LoCoMo 评测前置校验与 account 一致性,避免“导入成功但评测查不到上下文”的问题。

  • 新增 preflight_eval_config.py:校验/修复 ov.conf 与 ovcli.conf 的 root key、account 相关配置;root key 首次配置后要求重启服务并退出。
  • 新增 preflight_eval_runtime.py:统一解析 ACCOUNT/OPENVIKING_URL,并在导入前校验 OpenViking 可用性与 account 存在(可交互自动创建)。
  • run_full_eval.sh 与 import_and_eval_one.sh 统一接入 preflight,消除重复逻辑,并通过 OPENVIKING_CONFIG_FILE 保证同一配置路径贯穿全流程。
  • import_to_ov.py 增加 --account 并透传到 SDK client。
  • 更新 benchmark/locomo/README.md(配置关系与排查指引)。

This PR improves Vikingbot LoCoMo evaluation reliability by enforcing preflight checks and account consistency.

  • Added preflight_eval_config.py to validate/fix root-key and account-related settings in ov.conf/ovcli.conf; if root key is newly initialized, it exits and asks for server restart.
  • Added preflight_eval_runtime.py to centralize ACCOUNT/OPENVIKING_URL resolution and perform pre-import OpenViking/account readiness checks (with optional interactive account auto-creation).
  • Updated both run_full_eval.sh and import_and_eval_one.sh to use shared preflight runtime logic and OPENVIKING_CONFIG_FILE for a single consistent config path.
  • Added --account support in import_to_ov.py and passed it into the SDK client.
  • Updated benchmark/locomo/README.md with config mapping and troubleshooting guidance.

Related Issue

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Screenshots (if applicable)

Additional Notes

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

No code suggestions found for the PR.

@chenjw chenjw merged commit ce42389 into main Apr 22, 2026
5 of 6 checks passed
@chenjw chenjw deleted the feature_eval_check branch April 22, 2026 03:14
@github-project-automation github-project-automation Bot moved this from Backlog to Done in OpenViking project Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants