feat: 补充百炼记忆库 LoCoMo benchmark 评测脚本 by yangxinxin-7 · Pull Request #1664 · volcengine/OpenViking

yangxinxin-7 · 2026-04-23T07:37:07Z

Summary

Add benchmark/locomo/bailian_memory/ directory for evaluating LoCoMo long-term conversation memory dataset using Alibaba Cloud Bailian (ModelStudio) Memory
ingest.py: ingest LoCoMo conversations into Bailian memory library, with resume support and per-sample/session filtering
eval.py: run QA evaluation via SearchMemory retrieval + Qwen LLM, with concurrent threading and resume support
delete_user.py: clean up memory nodes for specified users
README.md: full setup guide, usage instructions, and notes on why user profile extraction is not recommended for the LoCoMo scenario

github-actions · 2026-04-23T07:38:30Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🏅 Score: 80
🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review Performance: Quadratic time in judge-only mode In `run_judge_only`, each `grade_one` thread writes the entire CSV file every time it grades a single row. This leads to O(km) time complexity where k is ungraded rows and m is total rows, which can be slow for large datasets. Consider batching writes or updating only the necessary rows. def grade_one(idx: int) -> None: row = rows[idx] label, reasoning = judge_answer( row.get("question", ""), row.get("answer", ""), row.get("response", ""), args.judge_base_url, judge_token, args.judge_model, ) row["result"] = label row["reasoning"] = reasoning with file_lock: tmp = args.output + ".tmp" with open(tmp, "w", encoding="utf-8", newline="") as f: writer = csv.DictWriter(f, fieldnames=fieldnames, extrasaction="ignore") writer.writeheader() writer.writerows(rows) os.replace(tmp, args.output) print(f" Graded {row.get('question_id', '?')}: {label}", file=sys.stderr) Documentation: Missing parameters* The README documents `--model` and `--top-k` parameters for `eval.py` that are not implemented in the current code, which will cause user confusion. # 指定模型和检索数量 python eval.py --model qwen-max --top-k 15 --threads 20 `</details> </td></tr> </table>`

github-actions · 2026-04-23T07:41:18Z

PR Code Suggestions ✨

No code suggestions found for the PR.

This reverts commit e706a8c.

feat: bailian

175c494

github-project-automation Bot added this to OpenViking project Apr 23, 2026

github-project-automation Bot moved this to Backlog in OpenViking project Apr 23, 2026

yangxinxin-7 requested a review from yeshion23333 April 23, 2026 07:38

github-actions Bot added the Review effort 3/5 label Apr 23, 2026

yeshion23333 approved these changes Apr 23, 2026

View reviewed changes

yeshion23333 merged commit e706a8c into volcengine:main Apr 23, 2026
5 of 6 checks passed

github-project-automation Bot moved this from Backlog to Done in OpenViking project Apr 23, 2026

yangxinxin-7 mentioned this pull request Apr 23, 2026

Revert "feat: 补充百炼记忆库 LoCoMo benchmark 评测脚本" #1665

Merged

yeshion23333 pushed a commit that referenced this pull request Apr 23, 2026

Revert "feat: bailian (#1664)" (#1665)

be0b375

This reverts commit e706a8c.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: 补充百炼记忆库 LoCoMo benchmark 评测脚本#1664

feat: 补充百炼记忆库 LoCoMo benchmark 评测脚本#1664
yeshion23333 merged 1 commit intovolcengine:mainfrom
yangxinxin-7:benchmark/bailian-memory

yangxinxin-7 commented Apr 23, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yangxinxin-7 commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

github-actions Bot commented Apr 23, 2026

PR Reviewer Guide 🔍

Uh oh!

github-actions Bot commented Apr 23, 2026

PR Code Suggestions ✨

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yangxinxin-7 commented Apr 23, 2026 •

edited

Loading