Skip to content

Fix/language output#2164

Merged
chenjw merged 114 commits into
volcengine:mainfrom
fujiajie666:fix/language_output
May 21, 2026
Merged

Fix/language output#2164
chenjw merged 114 commits into
volcengine:mainfrom
fujiajie666:fix/language_output

Conversation

@fujiajie666
Copy link
Copy Markdown
Collaborator

Description

Fixed a bug where the generated memories did not match the language of the input messages to be extracted.

Related Issue

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

  • The primary language in user_messages is used as the language determination criterion.
  • Remove the bug where a single foreign language (besides Chinese and English) was used as a memorization language.

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Screenshots (if applicable)

Additional Notes

chenjw and others added 20 commits May 19, 2026 21:31
Expand _resolve_links so shared page ids resolve across every operation URI instead of collapsing to a single path. Align the page-id and extract-loop tests with the current API contract and the multi-URI link behavior.

🤖 Generated with [Aiden x Claude Code]

Co-Authored-By: Aiden
Expand the TAU-2 benchmark harness for trajectory-memory retrieval, corpus reuse, and fixed-first-user evidence runs so prompt and eval changes can be exercised together. Align trajectory extraction with reusable operation contracts and update usage-audit docs/runtime to match the new workflow.

🤖 Generated with [Aiden x Claude Code]

Co-Authored-By: Aiden
Apply the remaining formatter-driven cleanup in the memory modules so the working tree stays clean before the next behavior changes. This keeps helper signatures and string literals aligned with current lint output.

🤖 Generated with [Aiden x Claude Code]

Co-Authored-By: Aiden
@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🏅 Score: 85
🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Prefer user text over assistant text for language detection in SessionExtractContextProvider

Relevant files:

  • openviking/session/memory/session_extract_context_provider.py
  • tests/session/memory/test_memory_react_system_prompt.py

Sub-PR theme: Overhaul language detection logic with fallback, Latin script support, and system locale/timezone hints

Relevant files:

  • openviking/session/memory/utils/language.py
  • tests/storage/test_semantic_processor_language.py

Sub-PR theme: Update MemoryExtractor to use new language detection functions

Relevant files:

  • openviking/session/memory_extractor.py
  • tests/session/test_memory_extractor_language.py

⚡ Recommended focus areas for review

Use of private function from another module

MemoryExtractor._detect_output_language imports and uses _detect_language_from_text, a private function (starts with underscore) from openviking.session.memory.utils.language. This couples the two modules unnecessarily; consider exposing a public API for language detection instead of relying on private functions.

from openviking.session.memory.utils.language import _detect_language_from_text

user_text = "\n".join(
    str(getattr(m, "content", "") or "")
    for m in messages
    if getattr(m, "role", "") == "user" and getattr(m, "content", None)
)

if not user_text:
    return fallback

return _detect_language_from_text(user_text, fallback)

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

No code suggestions found for the PR.

@chenjw chenjw merged commit 3acc63c into volcengine:main May 21, 2026
4 of 5 checks passed
@github-project-automation github-project-automation Bot moved this from Backlog to Done in OpenViking project May 21, 2026
@fujiajie666 fujiajie666 deleted the fix/language_output branch May 22, 2026 03:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants