Skip to content

feat: add memory extract telemetry breakdown#735

Merged
qin-ctx merged 1 commit intomainfrom
memory_extract_telemetry
Mar 19, 2026
Merged

feat: add memory extract telemetry breakdown#735
qin-ctx merged 1 commit intomainfrom
memory_extract_telemetry

Conversation

@zhoujh01
Copy link
Collaborator

@zhoujh01 zhoujh01 commented Mar 18, 2026

Description

Related Issue

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

核心变更分 4 块:

  • Telemetry 能力增强
    • 给 memory extract 增加了更细的 telemetry breakdown
    • 给 resource 处理链路增加了 telemetry breakdown
    • telemetry summary 会省略值为 0 的字段,返回更干净
  • API / Console 行为更新
    • resources/temp_upload 现在支持返回 telemetry
    • console 上传资源时会把 upload 和 add_resource 两段 telemetry 一起展示
    • 相关后端入口在 openviking/server/routers/resources.py、前端在 openviking/console/static/
      app.js
  • 核心实现调整
    • 改了 openviking/telemetry/operation.py
    • 补充了 openviking/session/memory_extractor.py、openviking/service/resource_service.py、
      openviking/utils/resource_processor.py 的 telemetry 采集
    • 还处理了 openviking/session/compressor.py 的代码整理/冲突结果
    • 新增了用户向的双语 guide:
      • docs/en/guides/07-operation-telemetry.md
      • docs/zh/guides/07-operation-telemetry.md
    • 同步更新了 CONTRIBUTING.md、CONTRIBUTING_CN.md、CONTRIBUTING_JA.md
    • 新增/更新了 telemetry、resources、sessions 相关测试

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Screenshots (if applicable)

Additional Notes

@github-actions
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Early Return Telemetry

Verify that telemetry counters are correctly handled when the function exits early (e.g., no candidates, empty messages). Ensure that partial telemetry data is still useful and consistent.

if not messages:
    return []

context = {"messages": messages}
if not ctx:
    return []

self._pending_semantic_changes.clear()
telemetry = get_current_telemetry()
telemetry.set("memory.extract.candidates.total", 0)
telemetry.set("memory.extract.candidates.standard", 0)
telemetry.set("memory.extract.candidates.tool_skill", 0)
telemetry.set("memory.extract.created", 0)
telemetry.set("memory.extract.merged", 0)
telemetry.set("memory.extract.deleted", 0)
telemetry.set("memory.extract.skipped", 0)

with telemetry.measure("memory.extract.total"):
    try:
        if strict_extract_errors:
            # Intentionally let extraction errors bubble up so caller (task tracker)
            # can mark background commit tasks as failed with an explicit error.
            candidates = await self.extractor.extract_strict(context, user, session_id)
        else:
            candidates = await self.extractor.extract(context, user, session_id)

        if not candidates:
            return []
Gauge Usage for Durations

Confirm that using gauges to accumulate durations (instead of counters) is appropriate for the telemetry system's expectations and downstream consumers.

def add_duration(self, key: str, duration_ms: float) -> None:
    if not self.enabled:
        return
    gauge_key = key if key.endswith(".duration_ms") else f"{key}.duration_ms"
    try:
        normalized_duration = max(float(duration_ms), 0.0)
    except (TypeError, ValueError):
        normalized_duration = 0.0
    with self._lock:
        existing = self._gauges.get(gauge_key, 0.0)
        try:
            existing_value = float(existing)
        except (TypeError, ValueError):
            existing_value = 0.0
        self._gauges[gauge_key] = existing_value + normalized_duration

@contextmanager
def measure(self, key: str) -> Iterator[None]:
    if not self.enabled:
        yield
        return

    start = time.perf_counter()
    try:
        yield
    finally:
        self.add_duration(key, (time.perf_counter() - start) * 1000)

@github-actions
Copy link

PR Code Suggestions ✨

No code suggestions found for the PR.

Copy link
Contributor

@qin-ptr qin-ptr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR adds detailed telemetry breakdown for memory extract operations, including candidates statistics (total/standard/tool_skill), action results (created/merged/deleted/skipped), and 12 stage-wise timing breakdowns. The implementation uses measure() context manager for automatic timing and accumulation, with comprehensive documentation improvements.

Blocking Issues

  1. [Bug] PR Description - The PR description is empty (only template placeholders remain). Must add description explaining what was changed (adding memory extract telemetry breakdown) and why (for performance analysis and problem diagnosis).

  2. [Design] Telemetry initialization strategy inconsistent with documentation - see inline comment on compressor.py:239

  3. [Bug] Potential undefined property in JavaScript output - see inline comment on app.js:1740

Non-blocking Suggestions

  • Add test coverage for VLM unavailable scenario
  • Fix documentation format inconsistency ("范围" vs "备注")

Please address the blocking issues before merging. See inline comments for detailed explanations and suggested fixes.

@zhoujh01 zhoujh01 force-pushed the memory_extract_telemetry branch from 098b8b9 to 4b33da4 Compare March 18, 2026 09:12
@qin-ctx qin-ctx self-assigned this Mar 19, 2026
@zhoujh01 zhoujh01 force-pushed the memory_extract_telemetry branch from 4b33da4 to de4e8d5 Compare March 19, 2026 06:53
feat: add resource telemetry breakdown

telemetry: omit zero-valued summary fields

feat(resources): add temp upload telemetry support

docs: move telemetry guide out of design

docs: sync contributor build requirements
@zhoujh01 zhoujh01 force-pushed the memory_extract_telemetry branch from de4e8d5 to 7190994 Compare March 19, 2026 07:07
@qin-ctx qin-ctx merged commit f467de4 into main Mar 19, 2026
6 checks passed
@qin-ctx qin-ctx deleted the memory_extract_telemetry branch March 19, 2026 07:11
@github-project-automation github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 19, 2026
chethanuk added a commit to chethanuk/OpenViking that referenced this pull request Mar 19, 2026
- Add .pr_agent.toml with 15 repo-specific review rules derived from real
  bug history (PRs volcengine#505, volcengine#728, volcengine#749, volcengine#740/volcengine#745, volcengine#754, volcengine#735, volcengine#767)
- Rules structured as WHEN/THEN/BECAUSE for deterministic enforcement
- Add 8 custom labels (memory-pipeline, async-change, api-breaking, etc.)
- Add ignore patterns for lock files, third_party, build artifacts
- Enable score review, TODO scan, split-PR detection, security audit
- Configure improve tool with quality threshold and extended mode
- Configure describe tool with PR diagrams and semantic file types
- Update workflow: ark-code-latest model, checkout step for .pr_agent.toml,
  move all config from inline YAML to .pr_agent.toml (single source of truth)
qin-ctx pushed a commit that referenced this pull request Mar 19, 2026
…#780)

- Add .pr_agent.toml with 15 repo-specific review rules derived from real
  bug history (PRs #505, #728, #749, #740/#745, #754, #735, #767)
- Rules structured as WHEN/THEN/BECAUSE for deterministic enforcement
- Add 8 custom labels (memory-pipeline, async-change, api-breaking, etc.)
- Add ignore patterns for lock files, third_party, build artifacts
- Enable score review, TODO scan, split-PR detection, security audit
- Configure improve tool with quality threshold and extended mode
- Configure describe tool with PR diagrams and semantic file types
- Update workflow: ark-code-latest model, checkout step for .pr_agent.toml,
  move all config from inline YAML to .pr_agent.toml (single source of truth)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants