feat(metrics): add vikingbot feedback observability#2037
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨No code suggestions found for the PR. |
# Conflicts: # bot/vikingbot/agent/memory.py
|
发群里请相关同学review吧 |
yeshion23333
left a comment
There was a problem hiding this comment.
I found two blocking issues and two follow-up suggestions in the new feedback observability path. The most serious problem is that the default server metrics bootstrap now has a hard import-time dependency on the vikingbot package and its transitive runtime requirements.
| from typing import Any, ClassVar | ||
|
|
||
| from openviking.metrics.core.base import MetricCollector | ||
|
|
There was a problem hiding this comment.
[Bug] (blocking) This collector now imports vikingbot.config.loader at module import time, and openviking.metrics.bootstrap imports FeedbackCollector unconditionally. That means enabling the default server metrics path now requires the full vikingbot dependency chain to be importable, even in deployments that only use the server. I was able to reproduce this locally by importing create_default_collector_manager(), which failed in the bot dependency chain with ModuleNotFoundError. Before this PR, the default metrics bootstrap did not depend on bot-only runtime packages. Please decouple the collector from vikingbot.config.loader and inject the bot data path from server-side config (or at least defer the dependency so unsupported environments degrade gracefully instead of failing during import).
| return None | ||
|
|
||
| feedback_events = metadata.get("feedback_events", []) | ||
| response_outcomes = metadata.get("response_outcomes", {}) |
There was a problem hiding this comment.
[Design] (blocking) The denominator for the exported rates is currently len(_collect_response_ids(...)), i.e. every persisted assistant message with a response_id. The numerators, however, only come from the new feedback_events / response_outcomes contract in metadata. In mixed historical data, this will systematically understate coverage and resolution because old responses that were never part of the new observability contract are still counted in responses_total. For example, if an older session has 100 assistant responses but only the newest 10 are tracked in response_outcomes, a dashboard reader will see those rates as if all 100 responses were eligible. Please either change the denominator to tracked responses or export a separate tracked-response total and document the current semantics explicitly.
| @@ -80,6 +81,7 @@ def create_default_collector_manager(*, app=None, service=None) -> CollectorMana | |||
| manager = CollectorManager() | |||
| manager.register(QueueCollector(data_source=QueuePipelineStateDataSource())) | |||
| manager.register(TaskTrackerCollector(data_source=TaskStateDataSource())) | |||
There was a problem hiding this comment.
[Suggestion] (non-blocking) Registering FeedbackCollector() with no explicit constructor inputs makes the bootstrap path rely on hidden defaults inside the collector. That is what forces the current test to monkeypatch load_config, and it also makes deployment configuration harder to reason about. Consider passing the bot data path (or a dedicated feedback metrics config object) from the bootstrap layer so the dependency stays explicit and testable.
| finally: | ||
| shutdown_metrics(app=app) | ||
|
|
||
| async def test_metrics_endpoint_exports_feedback_metrics(self, monkeypatch, tmp_path): |
There was a problem hiding this comment.
[Suggestion] (non-blocking) This endpoint test only writes the metadata record and does not include any persisted assistant message rows. That still proves the endpoint can expose feedback metrics, but it does not exercise the real denominator path used by responses_total, feedback_coverage, and the derived rates, because those currently come from scanning assistant messages in the JSONL body. Adding at least one assistant record here would make this test much better at catching regressions in the session-file contract and the rate semantics.
Description
Add VikingBot feedback observability to OpenViking metrics by exporting scrape-time feedback and outcome aggregates from persisted bot sessions. This also documents the new metric families and provides Grafana/Prometheus examples plus validation guides.
Related Issue
Type of Change
Changes Made
FeedbackCollectorand bot-side feedback aggregation logic to exportopenviking_feedback_*andopenviking_feedback_channel_*gauges from persisted VikingBot session data.Testing
Checklist
Screenshots (if applicable)
N/A
Additional Notes
bot/sessions/*.jsonlon each scrape.valid="1"indicates a fresh successful snapshot, whilevalid="0"indicates fallback to the last successful snapshot after collector refresh failure.compare_json.py,log,ov_conf/, andscripts/.