SCAL-308281 Add tool and upstream health metrics#138
SCAL-308281 Add tool and upstream health metrics#138kanwarbajwa wants to merge 2 commits intomainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive metrics recording framework for MCP tools and upstream ThoughtSpot API calls, facilitating the tracking of invocation counts, durations, and outcomes. Key changes include the addition of a tool-metrics utility, support for tenant-scoped event identities, and the integration of metrics collection within the base MCP server and ThoughtSpot service. The review feedback identifies several improvement opportunities, such as refining identity-setting logic to handle falsy values, enhancing observability by including user IDs, refactoring redundant metric recording code into finally blocks, and correcting a logic error in the streaming message counter.
## Summary Add MCP tool-level metrics and ThoughtSpot upstream health metrics with tenant-scoped Analytics Engine identity. ## What Changed - wrap MCP tool handling in a per-tool recorder that records tool call counts and latency - attach trusted `tenantId` from ThoughtSpot session info to Analytics Engine events - instrument ThoughtSpot API calls for upstream count and latency metrics - track streaming upstream starts and parsed stream messages with a dedicated background recorder - add focused tests for recorder identity, tool flush behavior, and streaming metrics ## Semantics - tool metrics stay low-cardinality through `tool_name`, `api_surface`, `api_version`, and `outcome` - tenant identity flows only through Analytics Engine `eventIdentity`, not metric labels - background metrics flush via `waitUntil(...)` when available and never block MCP responses ## Validation - `npm run lint` - `npm test -- --coverage.enabled=false test/metrics/runtime/metrics-recorder.spec.ts test/metrics/runtime/request-metrics.spec.ts test/thoughtspot/thoughtspot-service.spec.ts test/servers/mcp-server.spec.ts` - `npm test`
30f3c62 to
07140fc
Compare
|
@gemini-code-assist please review again |
## Summary Tighten upstream and API version metric semantics so tenant analytics answer rollout and health questions correctly. ## What Changed - count unusable streaming responses as upstream errors before stream processing starts - add upstream_operation to stream-message metrics so future streamed calls stay distinct - split request and tool version metrics into effective api_version and selector-oriented api_version_mode - stamp unversioned token routes as following the latest served surface - extend tests for streaming failure accounting and version-label behavior ## Semantics - api_version=default answers which tenants are still on the legacy/v1 surface - api_version_mode answers which tenants are pinned versus following latest
07140fc to
12e9bd6
Compare
There was a problem hiding this comment.
Code Review
This pull request implements a comprehensive metrics instrumentation framework to track tool invocations and upstream service calls. Key changes include adding an api_version_mode dimension to distinguish between pinned and latest API versions, enhancing the MetricsRecorder to support tenant and user identity, and integrating these metrics into the MCP server base and ThoughtSpot service. I have no feedback to provide on these changes.
Summary
Add MCP tool-level metrics and ThoughtSpot upstream health metrics with tenant-scoped Analytics Engine identity.
What Changed
tenantIdfrom ThoughtSpot session info to Analytics Engine eventsSemantics
tool_name,api_surface,api_version, andoutcomeeventIdentity, not metric labelswaitUntil(...)when available and never block MCP responses