Skip to content

SCAL-308281 Add tool and upstream health metrics#138

Open
kanwarbajwa wants to merge 2 commits intomainfrom
SCAL-308281_ToolAndUpstreamHealth
Open

SCAL-308281 Add tool and upstream health metrics#138
kanwarbajwa wants to merge 2 commits intomainfrom
SCAL-308281_ToolAndUpstreamHealth

Conversation

@kanwarbajwa
Copy link
Copy Markdown
Contributor

Summary

Add MCP tool-level metrics and ThoughtSpot upstream health metrics with tenant-scoped Analytics Engine identity.

What Changed

  • wrap MCP tool handling in a per-tool recorder that records tool call counts and latency
  • attach trusted tenantId from ThoughtSpot session info to Analytics Engine events
  • instrument ThoughtSpot API calls for upstream count and latency metrics
  • track streaming upstream starts and parsed stream messages with a dedicated background recorder
  • add focused tests for recorder identity, tool flush behavior, and streaming metrics

Semantics

  • tool metrics stay low-cardinality through tool_name, api_surface, api_version, and outcome
  • tenant identity flows only through Analytics Engine eventIdentity, not metric labels
  • background metrics flush via waitUntil(...) when available and never block MCP responses

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive metrics recording framework for MCP tools and upstream ThoughtSpot API calls, facilitating the tracking of invocation counts, durations, and outcomes. Key changes include the addition of a tool-metrics utility, support for tenant-scoped event identities, and the integration of metrics collection within the base MCP server and ThoughtSpot service. The review feedback identifies several improvement opportunities, such as refining identity-setting logic to handle falsy values, enhancing observability by including user IDs, refactoring redundant metric recording code into finally blocks, and correcting a logic error in the streaming message counter.

Comment thread src/metrics/runtime/metrics-recorder.ts
Comment thread src/servers/mcp-server-base.ts
Comment thread src/servers/mcp-server-base.ts
Comment thread src/thoughtspot/thoughtspot-service.ts
Comment thread src/streaming-utils.ts
## Summary

Add MCP tool-level metrics and ThoughtSpot upstream health metrics with tenant-scoped
Analytics Engine identity.

## What Changed

- wrap MCP tool handling in a per-tool recorder that records tool call counts and latency
- attach trusted `tenantId` from ThoughtSpot session info to Analytics Engine events
- instrument ThoughtSpot API calls for upstream count and latency metrics
- track streaming upstream starts and parsed stream messages with a dedicated background recorder
- add focused tests for recorder identity, tool flush behavior, and streaming metrics

## Semantics

- tool metrics stay low-cardinality through `tool_name`, `api_surface`, `api_version`, and `outcome`
- tenant identity flows only through Analytics Engine `eventIdentity`, not metric labels
- background metrics flush via `waitUntil(...)` when available and never block MCP responses

## Validation

- `npm run lint`
- `npm test -- --coverage.enabled=false test/metrics/runtime/metrics-recorder.spec.ts test/metrics/runtime/request-metrics.spec.ts test/thoughtspot/thoughtspot-service.spec.ts test/servers/mcp-server.spec.ts`
- `npm test`
@kanwarbajwa kanwarbajwa force-pushed the SCAL-308281_ToolAndUpstreamHealth branch from 30f3c62 to 07140fc Compare May 6, 2026 08:24
@kanwarbajwa
Copy link
Copy Markdown
Contributor Author

@gemini-code-assist please review again

## Summary

Tighten upstream and API version metric semantics so tenant analytics answer
rollout and health questions correctly.

## What Changed

- count unusable streaming responses as upstream errors before stream processing starts
- add upstream_operation to stream-message metrics so future streamed calls stay distinct
- split request and tool version metrics into effective api_version and selector-oriented
  api_version_mode
- stamp unversioned token routes as following the latest served surface
- extend tests for streaming failure accounting and version-label behavior

## Semantics

- api_version=default answers which tenants are still on the legacy/v1 surface
- api_version_mode answers which tenants are pinned versus following latest
@kanwarbajwa kanwarbajwa force-pushed the SCAL-308281_ToolAndUpstreamHealth branch from 07140fc to 12e9bd6 Compare May 6, 2026 08:28
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a comprehensive metrics instrumentation framework to track tool invocations and upstream service calls. Key changes include adding an api_version_mode dimension to distinguish between pinned and latest API versions, enhancing the MetricsRecorder to support tenant and user identity, and integrating these metrics into the MCP server base and ThoughtSpot service. I have no feedback to provide on these changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant