Batch time-series endpoint and embedded run scores by nadheesh · Pull Request #508 · wso2/agent-manager

nadheesh · 2026-03-09T06:20:49Z

Summary

Resolves Batch time-series endpoint for multiple evaluators #471: Replace per-evaluator time-series calls with a single batch endpoint (evaluators query param) that returns grouped data for all requested evaluators
Resolves Embed run scores in ListMonitorRuns response #472: Add optional includeScores=true param to ListMonitorRuns to embed evaluator score summaries in each run, eliminating the N+1 request waterfall

Test plan

Backend tests pass (make dev-test)
Lint passes (make lint)
Go build succeeds

Summary by CodeRabbit

New Features
- Added pagination (limit, offset) and optional includeScores to monitor runs listing.
- Monitor runs can include per-evaluator score summaries.
- Scores time-series now accepts multiple evaluators and returns a batch time-series response.
- Frontend updated to request/display batch time-series and optional run scores (hooks and UI components adapted).
Documentation
- Public API spec updated with new query params and batch time-series/score schemas.

coderabbitai · 2026-03-09T06:21:10Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Adds optional includeScores to the monitor runs list to embed per-run evaluator summaries and introduces a batch time-series endpoint accepting multiple evaluators (comma-separated) that returns per-evaluator series in a single response. Frontend clients, spec, models, services, repos, tests, and wiring updated accordingly.

Changes

Cohort / File(s)	Summary
Monitor runs + Scores `agent-manager-service/controllers/monitor_controller.go`, `agent-manager-service/services/monitor_manager.go`, `agent-manager-service/models/monitor.go`, `agent-manager-service/spec/model_monitor_run_response.go`	Added `includeScores` query param; propagated to service signature; service loads evaluator summaries when true and attaches `scores` field to MonitorRunResponse.
Batch time-series API surface `agent-manager-service/controllers/monitor_scores_controller.go`, `agent-manager-service/spec/api_default.go`, `agent-manager-service/docs/api_v1_openapi.yaml`	Replaced single `evaluator` param with `evaluators` (comma-separated); controller now parses multiple evaluators and returns BatchTimeSeriesResponse; OpenAPI and generated client surfaces updated.
Batch time-series models `agent-manager-service/models/score.go`, `agent-manager-service/spec/model_batch_time_series_evaluator_series.go`, `agent-manager-service/spec/model_batch_time_series_response.go`	Removed single TimeSeriesResponse model; added `BatchTimeSeriesEvaluatorSeries` and `BatchTimeSeriesResponse` models and corresponding spec constructors/accessors/nullable wrappers.
Score repository (batch APIs) `agent-manager-service/repositories/evaluation_score_repository.go`	Extended ScoreRepository with batch methods: `GetEvaluatorsByMonitorAndRunIDs`, `GetEvaluatorsTraceAggregated`, `GetEvaluatorsTimeSeriesAggregated`; added `BatchTraceAggregation` and `BatchTimeBucketAggregation` types and implementations.
Monitor scores service logic `agent-manager-service/services/monitor_scores_service.go`, `agent-manager-service/services/monitor_manager.go`	Added `GetEvaluatorsTimeSeries` implementing probe + branch (sparse trace-level vs dense bucketed) for multiple evaluators; monitor manager holds `scoreRepo` and supports includeScores flow.
Wiring / DI `agent-manager-service/wiring/wire_gen.go`	Updated `NewMonitorManagerService` signature to accept and forward `scoreRepository`; adjusted initialization call sites.
Converters / Utils `agent-manager-service/utils/makeresults.go`	Added `ConvertToBatchTimeSeriesResponse` to map internal batch models to spec responses.
Tests (service/repo stubs) `agent-manager-service/tests/monitor_scores_test.go`	Extended test stubs/configurableScoreRepo with new batch methods/fixtures and updated test expectations for batch aggregations and granularity handling.
Client libraries & hooks `console/workspaces/libs/api-client/src/apis/monitors.ts`, `console/workspaces/libs/api-client/src/hooks/monitors.ts`, `console/workspaces/libs/types/src/api/monitors.ts`	Client and hooks updated to send `evaluators` (comma-separated / string[]) and return `BatchTimeSeriesResponse`; added `includeScores` to list runs params and type surfaces updated.
Frontend components `console/workspaces/pages/eval/src/subComponents/PerformanceByEvaluatorCard.tsx`, `console/workspaces/pages/eval/src/subComponents/RunSummaryCard.tsx`	Performance card consumes batch evaluators array; RunSummaryCard requests `includeScores=true` and renders run scores from embedded `scores` instead of per-run API calls.

Sequence Diagram

sequenceDiagram
    participant Client as Frontend Client
    participant Controller as Monitor Scores Controller
    participant Service as Monitor Scores Service
    participant Repo as Score Repository
    participant DB as Database

    Client->>Controller: GET /monitors/{m}/scores/timeseries?evaluators=eval1,eval2&startTime&endTime
    Controller->>Service: GetEvaluatorsTimeSeries(monitorID, [eval1,eval2], startTime, endTime)

    Service->>Repo: Probe sparsity for evaluatorNames
    Repo->>DB: Query traces/spans for evaluatorNames
    DB-->>Repo: probe results (sparse/dense)
    Repo-->>Service: probe outcome

    alt Sparse path (trace-level)
        Service->>Repo: GetEvaluatorsTraceAggregated(monitorID, evaluatorNames, startTime, endTime, limit)
        Repo->>DB: Fetch trace-level aggregates per evaluator
        DB-->>Repo: trace aggregates
    else Dense path (bucketed)
        Service->>Repo: GetEvaluatorsTimeSeriesAggregated(monitorID, evaluatorNames, startTime, endTime, granularity)
        Repo->>DB: Fetch time-bucket aggregates per evaluator
        DB-->>Repo: bucket aggregates
    end

    Repo-->>Service: Batch aggregation results
    Service-->>Controller: BatchTimeSeriesResponse (preserve input evaluator order)
    Controller-->>Client: JSON { monitorName, granularity, evaluators: [{evaluatorName, points: [...]}] }

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I hopped through code with nimble paws,

Turned many calls into just one clause,
Evaluators now gather in a single run,
Runs carry scores so the UI’s work is done,
A little rabbit cheers: efficient and fun!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 65.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The PR description provides essential information but lacks several template sections. It includes Purpose (resolves `#471`, `#472`), a basic test plan, but is missing Goals, Approach, User stories, Release note, Documentation, Training, Certification, Marketing, Detailed test coverage, Security checks, Samples, Related PRs, Migrations, Test environment, and Learning sections.	Expand the description to include Goals section explaining the solutions, Approach with implementation details, and at minimum provide N/A with explanations for non-applicable sections like Documentation, Training, Certification, Marketing, Samples, and Learning.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Batch time-series endpoint and embedded run scores' accurately and concisely summarizes the main changes, covering both key objectives: the new batch endpoint and the embedded scores feature.
Linked Issues check	✅ Passed	The PR implementation comprehensively addresses both linked issues: Issue `#471` (batch time-series with evaluators parameter) and Issue `#472` (includeScores parameter for ListMonitorRuns with embedded evaluator scores), including all required schema changes, repository methods, service updates, and frontend adjustments.
Out of Scope Changes check	✅ Passed	All code changes are directly aligned with the two linked issues. Backend changes implement batch time-series and embedded scores, frontend changes adapt to the new API contracts, and test updates support the new functionality.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

agent-manager-service/docs/api_v1_openapi.yaml (1)

5576-5628: ⚠️ Potential issue | 🟠 Major

Keep the legacy evaluator query param in the contract.

The linked objective says this route should add evaluators while keeping evaluator for backward compatibility. Right now the spec removes evaluator, and agent-manager-service/controllers/monitor_scores_controller.go:197-230 hard-fails when evaluators is absent. Existing clients that still send evaluator=... will now get a 400 instead of continuing to work.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent-manager-service/docs/api_v1_openapi.yaml` around lines 5576 - 5628, The
OpenAPI contract removed the legacy single "evaluator" query param causing older
clients to break; update the GET
/orgs/{orgName}/projects/{projName}/agents/{agentName}/monitors/{monitorName}/scores/timeseries
(operationId getMonitorScoresTimeSeries) to include a non-required query
parameter named "evaluator" (type: string, in: query) alongside the new
"evaluators" param and note in its description that it is legacy and equivalent
to a single-item evaluators list; this will align the spec with
agent-manager-service/controllers/monitor_scores_controller.go (which currently
accepts/depends on either) and preserve backward compatibility.

console/workspaces/pages/eval/src/subComponents/PerformanceByEvaluatorCard.tsx (1)

112-119: ⚠️ Potential issue | 🟡 Minor

Format the x-axis from the returned granularity.

The batch endpoint can now return minute/hour/day/week, but the chart still always formats buckets with HH:mm. For day and week series that renders misleading 00:00 labels and makes the trend harder to read.

Suggested fix

-      const label = date.toLocaleString(undefined, {
-        month: "short",
-        day: "numeric",
-        hour: "2-digit",
-        minute: "2-digit",
-        hour12: false,
-      });
+      const label = formatBucketLabel(date, timeSeriesByEvaluator.granularity);

const formatBucketLabel = (date: Date, granularity?: string) => {
  switch (granularity) {
    case "day":
    case "week":
      return date.toLocaleDateString(undefined, {
        month: "short",
        day: "numeric",
      });
    default:
      return date.toLocaleString(undefined, {
        month: "short",
        day: "numeric",
        hour: "2-digit",
        minute: "2-digit",
        hour12: false,
      });
  }
};

Also applies to: 128-136

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@console/workspaces/pages/eval/src/subComponents/PerformanceByEvaluatorCard.tsx`
around lines 112 - 119, The x-axis is always formatted as "HH:mm"; introduce a
formatter that uses the returned granularity and apply it when building series
and rendering ticks: add a formatBucketLabel(date, granularity?) function (use
the suggested switch for "day"/"week" vs default) and use
timeSeriesByEvaluator.granularity (or the returned granularity field) when
formatting each point in evaluators.forEach (where seriesMap[evaluatorName] is
created) and when supplying the tick label formatter for the chart (the code
block around lines 128-136 that renders X-axis ticks). Ensure you replace the
hardcoded HH:mm formatting with calls to formatBucketLabel so day/week show "Mon
1" style and minute/hour use date+time.

agent-manager-service/spec/api_default.go (1)

4769-4786: ⚠️ Potential issue | 🟠 Major

Keep the legacy single-evaluator path in the generated client.

Line 4848 now makes evaluators mandatory, and the old Evaluator(...) builder is gone. That breaks the compatibility goal from #471 for existing Go callers that still send the single evaluator query param. Please model both params in the OpenAPI spec, keep them optional, and regenerate so old callers keep compiling while new callers can batch.

Also applies to: 4848-4854
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent-manager-service/spec/api_default.go` around lines 4769 - 4786, The
generated client removed the legacy single-evaluator builder and made evaluators
mandatory; update the OpenAPI spec to define both query params "evaluator"
(single) and "evaluators" (comma-separated) as optional, then regenerate the
client so ApiGetMonitorScoresTimeSeriesRequest contains both builder methods
(Evaluator(evaluator string) and Evaluators(evaluators string)) and neither
field is required; ensure the generated getters/setters for Evaluator and
Evaluators exist and accept pointer/optional values so existing callers using
Evaluator(...) keep compiling while new callers can use Evaluators(...).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@agent-manager-service/controllers/monitor_scores_controller.go`:
- Around line 209-218: The handler currently rejects requests that use the
legacy "evaluator" query param; update the parsing logic so if
r.URL.Query().Get("evaluators") is empty it falls back to
r.URL.Query().Get("evaluator") and passes that value into parseEvaluatorsList
(or constructs a single-item slice) so evaluatorNames contains at least one
name; specifically modify the block that reads evaluatorsParam and builds
evaluatorNames (referencing parseEvaluatorsList and the evaluatorNames variable)
to accept either "evaluators" (CSV) or the single legacy "evaluator" parameter
before returning the 400.

In `@agent-manager-service/docs/api_v1_openapi.yaml`:
- Around line 5279-5292: Update the OpenAPI parameter docs for pagination to
match the server's enforcement: in the getAgentTraceScores operation (and other
scores/traces endpoints), change the limit query parameter to include "minimum:
1" and "maximum: 100" and set its default to 100 (instead of 20), and add
"minimum: 0" to the offset parameter while keeping default 0; this aligns the
YAML parameter definitions with the checks implemented in monitor_controller.go
(the limit>0, cap to 100, and non-negative offset logic).

In `@agent-manager-service/utils/makeresults.go`:
- Around line 739-752: The code currently only sets response.Scores when
len(run.Scores) > 0, which drops non-nil empty slices; change the conditional to
check for a non-nil slice (if run.Scores != nil) so requested-but-empty score
lists are preserved, then create scores := make([]spec.EvaluatorScoreSummary,
len(run.Scores)) and populate it (loop will be skipped for len==0) and assign
response.Scores = scores; update references to run.Scores and response.Scores in
makeresults.go accordingly.

---

Outside diff comments:
In `@agent-manager-service/docs/api_v1_openapi.yaml`:
- Around line 5576-5628: The OpenAPI contract removed the legacy single
"evaluator" query param causing older clients to break; update the GET
/orgs/{orgName}/projects/{projName}/agents/{agentName}/monitors/{monitorName}/scores/timeseries
(operationId getMonitorScoresTimeSeries) to include a non-required query
parameter named "evaluator" (type: string, in: query) alongside the new
"evaluators" param and note in its description that it is legacy and equivalent
to a single-item evaluators list; this will align the spec with
agent-manager-service/controllers/monitor_scores_controller.go (which currently
accepts/depends on either) and preserve backward compatibility.

In `@agent-manager-service/spec/api_default.go`:
- Around line 4769-4786: The generated client removed the legacy
single-evaluator builder and made evaluators mandatory; update the OpenAPI spec
to define both query params "evaluator" (single) and "evaluators"
(comma-separated) as optional, then regenerate the client so
ApiGetMonitorScoresTimeSeriesRequest contains both builder methods
(Evaluator(evaluator string) and Evaluators(evaluators string)) and neither
field is required; ensure the generated getters/setters for Evaluator and
Evaluators exist and accept pointer/optional values so existing callers using
Evaluator(...) keep compiling while new callers can use Evaluators(...).

In
`@console/workspaces/pages/eval/src/subComponents/PerformanceByEvaluatorCard.tsx`:
- Around line 112-119: The x-axis is always formatted as "HH:mm"; introduce a
formatter that uses the returned granularity and apply it when building series
and rendering ticks: add a formatBucketLabel(date, granularity?) function (use
the suggested switch for "day"/"week" vs default) and use
timeSeriesByEvaluator.granularity (or the returned granularity field) when
formatting each point in evaluators.forEach (where seriesMap[evaluatorName] is
created) and when supplying the tick label formatter for the chart (the code
block around lines 128-136 that renders X-axis ticks). Ensure you replace the
hardcoded HH:mm formatting with calls to formatBucketLabel so day/week show "Mon
1" style and minute/hour use date+time.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5cb4551e-eecb-4df5-825b-8e222d040a5f

📥 Commits

Reviewing files that changed from the base of the PR and between 929031a and dbfd1cd.

⛔ Files ignored due to path filters (1)

console/common/config/rush/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (20)

agent-manager-service/controllers/monitor_controller.go
agent-manager-service/controllers/monitor_scores_controller.go
agent-manager-service/docs/api_v1_openapi.yaml
agent-manager-service/models/monitor.go
agent-manager-service/models/score.go
agent-manager-service/repositories/evaluation_score_repository.go
agent-manager-service/services/monitor_manager.go
agent-manager-service/services/monitor_scores_service.go
agent-manager-service/spec/api_default.go
agent-manager-service/spec/model_batch_time_series_evaluator_series.go
agent-manager-service/spec/model_batch_time_series_response.go
agent-manager-service/spec/model_monitor_run_response.go
agent-manager-service/tests/monitor_scores_test.go
agent-manager-service/utils/makeresults.go
agent-manager-service/wiring/wire_gen.go
console/workspaces/libs/api-client/src/apis/monitors.ts
console/workspaces/libs/api-client/src/hooks/monitors.ts
console/workspaces/libs/types/src/api/monitors.ts
console/workspaces/pages/eval/src/subComponents/PerformanceByEvaluatorCard.tsx
console/workspaces/pages/eval/src/subComponents/RunSummaryCard.tsx

coderabbitai

🧹 Nitpick comments (1)

agent-manager-service/tests/monitor_scores_test.go (1)
304-395: Add one real multi-evaluator contract test.

The updated suite covers basic validation, but it still doesn't lock down the new CSV parsing rules or a successful multi-evaluator response. A couple of cases for evaluators=a, b ,a, ?evaluator=latency, and >50 unique names — plus one happy-path assertion on ordered evaluator series — would make regressions here much harder to miss.

Based on learnings In wso2/agent-manager, update all code paths handling GET /monitors/{monitorName}/scores/timeseries to accept the new evaluators query parameter (plural, comma-separated) and remove any usage or parsing of the old singular evaluator parameter.

Also applies to: 403-445
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent-manager-service/tests/monitor_scores_test.go` around lines 304 - 395,
Add a real multi-evaluator contract test and remove legacy singular parsing:
update the test suite around TestGetScoresTimeSeries_Validation and
TestGetScoresTimeSeries_ValidRanges (using newScoresHandler and the same base
path) to include cases for evaluators=a,b,a (duplicates), evaluators with >50
unique names (expect 400), and a request using evaluator=latency (singular) to
confirm it is rejected; also add one happy-path integration-style test that
sends evaluators=latency,throughput (comma-separated) and asserts the response
contains ordered evaluator time series for those names. Then change the server
code path that handles GET /monitors/{monitorName}/scores/timeseries (the
handler invoked by handler.ServeHTTP/newScoresHandler) to parse the plural
"evaluators" query param as a comma-separated list, enforce max-unique limit
(50), dedupe when appropriate, and remove any parsing/acceptance of the old
"evaluator" singular parameter so only the plural CSV form is supported.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@agent-manager-service/tests/monitor_scores_test.go`:
- Around line 304-395: Add a real multi-evaluator contract test and remove
legacy singular parsing: update the test suite around
TestGetScoresTimeSeries_Validation and TestGetScoresTimeSeries_ValidRanges
(using newScoresHandler and the same base path) to include cases for
evaluators=a,b,a (duplicates), evaluators with >50 unique names (expect 400),
and a request using evaluator=latency (singular) to confirm it is rejected; also
add one happy-path integration-style test that sends
evaluators=latency,throughput (comma-separated) and asserts the response
contains ordered evaluator time series for those names. Then change the server
code path that handles GET /monitors/{monitorName}/scores/timeseries (the
handler invoked by handler.ServeHTTP/newScoresHandler) to parse the plural
"evaluators" query param as a comma-separated list, enforce max-unique limit
(50), dedupe when appropriate, and remove any parsing/acceptance of the old
"evaluator" singular parameter so only the plural CSV form is supported.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 47b35a72-d201-4f66-8b98-3b238237b676

📥 Commits

Reviewing files that changed from the base of the PR and between dbfd1cd and 71d5d3e.

📒 Files selected for processing (5)

agent-manager-service/controllers/monitor_scores_controller.go
agent-manager-service/models/score.go
agent-manager-service/services/monitor_scores_service.go
agent-manager-service/tests/monitor_scores_test.go
console/workspaces/pages/eval/src/subComponents/RunSummaryCard.tsx

nadheesh added 2 commits March 9, 2026 11:44

Batch time series endpoints for all evalutors

b5d5960

Embed run scores in ListMonitorRuns response

dbfd1cd

coderabbitai Bot reviewed Mar 9, 2026

View reviewed changes

Comment thread agent-manager-service/controllers/monitor_scores_controller.go

Comment thread agent-manager-service/docs/api_v1_openapi.yaml

Comment thread agent-manager-service/utils/makeresults.go Outdated

nadheesh added 2 commits March 9, 2026 12:11

Improve null/skipped score handling

71d5d3e

Address PR comments

b06a4f1

coderabbitai Bot reviewed Mar 9, 2026

View reviewed changes

hanzjk approved these changes Mar 9, 2026

View reviewed changes

nadheesh merged commit 4c1c134 into wso2:main Mar 9, 2026
8 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch time-series endpoint and embedded run scores#508

Batch time-series endpoint and embedded run scores#508
nadheesh merged 4 commits intowso2:mainfrom
nadheesh:main

nadheesh commented Mar 9, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 9, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nadheesh commented Mar 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nadheesh commented Mar 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 9, 2026 •

edited

Loading