Skip to content

Batch time-series endpoint and embedded run scores#508

Merged
nadheesh merged 4 commits intowso2:mainfrom
nadheesh:main
Mar 9, 2026
Merged

Batch time-series endpoint and embedded run scores#508
nadheesh merged 4 commits intowso2:mainfrom
nadheesh:main

Conversation

@nadheesh
Copy link
Copy Markdown
Contributor

@nadheesh nadheesh commented Mar 9, 2026

Summary

Test plan

  • Backend tests pass (make dev-test)
  • Lint passes (make lint)
  • Go build succeeds

Summary by CodeRabbit

  • New Features

    • Added pagination (limit, offset) and optional includeScores to monitor runs listing.
    • Monitor runs can include per-evaluator score summaries.
    • Scores time-series now accepts multiple evaluators and returns a batch time-series response.
    • Frontend updated to request/display batch time-series and optional run scores (hooks and UI components adapted).
  • Documentation

    • Public API spec updated with new query params and batch time-series/score schemas.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 9, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Adds optional includeScores to the monitor runs list to embed per-run evaluator summaries and introduces a batch time-series endpoint accepting multiple evaluators (comma-separated) that returns per-evaluator series in a single response. Frontend clients, spec, models, services, repos, tests, and wiring updated accordingly.

Changes

Cohort / File(s) Summary
Monitor runs + Scores
agent-manager-service/controllers/monitor_controller.go, agent-manager-service/services/monitor_manager.go, agent-manager-service/models/monitor.go, agent-manager-service/spec/model_monitor_run_response.go
Added includeScores query param; propagated to service signature; service loads evaluator summaries when true and attaches scores field to MonitorRunResponse.
Batch time-series API surface
agent-manager-service/controllers/monitor_scores_controller.go, agent-manager-service/spec/api_default.go, agent-manager-service/docs/api_v1_openapi.yaml
Replaced single evaluator param with evaluators (comma-separated); controller now parses multiple evaluators and returns BatchTimeSeriesResponse; OpenAPI and generated client surfaces updated.
Batch time-series models
agent-manager-service/models/score.go, agent-manager-service/spec/model_batch_time_series_evaluator_series.go, agent-manager-service/spec/model_batch_time_series_response.go
Removed single TimeSeriesResponse model; added BatchTimeSeriesEvaluatorSeries and BatchTimeSeriesResponse models and corresponding spec constructors/accessors/nullable wrappers.
Score repository (batch APIs)
agent-manager-service/repositories/evaluation_score_repository.go
Extended ScoreRepository with batch methods: GetEvaluatorsByMonitorAndRunIDs, GetEvaluatorsTraceAggregated, GetEvaluatorsTimeSeriesAggregated; added BatchTraceAggregation and BatchTimeBucketAggregation types and implementations.
Monitor scores service logic
agent-manager-service/services/monitor_scores_service.go, agent-manager-service/services/monitor_manager.go
Added GetEvaluatorsTimeSeries implementing probe + branch (sparse trace-level vs dense bucketed) for multiple evaluators; monitor manager holds scoreRepo and supports includeScores flow.
Wiring / DI
agent-manager-service/wiring/wire_gen.go
Updated NewMonitorManagerService signature to accept and forward scoreRepository; adjusted initialization call sites.
Converters / Utils
agent-manager-service/utils/makeresults.go
Added ConvertToBatchTimeSeriesResponse to map internal batch models to spec responses.
Tests (service/repo stubs)
agent-manager-service/tests/monitor_scores_test.go
Extended test stubs/configurableScoreRepo with new batch methods/fixtures and updated test expectations for batch aggregations and granularity handling.
Client libraries & hooks
console/workspaces/libs/api-client/src/apis/monitors.ts, console/workspaces/libs/api-client/src/hooks/monitors.ts, console/workspaces/libs/types/src/api/monitors.ts
Client and hooks updated to send evaluators (comma-separated / string[]) and return BatchTimeSeriesResponse; added includeScores to list runs params and type surfaces updated.
Frontend components
console/workspaces/pages/eval/src/subComponents/PerformanceByEvaluatorCard.tsx, console/workspaces/pages/eval/src/subComponents/RunSummaryCard.tsx
Performance card consumes batch evaluators array; RunSummaryCard requests includeScores=true and renders run scores from embedded scores instead of per-run API calls.

Sequence Diagram

sequenceDiagram
    participant Client as Frontend Client
    participant Controller as Monitor Scores Controller
    participant Service as Monitor Scores Service
    participant Repo as Score Repository
    participant DB as Database

    Client->>Controller: GET /monitors/{m}/scores/timeseries?evaluators=eval1,eval2&startTime&endTime
    Controller->>Service: GetEvaluatorsTimeSeries(monitorID, [eval1,eval2], startTime, endTime)

    Service->>Repo: Probe sparsity for evaluatorNames
    Repo->>DB: Query traces/spans for evaluatorNames
    DB-->>Repo: probe results (sparse/dense)
    Repo-->>Service: probe outcome

    alt Sparse path (trace-level)
        Service->>Repo: GetEvaluatorsTraceAggregated(monitorID, evaluatorNames, startTime, endTime, limit)
        Repo->>DB: Fetch trace-level aggregates per evaluator
        DB-->>Repo: trace aggregates
    else Dense path (bucketed)
        Service->>Repo: GetEvaluatorsTimeSeriesAggregated(monitorID, evaluatorNames, startTime, endTime, granularity)
        Repo->>DB: Fetch time-bucket aggregates per evaluator
        DB-->>Repo: bucket aggregates
    end

    Repo-->>Service: Batch aggregation results
    Service-->>Controller: BatchTimeSeriesResponse (preserve input evaluator order)
    Controller-->>Client: JSON { monitorName, granularity, evaluators: [{evaluatorName, points: [...]}] }
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I hopped through code with nimble paws,

Turned many calls into just one clause,
Evaluators now gather in a single run,
Runs carry scores so the UI’s work is done,
A little rabbit cheers: efficient and fun!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 65.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description provides essential information but lacks several template sections. It includes Purpose (resolves #471, #472), a basic test plan, but is missing Goals, Approach, User stories, Release note, Documentation, Training, Certification, Marketing, Detailed test coverage, Security checks, Samples, Related PRs, Migrations, Test environment, and Learning sections. Expand the description to include Goals section explaining the solutions, Approach with implementation details, and at minimum provide N/A with explanations for non-applicable sections like Documentation, Training, Certification, Marketing, Samples, and Learning.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Batch time-series endpoint and embedded run scores' accurately and concisely summarizes the main changes, covering both key objectives: the new batch endpoint and the embedded scores feature.
Linked Issues check ✅ Passed The PR implementation comprehensively addresses both linked issues: Issue #471 (batch time-series with evaluators parameter) and Issue #472 (includeScores parameter for ListMonitorRuns with embedded evaluator scores), including all required schema changes, repository methods, service updates, and frontend adjustments.
Out of Scope Changes check ✅ Passed All code changes are directly aligned with the two linked issues. Backend changes implement batch time-series and embedded scores, frontend changes adapt to the new API contracts, and test updates support the new functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
agent-manager-service/docs/api_v1_openapi.yaml (1)

5576-5628: ⚠️ Potential issue | 🟠 Major

Keep the legacy evaluator query param in the contract.

The linked objective says this route should add evaluators while keeping evaluator for backward compatibility. Right now the spec removes evaluator, and agent-manager-service/controllers/monitor_scores_controller.go:197-230 hard-fails when evaluators is absent. Existing clients that still send evaluator=... will now get a 400 instead of continuing to work.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent-manager-service/docs/api_v1_openapi.yaml` around lines 5576 - 5628, The
OpenAPI contract removed the legacy single "evaluator" query param causing older
clients to break; update the GET
/orgs/{orgName}/projects/{projName}/agents/{agentName}/monitors/{monitorName}/scores/timeseries
(operationId getMonitorScoresTimeSeries) to include a non-required query
parameter named "evaluator" (type: string, in: query) alongside the new
"evaluators" param and note in its description that it is legacy and equivalent
to a single-item evaluators list; this will align the spec with
agent-manager-service/controllers/monitor_scores_controller.go (which currently
accepts/depends on either) and preserve backward compatibility.
console/workspaces/pages/eval/src/subComponents/PerformanceByEvaluatorCard.tsx (1)

112-119: ⚠️ Potential issue | 🟡 Minor

Format the x-axis from the returned granularity.

The batch endpoint can now return minute/hour/day/week, but the chart still always formats buckets with HH:mm. For day and week series that renders misleading 00:00 labels and makes the trend harder to read.

Suggested fix
-      const label = date.toLocaleString(undefined, {
-        month: "short",
-        day: "numeric",
-        hour: "2-digit",
-        minute: "2-digit",
-        hour12: false,
-      });
+      const label = formatBucketLabel(date, timeSeriesByEvaluator.granularity);
const formatBucketLabel = (date: Date, granularity?: string) => {
  switch (granularity) {
    case "day":
    case "week":
      return date.toLocaleDateString(undefined, {
        month: "short",
        day: "numeric",
      });
    default:
      return date.toLocaleString(undefined, {
        month: "short",
        day: "numeric",
        hour: "2-digit",
        minute: "2-digit",
        hour12: false,
      });
  }
};

Also applies to: 128-136

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@console/workspaces/pages/eval/src/subComponents/PerformanceByEvaluatorCard.tsx`
around lines 112 - 119, The x-axis is always formatted as "HH:mm"; introduce a
formatter that uses the returned granularity and apply it when building series
and rendering ticks: add a formatBucketLabel(date, granularity?) function (use
the suggested switch for "day"/"week" vs default) and use
timeSeriesByEvaluator.granularity (or the returned granularity field) when
formatting each point in evaluators.forEach (where seriesMap[evaluatorName] is
created) and when supplying the tick label formatter for the chart (the code
block around lines 128-136 that renders X-axis ticks). Ensure you replace the
hardcoded HH:mm formatting with calls to formatBucketLabel so day/week show "Mon
1" style and minute/hour use date+time.
agent-manager-service/spec/api_default.go (1)

4769-4786: ⚠️ Potential issue | 🟠 Major

Keep the legacy single-evaluator path in the generated client.

Line 4848 now makes evaluators mandatory, and the old Evaluator(...) builder is gone. That breaks the compatibility goal from #471 for existing Go callers that still send the single evaluator query param. Please model both params in the OpenAPI spec, keep them optional, and regenerate so old callers keep compiling while new callers can batch.

Also applies to: 4848-4854

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent-manager-service/spec/api_default.go` around lines 4769 - 4786, The
generated client removed the legacy single-evaluator builder and made evaluators
mandatory; update the OpenAPI spec to define both query params "evaluator"
(single) and "evaluators" (comma-separated) as optional, then regenerate the
client so ApiGetMonitorScoresTimeSeriesRequest contains both builder methods
(Evaluator(evaluator string) and Evaluators(evaluators string)) and neither
field is required; ensure the generated getters/setters for Evaluator and
Evaluators exist and accept pointer/optional values so existing callers using
Evaluator(...) keep compiling while new callers can use Evaluators(...).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@agent-manager-service/controllers/monitor_scores_controller.go`:
- Around line 209-218: The handler currently rejects requests that use the
legacy "evaluator" query param; update the parsing logic so if
r.URL.Query().Get("evaluators") is empty it falls back to
r.URL.Query().Get("evaluator") and passes that value into parseEvaluatorsList
(or constructs a single-item slice) so evaluatorNames contains at least one
name; specifically modify the block that reads evaluatorsParam and builds
evaluatorNames (referencing parseEvaluatorsList and the evaluatorNames variable)
to accept either "evaluators" (CSV) or the single legacy "evaluator" parameter
before returning the 400.

In `@agent-manager-service/docs/api_v1_openapi.yaml`:
- Around line 5279-5292: Update the OpenAPI parameter docs for pagination to
match the server's enforcement: in the getAgentTraceScores operation (and other
scores/traces endpoints), change the limit query parameter to include "minimum:
1" and "maximum: 100" and set its default to 100 (instead of 20), and add
"minimum: 0" to the offset parameter while keeping default 0; this aligns the
YAML parameter definitions with the checks implemented in monitor_controller.go
(the limit>0, cap to 100, and non-negative offset logic).

In `@agent-manager-service/utils/makeresults.go`:
- Around line 739-752: The code currently only sets response.Scores when
len(run.Scores) > 0, which drops non-nil empty slices; change the conditional to
check for a non-nil slice (if run.Scores != nil) so requested-but-empty score
lists are preserved, then create scores := make([]spec.EvaluatorScoreSummary,
len(run.Scores)) and populate it (loop will be skipped for len==0) and assign
response.Scores = scores; update references to run.Scores and response.Scores in
makeresults.go accordingly.

---

Outside diff comments:
In `@agent-manager-service/docs/api_v1_openapi.yaml`:
- Around line 5576-5628: The OpenAPI contract removed the legacy single
"evaluator" query param causing older clients to break; update the GET
/orgs/{orgName}/projects/{projName}/agents/{agentName}/monitors/{monitorName}/scores/timeseries
(operationId getMonitorScoresTimeSeries) to include a non-required query
parameter named "evaluator" (type: string, in: query) alongside the new
"evaluators" param and note in its description that it is legacy and equivalent
to a single-item evaluators list; this will align the spec with
agent-manager-service/controllers/monitor_scores_controller.go (which currently
accepts/depends on either) and preserve backward compatibility.

In `@agent-manager-service/spec/api_default.go`:
- Around line 4769-4786: The generated client removed the legacy
single-evaluator builder and made evaluators mandatory; update the OpenAPI spec
to define both query params "evaluator" (single) and "evaluators"
(comma-separated) as optional, then regenerate the client so
ApiGetMonitorScoresTimeSeriesRequest contains both builder methods
(Evaluator(evaluator string) and Evaluators(evaluators string)) and neither
field is required; ensure the generated getters/setters for Evaluator and
Evaluators exist and accept pointer/optional values so existing callers using
Evaluator(...) keep compiling while new callers can use Evaluators(...).

In
`@console/workspaces/pages/eval/src/subComponents/PerformanceByEvaluatorCard.tsx`:
- Around line 112-119: The x-axis is always formatted as "HH:mm"; introduce a
formatter that uses the returned granularity and apply it when building series
and rendering ticks: add a formatBucketLabel(date, granularity?) function (use
the suggested switch for "day"/"week" vs default) and use
timeSeriesByEvaluator.granularity (or the returned granularity field) when
formatting each point in evaluators.forEach (where seriesMap[evaluatorName] is
created) and when supplying the tick label formatter for the chart (the code
block around lines 128-136 that renders X-axis ticks). Ensure you replace the
hardcoded HH:mm formatting with calls to formatBucketLabel so day/week show "Mon
1" style and minute/hour use date+time.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5cb4551e-eecb-4df5-825b-8e222d040a5f

📥 Commits

Reviewing files that changed from the base of the PR and between 929031a and dbfd1cd.

⛔ Files ignored due to path filters (1)
  • console/common/config/rush/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (20)
  • agent-manager-service/controllers/monitor_controller.go
  • agent-manager-service/controllers/monitor_scores_controller.go
  • agent-manager-service/docs/api_v1_openapi.yaml
  • agent-manager-service/models/monitor.go
  • agent-manager-service/models/score.go
  • agent-manager-service/repositories/evaluation_score_repository.go
  • agent-manager-service/services/monitor_manager.go
  • agent-manager-service/services/monitor_scores_service.go
  • agent-manager-service/spec/api_default.go
  • agent-manager-service/spec/model_batch_time_series_evaluator_series.go
  • agent-manager-service/spec/model_batch_time_series_response.go
  • agent-manager-service/spec/model_monitor_run_response.go
  • agent-manager-service/tests/monitor_scores_test.go
  • agent-manager-service/utils/makeresults.go
  • agent-manager-service/wiring/wire_gen.go
  • console/workspaces/libs/api-client/src/apis/monitors.ts
  • console/workspaces/libs/api-client/src/hooks/monitors.ts
  • console/workspaces/libs/types/src/api/monitors.ts
  • console/workspaces/pages/eval/src/subComponents/PerformanceByEvaluatorCard.tsx
  • console/workspaces/pages/eval/src/subComponents/RunSummaryCard.tsx

Comment thread agent-manager-service/controllers/monitor_scores_controller.go
Comment thread agent-manager-service/docs/api_v1_openapi.yaml
Comment thread agent-manager-service/utils/makeresults.go Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
agent-manager-service/tests/monitor_scores_test.go (1)

304-395: Add one real multi-evaluator contract test.

The updated suite covers basic validation, but it still doesn't lock down the new CSV parsing rules or a successful multi-evaluator response. A couple of cases for evaluators=a, b ,a, ?evaluator=latency, and >50 unique names — plus one happy-path assertion on ordered evaluator series — would make regressions here much harder to miss.

Based on learnings In wso2/agent-manager, update all code paths handling GET /monitors/{monitorName}/scores/timeseries to accept the new evaluators query parameter (plural, comma-separated) and remove any usage or parsing of the old singular evaluator parameter.

Also applies to: 403-445

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent-manager-service/tests/monitor_scores_test.go` around lines 304 - 395,
Add a real multi-evaluator contract test and remove legacy singular parsing:
update the test suite around TestGetScoresTimeSeries_Validation and
TestGetScoresTimeSeries_ValidRanges (using newScoresHandler and the same base
path) to include cases for evaluators=a,b,a (duplicates), evaluators with >50
unique names (expect 400), and a request using evaluator=latency (singular) to
confirm it is rejected; also add one happy-path integration-style test that
sends evaluators=latency,throughput (comma-separated) and asserts the response
contains ordered evaluator time series for those names. Then change the server
code path that handles GET /monitors/{monitorName}/scores/timeseries (the
handler invoked by handler.ServeHTTP/newScoresHandler) to parse the plural
"evaluators" query param as a comma-separated list, enforce max-unique limit
(50), dedupe when appropriate, and remove any parsing/acceptance of the old
"evaluator" singular parameter so only the plural CSV form is supported.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@agent-manager-service/tests/monitor_scores_test.go`:
- Around line 304-395: Add a real multi-evaluator contract test and remove
legacy singular parsing: update the test suite around
TestGetScoresTimeSeries_Validation and TestGetScoresTimeSeries_ValidRanges
(using newScoresHandler and the same base path) to include cases for
evaluators=a,b,a (duplicates), evaluators with >50 unique names (expect 400),
and a request using evaluator=latency (singular) to confirm it is rejected; also
add one happy-path integration-style test that sends
evaluators=latency,throughput (comma-separated) and asserts the response
contains ordered evaluator time series for those names. Then change the server
code path that handles GET /monitors/{monitorName}/scores/timeseries (the
handler invoked by handler.ServeHTTP/newScoresHandler) to parse the plural
"evaluators" query param as a comma-separated list, enforce max-unique limit
(50), dedupe when appropriate, and remove any parsing/acceptance of the old
"evaluator" singular parameter so only the plural CSV form is supported.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 47b35a72-d201-4f66-8b98-3b238237b676

📥 Commits

Reviewing files that changed from the base of the PR and between dbfd1cd and 71d5d3e.

📒 Files selected for processing (5)
  • agent-manager-service/controllers/monitor_scores_controller.go
  • agent-manager-service/models/score.go
  • agent-manager-service/services/monitor_scores_service.go
  • agent-manager-service/tests/monitor_scores_test.go
  • console/workspaces/pages/eval/src/subComponents/RunSummaryCard.tsx

@nadheesh nadheesh merged commit 4c1c134 into wso2:main Mar 9, 2026
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Embed run scores in ListMonitorRuns response Batch time-series endpoint for multiple evaluators

2 participants