fix(evaluators): update route api by nina-kollman · Pull Request #3977 · traceloop/openllmetry

nina-kollman · 2026-04-12T06:58:35Z

I have added tests that cover my changes.
If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
(If applicable) I have updated the documentation accordingly.

Summary by CodeRabbit

Breaking Changes
- Evaluator execution calls now require an experiment identifier so runs are associated with a parent experiment.
- Evaluator execution endpoints are now scoped under experiment/run/task paths.
- Execution requests must include an evaluator identifier.
- Task creation endpoint path changed to use the plural "tasks".

coderabbitai · 2026-04-12T06:58:51Z

📝 Walkthrough

Walkthrough

Evaluator execution routing changed from evaluator-slug-scoped endpoints to experiment/run/task-scoped endpoints; ExecuteEvaluatorRequest now requires evaluator_slug; run_experiment_evaluator and trigger_experiment_evaluator accept and propagate experiment_slug; task creation endpoint path pluralized to /tasks.

Changes

Cohort / File(s)	Summary
Evaluator Execution & API Calls `packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py`	Renamed `_execute_evaluator_request` → `_execute_experiment_evaluator_request`; executor signature expanded to accept `experiment_slug`, `experiment_run_id`, `task_id`; POST route changed from evaluator-scoped `/v2/evaluators/slug/{evaluator_slug}/execute` to experiment/run/task-scoped `/v2/experiments/{experiment_slug}/runs/{experiment_run_id}/tasks/{task_id}`; `run_experiment_evaluator()` and `trigger_experiment_evaluator()` now accept `experiment_slug` and pass it to the new executor.
Request Model `packages/traceloop-sdk/traceloop/sdk/evaluator/model.py`	`ExecuteEvaluatorRequest` updated to include required field `evaluator_slug: str`.
Experiment Task Endpoint & Calls `packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py`	Calls to evaluator methods in `run_single_row()` updated to pass `experiment_slug`; task creation POST path changed from `/experiments/{experiment_slug}/runs/{experiment_run_id}/task` → `/experiments/{experiment_slug}/runs/{experiment_run_id}/tasks`.
Guardrails Caller `packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py`	`Guardrails.execute_evaluator` now supplies `experiment_slug="guardrail"` when invoking `_evaluator.run_experiment_evaluator` (uses same dummy IDs for task/run).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I hopped through code, made routes refine,
From slugs to runs, each task in line.
A tiny change, yet neat and spry,
Parameters dance and endpoints fly.
Happy hops—evaluations sigh! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'fix(evaluators): update route api' is vague and overly generic, using non-descriptive language that fails to convey the specific nature of the changes.	Revise the title to be more specific and descriptive, such as 'fix(evaluators): update evaluator endpoint routes to use experiment context' or similar, to clearly indicate what API routes were changed and why.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch nk/eval_routes

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py (1)

102-113: ⚠️ Potential issue | 🔴 Critical

Required experiment_slug breaks an existing caller path.

This signature change is not fully propagated: packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (context snippet Line 195-213) still calls run_experiment_evaluator(...) without experiment_slug, which will raise a runtime TypeError.

Proposed follow-up fix (caller update)

-            result = await self._evaluator.run_experiment_evaluator(
+            result = await self._evaluator.run_experiment_evaluator(
                 evaluator_slug=slug,
+                experiment_slug="guardrail",
                 task_id="guardrail",
                 experiment_id="guardrail",
                 experiment_run_id="guardrail",
                 input=data,
                 timeout_in_sec=120,
                 evaluator_version=evaluator_version,
                 evaluator_config=evaluator_config,
             )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py` around lines 102
- 113, The new required parameter experiment_slug in run_experiment_evaluator
breaks existing callers (e.g., guardrails.py) — make the change non-breaking by
reverting experiment_slug to an optional parameter with a default (e.g., None)
in the run_experiment_evaluator signature or update all callers to pass the new
argument; specifically, modify the function definition of
run_experiment_evaluator to accept experiment_slug: Optional[str] = None (or
update the caller in guardrails.py to pass a valid experiment_slug when invoking
run_experiment_evaluator) and add any necessary None-handling inside
run_experiment_evaluator where experiment_slug is used.

packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (1)

193-207: ⚠️ Potential issue | 🟠 Major

Please add regression tests for the route/signature migration.

These changes alter both evaluator invocation wiring and task endpoint paths. With the PR checklist item still unchecked, this should be covered by tests to prevent silent runtime/API regressions.

Also applies to: 441-442

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py` around lines
193 - 207, Add regression tests that exercise the migrated evaluator invocation
wiring and the task endpoint path/signature changes: write tests that call the
code path that uses _evaluator.trigger_experiment_evaluator (and the branch that
previously populated eval_results[evaluator_slug]) to ensure the evaluator
invocation is passed the correct evaluator_slug, experiment_slug,
evaluator_config, task_id and experiment_run_id; additionally add
integration-style tests hitting the task endpoint(s) that were renamed/moved to
verify request routing and payload signature still reach the new handler and
that the evaluator trigger is invoked with expected params. Use mocks/spies for
trigger_experiment_evaluator to assert call arguments and include both success
and error paths so changes to wiring or routes will fail tests if regressed.

🧹 Nitpick comments (1)

packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py (1)
105-105: Docstrings should include experiment_slug in Args.

Both public method signatures now require experiment_slug, but the parameter documentation wasn’t updated.

Also applies to: 152-152
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py` at line 105,
Update the docstrings for the public methods in this module that now accept the
parameter experiment_slug: add an "experiment_slug: str" entry to each method's
Args section describing what the slug represents and how it's used (match the
existing docstring style/format in evaluator.py); ensure both docstrings for the
two public methods whose signatures include experiment_slug are updated so the
Args block lists experiment_slug alongside the other parameters.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py`:
- Around line 102-113: The new required parameter experiment_slug in
run_experiment_evaluator breaks existing callers (e.g., guardrails.py) — make
the change non-breaking by reverting experiment_slug to an optional parameter
with a default (e.g., None) in the run_experiment_evaluator signature or update
all callers to pass the new argument; specifically, modify the function
definition of run_experiment_evaluator to accept experiment_slug: Optional[str]
= None (or update the caller in guardrails.py to pass a valid experiment_slug
when invoking run_experiment_evaluator) and add any necessary None-handling
inside run_experiment_evaluator where experiment_slug is used.

In `@packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py`:
- Around line 193-207: Add regression tests that exercise the migrated evaluator
invocation wiring and the task endpoint path/signature changes: write tests that
call the code path that uses _evaluator.trigger_experiment_evaluator (and the
branch that previously populated eval_results[evaluator_slug]) to ensure the
evaluator invocation is passed the correct evaluator_slug, experiment_slug,
evaluator_config, task_id and experiment_run_id; additionally add
integration-style tests hitting the task endpoint(s) that were renamed/moved to
verify request routing and payload signature still reach the new handler and
that the evaluator trigger is invoked with expected params. Use mocks/spies for
trigger_experiment_evaluator to assert call arguments and include both success
and error paths so changes to wiring or routes will fail tests if regressed.

---

Nitpick comments:
In `@packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py`:
- Line 105: Update the docstrings for the public methods in this module that now
accept the parameter experiment_slug: add an "experiment_slug: str" entry to
each method's Args section describing what the slug represents and how it's used
(match the existing docstring style/format in evaluator.py); ensure both
docstrings for the two public methods whose signatures include experiment_slug
are updated so the Args block lists experiment_slug alongside the other
parameters.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 069e810a-c03a-4763-8fe3-9bf822a5b0fd

📥 Commits

Reviewing files that changed from the base of the PR and between 786d49f and 950d369.

📒 Files selected for processing (2)

packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (1)
202-212: ⚠️ Potential issue | 🟠 Major

Add a regression test for the new experiment-scoped call contract.

Line 204 introduces a required routing parameter (experiment_slug) in this guardrails path. Please add a unit test that verifies Guardrails.execute_evaluator() forwards experiment_slug="guardrail" and the expected dummy IDs to Evaluator.run_experiment_evaluator(...), to prevent silent breakage on future API route changes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py` around lines
202 - 212, Add a unit test that mocks the Evaluator.run_experiment_evaluator and
asserts Guardrails.execute_evaluator forwards the new experiment-scoped routing
parameters: ensure the test calls Guardrails.execute_evaluator (with suitable
dummy input `data`) and verifies the mock was called with
experiment_slug="guardrail", evaluator_slug equal to the passed `slug`, and the
expected dummy IDs for task_id, experiment_id, and experiment_run_id all set to
"guardrail", plus the evaluator_version, evaluator_config, timeout_in_sec=120
and input=data; use the Guardrails.execute_evaluator and
Evaluator.run_experiment_evaluator symbols to locate and instrument the call
(patch/magicmock) and add this test to the guardrails tests suite.

🧹 Nitpick comments (1)

packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (1)

202-207: Extract the repeated "guardrail" context ID into a constant.

Using one constant here reduces typo/divergence risk across experiment_slug, task_id, experiment_id, and experiment_run_id.

♻️ Proposed refactor

+GUARDRAIL_CONTEXT_ID = "guardrail"
...
         result = await self._evaluator.run_experiment_evaluator(
             evaluator_slug=slug,
-            experiment_slug="guardrail",
-            task_id="guardrail",
-            experiment_id="guardrail",
-            experiment_run_id="guardrail",
+            experiment_slug=GUARDRAIL_CONTEXT_ID,
+            task_id=GUARDRAIL_CONTEXT_ID,
+            experiment_id=GUARDRAIL_CONTEXT_ID,
+            experiment_run_id=GUARDRAIL_CONTEXT_ID,
             input=data,
             timeout_in_sec=120,
             evaluator_version=evaluator_version,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py` around lines
202 - 207, The four repeated literal "guardrail" context IDs passed into
self._evaluator.run_experiment_evaluator (experiment_slug, task_id,
experiment_id, experiment_run_id) should be consolidated into a single constant
(e.g., GUARDRAIL_CONTEXT = "guardrail") declared near the top of the module or
class; replace the four literal usages in the call to
_evaluator.run_experiment_evaluator with that constant to avoid duplication and
typos (update any other occurrences in guardrails.py that use the same literal
to use the constant as well).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py`:
- Around line 202-212: Add a unit test that mocks the
Evaluator.run_experiment_evaluator and asserts Guardrails.execute_evaluator
forwards the new experiment-scoped routing parameters: ensure the test calls
Guardrails.execute_evaluator (with suitable dummy input `data`) and verifies the
mock was called with experiment_slug="guardrail", evaluator_slug equal to the
passed `slug`, and the expected dummy IDs for task_id, experiment_id, and
experiment_run_id all set to "guardrail", plus the evaluator_version,
evaluator_config, timeout_in_sec=120 and input=data; use the
Guardrails.execute_evaluator and Evaluator.run_experiment_evaluator symbols to
locate and instrument the call (patch/magicmock) and add this test to the
guardrails tests suite.

---

Nitpick comments:
In `@packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py`:
- Around line 202-207: The four repeated literal "guardrail" context IDs passed
into self._evaluator.run_experiment_evaluator (experiment_slug, task_id,
experiment_id, experiment_run_id) should be consolidated into a single constant
(e.g., GUARDRAIL_CONTEXT = "guardrail") declared near the top of the module or
class; replace the four literal usages in the call to
_evaluator.run_experiment_evaluator with that constant to avoid duplication and
typos (update any other occurrences in guardrails.py that use the same literal
to use the constant as well).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b431977d-c1dd-4ba3-b59c-90b621aab12d

📥 Commits

Reviewing files that changed from the base of the PR and between 609a671 and dfa9b27.

📒 Files selected for processing (1)

packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py

nina-kollman added 3 commits April 12, 2026 09:30

to exp request

9cbf5f8

exp run

438fb5e

change

950d369

coderabbitai bot reviewed Apr 12, 2026

View reviewed changes

nina-kollman added 2 commits April 12, 2026 10:19

added slug

609a671

fix lint

dfa9b27

coderabbitai bot reviewed Apr 12, 2026

View reviewed changes

OzBenSimhonTraceloop approved these changes Apr 12, 2026

View reviewed changes

nina-kollman merged commit fb57594 into main Apr 12, 2026
12 of 13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(evaluators): update route api#3977

fix(evaluators): update route api#3977
nina-kollman merged 5 commits intomainfrom
nk/eval_routes

nina-kollman commented Apr 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 12, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nina-kollman commented Apr 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nina-kollman commented Apr 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 12, 2026 •

edited

Loading