Skip to content

fix(evaluators): update route api#3977

Merged
nina-kollman merged 5 commits intomainfrom
nk/eval_routes
Apr 12, 2026
Merged

fix(evaluators): update route api#3977
nina-kollman merged 5 commits intomainfrom
nk/eval_routes

Conversation

@nina-kollman
Copy link
Copy Markdown
Contributor

@nina-kollman nina-kollman commented Apr 12, 2026

  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly.

Summary by CodeRabbit

  • Breaking Changes
    • Evaluator execution calls now require an experiment identifier so runs are associated with a parent experiment.
    • Evaluator execution endpoints are now scoped under experiment/run/task paths.
    • Execution requests must include an evaluator identifier.
    • Task creation endpoint path changed to use the plural "tasks".

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 12, 2026

📝 Walkthrough

Walkthrough

Evaluator execution routing changed from evaluator-slug-scoped endpoints to experiment/run/task-scoped endpoints; ExecuteEvaluatorRequest now requires evaluator_slug; run_experiment_evaluator and trigger_experiment_evaluator accept and propagate experiment_slug; task creation endpoint path pluralized to /tasks.

Changes

Cohort / File(s) Summary
Evaluator Execution & API Calls
packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py
Renamed _execute_evaluator_request_execute_experiment_evaluator_request; executor signature expanded to accept experiment_slug, experiment_run_id, task_id; POST route changed from evaluator-scoped /v2/evaluators/slug/{evaluator_slug}/execute to experiment/run/task-scoped /v2/experiments/{experiment_slug}/runs/{experiment_run_id}/tasks/{task_id}; run_experiment_evaluator() and trigger_experiment_evaluator() now accept experiment_slug and pass it to the new executor.
Request Model
packages/traceloop-sdk/traceloop/sdk/evaluator/model.py
ExecuteEvaluatorRequest updated to include required field evaluator_slug: str.
Experiment Task Endpoint & Calls
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py
Calls to evaluator methods in run_single_row() updated to pass experiment_slug; task creation POST path changed from /experiments/{experiment_slug}/runs/{experiment_run_id}/task/experiments/{experiment_slug}/runs/{experiment_run_id}/tasks.
Guardrails Caller
packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py
Guardrails.execute_evaluator now supplies experiment_slug="guardrail" when invoking _evaluator.run_experiment_evaluator (uses same dummy IDs for task/run).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I hopped through code, made routes refine,
From slugs to runs, each task in line.
A tiny change, yet neat and spry,
Parameters dance and endpoints fly.
Happy hops—evaluations sigh! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'fix(evaluators): update route api' is vague and overly generic, using non-descriptive language that fails to convey the specific nature of the changes. Revise the title to be more specific and descriptive, such as 'fix(evaluators): update evaluator endpoint routes to use experiment context' or similar, to clearly indicate what API routes were changed and why.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch nk/eval_routes

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py (1)

102-113: ⚠️ Potential issue | 🔴 Critical

Required experiment_slug breaks an existing caller path.

This signature change is not fully propagated: packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (context snippet Line 195-213) still calls run_experiment_evaluator(...) without experiment_slug, which will raise a runtime TypeError.

Proposed follow-up fix (caller update)
-            result = await self._evaluator.run_experiment_evaluator(
+            result = await self._evaluator.run_experiment_evaluator(
                 evaluator_slug=slug,
+                experiment_slug="guardrail",
                 task_id="guardrail",
                 experiment_id="guardrail",
                 experiment_run_id="guardrail",
                 input=data,
                 timeout_in_sec=120,
                 evaluator_version=evaluator_version,
                 evaluator_config=evaluator_config,
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py` around lines 102
- 113, The new required parameter experiment_slug in run_experiment_evaluator
breaks existing callers (e.g., guardrails.py) — make the change non-breaking by
reverting experiment_slug to an optional parameter with a default (e.g., None)
in the run_experiment_evaluator signature or update all callers to pass the new
argument; specifically, modify the function definition of
run_experiment_evaluator to accept experiment_slug: Optional[str] = None (or
update the caller in guardrails.py to pass a valid experiment_slug when invoking
run_experiment_evaluator) and add any necessary None-handling inside
run_experiment_evaluator where experiment_slug is used.
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (1)

193-207: ⚠️ Potential issue | 🟠 Major

Please add regression tests for the route/signature migration.

These changes alter both evaluator invocation wiring and task endpoint paths. With the PR checklist item still unchecked, this should be covered by tests to prevent silent runtime/API regressions.

Also applies to: 441-442

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py` around lines
193 - 207, Add regression tests that exercise the migrated evaluator invocation
wiring and the task endpoint path/signature changes: write tests that call the
code path that uses _evaluator.trigger_experiment_evaluator (and the branch that
previously populated eval_results[evaluator_slug]) to ensure the evaluator
invocation is passed the correct evaluator_slug, experiment_slug,
evaluator_config, task_id and experiment_run_id; additionally add
integration-style tests hitting the task endpoint(s) that were renamed/moved to
verify request routing and payload signature still reach the new handler and
that the evaluator trigger is invoked with expected params. Use mocks/spies for
trigger_experiment_evaluator to assert call arguments and include both success
and error paths so changes to wiring or routes will fail tests if regressed.
🧹 Nitpick comments (1)
packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py (1)

105-105: Docstrings should include experiment_slug in Args.

Both public method signatures now require experiment_slug, but the parameter documentation wasn’t updated.

Also applies to: 152-152

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py` at line 105,
Update the docstrings for the public methods in this module that now accept the
parameter experiment_slug: add an "experiment_slug: str" entry to each method's
Args section describing what the slug represents and how it's used (match the
existing docstring style/format in evaluator.py); ensure both docstrings for the
two public methods whose signatures include experiment_slug are updated so the
Args block lists experiment_slug alongside the other parameters.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py`:
- Around line 102-113: The new required parameter experiment_slug in
run_experiment_evaluator breaks existing callers (e.g., guardrails.py) — make
the change non-breaking by reverting experiment_slug to an optional parameter
with a default (e.g., None) in the run_experiment_evaluator signature or update
all callers to pass the new argument; specifically, modify the function
definition of run_experiment_evaluator to accept experiment_slug: Optional[str]
= None (or update the caller in guardrails.py to pass a valid experiment_slug
when invoking run_experiment_evaluator) and add any necessary None-handling
inside run_experiment_evaluator where experiment_slug is used.

In `@packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py`:
- Around line 193-207: Add regression tests that exercise the migrated evaluator
invocation wiring and the task endpoint path/signature changes: write tests that
call the code path that uses _evaluator.trigger_experiment_evaluator (and the
branch that previously populated eval_results[evaluator_slug]) to ensure the
evaluator invocation is passed the correct evaluator_slug, experiment_slug,
evaluator_config, task_id and experiment_run_id; additionally add
integration-style tests hitting the task endpoint(s) that were renamed/moved to
verify request routing and payload signature still reach the new handler and
that the evaluator trigger is invoked with expected params. Use mocks/spies for
trigger_experiment_evaluator to assert call arguments and include both success
and error paths so changes to wiring or routes will fail tests if regressed.

---

Nitpick comments:
In `@packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py`:
- Line 105: Update the docstrings for the public methods in this module that now
accept the parameter experiment_slug: add an "experiment_slug: str" entry to
each method's Args section describing what the slug represents and how it's used
(match the existing docstring style/format in evaluator.py); ensure both
docstrings for the two public methods whose signatures include experiment_slug
are updated so the Args block lists experiment_slug alongside the other
parameters.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 069e810a-c03a-4763-8fe3-9bf822a5b0fd

📥 Commits

Reviewing files that changed from the base of the PR and between 786d49f and 950d369.

📒 Files selected for processing (2)
  • packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py
  • packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (1)

202-212: ⚠️ Potential issue | 🟠 Major

Add a regression test for the new experiment-scoped call contract.

Line 204 introduces a required routing parameter (experiment_slug) in this guardrails path. Please add a unit test that verifies Guardrails.execute_evaluator() forwards experiment_slug="guardrail" and the expected dummy IDs to Evaluator.run_experiment_evaluator(...), to prevent silent breakage on future API route changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py` around lines
202 - 212, Add a unit test that mocks the Evaluator.run_experiment_evaluator and
asserts Guardrails.execute_evaluator forwards the new experiment-scoped routing
parameters: ensure the test calls Guardrails.execute_evaluator (with suitable
dummy input `data`) and verifies the mock was called with
experiment_slug="guardrail", evaluator_slug equal to the passed `slug`, and the
expected dummy IDs for task_id, experiment_id, and experiment_run_id all set to
"guardrail", plus the evaluator_version, evaluator_config, timeout_in_sec=120
and input=data; use the Guardrails.execute_evaluator and
Evaluator.run_experiment_evaluator symbols to locate and instrument the call
(patch/magicmock) and add this test to the guardrails tests suite.
🧹 Nitpick comments (1)
packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (1)

202-207: Extract the repeated "guardrail" context ID into a constant.

Using one constant here reduces typo/divergence risk across experiment_slug, task_id, experiment_id, and experiment_run_id.

♻️ Proposed refactor
+GUARDRAIL_CONTEXT_ID = "guardrail"
...
         result = await self._evaluator.run_experiment_evaluator(
             evaluator_slug=slug,
-            experiment_slug="guardrail",
-            task_id="guardrail",
-            experiment_id="guardrail",
-            experiment_run_id="guardrail",
+            experiment_slug=GUARDRAIL_CONTEXT_ID,
+            task_id=GUARDRAIL_CONTEXT_ID,
+            experiment_id=GUARDRAIL_CONTEXT_ID,
+            experiment_run_id=GUARDRAIL_CONTEXT_ID,
             input=data,
             timeout_in_sec=120,
             evaluator_version=evaluator_version,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py` around lines
202 - 207, The four repeated literal "guardrail" context IDs passed into
self._evaluator.run_experiment_evaluator (experiment_slug, task_id,
experiment_id, experiment_run_id) should be consolidated into a single constant
(e.g., GUARDRAIL_CONTEXT = "guardrail") declared near the top of the module or
class; replace the four literal usages in the call to
_evaluator.run_experiment_evaluator with that constant to avoid duplication and
typos (update any other occurrences in guardrails.py that use the same literal
to use the constant as well).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py`:
- Around line 202-212: Add a unit test that mocks the
Evaluator.run_experiment_evaluator and asserts Guardrails.execute_evaluator
forwards the new experiment-scoped routing parameters: ensure the test calls
Guardrails.execute_evaluator (with suitable dummy input `data`) and verifies the
mock was called with experiment_slug="guardrail", evaluator_slug equal to the
passed `slug`, and the expected dummy IDs for task_id, experiment_id, and
experiment_run_id all set to "guardrail", plus the evaluator_version,
evaluator_config, timeout_in_sec=120 and input=data; use the
Guardrails.execute_evaluator and Evaluator.run_experiment_evaluator symbols to
locate and instrument the call (patch/magicmock) and add this test to the
guardrails tests suite.

---

Nitpick comments:
In `@packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py`:
- Around line 202-207: The four repeated literal "guardrail" context IDs passed
into self._evaluator.run_experiment_evaluator (experiment_slug, task_id,
experiment_id, experiment_run_id) should be consolidated into a single constant
(e.g., GUARDRAIL_CONTEXT = "guardrail") declared near the top of the module or
class; replace the four literal usages in the call to
_evaluator.run_experiment_evaluator with that constant to avoid duplication and
typos (update any other occurrences in guardrails.py that use the same literal
to use the constant as well).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b431977d-c1dd-4ba3-b59c-90b621aab12d

📥 Commits

Reviewing files that changed from the base of the PR and between 609a671 and dfa9b27.

📒 Files selected for processing (1)
  • packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py

@nina-kollman nina-kollman merged commit fb57594 into main Apr 12, 2026
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants