Skip to content

fix(traceloop-sdk): Add csv and json support to experiment #864

Merged
nina-kollman merged 4 commits intomainfrom
nk/exp_result
Jan 18, 2026
Merged

fix(traceloop-sdk): Add csv and json support to experiment #864
nina-kollman merged 4 commits intomainfrom
nk/exp_result

Conversation

@nina-kollman
Copy link
Copy Markdown
Contributor

@nina-kollman nina-kollman commented Jan 15, 2026

Fixes: TLP-1233


Important

Add CSV and JSON export functionality to experiment results in Traceloop SDK with tests for validation and error handling.

  • New Features:
    • Add toCsvString() and toJsonString() methods in Experiment class for exporting results.
    • Automatically export results to experiment_results.csv and experiment_results.json in sample_experiment.ts.
  • Behavior:
    • Export methods use last run's experiment slug and run ID if not provided.
    • Throws error if experiment_slug or run_id is missing.
  • Tests:
    • Add experiment-export.test.ts to test CSV/JSON export, parameter validation, and error handling.

This description was created by Ellipsis for f4417c3. You can customize this summary. It will automatically update as commits are pushed.


Summary by CodeRabbit

  • New Features

    • Added CSV and JSON export of experiment results, with optional automatic runtime export to files after runs.
    • Exports accept explicit parameters or will default to the most recent run when parameters are omitted.
  • Tests

    • Added tests for CSV/JSON export, parameter validation, error handling, and last-run default behavior.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 15, 2026

📝 Walkthrough

Walkthrough

Adds CSV and JSON export methods to the Experiment client that resolve slug/runId (with last-used fallbacks); the sample app writes exported CSV/JSON files at runtime; tests exercising exports via Polly.js were added.

Changes

Cohort / File(s) Summary
Core export functionality
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts
Added private _lastExperimentSlug and _lastRunId; added resolveExportParams(...); added toCsvString(...) and toJsonString(...) which call /export/csv and /export/json, validate responses, and return strings.
Tests
packages/traceloop-sdk/test/experiment-export.test.ts
New Polly.js-based test suite (HTTP/Fetch adapters, filesystem persister) covering CSV/JSON exports with explicit params, missing-param error cases, and last-run fallback behavior.
Sample application
packages/sample-app/src/sample_experiment.ts
Added fs import and runtime calls to client.experiment.toCsvString() and toJsonString(); writes experiment_results.csv and experiment_results.json.

Sequence Diagram(s)

sequenceDiagram
  actor User
  participant SampleApp as Sample App
  participant Experiment as Experiment Client
  participant API as Traceloop API
  participant FS as Filesystem

  User->>SampleApp: trigger export flow
  SampleApp->>Experiment: call toCsvString()/toJsonString()
  Experiment->>Experiment: resolveExportParams(experimentSlug?, runId?)
  Experiment->>API: GET /export/csv or /export/json (slug, runId)
  API-->>Experiment: 200 + CSV/JSON payload
  Experiment-->>SampleApp: return string result
  SampleApp->>FS: write experiment_results.csv / experiment_results.json
  FS-->>SampleApp: write confirmation
  SampleApp-->>User: export complete
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I hopped through code to fetch the rows,
CSV and JSON where the data grows.
I kept the slug and chased the run,
Wrote files at dusk and danced at sun.
🥕✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding CSV and JSON export support to the experiment functionality in traceloop-sdk.
Linked Issues check ✅ Passed The PR fully implements the coding requirements from TLP-1233: adds toCsvString() and toJsonString() methods with parameter resolution, error handling, and comprehensive test coverage.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing CSV/JSON export functionality: experiment methods, sample app demonstration, and dedicated test suite; no unrelated modifications detected.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings


📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 77c0da9 and f4417c3.

📒 Files selected for processing (1)
  • packages/traceloop-sdk/test/experiment-export.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/traceloop-sdk/test/experiment-export.test.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and test

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to c14fb77 in 2 minutes and 6 seconds. Click for details.
  • Reviewed 309 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/sample-app/src/sample_experiment.ts:164
  • Draft comment:
    Consider adding error handling around the synchronous file writes. Using fs.writeFileSync can throw errors (e.g. due to permission issues); wrapping these calls in try-catch or using asynchronous alternatives (fs.promises.writeFile) would make the export more robust.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% Looking at the full context, the fs.writeFileSync calls on lines 168 and 171 are already inside a try-catch block that starts at line 115 and has a catch block at line 173. This catch block handles errors and logs them appropriately. The comment suggests adding error handling, but error handling is already present. The comment is technically incorrect because it claims error handling is needed when it already exists. The suggestion about using async alternatives is a code quality suggestion, but it's not actionable as a "must fix" issue - it's more of a preference, and the current code is already protected by the try-catch. Could the comment be referring to more granular error handling specifically for the file operations, separate from the broader experiment error handling? Perhaps the author wants specific error messages for file write failures rather than generic experiment operation errors? While more granular error handling could be beneficial, the comment states that the code "can throw errors" and needs wrapping in try-catch, which is already done. The comment doesn't specifically ask for more granular error messages. The existing error handling is sufficient and the comment is factually incorrect about the lack of error handling. The comment should be deleted because it incorrectly suggests that error handling is missing when the code is already wrapped in a try-catch block (lines 115-181). The existing error handling will catch any errors from fs.writeFileSync.
2. packages/traceloop-sdk/src/lib/client/experiment/experiment.ts:500
  • Draft comment:
    Storing the last experiment slug and run ID in instance variables (_lastExperimentSlug, _lastRunId) works for sequential runs but may lead to race conditions in concurrent scenarios. Consider documenting this behavior or refactoring for concurrency if needed.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment is speculative - it says "may lead to race conditions in concurrent scenarios" which is a hypothetical concern. The comment doesn't demonstrate that there IS a race condition problem, just that there COULD be one. The rules explicitly state "Do NOT make speculative comments, such as 'If X, then Y is an issue', Only comment if it is definitely an issue." This comment also suggests "Consider documenting this behavior or refactoring for concurrency if needed" which is vague and not actionable - it doesn't provide a clear code change. The comment is essentially asking the author to think about whether this could be a problem, rather than pointing out a definite issue. Without evidence that this class is actually used concurrently or that concurrent usage is a design requirement, this is purely speculative. Could there be evidence that this class is designed to be used concurrently? Perhaps the architecture requires thread-safety or concurrent experiment runs? Maybe the comment is valid if there's a clear use case where multiple experiments run simultaneously on the same instance. Even if concurrent usage were possible, the comment doesn't provide concrete evidence that it's a problem in practice. The comment uses "may lead to" language which is speculative. Without seeing actual concurrent usage patterns or requirements in the codebase, this remains a hypothetical concern. The rules are clear: only comment if it's definitely an issue, not if it might be one. This comment should be deleted because it's speculative ("may lead to race conditions") rather than pointing out a definite issue. It also provides vague suggestions ("Consider documenting" or "refactoring if needed") rather than actionable code changes.
3. packages/traceloop-sdk/test/experiment-export.test.ts:128
  • Draft comment:
    The tests mainly cover error scenarios when export parameters are missing. It would be beneficial to add tests that simulate successful CSV/JSON export responses to validate the correct functioning of toCsvString and toJsonString methods.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.

Workflow ID: wflow_HZOqBUlNC6IBySHU

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@packages/traceloop-sdk/test/experiment-export.test.ts`:
- Around line 85-104: Test currently swallows all errors in the try/catch which
can hide failures; change the handler around
client.experiment.toJsonString(experimentSlug, runId) to either let errors
bubble (remove the catch) so the test fails on real errors, or in the catch
inspect the error/message and call this.skip() for the known "no valid
recordings" condition and rethrow for anything else; specifically update the
anonymous test function (the "should export experiment results as JSON with
explicit parameters" it block) to use this.skip() when the error indicates no
recordings (or a 404) and otherwise rethrow the error so failures are not
silently ignored, referencing client.experiment.toJsonString, experimentSlug and
runId.
- Around line 65-83: The test "should export experiment results as CSV with
explicit parameters" currently swallows errors in its try-catch, so replace the
silent catch by either (A) wiring a deterministic fixture/Polly recording and
asserting the happy-path result from
client.experiment.toCsvString(experimentSlug, runId) (remove the try-catch and
assert csvData truthiness/length/type), or (B) mark the test pending/skipped
(e.g., using it.skip or describe.skip) until fixtures are available; ensure you
reference the same test name and the call to client.experiment.toCsvString along
with the experimentSlug/runId variables when implementing the chosen fix.
🧹 Nitpick comments (2)
packages/sample-app/src/sample_experiment.ts (1)

163-172: Consider adding success logging after file writes.

The export functionality is correctly implemented within the existing error handling block. For better demonstration and debugging visibility, consider adding success messages after the file writes.

💡 Suggested improvement
     // Export to CSV and JSON using last run's experiment slug and run ID
     const csvData = await client.experiment.toCsvString();
     fs.writeFileSync("experiment_results.csv", csvData);
+    console.log("   ✅ Exported experiment_results.csv");

     const jsonData = await client.experiment.toJsonString();
     fs.writeFileSync("experiment_results.json", jsonData);
+    console.log("   ✅ Exported experiment_results.json");
packages/traceloop-sdk/test/experiment-export.test.ts (1)

128-143: Test name is misleading - rename to match actual behavior.

The test is named "should use last run values when not provided" but actually tests error handling when no previous run exists. Consider renaming to accurately reflect what's being tested.

💡 Suggested rename
-    it("should use last run values when not provided", async function () {
-      // This test would require running an actual experiment first
-      // For now, we'll just verify the error handling
+    it("should throw error when no previous run exists and parameters are omitted", async function () {
       try {
         await client.experiment.toCsvString();
         assert.fail("Should have thrown an error");
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1d1f642 and c14fb77.

📒 Files selected for processing (3)
  • packages/sample-app/src/sample_experiment.ts
  • packages/traceloop-sdk/src/lib/client/experiment/experiment.ts
  • packages/traceloop-sdk/test/experiment-export.test.ts
🧰 Additional context used
📓 Path-based instructions (2)
packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Import AI/LLM semantic attribute constants from @traceloop/ai-semantic-conventions rather than hardcoding strings

Files:

  • packages/traceloop-sdk/src/lib/client/experiment/experiment.ts
  • packages/traceloop-sdk/test/experiment-export.test.ts
packages/traceloop-sdk/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

packages/traceloop-sdk/**/*.{ts,tsx}: Use the provided decorators (@workflow, @task, @agent) for workflow/task/agent spans instead of re-implementing them
For manual LLM operations, use trace.withLLMSpan from @traceloop/node-server-sdk

Files:

  • packages/traceloop-sdk/src/lib/client/experiment/experiment.ts
  • packages/traceloop-sdk/test/experiment-export.test.ts
🧠 Learnings (5)
📓 Common learnings
Learnt from: nina-kollman
Repo: traceloop/openllmetry-js PR: 672
File: packages/traceloop-sdk/src/lib/client/experiment/experiment.ts:212-231
Timestamp: 2025-08-22T12:44:43.893Z
Learning: In the Traceloop SDK experiment system, JSONL datasets use a custom format where the first line contains column definitions and subsequent lines contain the actual data rows. The parseJsonlToRows method in packages/traceloop-sdk/src/lib/client/experiment/experiment.ts intentionally skips the first line (index 0) to avoid parsing the column definition as data.
Learnt from: galzilber
Repo: traceloop/openllmetry-js PR: 643
File: packages/traceloop-sdk/test/datasets-final.test.ts:129-143
Timestamp: 2025-08-12T13:57:42.538Z
Learning: The Traceloop SDK includes a response transformer utility (transformApiResponse) that automatically converts snake_case API responses to camelCase for SDK consumers. This means interface definitions and tests should use camelCase field names (like datasetId, datasetSlug, createdAt, updatedAt) rather than the raw API's snake_case format.
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx} : Import AI/LLM semantic attribute constants from traceloop/ai-semantic-conventions rather than hardcoding strings

Applied to files:

  • packages/sample-app/src/sample_experiment.ts
  • packages/traceloop-sdk/test/experiment-export.test.ts
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/traceloop-sdk/**/*.{ts,tsx} : For manual LLM operations, use trace.withLLMSpan from traceloop/node-server-sdk

Applied to files:

  • packages/sample-app/src/sample_experiment.ts
  • packages/traceloop-sdk/test/experiment-export.test.ts
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/traceloop-sdk/src/lib/node-server-sdk.ts : Register new instrumentation packages during SDK initialization

Applied to files:

  • packages/sample-app/src/sample_experiment.ts
📚 Learning: 2025-12-30T22:05:07.422Z
Learnt from: galzilber
Repo: traceloop/openllmetry-js PR: 853
File: packages/sample-app/src/sample_security_evaluators.ts:37-46
Timestamp: 2025-12-30T22:05:07.422Z
Learning: In packages/sample-app/src/**/*.ts, allow demonstration code to omit defensive error handling and bounds checking for API responses. This pattern applies to sample/demo code only and should not be interpreted as a best practice for production code. Reviewers should still require basic guards for critical paths (security, integrity) and clearly document that the code is a demonstration with comments indicating assumptions.

Applied to files:

  • packages/sample-app/src/sample_experiment.ts
🪛 GitHub Actions: CI
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts

[warning] 1-1: Prettier formatting check failed. Run 'prettier --write' to fix code style issues in this file.

packages/sample-app/src/sample_experiment.ts

[warning] 1-1: Prettier formatting check failed. Run 'prettier --write' to fix code style issues in this file.

🔇 Additional comments (7)
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts (5)

498-517: LGTM!

The resolveExportParams helper cleanly handles parameter resolution with fallback to last-used values and provides clear error messages when required parameters are missing.


525-550: LGTM!

The toCsvString method correctly leverages the existing handleResponse logic that handles text/csv content-type responses, and includes appropriate error handling and validation.


558-584: LGTM!

The toJsonString method appropriately handles both string and object responses, ensuring consistent string output for the caller.


172-176: LGTM!

The tracking of last-used experiment parameters is correctly placed after successful experiment completion, enabling the export methods to use these values when parameters are not explicitly provided.


487-489: No changes needed—the API response fields are correctly transformed.

The handleResponse() method at line 485 applies transformApiResponse() to JSON responses from the /v2/experiments/run-in-github endpoint. The RunInGithubResponse interface correctly defines fields in camelCase (experimentSlug, runId), and lines 487-489 properly access these transformed fields.

packages/traceloop-sdk/test/experiment-export.test.ts (2)

106-126: LGTM!

The error handling tests correctly verify that appropriate errors are thrown when required parameters are missing, with proper assertions on both error type and message content.


19-62: LGTM!

The Polly configuration and test setup are well-structured with appropriate handling for recording vs replay modes and proper security measures for stripping authorization headers from persisted recordings.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment thread packages/traceloop-sdk/test/experiment-export.test.ts
Comment thread packages/traceloop-sdk/test/experiment-export.test.ts
Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 77c0da9 in 57 seconds. Click for details.
  • Reviewed 50 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/sample-app/src/sample_experiment.ts:169
  • Draft comment:
    Removed an extra blank line; this is a minor stylistic update.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
2. packages/traceloop-sdk/src/lib/client/experiment/experiment.ts:525
  • Draft comment:
    Refactored parameter formatting in toCsvString and toJsonString improves readability. Error handling and endpoint usage remain consistent.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_LKjqbKitiA8rAmky

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed f4417c3 in 47 seconds. Click for details.
  • Reviewed 77 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/test/experiment-export.test.ts:66
  • Draft comment:
    Good use of this.skip() for conditional testing. Consider extracting the skip logic into a helper function or a beforeEach hook to reduce duplication across tests.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None
2. packages/traceloop-sdk/test/experiment-export.test.ts:87
  • Draft comment:
    Conditional skip is also applied here for the JSON export test. Again, consider refactoring to avoid code repetition.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None
3. packages/traceloop-sdk/test/experiment-export.test.ts:76
  • Draft comment:
    Removal of try/catch in favor of letting errors propagate makes the tests clearer. This ensures tests fail on unexpected errors rather than quietly logging them.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_HQq15HFrlHybLADf

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@nina-kollman nina-kollman merged commit 114d2b8 into main Jan 18, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants