fix(traceloop-sdk): Add csv and json support to experiment by nina-kollman · Pull Request #864 · traceloop/openllmetry-js

nina-kollman · 2026-01-15T10:08:44Z

Fixes: TLP-1233

Important

Add CSV and JSON export functionality to experiment results in Traceloop SDK with tests for validation and error handling.

New Features:
- Add toCsvString() and toJsonString() methods in Experiment class for exporting results.
- Automatically export results to experiment_results.csv and experiment_results.json in sample_experiment.ts.
Behavior:
- Export methods use last run's experiment slug and run ID if not provided.
- Throws error if experiment_slug or run_id is missing.
Tests:
- Add experiment-export.test.ts to test CSV/JSON export, parameter validation, and error handling.

^{This description was created by}^{for f4417c3. You can customize this summary. It will automatically update as commits are pushed.}

Summary by CodeRabbit

New Features
- Added CSV and JSON export of experiment results, with optional automatic runtime export to files after runs.
- Exports accept explicit parameters or will default to the most recent run when parameters are omitted.
Tests
- Added tests for CSV/JSON export, parameter validation, error handling, and last-run default behavior.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-15T10:08:56Z

📝 Walkthrough

Walkthrough

Adds CSV and JSON export methods to the Experiment client that resolve slug/runId (with last-used fallbacks); the sample app writes exported CSV/JSON files at runtime; tests exercising exports via Polly.js were added.

Changes

Cohort / File(s)	Summary
Core export functionality `packages/traceloop-sdk/src/lib/client/experiment/experiment.ts`	Added private `_lastExperimentSlug` and `_lastRunId`; added `resolveExportParams(...)`; added `toCsvString(...)` and `toJsonString(...)` which call `/export/csv` and `/export/json`, validate responses, and return strings.
Tests `packages/traceloop-sdk/test/experiment-export.test.ts`	New Polly.js-based test suite (HTTP/Fetch adapters, filesystem persister) covering CSV/JSON exports with explicit params, missing-param error cases, and last-run fallback behavior.
Sample application `packages/sample-app/src/sample_experiment.ts`	Added `fs` import and runtime calls to `client.experiment.toCsvString()` and `toJsonString()`; writes `experiment_results.csv` and `experiment_results.json`.

Sequence Diagram(s)

sequenceDiagram
  actor User
  participant SampleApp as Sample App
  participant Experiment as Experiment Client
  participant API as Traceloop API
  participant FS as Filesystem

  User->>SampleApp: trigger export flow
  SampleApp->>Experiment: call toCsvString()/toJsonString()
  Experiment->>Experiment: resolveExportParams(experimentSlug?, runId?)
  Experiment->>API: GET /export/csv or /export/json (slug, runId)
  API-->>Experiment: 200 + CSV/JSON payload
  Experiment-->>SampleApp: return string result
  SampleApp->>FS: write experiment_results.csv / experiment_results.json
  FS-->>SampleApp: write confirmation
  SampleApp-->>User: export complete

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I hopped through code to fetch the rows,
CSV and JSON where the data grows.
I kept the slug and chased the run,
Wrote files at dusk and danced at sun.
🥕✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding CSV and JSON export support to the experiment functionality in traceloop-sdk.
Linked Issues check	✅ Passed	The PR fully implements the coding requirements from TLP-1233: adds toCsvString() and toJsonString() methods with parameter resolution, error handling, and comprehensive test coverage.
Out of Scope Changes check	✅ Passed	All changes are directly related to implementing CSV/JSON export functionality: experiment methods, sample app demonstration, and dedicated test suite; no unrelated modifications detected.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 77c0da9 and f4417c3.

📒 Files selected for processing (1)

packages/traceloop-sdk/test/experiment-export.test.ts

🚧 Files skipped from review as they are similar to previous changes (1)

packages/traceloop-sdk/test/experiment-export.test.ts

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and test

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to c14fb77 in 2 minutes and 6 seconds. Click for details.

Reviewed 309 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/sample-app/src/sample_experiment.ts:164

Draft comment:
Consider adding error handling around the synchronous file writes. Using fs.writeFileSync can throw errors (e.g. due to permission issues); wrapping these calls in try-catch or using asynchronous alternatives (fs.promises.writeFile) would make the export more robust.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% Looking at the full context, the fs.writeFileSync calls on lines 168 and 171 are already inside a try-catch block that starts at line 115 and has a catch block at line 173. This catch block handles errors and logs them appropriately. The comment suggests adding error handling, but error handling is already present. The comment is technically incorrect because it claims error handling is needed when it already exists. The suggestion about using async alternatives is a code quality suggestion, but it's not actionable as a "must fix" issue - it's more of a preference, and the current code is already protected by the try-catch. Could the comment be referring to more granular error handling specifically for the file operations, separate from the broader experiment error handling? Perhaps the author wants specific error messages for file write failures rather than generic experiment operation errors? While more granular error handling could be beneficial, the comment states that the code "can throw errors" and needs wrapping in try-catch, which is already done. The comment doesn't specifically ask for more granular error messages. The existing error handling is sufficient and the comment is factually incorrect about the lack of error handling. The comment should be deleted because it incorrectly suggests that error handling is missing when the code is already wrapped in a try-catch block (lines 115-181). The existing error handling will catch any errors from fs.writeFileSync.

2. packages/traceloop-sdk/src/lib/client/experiment/experiment.ts:500

Draft comment:
Storing the last experiment slug and run ID in instance variables (_lastExperimentSlug, _lastRunId) works for sequential runs but may lead to race conditions in concurrent scenarios. Consider documenting this behavior or refactoring for concurrency if needed.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment is speculative - it says "may lead to race conditions in concurrent scenarios" which is a hypothetical concern. The comment doesn't demonstrate that there IS a race condition problem, just that there COULD be one. The rules explicitly state "Do NOT make speculative comments, such as 'If X, then Y is an issue', Only comment if it is definitely an issue." This comment also suggests "Consider documenting this behavior or refactoring for concurrency if needed" which is vague and not actionable - it doesn't provide a clear code change. The comment is essentially asking the author to think about whether this could be a problem, rather than pointing out a definite issue. Without evidence that this class is actually used concurrently or that concurrent usage is a design requirement, this is purely speculative. Could there be evidence that this class is designed to be used concurrently? Perhaps the architecture requires thread-safety or concurrent experiment runs? Maybe the comment is valid if there's a clear use case where multiple experiments run simultaneously on the same instance. Even if concurrent usage were possible, the comment doesn't provide concrete evidence that it's a problem in practice. The comment uses "may lead to" language which is speculative. Without seeing actual concurrent usage patterns or requirements in the codebase, this remains a hypothetical concern. The rules are clear: only comment if it's definitely an issue, not if it might be one. This comment should be deleted because it's speculative ("may lead to race conditions") rather than pointing out a definite issue. It also provides vague suggestions ("Consider documenting" or "refactoring if needed") rather than actionable code changes.

3. packages/traceloop-sdk/test/experiment-export.test.ts:128

Draft comment:
The tests mainly cover error scenarios when export parameters are missing. It would be beneficial to add tests that simulate successful CSV/JSON export responses to validate the correct functioning of toCsvString and toJsonString methods.
Reason this comment was not posted:
Comment looked like it was already resolved.

Workflow ID: wflow_HZOqBUlNC6IBySHU

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@packages/traceloop-sdk/test/experiment-export.test.ts`:
- Around line 85-104: Test currently swallows all errors in the try/catch which
can hide failures; change the handler around
client.experiment.toJsonString(experimentSlug, runId) to either let errors
bubble (remove the catch) so the test fails on real errors, or in the catch
inspect the error/message and call this.skip() for the known "no valid
recordings" condition and rethrow for anything else; specifically update the
anonymous test function (the "should export experiment results as JSON with
explicit parameters" it block) to use this.skip() when the error indicates no
recordings (or a 404) and otherwise rethrow the error so failures are not
silently ignored, referencing client.experiment.toJsonString, experimentSlug and
runId.
- Around line 65-83: The test "should export experiment results as CSV with
explicit parameters" currently swallows errors in its try-catch, so replace the
silent catch by either (A) wiring a deterministic fixture/Polly recording and
asserting the happy-path result from
client.experiment.toCsvString(experimentSlug, runId) (remove the try-catch and
assert csvData truthiness/length/type), or (B) mark the test pending/skipped
(e.g., using it.skip or describe.skip) until fixtures are available; ensure you
reference the same test name and the call to client.experiment.toCsvString along
with the experimentSlug/runId variables when implementing the chosen fix.

🧹 Nitpick comments (2)

packages/sample-app/src/sample_experiment.ts (1)
163-172: Consider adding success logging after file writes.

The export functionality is correctly implemented within the existing error handling block. For better demonstration and debugging visibility, consider adding success messages after the file writes.
💡 Suggested improvement
     // Export to CSV and JSON using last run's experiment slug and run ID
     const csvData = await client.experiment.toCsvString();
     fs.writeFileSync("experiment_results.csv", csvData);
+    console.log("   ✅ Exported experiment_results.csv");

     const jsonData = await client.experiment.toJsonString();
     fs.writeFileSync("experiment_results.json", jsonData);
+    console.log("   ✅ Exported experiment_results.json");
packages/traceloop-sdk/test/experiment-export.test.ts (1)
128-143: Test name is misleading - rename to match actual behavior.

The test is named "should use last run values when not provided" but actually tests error handling when no previous run exists. Consider renaming to accurately reflect what's being tested.
💡 Suggested rename
-    it("should use last run values when not provided", async function () {
-      // This test would require running an actual experiment first
-      // For now, we'll just verify the error handling
+    it("should throw error when no previous run exists and parameters are omitted", async function () {
       try {
         await client.experiment.toCsvString();
         assert.fail("Should have thrown an error");

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1d1f642 and c14fb77.

📒 Files selected for processing (3)

packages/sample-app/src/sample_experiment.ts
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts
packages/traceloop-sdk/test/experiment-export.test.ts

🧰 Additional context used

📓 Path-based instructions (2)

packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Import AI/LLM semantic attribute constants from @traceloop/ai-semantic-conventions rather than hardcoding strings

Files:

packages/traceloop-sdk/src/lib/client/experiment/experiment.ts
packages/traceloop-sdk/test/experiment-export.test.ts

packages/traceloop-sdk/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

packages/traceloop-sdk/**/*.{ts,tsx}: Use the provided decorators (@workflow, @task, @agent) for workflow/task/agent spans instead of re-implementing them
For manual LLM operations, use trace.withLLMSpan from @traceloop/node-server-sdk

Files:

packages/traceloop-sdk/src/lib/client/experiment/experiment.ts
packages/traceloop-sdk/test/experiment-export.test.ts

🧠 Learnings (5)

📓 Common learnings

Learnt from: nina-kollman
Repo: traceloop/openllmetry-js PR: 672
File: packages/traceloop-sdk/src/lib/client/experiment/experiment.ts:212-231
Timestamp: 2025-08-22T12:44:43.893Z
Learning: In the Traceloop SDK experiment system, JSONL datasets use a custom format where the first line contains column definitions and subsequent lines contain the actual data rows. The parseJsonlToRows method in packages/traceloop-sdk/src/lib/client/experiment/experiment.ts intentionally skips the first line (index 0) to avoid parsing the column definition as data.

Learnt from: galzilber
Repo: traceloop/openllmetry-js PR: 643
File: packages/traceloop-sdk/test/datasets-final.test.ts:129-143
Timestamp: 2025-08-12T13:57:42.538Z
Learning: The Traceloop SDK includes a response transformer utility (transformApiResponse) that automatically converts snake_case API responses to camelCase for SDK consumers. This means interface definitions and tests should use camelCase field names (like datasetId, datasetSlug, createdAt, updatedAt) rather than the raw API's snake_case format.

📚 Learning: 2025-08-24T22:08:07.023Z

Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx} : Import AI/LLM semantic attribute constants from traceloop/ai-semantic-conventions rather than hardcoding strings

Applied to files:

packages/sample-app/src/sample_experiment.ts
packages/traceloop-sdk/test/experiment-export.test.ts

📚 Learning: 2025-08-24T22:08:07.023Z

Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/traceloop-sdk/**/*.{ts,tsx} : For manual LLM operations, use trace.withLLMSpan from traceloop/node-server-sdk

Applied to files:

packages/sample-app/src/sample_experiment.ts
packages/traceloop-sdk/test/experiment-export.test.ts

📚 Learning: 2025-08-24T22:08:07.023Z

Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/traceloop-sdk/src/lib/node-server-sdk.ts : Register new instrumentation packages during SDK initialization

Applied to files:

packages/sample-app/src/sample_experiment.ts

📚 Learning: 2025-12-30T22:05:07.422Z

Learnt from: galzilber
Repo: traceloop/openllmetry-js PR: 853
File: packages/sample-app/src/sample_security_evaluators.ts:37-46
Timestamp: 2025-12-30T22:05:07.422Z
Learning: In packages/sample-app/src/**/*.ts, allow demonstration code to omit defensive error handling and bounds checking for API responses. This pattern applies to sample/demo code only and should not be interpreted as a best practice for production code. Reviewers should still require basic guards for critical paths (security, integrity) and clearly document that the code is a demonstration with comments indicating assumptions.

Applied to files:

packages/sample-app/src/sample_experiment.ts

🪛 GitHub Actions: CI

packages/traceloop-sdk/src/lib/client/experiment/experiment.ts

[warning] 1-1: Prettier formatting check failed. Run 'prettier --write' to fix code style issues in this file.

packages/sample-app/src/sample_experiment.ts

[warning] 1-1: Prettier formatting check failed. Run 'prettier --write' to fix code style issues in this file.

🔇 Additional comments (7)

packages/traceloop-sdk/src/lib/client/experiment/experiment.ts (5)

498-517: LGTM!

The resolveExportParams helper cleanly handles parameter resolution with fallback to last-used values and provides clear error messages when required parameters are missing.

525-550: LGTM!

The toCsvString method correctly leverages the existing handleResponse logic that handles text/csv content-type responses, and includes appropriate error handling and validation.

558-584: LGTM!

The toJsonString method appropriately handles both string and object responses, ensuring consistent string output for the caller.

172-176: LGTM!

The tracking of last-used experiment parameters is correctly placed after successful experiment completion, enabling the export methods to use these values when parameters are not explicitly provided.

487-489: No changes needed—the API response fields are correctly transformed.

The handleResponse() method at line 485 applies transformApiResponse() to JSON responses from the /v2/experiments/run-in-github endpoint. The RunInGithubResponse interface correctly defines fields in camelCase (experimentSlug, runId), and lines 487-489 properly access these transformed fields.

packages/traceloop-sdk/test/experiment-export.test.ts (2)

106-126: LGTM!

The error handling tests correctly verify that appropriate errors are thrown when required parameters are missing, with proper assertions on both error type and message content.

19-62: LGTM!

The Polly configuration and test setup are well-structured with appropriate handling for recording vs replay modes and proper security measures for stripping authorization headers from persisted recordings.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 77c0da9 in 57 seconds. Click for details.

Reviewed 50 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/sample-app/src/sample_experiment.ts:169

Draft comment:
Removed an extra blank line; this is a minor stylistic update.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. packages/traceloop-sdk/src/lib/client/experiment/experiment.ts:525

Draft comment:
Refactored parameter formatting in toCsvString and toJsonString improves readability. Error handling and endpoint usage remain consistent.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_LKjqbKitiA8rAmky

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed f4417c3 in 47 seconds. Click for details.

Reviewed 77 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/traceloop-sdk/test/experiment-export.test.ts:66

Draft comment:
Good use of this.skip() for conditional testing. Consider extracting the skip logic into a helper function or a beforeEach hook to reduce duplication across tests.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

2. packages/traceloop-sdk/test/experiment-export.test.ts:87

Draft comment:
Conditional skip is also applied here for the JSON export test. Again, consider refactoring to avoid code repetition.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

3. packages/traceloop-sdk/test/experiment-export.test.ts:76

Draft comment:
Removal of try/catch in favor of letting errors propagate makes the tests clearer. This ensures tests fail on unexpected errors rather than quietly logging them.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_HQq15HFrlHybLADf

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

add json

c14fb77

ellipsis-dev bot reviewed Jan 15, 2026

View reviewed changes

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

Comment thread packages/traceloop-sdk/test/experiment-export.test.ts

Comment thread packages/traceloop-sdk/test/experiment-export.test.ts

pretty

77c0da9

ellipsis-dev bot reviewed Jan 15, 2026

View reviewed changes

test fix

f4417c3

ellipsis-dev bot reviewed Jan 15, 2026

View reviewed changes

galzilber approved these changes Jan 18, 2026

View reviewed changes

Merge branch 'main' into nk/exp_result

b0f6758

nina-kollman merged commit 114d2b8 into main Jan 18, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(traceloop-sdk): Add csv and json support to experiment #864

fix(traceloop-sdk): Add csv and json support to experiment #864
nina-kollman merged 4 commits intomainfrom
nk/exp_result

nina-kollman commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 15, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nina-kollman commented Jan 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nina-kollman commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 15, 2026 •

edited

Loading