fix(traceloop-sdk): Add csv and json support to experiment #864
fix(traceloop-sdk): Add csv and json support to experiment #864nina-kollman merged 4 commits intomainfrom
Conversation
📝 WalkthroughWalkthroughAdds CSV and JSON export methods to the Experiment client that resolve slug/runId (with last-used fallbacks); the sample app writes exported CSV/JSON files at runtime; tests exercising exports via Polly.js were added. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant SampleApp as Sample App
participant Experiment as Experiment Client
participant API as Traceloop API
participant FS as Filesystem
User->>SampleApp: trigger export flow
SampleApp->>Experiment: call toCsvString()/toJsonString()
Experiment->>Experiment: resolveExportParams(experimentSlug?, runId?)
Experiment->>API: GET /export/csv or /export/json (slug, runId)
API-->>Experiment: 200 + CSV/JSON payload
Experiment-->>SampleApp: return string result
SampleApp->>FS: write experiment_results.csv / experiment_results.json
FS-->>SampleApp: write confirmation
SampleApp-->>User: export complete
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
✏️ Tip: You can disable this entire section by setting Comment |
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed everything up to c14fb77 in 2 minutes and 6 seconds. Click for details.
- Reviewed
309lines of code in3files - Skipped
0files when reviewing. - Skipped posting
3draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/sample-app/src/sample_experiment.ts:164
- Draft comment:
Consider adding error handling around the synchronous file writes. Using fs.writeFileSync can throw errors (e.g. due to permission issues); wrapping these calls in try-catch or using asynchronous alternatives (fs.promises.writeFile) would make the export more robust. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% Looking at the full context, the fs.writeFileSync calls on lines 168 and 171 are already inside a try-catch block that starts at line 115 and has a catch block at line 173. This catch block handles errors and logs them appropriately. The comment suggests adding error handling, but error handling is already present. The comment is technically incorrect because it claims error handling is needed when it already exists. The suggestion about using async alternatives is a code quality suggestion, but it's not actionable as a "must fix" issue - it's more of a preference, and the current code is already protected by the try-catch. Could the comment be referring to more granular error handling specifically for the file operations, separate from the broader experiment error handling? Perhaps the author wants specific error messages for file write failures rather than generic experiment operation errors? While more granular error handling could be beneficial, the comment states that the code "can throw errors" and needs wrapping in try-catch, which is already done. The comment doesn't specifically ask for more granular error messages. The existing error handling is sufficient and the comment is factually incorrect about the lack of error handling. The comment should be deleted because it incorrectly suggests that error handling is missing when the code is already wrapped in a try-catch block (lines 115-181). The existing error handling will catch any errors from fs.writeFileSync.
2. packages/traceloop-sdk/src/lib/client/experiment/experiment.ts:500
- Draft comment:
Storing the last experiment slug and run ID in instance variables (_lastExperimentSlug, _lastRunId) works for sequential runs but may lead to race conditions in concurrent scenarios. Consider documenting this behavior or refactoring for concurrency if needed. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment is speculative - it says "may lead to race conditions in concurrent scenarios" which is a hypothetical concern. The comment doesn't demonstrate that there IS a race condition problem, just that there COULD be one. The rules explicitly state "Do NOT make speculative comments, such as 'If X, then Y is an issue', Only comment if it is definitely an issue." This comment also suggests "Consider documenting this behavior or refactoring for concurrency if needed" which is vague and not actionable - it doesn't provide a clear code change. The comment is essentially asking the author to think about whether this could be a problem, rather than pointing out a definite issue. Without evidence that this class is actually used concurrently or that concurrent usage is a design requirement, this is purely speculative. Could there be evidence that this class is designed to be used concurrently? Perhaps the architecture requires thread-safety or concurrent experiment runs? Maybe the comment is valid if there's a clear use case where multiple experiments run simultaneously on the same instance. Even if concurrent usage were possible, the comment doesn't provide concrete evidence that it's a problem in practice. The comment uses "may lead to" language which is speculative. Without seeing actual concurrent usage patterns or requirements in the codebase, this remains a hypothetical concern. The rules are clear: only comment if it's definitely an issue, not if it might be one. This comment should be deleted because it's speculative ("may lead to race conditions") rather than pointing out a definite issue. It also provides vague suggestions ("Consider documenting" or "refactoring if needed") rather than actionable code changes.
3. packages/traceloop-sdk/test/experiment-export.test.ts:128
- Draft comment:
The tests mainly cover error scenarios when export parameters are missing. It would be beneficial to add tests that simulate successful CSV/JSON export responses to validate the correct functioning of toCsvString and toJsonString methods. - Reason this comment was not posted:
Comment looked like it was already resolved.
Workflow ID: wflow_HZOqBUlNC6IBySHU
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@packages/traceloop-sdk/test/experiment-export.test.ts`:
- Around line 85-104: Test currently swallows all errors in the try/catch which
can hide failures; change the handler around
client.experiment.toJsonString(experimentSlug, runId) to either let errors
bubble (remove the catch) so the test fails on real errors, or in the catch
inspect the error/message and call this.skip() for the known "no valid
recordings" condition and rethrow for anything else; specifically update the
anonymous test function (the "should export experiment results as JSON with
explicit parameters" it block) to use this.skip() when the error indicates no
recordings (or a 404) and otherwise rethrow the error so failures are not
silently ignored, referencing client.experiment.toJsonString, experimentSlug and
runId.
- Around line 65-83: The test "should export experiment results as CSV with
explicit parameters" currently swallows errors in its try-catch, so replace the
silent catch by either (A) wiring a deterministic fixture/Polly recording and
asserting the happy-path result from
client.experiment.toCsvString(experimentSlug, runId) (remove the try-catch and
assert csvData truthiness/length/type), or (B) mark the test pending/skipped
(e.g., using it.skip or describe.skip) until fixtures are available; ensure you
reference the same test name and the call to client.experiment.toCsvString along
with the experimentSlug/runId variables when implementing the chosen fix.
🧹 Nitpick comments (2)
packages/sample-app/src/sample_experiment.ts (1)
163-172: Consider adding success logging after file writes.The export functionality is correctly implemented within the existing error handling block. For better demonstration and debugging visibility, consider adding success messages after the file writes.
💡 Suggested improvement
// Export to CSV and JSON using last run's experiment slug and run ID const csvData = await client.experiment.toCsvString(); fs.writeFileSync("experiment_results.csv", csvData); + console.log(" ✅ Exported experiment_results.csv"); const jsonData = await client.experiment.toJsonString(); fs.writeFileSync("experiment_results.json", jsonData); + console.log(" ✅ Exported experiment_results.json");packages/traceloop-sdk/test/experiment-export.test.ts (1)
128-143: Test name is misleading - rename to match actual behavior.The test is named "should use last run values when not provided" but actually tests error handling when no previous run exists. Consider renaming to accurately reflect what's being tested.
💡 Suggested rename
- it("should use last run values when not provided", async function () { - // This test would require running an actual experiment first - // For now, we'll just verify the error handling + it("should throw error when no previous run exists and parameters are omitted", async function () { try { await client.experiment.toCsvString(); assert.fail("Should have thrown an error");
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
packages/sample-app/src/sample_experiment.tspackages/traceloop-sdk/src/lib/client/experiment/experiment.tspackages/traceloop-sdk/test/experiment-export.test.ts
🧰 Additional context used
📓 Path-based instructions (2)
packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Import AI/LLM semantic attribute constants from
@traceloop/ai-semantic-conventionsrather than hardcoding strings
Files:
packages/traceloop-sdk/src/lib/client/experiment/experiment.tspackages/traceloop-sdk/test/experiment-export.test.ts
packages/traceloop-sdk/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
packages/traceloop-sdk/**/*.{ts,tsx}: Use the provided decorators (@workflow,@task,@agent) for workflow/task/agent spans instead of re-implementing them
For manual LLM operations, use trace.withLLMSpan from@traceloop/node-server-sdk
Files:
packages/traceloop-sdk/src/lib/client/experiment/experiment.tspackages/traceloop-sdk/test/experiment-export.test.ts
🧠 Learnings (5)
📓 Common learnings
Learnt from: nina-kollman
Repo: traceloop/openllmetry-js PR: 672
File: packages/traceloop-sdk/src/lib/client/experiment/experiment.ts:212-231
Timestamp: 2025-08-22T12:44:43.893Z
Learning: In the Traceloop SDK experiment system, JSONL datasets use a custom format where the first line contains column definitions and subsequent lines contain the actual data rows. The parseJsonlToRows method in packages/traceloop-sdk/src/lib/client/experiment/experiment.ts intentionally skips the first line (index 0) to avoid parsing the column definition as data.
Learnt from: galzilber
Repo: traceloop/openllmetry-js PR: 643
File: packages/traceloop-sdk/test/datasets-final.test.ts:129-143
Timestamp: 2025-08-12T13:57:42.538Z
Learning: The Traceloop SDK includes a response transformer utility (transformApiResponse) that automatically converts snake_case API responses to camelCase for SDK consumers. This means interface definitions and tests should use camelCase field names (like datasetId, datasetSlug, createdAt, updatedAt) rather than the raw API's snake_case format.
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx} : Import AI/LLM semantic attribute constants from traceloop/ai-semantic-conventions rather than hardcoding strings
Applied to files:
packages/sample-app/src/sample_experiment.tspackages/traceloop-sdk/test/experiment-export.test.ts
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/traceloop-sdk/**/*.{ts,tsx} : For manual LLM operations, use trace.withLLMSpan from traceloop/node-server-sdk
Applied to files:
packages/sample-app/src/sample_experiment.tspackages/traceloop-sdk/test/experiment-export.test.ts
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/traceloop-sdk/src/lib/node-server-sdk.ts : Register new instrumentation packages during SDK initialization
Applied to files:
packages/sample-app/src/sample_experiment.ts
📚 Learning: 2025-12-30T22:05:07.422Z
Learnt from: galzilber
Repo: traceloop/openllmetry-js PR: 853
File: packages/sample-app/src/sample_security_evaluators.ts:37-46
Timestamp: 2025-12-30T22:05:07.422Z
Learning: In packages/sample-app/src/**/*.ts, allow demonstration code to omit defensive error handling and bounds checking for API responses. This pattern applies to sample/demo code only and should not be interpreted as a best practice for production code. Reviewers should still require basic guards for critical paths (security, integrity) and clearly document that the code is a demonstration with comments indicating assumptions.
Applied to files:
packages/sample-app/src/sample_experiment.ts
🪛 GitHub Actions: CI
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts
[warning] 1-1: Prettier formatting check failed. Run 'prettier --write' to fix code style issues in this file.
packages/sample-app/src/sample_experiment.ts
[warning] 1-1: Prettier formatting check failed. Run 'prettier --write' to fix code style issues in this file.
🔇 Additional comments (7)
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts (5)
498-517: LGTM!The
resolveExportParamshelper cleanly handles parameter resolution with fallback to last-used values and provides clear error messages when required parameters are missing.
525-550: LGTM!The
toCsvStringmethod correctly leverages the existinghandleResponselogic that handlestext/csvcontent-type responses, and includes appropriate error handling and validation.
558-584: LGTM!The
toJsonStringmethod appropriately handles both string and object responses, ensuring consistent string output for the caller.
172-176: LGTM!The tracking of last-used experiment parameters is correctly placed after successful experiment completion, enabling the export methods to use these values when parameters are not explicitly provided.
487-489: No changes needed—the API response fields are correctly transformed.The
handleResponse()method at line 485 appliestransformApiResponse()to JSON responses from the/v2/experiments/run-in-githubendpoint. TheRunInGithubResponseinterface correctly defines fields in camelCase (experimentSlug,runId), and lines 487-489 properly access these transformed fields.packages/traceloop-sdk/test/experiment-export.test.ts (2)
106-126: LGTM!The error handling tests correctly verify that appropriate errors are thrown when required parameters are missing, with proper assertions on both error type and message content.
19-62: LGTM!The Polly configuration and test setup are well-structured with appropriate handling for recording vs replay modes and proper security measures for stripping authorization headers from persisted recordings.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed 77c0da9 in 57 seconds. Click for details.
- Reviewed
50lines of code in2files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/sample-app/src/sample_experiment.ts:169
- Draft comment:
Removed an extra blank line; this is a minor stylistic update. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. packages/traceloop-sdk/src/lib/client/experiment/experiment.ts:525
- Draft comment:
Refactored parameter formatting in toCsvString and toJsonString improves readability. Error handling and endpoint usage remain consistent. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_LKjqbKitiA8rAmky
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed f4417c3 in 47 seconds. Click for details.
- Reviewed
77lines of code in1files - Skipped
0files when reviewing. - Skipped posting
3draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/test/experiment-export.test.ts:66
- Draft comment:
Good use of this.skip() for conditional testing. Consider extracting the skip logic into a helper function or a beforeEach hook to reduce duplication across tests. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
2. packages/traceloop-sdk/test/experiment-export.test.ts:87
- Draft comment:
Conditional skip is also applied here for the JSON export test. Again, consider refactoring to avoid code repetition. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
3. packages/traceloop-sdk/test/experiment-export.test.ts:76
- Draft comment:
Removal of try/catch in favor of letting errors propagate makes the tests clearer. This ensures tests fail on unexpected errors rather than quietly logging them. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_HQq15HFrlHybLADf
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
Fixes: TLP-1233
Important
Add CSV and JSON export functionality to experiment results in Traceloop SDK with tests for validation and error handling.
toCsvString()andtoJsonString()methods inExperimentclass for exporting results.experiment_results.csvandexperiment_results.jsoninsample_experiment.ts.experiment_slugorrun_idis missing.experiment-export.test.tsto test CSV/JSON export, parameter validation, and error handling.This description was created by
for f4417c3. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.