-
Notifications
You must be signed in to change notification settings - Fork 46
fix(exp): Add run in github experiment #837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…to nk/exp_in_github
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds GitHub Actions–aware experiment execution to the SDK, new GitHub-related interfaces and exports, a GitHub experiment sample and npm script in the sample app, and conditional logging guards in an existing sample. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as Caller (local/CI)
participant SDK as Traceloop SDK (Experiment)
participant Backend as Traceloop Backend
participant Executor as Local Executor (task runner)
participant OpenAI as OpenAI API
Note over Caller,SDK: Caller invokes Experiment.run(task, options)
alt GITHUB_ACTIONS=true (GitHub-backed)
Caller->>SDK: run(task, options)
SDK->>SDK: getGithubContext()
SDK->>Backend: fetch dataset rows (if needed)
loop per dataset row
SDK->>Executor: execute task for row
Executor->>OpenAI: request model
OpenAI-->>Executor: model response
Executor-->>SDK: TaskResult (output/error)
end
SDK->>Backend: POST /experiment/run with results + github_context + metadata
Backend-->>SDK: RunInGithubResponse
SDK-->>Caller: return RunInGithubResponse
else Local execution
Caller->>SDK: run(task, options)
SDK->>Executor: runLocally -> executeTasksLocally(rows)
Executor->>OpenAI: call model per row
Executor-->>SDK: ExperimentRunResult (taskResults, evaluations)
SDK-->>Caller: return ExperimentRunResult
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
…to nk/exp_in_github
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed everything up to 5573148 in 2 minutes and 45 seconds. Click for details.
- Reviewed
416lines of code in5files - Skipped
0files when reviewing. - Skipped posting
7draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/sample-app/package.json:41
- Draft comment:
Script 'run:github_experiment' added looks correct. Ensure that 'tsx' is installed as a dev dependency for consistency. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. packages/sample-app/src/sample_github_experiment.ts:55
- Draft comment:
Consider adding error handling around the OpenAI API call to capture and log API-specific errors. - Reason this comment was not posted:
Confidence changes required:50%<= threshold50%None
3. packages/sample-app/src/sample_github_experiment.ts:91
- Draft comment:
Using 'if ("taskResults" in results)' to distinguish GitHub vs local responses may be brittle if response shapes change. - Reason this comment was not posted:
Confidence changes required:50%<= threshold50%None
4. packages/traceloop-sdk/src/lib/client/experiment/experiment.ts:138
- Draft comment:
In runLocally, task execution isn’t wrapped in a try/catch. If one task fails, it aborts the entire run. Consider handling individual task errors (respecting a potential stopOnError option). - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. packages/traceloop-sdk/src/lib/client/experiment/experiment.ts:408
- Draft comment:
Tasks in executeTasksLocally are processed sequentially. Consider parallelizing execution for performance benefits on large datasets. - Reason this comment was not posted:
Confidence changes required:50%<= threshold50%None
6. packages/traceloop-sdk/src/lib/client/experiment/experiment.ts:381
- Draft comment:
getGithubContext() relies on GITHUB_REF matching a PR pattern. Ensure this assumption holds for all GitHub Action events you intend to support. - Reason this comment was not posted:
Confidence changes required:50%<= threshold50%None
7. packages/traceloop-sdk/src/lib/interfaces/experiment.interface.ts:98
- Draft comment:
Typo: Consider using the canonical capitalization "GitHub" rather than "Github" in the interface names (GitHubContext, RunInGitHubOptions, RunInGitHubResponse) for consistency with the official branding. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% This is a style/naming comment about capitalization. While technically correct that "GitHub" is the official branding, this falls into the category of minor style preferences. The rules state to not make comments that are "obvious or unimportant" and to only comment if there's "clearly a code change required." Naming conventions like this are subjective - while GitHub's official branding uses capital H, many codebases use "Github" in their code. This is not a functional issue, and changing it would be a breaking change if these interfaces are already being used. The comment doesn't point to a bug or clear problem, just a stylistic preference. However, consistent naming with official branding could be considered a legitimate code quality issue, especially for public-facing APIs. If this is a new interface being added, now would be the best time to fix it before it's used elsewhere. The comment is actionable and clear about what should change. While the comment is actionable, it's still a minor stylistic preference rather than a functional issue. The rules emphasize not making "obvious or unimportant" comments. Capitalization preferences, especially when both forms are widely used in code, fall into this category. This is the type of thing that could be caught by a linter or style guide, not a manual code review. This comment should be deleted. It's a minor stylistic preference about capitalization that doesn't represent a functional issue or clear code quality problem. While technically correct about GitHub's branding, it's not important enough to warrant a review comment.
Workflow ID: wflow_qBGrssmshqMBi0FF
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts (1)
1-19: Address Prettier formatting issues.The CI pipeline reports formatting issues. Run
prettier --writeto fix code style.
🧹 Nitpick comments (5)
packages/sample-app/package.json (1)
41-41: Inconsistent script pattern compared to other run scripts.Other scripts follow the
npm run build && node dist/src/...pattern, while this new script usesnpx tsxdirectly. This bypasses TypeScript compilation and runs the source file directly. Consider aligning with the existing pattern for consistency:- "run:github_experiment": "npx tsx src/sample_github_experiment.ts", + "run:github_experiment": "npm run build && node dist/src/sample_github_experiment.js",Alternatively, if the intent is faster iteration during development, this is acceptable but could benefit from a comment or naming convention (e.g.,
dev:github_experiment).packages/traceloop-sdk/src/lib/interfaces/experiment.interface.ts (1)
105-110: Consider consolidatingTaskResultwith the existingTaskResponsetype.
TaskResultis very similar toTaskResponse(lines 33-40). The main difference is thatTaskResulthas optionaloutputwhileTaskResponsehas requiredoutput. Consider whether these can be unified or ifTaskResultcan extend/reuseTaskResponsewithPartial<>orPick<>to reduce duplication.// Alternative: Derive TaskResult from TaskResponse export type TaskResult = Omit<TaskResponse, 'output' | 'timestamp' | 'input_schema_mapping'> & { output?: Record<string, any>; };packages/sample-app/src/sample_github_experiment.ts (1)
49-75: Add defensive checks for input data.The task function assumes
row.queryexists and is a string. If the dataset row lacks this field, the code will fail at runtime. Consider adding validation:const researchTask: ExperimentTaskFunction = async ( row: TaskInput, ): Promise<TaskOutput> => { - console.log(`Processing question: ${row.query}`); - const question = row.query as string; + const question = row.query; + if (typeof question !== "string" || !question) { + throw new Error("Missing or invalid 'query' field in input row"); + } + console.log(`Processing question: ${question}`);This provides clearer error messages when dataset schema doesn't match expectations.
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts (2)
404-434: Consider addingstopOnErrorsupport for consistency with local execution.
runLocallyhas access tostopOnErroroption (fromExperimentRunOptions), butexecuteTasksLocallyalways continues on error. Consider adding an option to halt early on failure for parity:private async executeTasksLocally<TInput, TOutput>( task: ExperimentTaskFunction<TInput, TOutput>, rows: Record<string, any>[], stopOnError = false, ): Promise<TaskResult[]> { // ... for (const row of rows) { try { // ... } catch (error) { results.push({ /* ... */ }); if (stopOnError) { break; } } } }
479-483: Duplicatedcreated_from: "github"metadata assignment.
created_from: "github"is set in bothrun()(line 87) andrunInGithub()(line 482). Sincerun()always sets it when delegating torunInGithub(), the assignment here is redundant when called viarun(). However, it's needed ifrunInGithub()is called directly. Consider documenting this or ensuring only one location is responsible.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
packages/sample-app/package.json(1 hunks)packages/sample-app/src/sample_github_experiment.ts(1 hunks)packages/traceloop-sdk/src/lib/client/experiment/experiment.ts(3 hunks)packages/traceloop-sdk/src/lib/interfaces/experiment.interface.ts(1 hunks)packages/traceloop-sdk/src/lib/node-server-sdk.ts(1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
packages/*/package.json
📄 CodeRabbit inference engine (CLAUDE.md)
Use workspace:* for intra-repo package dependencies in package.json
Files:
packages/sample-app/package.json
packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Import AI/LLM semantic attribute constants from @traceloop/ai-semantic-conventions rather than hardcoding strings
Files:
packages/traceloop-sdk/src/lib/node-server-sdk.tspackages/traceloop-sdk/src/lib/interfaces/experiment.interface.tspackages/traceloop-sdk/src/lib/client/experiment/experiment.ts
packages/traceloop-sdk/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
packages/traceloop-sdk/**/*.{ts,tsx}: Use the provided decorators (@workflow, @task, @agent) for workflow/task/agent spans instead of re-implementing them
For manual LLM operations, use trace.withLLMSpan from @traceloop/node-server-sdk
Files:
packages/traceloop-sdk/src/lib/node-server-sdk.tspackages/traceloop-sdk/src/lib/interfaces/experiment.interface.tspackages/traceloop-sdk/src/lib/client/experiment/experiment.ts
packages/traceloop-sdk/src/lib/node-server-sdk.ts
📄 CodeRabbit inference engine (CLAUDE.md)
Register new instrumentation packages during SDK initialization
Files:
packages/traceloop-sdk/src/lib/node-server-sdk.ts
🧠 Learnings (6)
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx} : Import AI/LLM semantic attribute constants from traceloop/ai-semantic-conventions rather than hardcoding strings
Applied to files:
packages/traceloop-sdk/src/lib/node-server-sdk.tspackages/traceloop-sdk/src/lib/interfaces/experiment.interface.tspackages/sample-app/src/sample_github_experiment.ts
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/traceloop-sdk/src/lib/node-server-sdk.ts : Register new instrumentation packages during SDK initialization
Applied to files:
packages/traceloop-sdk/src/lib/node-server-sdk.ts
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/traceloop-sdk/**/*.{ts,tsx} : For manual LLM operations, use trace.withLLMSpan from traceloop/node-server-sdk
Applied to files:
packages/traceloop-sdk/src/lib/node-server-sdk.tspackages/sample-app/src/sample_github_experiment.ts
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/traceloop-sdk/**/*.{ts,tsx} : Use the provided decorators (workflow, task, agent) for workflow/task/agent spans instead of re-implementing them
Applied to files:
packages/traceloop-sdk/src/lib/node-server-sdk.tspackages/traceloop-sdk/src/lib/interfaces/experiment.interface.tspackages/sample-app/src/sample_github_experiment.ts
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/traceloop-sdk/package.json : When adding a new instrumentation package, add it to the main SDK dependencies
Applied to files:
packages/traceloop-sdk/src/lib/node-server-sdk.ts
📚 Learning: 2025-08-13T08:00:42.668Z
Learnt from: galzilber
Repo: traceloop/openllmetry-js PR: 643
File: packages/traceloop-sdk/src/lib/interfaces/dataset.interface.ts:27-27
Timestamp: 2025-08-13T08:00:42.668Z
Learning: In packages/traceloop-sdk/src/lib/interfaces/dataset.interface.ts, when the user said "there is no date" in response to a comment about removing "date" from column type unions, they may have been referring to local changes not yet reflected in the PR branch, as the code still contained "date" in the union types at lines 27 and 43.
Applied to files:
packages/traceloop-sdk/src/lib/interfaces/experiment.interface.ts
🧬 Code graph analysis (1)
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts (2)
packages/traceloop-sdk/src/lib/interfaces/experiment.interface.ts (7)
ExperimentTaskFunction(6-11)ExperimentRunOptions(20-29)ExperimentRunResult(42-48)RunInGithubResponse(121-125)GithubContext(98-103)TaskResult(105-110)RunInGithubOptions(112-119)packages/traceloop-sdk/src/lib/client/dataset/row.ts (1)
data(33-35)
🪛 GitHub Actions: CI
packages/sample-app/src/sample_github_experiment.ts
[warning] 1-1: Prettier formatting issues detected. Run 'prettier --write' to fix code style.
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts
[warning] 1-1: Prettier formatting issues detected. Run 'prettier --write' to fix code style.
🔇 Additional comments (6)
packages/traceloop-sdk/src/lib/node-server-sdk.ts (1)
28-31: LGTM!The new GitHub-related interfaces are correctly exported from the SDK's public API, enabling consumers to use these types when integrating with GitHub workflows.
packages/traceloop-sdk/src/lib/interfaces/experiment.interface.ts (1)
98-103: LGTM!The
GithubContext,RunInGithubOptions, andRunInGithubResponseinterfaces are well-structured and appropriately typed for the GitHub Actions integration.Also applies to: 112-125
packages/sample-app/src/sample_github_experiment.ts (2)
91-102: Type narrowing for result handling is appropriate.The check
"taskResults" in resultscorrectly discriminates betweenExperimentRunResult(local) andRunInGithubResponse(GitHub). This aligns with the union return type ofExperiment.run.
1-9: Address Prettier formatting issues.The CI pipeline reports formatting issues. Run
prettier --writeto fix code style before merging.⛔ Skipped due to learnings
Learnt from: CR Repo: traceloop/openllmetry-js PR: 0 File: CLAUDE.md:0-0 Timestamp: 2025-08-24T22:08:07.023Z Learning: Applies to packages/traceloop-sdk/**/*.{ts,tsx} : For manual LLM operations, use trace.withLLMSpan from traceloop/node-server-sdkLearnt from: CR Repo: traceloop/openllmetry-js PR: 0 File: CLAUDE.md:0-0 Timestamp: 2025-08-24T22:08:07.023Z Learning: Applies to packages/traceloop-sdk/src/lib/node-server-sdk.ts : Register new instrumentation packages during SDK initializationLearnt from: CR Repo: traceloop/openllmetry-js PR: 0 File: CLAUDE.md:0-0 Timestamp: 2025-08-24T22:08:07.023Z Learning: Applies to packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx} : Import AI/LLM semantic attribute constants from traceloop/ai-semantic-conventions rather than hardcoding stringsLearnt from: nina-kollman Repo: traceloop/openllmetry-js PR: 672 File: packages/traceloop-sdk/src/lib/client/experiment/experiment.ts:212-231 Timestamp: 2025-08-22T12:44:43.893Z Learning: In the Traceloop SDK experiment system, JSONL datasets use a custom format where the first line contains column definitions and subsequent lines contain the actual data rows. The parseJsonlToRows method in packages/traceloop-sdk/src/lib/client/experiment/experiment.ts intentionally skips the first line (index 0) to avoid parsing the column definition as data.packages/traceloop-sdk/src/lib/client/experiment/experiment.ts (2)
361-399: LGTM - GitHub context extraction is well-structured.The method properly validates the GitHub Actions environment, extracts required environment variables, parses the PR number from the ref, and constructs a proper PR URL. The error messages are clear and actionable.
440-525: TherunInGithubmethod is well-implemented overall.The method correctly:
- Generates experiment slugs with fallback
- Validates the task function
- Fetches dataset rows and executes tasks locally
- Prepares the payload with proper snake_case field naming for the API
- Handles both camelCase and snake_case response fields (lines 516-518)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 11092cd in 2 minutes and 4 seconds. Click for details.
- Reviewed
96lines of code in4files - Skipped
0files when reviewing. - Skipped posting
6draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/sample-app/package.json:41
- Draft comment:
Updated 'run:github_experiment' script to compile TypeScript before execution. This change aligns with the other commands and promotes consistency. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. packages/sample-app/src/sample_experiment.ts:129
- Draft comment:
Added conditional checks for 'taskResults' before logging experiment results to prevent runtime errors if the property is missing. Optionally, consider logging an alternative message when the expected property is absent. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
3. packages/sample-app/src/sample_github_experiment.ts:56
- Draft comment:
Changed the model from 'gpt-4' to 'gpt-4o'. Verify that 'gpt-4o' is the intended model identifier to avoid potential API errors. - Reason this comment was not posted:
Comment looked like it was already resolved.
4. packages/traceloop-sdk/src/lib/client/experiment/experiment.ts:359
- Draft comment:
Removed the strict GitHub Actions check in getGithubContext. Now the method only verifies that necessary GitHub environment variables are present. Ensure this behavior is intended to allow execution outside of strict CI environments. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the PR author to ensure that the behavior is intended, which violates the rule against asking for confirmation of intention. It does not provide a specific suggestion or point out a clear issue with the code.
5. packages/traceloop-sdk/src/lib/client/experiment/experiment.ts:492
- Draft comment:
Updated the API endpoint to use '/v2/experiments/run-in-github'. Ensure the backend is updated to support this versioned route. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the PR author to ensure that the backend is updated to support the new versioned route. This falls under the rule of not asking the author to ensure behavior is intended or tested. Therefore, this comment should be removed.
6. packages/sample-app/src/sample_github_experiment.ts:56
- Draft comment:
Typo found: The model identifier 'gpt-4o' on line 56 appears to be a typographical error. It may have been intended to be 'gpt-4'. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% The comment assumes "gpt-4o" is a typo for "gpt-4", but this is incorrect. GPT-4o (GPT-4 Omni) is a real, valid model from OpenAI that was released in 2024. The change from "gpt-4" to "gpt-4o" is clearly intentional - it's an upgrade to a newer model. The automated tool that generated this comment doesn't recognize "gpt-4o" as a valid model identifier. This is a false positive and the comment should be deleted. Could there be a scenario where the author accidentally typed "gpt-4o" when they meant "gpt-4"? Perhaps in some contexts or codebases, "gpt-4o" might not be available or supported? While it's theoretically possible the author made a mistake, the burden of proof is on the comment to show strong evidence of an error. "gpt-4o" is a well-known, widely-used model identifier, and there's no indication this is a typo. The change is clearly intentional, and the comment provides no evidence otherwise. This comment should be deleted. It incorrectly identifies "gpt-4o" as a typo when it's actually a valid OpenAI model identifier. The change from "gpt-4" to "gpt-4o" appears to be an intentional upgrade to a newer model.
Workflow ID: wflow_PTHu7ezeIr1ln65L
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 0619959 in 1 minute and 14 seconds. Click for details.
- Reviewed
80lines of code in2files - Skipped
0files when reviewing. - Skipped posting
6draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/sample-app/src/sample_github_experiment.ts:57
- Draft comment:
Minor formatting update: The system message content is now split across multiple lines. This is purely stylistic and does not affect functionality. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. packages/sample-app/src/sample_github_experiment.ts:77
- Draft comment:
Minor formatting change: The log statement for GitHub Actions context is reformatted into a multi-line call for improved readability. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
3. packages/sample-app/src/sample_github_experiment.ts:105
- Draft comment:
Minor formatting change: The log message indicating GitHub execution results is split across multiple lines. No functional changes. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
4. packages/traceloop-sdk/src/lib/client/experiment/experiment.ts:70
- Draft comment:
Documentation update: Redundant JSDoc lines about GitHub Actions context have been removed to reduce clutter. Ensure the remaining comments accurately reflect behavior. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
5. packages/traceloop-sdk/src/lib/client/experiment/experiment.ts:84
- Draft comment:
Minor formatting improvement: The conditional assignment for experimentMetadata is now split across multiple lines for better readability without changing functionality. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
6. packages/traceloop-sdk/src/lib/client/experiment/experiment.ts:506
- Draft comment:
Minor formatting update: The returned experimentSlug assignment is split over multiple lines for consistency and readability. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_LTjT9pQ39G8WqEH4
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts (1)
79-93: Logic bugs remain in GitHub Actions branch delegation.The issues flagged in previous reviews are still present:
Line 81: Empty string fallback
options.datasetSlug || ""will pass an empty string torunInGithub, which eventually callsgetDatasetRows()that throws "Dataset slug is required". Validate upfront.Lines 85-87: The condition checks
options.relatedRefbut then spreadsoptions.aux. This is inverted logic—either checkoptions.auxor ensure consistency.if (process.env.GITHUB_ACTIONS === "true") { + if (!options.datasetSlug) { + throw new Error("datasetSlug is required when running in GitHub Actions"); + } return await this.runInGithub(task, { - datasetSlug: options.datasetSlug || "", + datasetSlug: options.datasetSlug, datasetVersion: options.datasetVersion, evaluators: options.evaluators, experimentSlug: options.experimentSlug, - experimentMetadata: options.relatedRef - ? { ...options.aux, created_from: "github" } - : { created_from: "github" }, + experimentMetadata: { ...options.aux, created_from: "github" }, experimentRunMetadata: { ...(options.relatedRef && { related_ref: options.relatedRef }), ...(options.aux && { aux: options.aux }), }, }); }
🧹 Nitpick comments (2)
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts (1)
499-505: Redundant response check loses detailed error information.The check at lines 499-502 is redundant because
handleResponse()at line 505 already validatesresponse.okand provides better error handling by parsing the JSON error body. The current sequence throws a generic message beforehandleResponse()can extract the API's detailed error.Either remove the redundant check and let
handleResponse()handle errors, or if you need custom handling here, parse the error body:- if (!response.ok) { - throw new Error( - `Failed to submit GitHub experiment: ${response.status} ${response.statusText}`, - ); - } - const data = await this.handleResponse(response);packages/sample-app/src/sample_github_experiment.ts (1)
49-76: Consider defensive handling for input type.Line 53 uses an unsafe type assertion
row.query as string. If the dataset contains non-string query values, this could produce unexpected behavior.For more robust samples that users might copy:
- const question = row.query as string; + const question = String(row.query ?? "");
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
packages/sample-app/src/sample_github_experiment.ts(1 hunks)packages/traceloop-sdk/src/lib/client/experiment/experiment.ts(3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Import AI/LLM semantic attribute constants from @traceloop/ai-semantic-conventions rather than hardcoding strings
Files:
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts
packages/traceloop-sdk/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
packages/traceloop-sdk/**/*.{ts,tsx}: Use the provided decorators (@workflow, @task, @agent) for workflow/task/agent spans instead of re-implementing them
For manual LLM operations, use trace.withLLMSpan from @traceloop/node-server-sdk
Files:
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts
🧠 Learnings (3)
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx} : Import AI/LLM semantic attribute constants from traceloop/ai-semantic-conventions rather than hardcoding strings
Applied to files:
packages/sample-app/src/sample_github_experiment.ts
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/traceloop-sdk/**/*.{ts,tsx} : For manual LLM operations, use trace.withLLMSpan from traceloop/node-server-sdk
Applied to files:
packages/sample-app/src/sample_github_experiment.ts
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/traceloop-sdk/**/*.{ts,tsx} : Use the provided decorators (workflow, task, agent) for workflow/task/agent spans instead of re-implementing them
Applied to files:
packages/sample-app/src/sample_github_experiment.ts
🔇 Additional comments (6)
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts (3)
98-104: LGTM!Clean extraction of local execution logic into a dedicated private method.
358-391: LGTM!Well-structured context extraction with proper validation and clear error messages. The restriction to PR events is appropriately enforced with a descriptive error.
393-426: LGTM!Proper per-row error isolation allows partial experiment completion. Error handling is consistent and preserves metadata for debugging.
packages/sample-app/src/sample_github_experiment.ts (3)
11-35: LGTM!Solid initialization pattern with proper error handling for SDK init failure and client validation.
78-121: LGTM!Clean execution flow with proper type discrimination between local and GitHub execution results. Comprehensive error handling includes stack traces for debugging.
124-128: LGTM!Standard top-level error handler for async entry point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed d3573c7 in 1 minute and 16 seconds. Click for details.
- Reviewed
13lines of code in1files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/sample-app/src/sample_github_experiment.ts:53
- Draft comment:
Added semicolon for style consistency. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative and does not provide any actionable feedback or suggestions. It simply states that a semicolon was added for style consistency, which does not require any action from the PR author.
2. packages/sample-app/src/sample_github_experiment.ts:56
- Draft comment:
Typo detected: The model name 'gpt-4o' appears to be misspelled. Did you mean 'gpt-4'? - Reason this comment was not posted:
Comment was on unchanged code.
Workflow ID: wflow_iV7xkfrkTZQy1tmG
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
packages/traceloop-sdk/src/lib/interfaces/experiment.interface.ts (1)
98-127: DifferentiateTaskResponsevsTaskResultto avoid confusionYou now have two very similar per-row types (
TaskResponsefor local runs andTaskResultfor GitHub runs). The shapes are close enough that it’s easy for consumers (and future maintainers) to mix them up. Consider either:
- Adding short doc comments explaining when each is used (SDK-local vs GitHub run payload), or
- Extracting a shared base type and extending it for each use-case, so the naming difference is enforced by the type hierarchy instead of being purely conventional.
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts (2)
381-414: Per-row error capture inexecuteTasksLocallyis good; consider richer error metadataThe per-row try/catch in
executeTasksLocallyis a nice improvement for the GitHub path, ensuring one bad row doesn’t abort the entire run and exposingerroralongside the originalinput.You might consider also including
error.name(or a simple classification) inmetadataso the backend / UI can distinguish different failure types without parsing the message:metadata: { rowId: row.id, timestamp: Date.now(), errorType: error instanceof Error ? error.name : "UnknownError", }
420-502: ReusevalidateRunOptionsfor GitHub runs to keep evaluator validation consistentIn
runLocallyyou callvalidateRunOptionsto enforce that the task is a function and evaluators have valid names/versions. InrunInGithub, you recheck only thattaskis a function and then map evaluators directly to slugs without any structural validation.For consistency and earlier feedback on invalid evaluator configs, consider reusing the existing validator here:
// Near the top of runInGithub, before getGithubContext(): this.validateRunOptions(task, { datasetSlug, datasetVersion, evaluators, experimentSlug, });This would let you drop the manual
tasktypeof check and ensure both local and GitHub paths reject malformed evaluator entries in the same way.packages/sample-app/src/sample_github_experiment.ts (1)
125-128: Make top-level error logging robust to non-Error throwsEverywhere else you defensively format errors with
error instanceof Error ? error.message : String(error), but the top-levelmain().catchassumes anError:main().catch((error) => { console.error("💥 Application failed:", error.message); process.exit(1); });To avoid logging
undefinedif something throws a non-Error (string, etc.), mirror the earlier pattern:-main().catch((error) => { - console.error("💥 Application failed:", error.message); - process.exit(1); -}); +main().catch((error) => { + console.error( + "💥 Application failed:", + error instanceof Error ? error.message : String(error), + ); + process.exit(1); +});
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
packages/sample-app/src/sample_github_experiment.ts(1 hunks)packages/traceloop-sdk/src/lib/client/experiment/experiment.ts(3 hunks)packages/traceloop-sdk/src/lib/interfaces/experiment.interface.ts(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Import AI/LLM semantic attribute constants from @traceloop/ai-semantic-conventions rather than hardcoding strings
Files:
packages/traceloop-sdk/src/lib/client/experiment/experiment.tspackages/traceloop-sdk/src/lib/interfaces/experiment.interface.ts
packages/traceloop-sdk/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
packages/traceloop-sdk/**/*.{ts,tsx}: Use the provided decorators (@workflow, @task, @agent) for workflow/task/agent spans instead of re-implementing them
For manual LLM operations, use trace.withLLMSpan from @traceloop/node-server-sdk
Files:
packages/traceloop-sdk/src/lib/client/experiment/experiment.tspackages/traceloop-sdk/src/lib/interfaces/experiment.interface.ts
🧠 Learnings (4)
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx} : Import AI/LLM semantic attribute constants from traceloop/ai-semantic-conventions rather than hardcoding strings
Applied to files:
packages/sample-app/src/sample_github_experiment.tspackages/traceloop-sdk/src/lib/client/experiment/experiment.tspackages/traceloop-sdk/src/lib/interfaces/experiment.interface.ts
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/traceloop-sdk/**/*.{ts,tsx} : For manual LLM operations, use trace.withLLMSpan from traceloop/node-server-sdk
Applied to files:
packages/sample-app/src/sample_github_experiment.ts
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/traceloop-sdk/**/*.{ts,tsx} : Use the provided decorators (workflow, task, agent) for workflow/task/agent spans instead of re-implementing them
Applied to files:
packages/sample-app/src/sample_github_experiment.ts
📚 Learning: 2025-08-13T08:00:42.668Z
Learnt from: galzilber
Repo: traceloop/openllmetry-js PR: 643
File: packages/traceloop-sdk/src/lib/interfaces/dataset.interface.ts:27-27
Timestamp: 2025-08-13T08:00:42.668Z
Learning: In packages/traceloop-sdk/src/lib/interfaces/dataset.interface.ts, when the user said "there is no date" in response to a comment about removing "date" from column type unions, they may have been referring to local changes not yet reflected in the PR branch, as the code still contained "date" in the union types at lines 27 and 43.
Applied to files:
packages/traceloop-sdk/src/lib/interfaces/experiment.interface.ts
🧬 Code graph analysis (1)
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts (1)
packages/traceloop-sdk/src/lib/interfaces/experiment.interface.ts (7)
ExperimentTaskFunction(6-11)ExperimentRunOptions(20-29)RunInGithubOptions(112-121)ExperimentRunResult(42-48)RunInGithubResponse(123-127)GithubContext(98-103)TaskResult(105-110)
🔇 Additional comments (3)
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts (3)
75-84:runnow returns a union type; ensure callers narrow appropriatelyThe new signature
async run<TInput, TOutput>( task: ExperimentTaskFunction<TInput, TOutput>, options: ExperimentRunOptions | RunInGithubOptions = {}, ): Promise<ExperimentRunResult | RunInGithubResponse>changes the public API from a single result type to a union. That’s reasonable given the auto GitHub dispatch, but any existing callers that assumed
ExperimentRunResultwill now need to narrow the type (e.g.'taskResults' in result) similar to the new GitHub sample.It’s worth double‑checking downstream usages and updating docs/examples to reflect this unioned return, so consumers aren’t surprised by the new shape when running under GitHub Actions.
86-182: Local path extraction intorunLocallylooks correctPulling the non‑GitHub logic into
runLocallypreserves the previous behavior: same experiment initialization, dataset loading, per‑row task execution, evaluator invocation, and aggregated result shape. The separation makes the GitHub path easier to reason about without changing local semantics.
346-379: GitHub context extraction and PR validation are solid
getGithubContextcorrectly validates all required env vars (repository, ref, sha, actor) and enforces that the ref corresponds to a pull request (refs/pull/<n>/...), producing clear error messages when assumptions aren’t met. This should fail fast in misconfigured workflows while giving enough information to debug.
🧪 Traceloop Experiment ResultsExperiment: research-ts Evaluators Results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 9db388d in 45 seconds. Click for details.
- Reviewed
75lines of code in1files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/src/lib/client/experiment/experiment.ts:388
- Draft comment:
Using Promise.all improves concurrency but ensure tasks are safe to run in parallel and consider throttling for large datasets. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
2. packages/traceloop-sdk/src/lib/client/experiment/experiment.ts:488
- Draft comment:
Removing explicit mapping of response fields relies on transformApiResponse. Ensure it always provides the expected keys (e.g. experimentSlug fallback) to avoid breaking consumers. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
Workflow ID: wflow_8Z2ebIW5B8PxSVM2
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed b4a5572 in 51 seconds. Click for details.
- Reviewed
13lines of code in1files - Skipped
0files when reviewing. - Skipped posting
1draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/src/lib/client/experiment/experiment.ts:407
- Draft comment:
The change adds a trailing comma after the arrow function block. This is stylistic and aligns with common formatting practices. Ensure it conforms with the project's lint/style guidelines. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_MsnkOpJob6XIw1Ci
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts(3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Import AI/LLM semantic attribute constants from @traceloop/ai-semantic-conventions rather than hardcoding strings
Files:
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts
packages/traceloop-sdk/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
packages/traceloop-sdk/**/*.{ts,tsx}: Use the provided decorators (@workflow, @task, @agent) for workflow/task/agent spans instead of re-implementing them
For manual LLM operations, use trace.withLLMSpan from @traceloop/node-server-sdk
Files:
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts
🧠 Learnings (1)
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx} : Import AI/LLM semantic attribute constants from traceloop/ai-semantic-conventions rather than hardcoding strings
Applied to files:
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts
🧬 Code graph analysis (1)
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts (1)
packages/traceloop-sdk/src/lib/interfaces/experiment.interface.ts (7)
ExperimentTaskFunction(6-11)ExperimentRunOptions(20-29)RunInGithubOptions(112-121)ExperimentRunResult(42-48)RunInGithubResponse(123-127)GithubContext(98-103)TaskResult(105-110)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and test
🔇 Additional comments (4)
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts (4)
15-18: LGTM! New imports support GitHub Actions integration.The imported types (GithubContext, TaskResult, RunInGithubOptions, RunInGithubResponse) are properly used throughout the new GitHub Actions functionality.
346-379: Implementation is sound; ensure environment check is consistent with caller.The
getGithubContext()implementation correctly:
- Validates all required GitHub environment variables
- Extracts PR number with appropriate regex
- Restricts usage to PR events with clear error messages
However, ensure the environment detection in
run()(line 79) is consistent with the expectations here. Ifrun()only checksGITHUB_ACTIONS === "true"but this method requires additional variables likeGITHUB_REPOSITORY,GITHUB_REF, etc., the error will occur late in execution rather than at the routing decision point.
381-412: Excellent implementation using Promise.all and map!This method addresses previous feedback perfectly:
- Uses
Promise.allwithmapfor efficient parallel task execution (as requested in past review)- Proper error handling captures individual task failures without stopping the entire run
- Clear result structure with either
outputorerrorper taskThe implementation is clean and type-safe.
Based on past review feedback requesting this pattern.
414-495: Well-structured implementation that addresses previous concerns.The
runInGithub()method correctly:
- Validates task function upfront
- Executes tasks locally and captures results
- Properly merges
experimentMetadata(lines 455-458) - addressing ellipsis-dev's feedback about it being ignored- Correctly handles both
relatedRefandauxconditionally (lines 460-464) - fixing the previous logic bug mentioned by coderabbitai- Constructs complete
github_contextpayload with all required fields- Provides comprehensive error wrapping
The metadata construction at lines 455-464 is now correct:
const mergedExperimentMetadata = { ...(experimentMetadata || {}), created_from: "github", }; const mergedExperimentRunMetadata = { ...(experimentRunMetadata || {}), ...(relatedRef && { related_ref: relatedRef }), ...(aux && { aux: aux }), };This properly spreads all provided metadata before adding GitHub-specific fields.
Based on past review feedback that has been addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts (1)
74-84: Unsafe cast toRunInGithubOptionsand lack of upfrontdatasetSlugvalidation in GitHub path.
optionsis typed asExperimentRunOptions | RunInGithubOptions, butdatasetSlugis optional onExperimentRunOptions. The cast:return await this.runInGithub(task, options as RunInGithubOptions);bypasses that and lets
runInGithubassume a non-emptydatasetSlug, even though callers ofrunare not forced to provide it. At runtimegetDatasetRowswill still throw, but the error is late and the type contract ofrunInGithubis effectively weakened.Consider validating
datasetSlugbefore delegating, and only then asserting the type, e.g.:if (process.env.GITHUB_ACTIONS === "true") { if (!("datasetSlug" in options) || !options.datasetSlug) { throw new Error("datasetSlug is required when running in GitHub Actions"); } return await this.runInGithub(task, options as RunInGithubOptions); }This removes the unsound cast and gives a clearer error specific to the GitHub flow.
🧹 Nitpick comments (1)
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts (1)
418-495: GitHub experiment flow is well-structured; consider reusing evaluator validation.The
runInGithubflow is coherent: it validates the task, pulls GitHub context, loads dataset rows, executes tasks locally, builds evaluator slugs, mergesexperimentMetadata(now correctly preserving caller metadata and addingcreated_from: "github"), foldsrelatedRef/auxinto run metadata, and posts a single payload to the backend. That addresses the prior metadata-construction concern.One improvement: evaluator validation here diverges from the local path.
validateRunOptionsalready enforces non-empty evaluator names/versions, butrunInGithubcurrently just maps evaluators without checks. To keep behavior consistent and avoid malformed evaluator definitions getting through only on GitHub runs, you could reuse the existing validator:if (!task || typeof task !== "function") { throw new Error("Task function is required and must be a function"); } this.validateRunOptions(task, { evaluators });This keeps all evaluator-related rules in one place.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts(3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Import AI/LLM semantic attribute constants from @traceloop/ai-semantic-conventions rather than hardcoding strings
Files:
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts
packages/traceloop-sdk/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
packages/traceloop-sdk/**/*.{ts,tsx}: Use the provided decorators (@workflow, @task, @agent) for workflow/task/agent spans instead of re-implementing them
For manual LLM operations, use trace.withLLMSpan from @traceloop/node-server-sdk
Files:
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts
🧠 Learnings (1)
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx} : Import AI/LLM semantic attribute constants from traceloop/ai-semantic-conventions rather than hardcoding strings
Applied to files:
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts
🧬 Code graph analysis (1)
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts (1)
packages/traceloop-sdk/src/lib/interfaces/experiment.interface.ts (7)
ExperimentTaskFunction(6-11)ExperimentRunOptions(20-29)RunInGithubOptions(112-121)ExperimentRunResult(42-48)RunInGithubResponse(123-127)GithubContext(98-103)TaskResult(105-110)
🔇 Additional comments (4)
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts (4)
15-18: New GitHub-related types wired correctly into client.Importing
GithubContext,TaskResult,RunInGithubOptions, andRunInGithubResponsefrom the shared interfaces keeps the client aligned with the central type definitions; no issues here.
89-182: Local execution refactor intorunLocallypreserves existing behavior.The extraction of the previous
runlogic intorunLocallylooks behaviorally equivalent: experiment slug generation, dataset row loading, per-row task execution, evaluator invocation, and result shaping are unchanged and still guarded byvalidateRunOptions. No new issues spotted.
346-379: GitHub context extraction is PR-only; ensure this matches the intended usage.
getGithubContextcorrectly derives repository, commit SHA, actor, and PR URL fromGITHUB_*env vars, but it hard-fails whenGITHUB_REFdoesn’t matchrefs/pull/<number>/…. That meansrunInGithubwill currently only work for pull-request-triggered workflows, notpushorworkflow_dispatchruns on branches.If that’s intentional, it’s fine, but it should be documented and you may want a more targeted error message or pre-check at call sites to avoid surprises when workflows run on non-PR events.
384-412: Parallel task execution withPromise.allis a good improvement; behavior is clear.
executeTasksLocallyruns the task for all rows in parallel and returns structuredTaskResultentries, capturing errors per row without throwing. This aligns well with the GitHub use case and is cleanly separated from the localExperimentRunResultpath.
Important
Add GitHub Actions support for experiments with new
runInGithub()method and sample experiment.runInGithub()method inExperimentclass inexperiment.tsfor GitHub Actions execution.sample_github_experiment.tsfor GitHub-backed experiment sample.run:github_experimentscript inpackage.json.GithubContext,TaskResult,RunInGithubOptions, andRunInGithubResponsetypes inexperiment.interface.ts.run()method inExperimentclass to detect GitHub Actions and route torunInGithub().sample_experiment.tsto print results only iftaskResultsexist.This description was created by
for b4a5572. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
New Features
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.