-
Notifications
You must be signed in to change notification settings - Fork 48
fix(experiment): Finetune experiment docs #103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. Caution Review failedThe pull request is closed. WalkthroughAdds new Datasets documentation (quick start and SDK usage), expands Experiments docs with introduction and result-overview pages, rewrites the experiments running-from-code page to an SDK-centric flow with explicit task I/O contracts and TypeScript typings, and updates site navigation in Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Dev as Developer
participant SDK as Traceloop SDK
participant SVC as Traceloop Service
rect rgba(230,240,255,0.35)
note over Dev,SDK: Dataset lifecycle (SDK usage)
Dev->>SDK: initialize(config)
SDK->>SVC: auth + feature sync
Dev->>SDK: create/fromCSV/fromDataFrame
SDK->>SVC: create dataset (draft)
Dev->>SDK: addColumn / addRow(s)
SDK->>SVC: mutate schema/data (draft)
Dev->>SDK: publish(version)
SDK->>SVC: create immutable snapshot
Dev->>SDK: get / getVersionCSV
SDK->>SVC: retrieve dataset/version
end
sequenceDiagram
autonumber
participant Dev as Developer
participant SDK as Traceloop SDK
participant SVC as Traceloop Service
rect rgba(235,255,235,0.35)
note over Dev,SDK: Experiment run via SDK (updated flow)
Dev->>SDK: initialize → waitForInitialization → getClient
Dev->>SDK: experiment.run({ datasetSlug, datasetVersion, task, evaluators, ... })
loop for each dataset row
SDK->>SDK: invoke task(input: TaskInput) -> TaskOutput
SDK->>SVC: log task input/output
SDK->>SVC: run evaluators and log results
end
SDK-->>Dev: run summary and logs
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
✨ Finishing Touches🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed everything up to f1f981e in 2 minutes and 28 seconds. Click for details.
- Reviewed
670lines of code in6files - Skipped
10files when reviewing. - Skipped posting
9draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. datasets/quick-start.mdx:1
- Draft comment:
Clear and concise Quick Start docs for datasets. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. datasets/sdk-usage.mdx:39
- Draft comment:
Ensure the getting started guide link (/openllmetry/...) reflects the correct branding if intended. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
3. experiments/introduction.mdx:1
- Draft comment:
Introduction page is well-structured and clearly outlines experiment capabilities. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
4. experiments/result-overview.mdx:1
- Draft comment:
Result Overview details are clear and include sufficient visual aids. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
5. experiments/running-from-code.mdx:57
- Draft comment:
Typo: 'define' should be 'defines' in the task function description. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
6. mint.json:155
- Draft comment:
Navigation updates for Datasets and Experiments groups look good; verify ordering and branding consistency. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
7. datasets/sdk-usage.mdx:41
- Draft comment:
Typo alert: The URL path '/openllmetry/getting-started-python' appears to have a misspelling. It might be intended as '/opentelemetry/getting-started-python'. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% Looking at the context, this is a Traceloop SDK documentation, not OpenTelemetry. The URL path '/openllmetry' could be intentional - it might be a play on words combining 'LLM' (Large Language Models) with 'telemetry'. Without access to the actual website structure or knowing if this URL actually works, I can't be certain this is actually a typo. I might be overthinking this - if it's clearly a typo, it should be fixed. Also, the unusual spelling could confuse users. However, without being able to verify the correct URL structure or knowing the intentional branding decisions, making assumptions about the correct spelling could introduce errors. Since we can't verify the intended URL structure and this could be intentional branding, we should err on the side of caution and not keep this comment.
8. experiments/running-from-code.mdx:57
- Draft comment:
Typo: It should be "defines" instead of "define" in the sentence "Create a task function that define how your AI system processes each dataset row." - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% While this is technically correct grammar, fixing minor grammatical issues in documentation is not a critical code change. The meaning is perfectly clear either way. The rules state we should not make purely informative comments or comments about obvious/unimportant issues. The grammar error could make the documentation look less professional. Poor grammar in documentation could reflect badly on the project's quality. While professional documentation is important, this is such a minor grammatical issue that it doesn't significantly impact readability or understanding. The rules explicitly state not to make unimportant comments. Delete this comment as it points out a trivial grammatical issue that doesn't meaningfully impact the documentation's effectiveness.
9. experiments/running-from-code.mdx:43
- Draft comment:
Typographical error in the URL: "openllmetry" seems to be a typo; perhaps it should be "opentelemetry". - Reason this comment was not posted:
Based on historical feedback, this comment is too similar to comments previously marked by users as bad.
Workflow ID: wflow_NFxINPhicVyyo7oj
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
experiments/introduction.mdx (1)
32-32: Remove stray “32” character.It will render as visible text at the end of the page.
-32 +datasets/sdk-usage.mdx (1)
227-227: Remove stray “227” character at EOF.Likely an artifact of code generation that will render.
-227 +experiments/result-overview.mdx (1)
45-45: Remove stray trailing number.-45 +experiments/running-from-code.mdx (1)
259-267: Double initialization and API mismatch.
- You call both
Traceloop.init()andTraceloop.client(). Earlier,init()returned the client; keep one pattern.-# Initialize Traceloop -Traceloop.init() -client = Traceloop.client() +# Initialize Traceloop +client = Traceloop.init()
🧹 Nitpick comments (30)
experiments/introduction.mdx (3)
5-5: Tighten phrasing: “change of flow” → “flow change”.Reads more naturally and avoids awkward construction.
-Building reliable LLM applications means knowing whether a new prompt, model, or change of flow actually makes things better. +Building reliable LLM applications means knowing whether a new prompt, model, or flow change actually makes things better.
7-13: Add alt text and lazy-loading to images for accessibility and performance.Current
tags have no alt; consider descriptive alts and lazy loading.
-<Frame> - <img - className="block dark:hidden" - src="/img/experiment/exp-list-light.png" - /> - <img className="hidden dark:block" src="/img/experiment/exp-list-dark.png" /> -</Frame> +<Frame> + <img + className="block dark:hidden" + src="/img/experiment/exp-list-light.png" + alt="Experiments list view in light theme" + loading="lazy" + /> + <img + className="hidden dark:block" + src="/img/experiment/exp-list-dark.png" + alt="Experiments list view in dark theme" + loading="lazy" + /> +</Frame>
18-31: Polish card titles and copy; fix pluralization.
- “Compare Experiment Runs Results” is awkward. Prefer “Compare Experiment Runs.”
- “evaluator input” → “evaluator inputs.” Minor copy tightening elsewhere.
- <Card title="Run Multiple Evaluators" icon="list-check"> - Execute multiple evaluation checks against your dataset + <Card title="Run Multiple Evaluators" icon="list-check"> + Execute multiple evaluation checks against your dataset. </Card> - <Card title="View Complete Results" icon="table"> - See all experiment run outputs in a comprehensive table view with relevant indicators and detailed reasoning + <Card title="View Complete Results" icon="table"> + See all experiment run outputs in a comprehensive table with indicators and reasoning. </Card> - <Card title="Compare Experiment Runs Results" icon="code-compare"> - Run the same experiment across different dataset versions to see how it affects your workflow + <Card title="Compare Experiment Runs" icon="code-compare"> + Run the same experiment across dataset versions to see how it affects your workflow. </Card> <Card title="Custom Task Pipelines" icon="code"> - Add a tailored task to the experiment to create evaluator input. For example: LLM calls, semantic search, etc. + Create a custom task pipeline that generates evaluator inputs (e.g., LLM calls, semantic search). </Card>datasets/quick-start.mdx (6)
5-11: Add alt text and lazy-loading to dataset images.Improves accessibility and page performance.
<Frame> <img className="block dark:hidden" src="/img/dataset/dataset-list-light.png" + alt="Dataset list view in light theme" + loading="lazy" /> - <img className="hidden dark:block" src="/img/dataset/dataset-list-dark.png" /> + <img + className="hidden dark:block" + src="/img/dataset/dataset-list-dark.png" + alt="Dataset list view in dark theme" + loading="lazy" + /> </Frame>
13-15: Tighten phrasing.“available in the SDK” → “available via the SDK.” Also remove extra space.
-Datasets are simple data tables that you can use to manage your data for experiments and evaluation of your AI applications. -Datasets are available in the SDK, and they enable you to create versioned snapshots for reproducible testing. +Datasets are simple data tables to manage data for experiments and evaluation of your AI applications. +Datasets are available via the SDK and enable versioned snapshots for reproducible testing.
26-30: Make list items parallel and add terminal punctuation.Minor style improvement flagged by the linter.
-- **Text**: For prompts, model responses, or any textual data -- **Number**: For numerical values, scores, or metrics -- **Boolean**: For true/false flags or binary classifications +- **Text**: Prompts, model responses, or any textual data. +- **Number**: Numerical values, scores, or metrics. +- **Boolean**: True/false flags or binary classifications.
40-46: Add alt text to the second image set.<Frame> <img className="block dark:hidden" src="/img/dataset/dataset-view-light.png" + alt="Dataset edit view in light theme" + loading="lazy" /> - <img className="hidden dark:block" src="/img/dataset/dataset-view-dark.png" /> + <img + className="hidden dark:block" + src="/img/dataset/dataset-view-dark.png" + alt="Dataset edit view in dark theme" + loading="lazy" + /> </Frame>
49-53: Grammar: “Published versions”.Fixes adjective form and article.
-1. Click **Publish Version** to create a stable snapshot -2. Published versions are immutable -3. Publish versions are accessible in the SDK +1. Click **Publish Version** to create a stable snapshot. +2. Published versions are immutable. +3. Published versions are accessible via the SDK.
57-61: Optional: Link to SDK usage page from “version history” step.Helps users discover how to fetch versions programmatically.
-You can access all published versions of your dataset by opening the version history modal. This allows you to: +You can access all published versions of your dataset by opening the version history modal. You can also fetch specific versions via the SDK (see Datasets → SDK Usage). This allows you to:datasets/sdk-usage.mdx (4)
2-2: Capitalize title consistently: “SDK Usage”.-title: "SDK usage" +title: "SDK Usage"
61-70: Grammar: possessive “patients’ medical questions”.- description="Dataset with patients medical questions" + description="Dataset with patients' medical questions"
175-205: Align row schema with earlier defined columns.The row example includes fields (prompt, response, model) that weren’t added in the “Adding a Column” step for the manual dataset path. Consider either:
- defining those columns beforehand, or
- using a row that matches the defined schema.
-const rowData = { - user_id: userId, - prompt: prompt, - response: `This is the model response`, - model: "gpt-3.5-turbo", - satisfaction_score: 1, -}; +const rowData = { + "user-id": userId, + "satisfaction-score": 1, +};
20-35: Use a Minimal Initialization Example with Advanced Options CommentedThe SDK’s
initializefunction supports bothdisableBatchandtraceloopSyncEnabledas valid options. For most users, you can rely on yourTRACELOOP_API_KEYenvironment variable and calltraceloop.initialize()with no arguments; advanced flags can then be added as needed for development or prompt‐registry workflows (traceloop.com, deepwiki.com).File: datasets/sdk-usage.mdx
Lines: 20–35Suggested diff:
import * as traceloop from "@traceloop/node-server-sdk"; -// Initialize with comprehensive configuration -traceloop.initialize({ - appName: "your-app-name", - apiKey: process.env.TRACELOOP_API_KEY, - disableBatch: true, - traceloopSyncEnabled: true, -}); +// Minimal initialization (uses TRACELOOP_API_KEY env var) +traceloop.initialize(); + +// Advanced options (uncomment if needed): +// traceloop.initialize({ disableBatch: true }); +// traceloop.initialize({ traceloopSyncEnabled: true }); // Wait for initialization to complete await traceloop.waitForInitialization(); // Get the client instance for dataset operations const client = traceloop.getClient();experiments/result-overview.mdx (4)
6-12: Add alt text and lazy-loading to images.<Frame> <img className="block dark:hidden" src="/img/experiment/exp-list-light.png" + alt="Experiments overview list in light theme" + loading="lazy" /> - <img className="hidden dark:block" src="/img/experiment/exp-list-dark.png" /> + <img + className="hidden dark:block" + src="/img/experiment/exp-list-dark.png" + alt="Experiments overview list in dark theme" + loading="lazy" + /> </Frame>
17-23: Add alt text and lazy-loading to run list images.<Frame> <img className="block dark:hidden" src="/img/experiment/exp-run-list-light.png" + alt="Experiment run list view in light theme" + loading="lazy" /> - <img className="hidden dark:block" src="/img/experiment/exp-run-list-dark.png" /> + <img + className="hidden dark:block" + src="/img/experiment/exp-run-list-dark.png" + alt="Experiment run list view in dark theme" + loading="lazy" + /> </Frame>
36-42: Add alt text and lazy-loading to single run images.<Frame> <img className="block dark:hidden" src="/img/experiment/exp-run-light.png" + alt="Experiment run details view in light theme" + loading="lazy" /> - <img className="hidden dark:block" src="/img/experiment/exp-run-dark.png" /> + <img + className="hidden dark:block" + src="/img/experiment/exp-run-dark.png" + alt="Experiment run details view in dark theme" + loading="lazy" + /> </Frame>
5-15: Optional cross-links to adjacent docs.Add “See also” links to Introduction and Running via SDK to streamline navigation.
-All experiments are logged in the Traceloop platform. Each experiment is executed through the SDK. +All experiments are logged in the Traceloop platform. Each experiment is executed through the SDK. + +See also: [Introduction](/experiments/introduction) · [Run via SDK](/experiments/running-from-code)experiments/running-from-code.mdx (9)
21-37: TypeScript initialization looks good; ensure minimal-first guidance.The advanced options are helpful, but starting with a minimal example improves DX. Consider mirroring the change made in SDK Usage to show minimal init first, then advanced options.
57-62: Grammar fixes in task-function description.-Create a task function that define how your AI system processes each dataset row. The task is one of the experiments parameters, it will run it on each dataset row. +Create a task function that defines how your AI system processes each dataset row. The task is one of the experiment’s parameters and will run on each dataset row.
65-67: Include missing typing imports for the Python signature snippet.Without imports, the snippet may confuse readers.
```python Python -from typing import Callable, Optional, Dict, Any -task: Callable[[Optional[Dict[str, Any]]], Dict[str, Any]] +from typing import Callable, Optional, Dict, Any + +task: Callable[[Optional[Dict[str, Any]]], Dict[str, Any]]--- `183-191`: **Align dataset slug/version with earlier examples for consistency.** Earlier sections use “medical-questions” and publish to v1; keep consistent to reduce confusion. ```diff - dataset_slug="medical-q", - dataset_version="v1", + dataset_slug="medical-questions", + dataset_version="v1",
209-231: Unify Python OpenAI usage style.This section uses
openai.ChatCompletion.acreatewhereas earlier you usedAsyncOpenAI. Pick one style for consistency.- response = await openai.ChatCompletion.acreate( - model="gpt-4", - messages=[ + response = await openai.chat.completions.create({ + model: "gpt-4", + messages: [ {"role": "system", "content": "Be very careful and conservative in your response."}, {"role": "user", "content": input_data["question"]} - ] - ) - return {"response": response.choices[0].message.content} + ], + }) + return {"response": (response.choices or [None])[0].message.content}
268-283: Add OpenAI key handling in the complete example.The example uses OpenAI but doesn’t show API key setup; add a quick note or environment read for completeness.
-import openai +import openai, os @@ - response = await openai.ChatCompletion.acreate( + openai.api_key = os.getenv("OPENAI_API_KEY") + response = await openai.ChatCompletion.acreate(
349-356: Parameters list: call out error-handling flags.Since you show
stop_on_error/stopOnErrorearlier, include it here for completeness.- `experiment_slug` (str): Unique identifier for this experiment +- `stop_on_error` (bool): Whether to stop on the first task error (Python). In TypeScript, use `stopOnError`.
365-371: Add best practice about evaluator–task contract.Highlight that task outputs must include evaluator-required fields (already noted above), repeated here for reinforcement.
4. **Use appropriate evaluators** that match your use case 5. **Compare multiple approaches** systematically to find the best solution +6. **Ensure task outputs include evaluator input fields** so evaluators can run without errors.
194-201: Highlight parameter-order differences between Python and TypeScriptThe Python and TypeScript SDKs use different call signatures for
experiment.run(). In Python you supply only named parameters (e.g.dataset_slug="…",evaluators=[…]), whereas in TypeScript you pass the experiment task as the first positional argument and an options object second. Adding an explicit callout will help readers avoid confusion.Please update
experiments/running-from-code.mdxaround the TypeScript snippet (lines 194–201):• Above the TS example, insert a note along the lines of:
+ <!-- Note: Unlike the Python SDK, which uses only keyword arguments (e.g. `dataset_slug="…"`) for all settings, the TypeScript SDK takes the experiment task as its first positional argument followed by an options object. -->mint.json (4)
143-145: Two groups named “Quick Start” may confuse users; consider disambiguating.We have two separate groups with the same label (“Quick Start”) for different products/areas. Suggest renaming the Hub one to “Hub Quick Start” (or similar) to clarify the sidebar.
{ - "group": "Quick Start", + "group": "Hub Quick Start", "pages": ["hub/getting-started", "hub/configuration"] },Also applies to: 80-90
34-60: Optional: Add anchors for Datasets and Experiments for quicker access.If these are now first-class concepts, consider adding them to the top-level anchors.
"anchors": [ { "name": "OpenLLMetry", "icon": "telescope", "url": "openllmetry" }, + { + "name": "Datasets", + "icon": "database", + "url": "datasets/quick-start" + }, + { + "name": "Experiments", + "icon": "beaker", + "url": "experiments/introduction" + }, { "name": "Hub", "icon": "arrows-to-dot", "url": "hub" },
198-211: Optional: Add shortlink redirects for new sections.Handy shortlinks can improve shareability. Redirect “/datasets” and “/experiments” to their landing pages.
"redirects": [ { "source": "/openllmetry/integrations/exporting", "destination": "/openllmetry/integrations/introduction" }, { "source": "/openllmetry/tracing/decorators", "destination": "/openllmetry/tracing/annotations" }, { "source": "/openllmetry/tracing/privacy", "destination": "/openllmetry/privacy/traces" + }, + { + "source": "/datasets", + "destination": "/datasets/quick-start" + }, + { + "source": "/experiments", + "destination": "/experiments/introduction" } ],
106-137: Two “Integrations” groups; consider clarifying scope labels.There are two groups named “Integrations” (OpenLLMetry vendor integrations vs. a product-specific PostHog page). Consider renaming the latter to “Product Integrations” (or “Hub Integrations”) to reduce sidebar ambiguity.
{ - "group": "Integrations", + "group": "Product Integrations", "pages": ["integrations/posthog"] },Also applies to: 163-165
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (10)
img/dataset/dataset-list-dark.pngis excluded by!**/*.pngimg/dataset/dataset-list-light.pngis excluded by!**/*.pngimg/dataset/dataset-view-dark.pngis excluded by!**/*.pngimg/dataset/dataset-view-light.pngis excluded by!**/*.pngimg/experiment/exp-list-dark.pngis excluded by!**/*.pngimg/experiment/exp-list-light.pngis excluded by!**/*.pngimg/experiment/exp-run-dark.pngis excluded by!**/*.pngimg/experiment/exp-run-light.pngis excluded by!**/*.pngimg/experiment/exp-run-list-dark.pngis excluded by!**/*.pngimg/experiment/exp-run-list-light.pngis excluded by!**/*.png
📒 Files selected for processing (6)
datasets/quick-start.mdx(1 hunks)datasets/sdk-usage.mdx(1 hunks)experiments/introduction.mdx(1 hunks)experiments/result-overview.mdx(1 hunks)experiments/running-from-code.mdx(1 hunks)mint.json(1 hunks)
🧰 Additional context used
🪛 LanguageTool
experiments/introduction.mdx
[style] ~5-~5: The wording of this phrase can be improved.
Context: ...ompt, model, or change of flow actually makes things better. <img className="block d...
(MAKE_STYLE_BETTER)
datasets/sdk-usage.mdx
[grammar] ~51-~51: There might be a mistake here.
Context: ...rent ways depending on your data source: - Python: Import from CSV file or pandas...
(QB_NEW_EN)
[grammar] ~52-~52: There might be a mistake here.
Context: ...Import from CSV file or pandas DataFrame - TypeScript: Import from CSV data or cr...
(QB_NEW_EN)
datasets/quick-start.mdx
[grammar] ~26-~26: There might be a mistake here.
Context: ...set. You can add different column types: - Text: For prompts, model responses, or...
(QB_NEW_EN)
[grammar] ~27-~27: There might be a mistake here.
Context: ...ts, model responses, or any textual data - Number: For numerical values, scores, ...
(QB_NEW_EN)
[grammar] ~28-~28: There might be a mistake here.
Context: ...For numerical values, scores, or metrics - Boolean: For true/false flags or binar...
(QB_NEW_EN)
[grammar] ~57-~57: There might be a mistake here.
Context: ...rsion history modal. This allows you to: - Compare different versions of your datas...
(QB_NEW_EN)
experiments/running-from-code.mdx
[grammar] ~50-~50: There might be a mistake here.
Context: ...nt Structure An experiment consists of: - A dataset to test against - A **task...
(QB_NEW_EN)
[grammar] ~51-~51: There might be a mistake here.
Context: ...ists of: - A dataset to test against - A task function that defines what yo...
(QB_NEW_EN)
[grammar] ~52-~52: There might be a mistake here.
Context: ...at defines what your AI system should do - Evaluators to measure performance ## ...
(QB_NEW_EN)
[grammar] ~59-~59: There might be a mistake here.
Context: .... The task function signature expects: - Input: An optional dictionary containi...
(QB_NEW_EN)
[grammar] ~60-~60: There might be a mistake here.
Context: ...ctionary containing the dataset row data - Output: A dictionary with your task re...
(QB_NEW_EN)
🔇 Additional comments (4)
datasets/sdk-usage.mdx (2)
39-43: Confirm Getting Started Guide Link and Product CapitalizationThe link in your
<Note>block points to/openllmetry/getting-started-python, which may not match our published docs structure or slug. Please verify the exact path and adjust accordingly. For example, the canonical Python guide is typically one of:
- GitHub source:
https://github.com/traceloop/OpenLLMetry/blob/main/docs/getting-started/python.md- GitHub Pages:
https://traceloop.github.io/OpenLLMetry/getting-started/pythonAfter confirming the correct slug:
- Update the MDX link to use the verified path (e.g.
/docs/openllmetry/getting-started/pythonor the appropriate root-relative URL).- Ensure the product name “OpenLLMetry” is spelled with two capital “L”s everywhere (in slugs, titles, and text).
151-169: Please verify the correct method name and parameter shape for adding multiple columns in the Dataset SDK.
- In datasets/sdk-usage.mdx (lines 165–169), the example calls:
await myDataset.addColumn(columnsToAdd);
but noaddColumnsusage exists in this repo.- Confirm whether the SDK’s
addColumnmethod is overloaded to accept an array of column definitions, or whether a separateaddColumnsmethod should be used for bulk addition.- If the SDK requires
addColumnsfor multiple items, update the example accordingly; otherwise, leave as-is.experiments/running-from-code.mdx (1)
42-45: Confirm Python guide link and product name capitalizationThe slug
/openllmetry/getting-started-pythonmatches theopenllmetry/getting-started-python.mdxfile in the docs—no change to the link path is needed.
Please double-check that the product name “OpenLLMetry” is capitalized correctly in the guide’s title and headings.mint.json (1)
154-161: All referenced pages verified; ready to merge.I ran the verification script against the repository and confirmed that all of the newly added MDX files for the Datasets and Experiments groups are present. No missing pages detected.
| from openai import AsyncOpenAI | ||
|
|
||
| def provide_medical_info_prompt(question: str) -> str: | ||
| """ | ||
| Provides comprehensive medical information without restrictions | ||
| """ | ||
| return f"""You are a knowledgeable health educator. | ||
| Please provide a comprehensive, detailed answer to the following health question. | ||
| Question: {question} | ||
| Please provide: | ||
| 1. A clear, factual explanation using accessible language | ||
| 2. Key benefits and important considerations | ||
| 3. Specific recommendations and actionable guidance | ||
| 4. Relevant details about treatments, symptoms, or health practices | ||
| 5. Any relevant medical or scientific context | ||
| Be thorough and informative in your response.""" | ||
|
|
||
| async def medical_task(row): | ||
| openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY")) | ||
|
|
||
| prompt_text = provide_medical_info_prompt(row["question"]) | ||
| response = await openai_client.chat.completions.create( | ||
| model="gpt-3.5-turbo", | ||
| messages=[{"role": "user", "content": prompt_text}], | ||
| temperature=0.7, | ||
| max_tokens=500, | ||
| ) | ||
|
|
||
| return { | ||
| "response": response.choices[0].message.content, | ||
| "model": "gpt-4" | ||
| } | ||
|
|
||
| ai_response = response.choices[0].message.content | ||
|
|
||
| return {"completion": ai_response, "text": ai_response} | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python example: import missing and minor hardening.
- Missing
import os. - Consider showing model as a variable and handling potential empty responses.
```python Python
-from openai import AsyncOpenAI
+import os
+from openai import AsyncOpenAI
@@
async def medical_task(row):
openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
@@
- ai_response = response.choices[0].message.content
+ choice = (response.choices or [None])[0]
+ ai_response = choice.message.content if choice and choice.message else ""
return {"completion": ai_response, "text": ai_response}
<details>
<summary>🤖 Prompt for AI Agents</summary>
In experiments/running-from-code.mdx around lines 87 to 121, the snippet is
missing an import for os and doesn't defensively handle empty or missing choices
from the API; add "import os" at top, expose the model name as a variable (e.g.,
model = "gpt-3.5-turbo") and use it in the request, and replace direct indexing
of response.choices with a safe access pattern that checks response.choices and
response.choices[0].message before reading content, defaulting ai_response to an
empty string or a helpful fallback when absent; also consider reading the API
key via os.getenv and failing early with a clear error if it's not set.
</details>
<!-- fingerprinting:phantom:triton:chinchilla -->
<!-- This is an auto-generated comment by CodeRabbit -->
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 0ae79f9 in 42 seconds. Click for details.
- Reviewed
19lines of code in1files - Skipped
0files when reviewing. - Skipped posting
1draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. experiments/running-from-code.mdx:329
- Draft comment:
Ensure the provided GitHub links are correct and consistent; 'openllmetry' and 'openllmetry-js' should be verified. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
Workflow ID: wflow_IgzRgLBssGnDggkK
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
experiments/running-from-code.mdx (3)
82-82: Put and on separate lines.Some MDX renderers require block elements to start on their own lines. This avoids layout glitches.
-<Note> Ensure that the evaluator input schema variables are included in the task output dictionary. </Note> <Tip> You can add extra attributes to the task output even if they are not evaluator input parameters—these will also be logged to the platform. </Tip> +<Note> + Ensure that the evaluator input schema variables are included in the task output dictionary. +</Note> +<Tip> + You can add extra attributes to the task output even if they are not evaluator input parameters—these will also be logged to the platform. +</Tip>
86-121: Python snippet: add missing import and harden response extraction.
- Missing
import osforos.getenv.- Safely read
choices[0]to avoid IndexError/None access.```python Python -from openai import AsyncOpenAI +import os +from openai import AsyncOpenAI @@ async def medical_task(row): - openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY")) + openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY")) @@ - ai_response = response.choices[0].message.content + choice = (response.choices or [None])[0] + ai_response = choice.message.content if choice and choice.message else "" return {"completion": ai_response, "text": ai_response}--- `123-172`: **TypeScript snippet: instantiate OpenAI client and ensure defined string output.** `openai` is referenced but never created; also coalesce `content` to a string. ```diff ```typescript TypeScript import { OpenAI } from "openai"; import type { ExperimentTaskFunction, TaskInput, TaskOutput, } from "@traceloop/node-server-sdk"; +const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); + function provideMedicalInfoPrompt(question: string): string { @@ const medicalTask: ExperimentTaskFunction = async ( row: TaskInput, ): Promise<TaskOutput> => { const promptText = provideMedicalInfoPrompt(row.question as string); const answer = await openai.chat.completions.create({ model: "gpt-3.5-turbo", messages: [{ role: "user", content: promptText }], temperature: 0.7, max_tokens: 500, }); - const aiResponse = answer.choices?.[0]?.message?.content + const aiResponse = answer.choices?.[0]?.message?.content ?? "" return { completion: aiResponse, text: aiResponse }; };</blockquote></details> </blockquote></details> <details> <summary>🧹 Nitpick comments (12)</summary><blockquote> <details> <summary>experiments/running-from-code.mdx (12)</summary><blockquote> `21-37`: **Fix code fence language label for accurate syntax highlighting.** Use a TypeScript fence instead of JS; also standardize the human-readable label casing. ```diff -```js Typescript +```typescript TypeScript import * as traceloop from "@traceloop/node-server-sdk"; @@ const client = traceloop.getClient();--- `41-45`: **Verify docs link and wording in the Note block.** - Confirm the route “/openllmetry/getting-started-python” exists on the site. - Consider “SDK Getting Started guide” capitalization and “UI” capitalization in nearby text for consistency across the docs. Would you like me to scan the repo navigation (mint.json) to confirm the path and update link text for consistency? --- `57-62`: **Tighten grammar and clarify frequency of task execution.** Two small language nits and one clarity tweak. ```diff -Create a task function that define how your AI system processes each dataset row. The task is one of the experiments parameters, it will run it on each dataset row. +Create a task function that defines how your AI system processes each dataset row. The task is one of the experiment’s parameters and runs once per dataset row. The task function signature expects: - **Input**: An optional dictionary containing the dataset row data - **Output**: A dictionary with your task results
107-109: Avoid per-row client construction; reuse AsyncOpenAI across invocations.Instantiate the OpenAI client once at module scope to reduce connection overhead and improve throughput.
-from openai import AsyncOpenAI +from openai import AsyncOpenAI +openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY")) @@ -async def medical_task(row): - openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY")) +async def medical_task(row):If you prefer lazy init, cache it the first time the function runs.
176-189: Polish parameter descriptions and casing.Minor grammar and capitalization, and clarify the published-version constraint.
- `dataset_version` (str): Version of the dataset to use, experiment can only run on a published version + `dataset_version` (str): Version of the dataset to use. Experiments can only run on a published version. @@ - `stop_on_error` (boolean): Whether to stop on first error (default: False) - `wait_for_results` (boolean): Whether to wait for async tasks to complete, when not waiting the results will be found in the ui (default: True) + `stop_on_error` (boolean): Whether to stop on the first error (default: False) + `wait_for_results` (boolean): Whether to wait for async tasks to complete. When not waiting, results will be available in the UI (default: True)
221-236: Python “compare approaches”: import and defensive access for copy-pasteability.These are standalone snippets; include required imports and safe response extraction as above.
```python Python -# Task function that provides comprehensive medical information +from openai import AsyncOpenAI +import os + +# Task function that provides comprehensive medical information async def medical_task_provide_info(row): - openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY")) + openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY")) @@ - ai_response = response.choices[0].message.content + choice = (response.choices or [None])[0] + ai_response = choice.message.content if choice and choice.message else "" return {"completion": ai_response, "text": ai_response}--- `238-251`: **Python “refuse advice”: mirror the same import and safety improvements.** Keep the two examples consistent and robust. ```diff ```python Python -# Task function that refuses to provide medical advice +# Task function that refuses to provide medical advice async def medical_task_refuse_advice(row): - openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY")) + openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY")) @@ - ai_response = response.choices[0].message.content + choice = (response.choices or [None])[0] + ai_response = choice.message.content if choice and choice.message else "" return {"completion": ai_response, "text": ai_response}--- `252-254`: **Typo: approaches.** Fix spelling in the heading/comment. ```diff -# Run both approches in the same experiment +# Run both approaches in the same experiment async def compare_medical_approaches():
263-270: Use distinct slugs for different approaches to avoid collisions.Reusing the same
experiment_slugcan overwrite or conflate runs, depending on backend behavior. Recommend unique slugs per approach.- experiment_slug="medical-info", + experiment_slug="medical-info-provide", @@ - experiment_slug="medical-info", + experiment_slug="medical-info-refuse",If the platform intentionally deduplicates by slug, document that behavior explicitly.
275-328: TypeScript compare examples: ensure OpenAI client is in scope.This block references
openaibut does not declare it locally. Either show the instantiation here, or add a note that it’s defined above.```typescript TypeScript +import { OpenAI } from "openai"; +const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); // Task function that provides comprehensive medical information const medicalTaskProvideInfo: ExperimentTaskFunction = async (Also consider
?? ""instead of|| ""to avoid accidental fallback on falsy non-undefined values.
17-19: Align Python comment with actual initialization.The comment says “dataset sync enabled” but the code calls
Traceloop.init()with no options. Either show the flag if one exists in Python, or adjust the comment.-# Initialize with dataset sync enabled -client = Traceloop.init() +# Initialize the SDK +client = Traceloop.init()If Python supports a
traceloop_sync_enabled=Trueoption, consider demonstrating it explicitly.
162-167: Model name currency check; consider exposing model via config.
"gpt-3.5-turbo"may be deprecated depending on the SDKs’ current compatibility. Suggest reading the model name from an env var and/or updating to a currently supported model.- model: "gpt-3.5-turbo", + model: process.env.OPENAI_MODEL ?? "gpt-3.5-turbo",Happy to update all snippets once we confirm the recommended default model for August 2025.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
experiments/running-from-code.mdx(1 hunks)
🧰 Additional context used
🪛 LanguageTool
experiments/running-from-code.mdx
[grammar] ~50-~50: There might be a mistake here.
Context: ...nt Structure An experiment consists of: - A dataset to test against - A **task...
(QB_NEW_EN)
[grammar] ~51-~51: There might be a mistake here.
Context: ...ists of: - A dataset to test against - A task function that defines what yo...
(QB_NEW_EN)
[grammar] ~52-~52: There might be a mistake here.
Context: ...at defines what your AI system should do - Evaluators to measure performance ## ...
(QB_NEW_EN)
[grammar] ~59-~59: There might be a mistake here.
Context: .... The task function signature expects: - Input: An optional dictionary containi...
(QB_NEW_EN)
[grammar] ~60-~60: There might be a mistake here.
Context: ...ctionary containing the dataset row data - Output: A dictionary with your task re...
(QB_NEW_EN)
[grammar] ~187-~187: There might be a mistake here.
Context: ...p_on_error(boolean): Whether to stop on first error (default: False) -wait_fo...
(QB_NEW_EN)
🔇 Additional comments (4)
experiments/running-from-code.mdx (4)
192-201: LGTM on Python experiment.run usage.Clear example with evaluators and explicit slug; parameters align with the list above.
203-211: LGTM on TypeScript experiment.run usage.Signature matches the SDK pattern shown earlier; options object is concise and readable.
24-30: Confirm option names for the Node SDK initialization.Double-check
traceloopSyncEnabledanddisableBatchexact casing/names against the public API to avoid copy-paste errors.Would you like me to probe the SDK docs and update these keys if needed?
336-341: Verify example links remain valid.Ensure these GitHub paths exist and point to live examples; otherwise, consider pinning to a tag to reduce link rot.
Do you want me to check these URLs and propose pinned commit links?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed c3638ff in 1 minute and 45 seconds. Click for details.
- Reviewed
23lines of code in1files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. experiments/running-from-code.mdx:88
- Draft comment:
Good fix: Added the missing 'import os' necessary for using os.getenv in the Python example. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. experiments/running-from-code.mdx:159
- Draft comment:
In the TypeScript example, the new OpenAI client initialization lacks proper indentation. Also consider instantiating the client outside the task function to avoid repeated creation in performance-critical scenarios. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% The indentation issue is real but very minor - it would likely be caught by any linter or formatter. The performance suggestion about client initialization is a good catch that could meaningfully impact performance if this function is called repeatedly. However, this is more of an optional optimization than a clear bug. The indentation issue is too minor to warrant a comment. For the performance suggestion, we don't have enough context about how this code is used to know if repeated client creation is actually problematic. While the context is limited, initializing API clients inside functions is generally considered a performance anti-pattern worth fixing, regardless of the specific usage context. The indentation issue is too minor, but the performance suggestion about client initialization is a worthwhile code quality improvement to point out.
Workflow ID: wflow_UlCd5iLcM0ai47aF
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
Important
Enhances documentation for datasets and experiments with new guides and examples, and updates site navigation accordingly.
datasets/quick-start.mdxfor creating, adding data, publishing, and viewing version history of datasets.datasets/sdk-usage.mdxwith Python/TypeScript examples for dataset operations like creation, retrieval, and versioning.experiments/introduction.mdxandexperiments/result-overview.mdxfor conceptual guidance and result logging.experiments/running-from-code.mdxwith SDK-driven workflow and cross-language examples.mint.jsonto add Datasets section and reorganize Experiments section in site navigation.This description was created by
for c3638ff. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
Documentation
Chores