Skip to content

Conversation

@nina-kollman
Copy link
Contributor

@nina-kollman nina-kollman commented Aug 25, 2025

Important

Enhances documentation for datasets and experiments with new guides and examples, and updates site navigation accordingly.

  • Documentation:
    • Added datasets/quick-start.mdx for creating, adding data, publishing, and viewing version history of datasets.
    • Added datasets/sdk-usage.mdx with Python/TypeScript examples for dataset operations like creation, retrieval, and versioning.
    • Added experiments/introduction.mdx and experiments/result-overview.mdx for conceptual guidance and result logging.
    • Updated experiments/running-from-code.mdx with SDK-driven workflow and cross-language examples.
  • Chores:
    • Updated mint.json to add Datasets section and reorganize Experiments section in site navigation.

This description was created by Ellipsis for c3638ff. You can customize this summary. It will automatically update as commits are pushed.


Summary by CodeRabbit

  • Documentation

    • Added Datasets Quick Start (creation, data entry, publishing, version history).
    • Added Datasets SDK Usage with end-to-end Python/TypeScript examples for init, create/import, schema changes, row insertion, publish, and retrieval.
    • Added Experiments Introduction and Result Overview pages with visuals and conceptual guidance.
    • Updated Run via SDK guide with a clearer SDK-driven workflow and cross-language examples.
  • Chores

    • Updated site navigation: added Datasets section and reorganized Experiments pages.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 25, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Caution

Review failed

The pull request is closed.

Walkthrough

Adds new Datasets documentation (quick start and SDK usage), expands Experiments docs with introduction and result-overview pages, rewrites the experiments running-from-code page to an SDK-centric flow with explicit task I/O contracts and TypeScript typings, and updates site navigation in mint.json to include the new Datasets group and expanded Experiments pages.

Changes

Cohort / File(s) Summary of Changes
Datasets docs
datasets/quick-start.mdx, datasets/sdk-usage.mdx
Added Quick Start and SDK Usage MDX pages: quick-start.mdx introduces dataset concepts and a four-step workflow with Frame/Steps/Tip components; sdk-usage.mdx provides end-to-end Python and TypeScript SDK examples (initialization, create/import, retrieve, schema mutations, row operations, publish) with CodeGroup examples and notes on API key and client initialization.
Experiments docs
experiments/introduction.mdx, experiments/result-overview.mdx, experiments/running-from-code.mdx
Added introduction and result-overview pages with explanatory Frames and CardGroup visuals. Major rewrite of running-from-code.mdx to an SDK-centric "Run via SDK" flow: formalized SDK initialization (Python/TS), defined task input/output contracts, added TypeScript typings (ExperimentTaskFunction, TaskInput, TaskOutput) in examples, replaced samples with typed medical examples, and demonstrated experiment.run usage and logging.
Navigation
mint.json
Updated navigation: added Datasets group with datasets/quick-start and datasets/sdk-usage; expanded Experiments group to include experiments/introduction, experiments/result-overview, and experiments/running-from-code (replacing the prior single-page experiments entry).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Dev as Developer
  participant SDK as Traceloop SDK
  participant SVC as Traceloop Service

  rect rgba(230,240,255,0.35)
  note over Dev,SDK: Dataset lifecycle (SDK usage)
  Dev->>SDK: initialize(config)
  SDK->>SVC: auth + feature sync
  Dev->>SDK: create/fromCSV/fromDataFrame
  SDK->>SVC: create dataset (draft)
  Dev->>SDK: addColumn / addRow(s)
  SDK->>SVC: mutate schema/data (draft)
  Dev->>SDK: publish(version)
  SDK->>SVC: create immutable snapshot
  Dev->>SDK: get / getVersionCSV
  SDK->>SVC: retrieve dataset/version
  end
Loading
sequenceDiagram
  autonumber
  participant Dev as Developer
  participant SDK as Traceloop SDK
  participant SVC as Traceloop Service

  rect rgba(235,255,235,0.35)
  note over Dev,SDK: Experiment run via SDK (updated flow)
  Dev->>SDK: initialize → waitForInitialization → getClient
  Dev->>SDK: experiment.run({ datasetSlug, datasetVersion, task, evaluators, ... })
  loop for each dataset row
    SDK->>SDK: invoke task(input: TaskInput) -> TaskOutput
    SDK->>SVC: log task input/output
    SDK->>SVC: run evaluators and log results
  end
  SDK-->>Dev: run summary and logs
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

Hop hop — new pages in sight,
Datasets versioned, tidy and bright.
Experiments typed, tasks running true,
Docs stitched up — a carrot for you 🥕
I nibble lines and twirl — hooray!

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0ae79f9 and c3638ff.

📒 Files selected for processing (1)
  • experiments/running-from-code.mdx (1 hunks)
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch nk/exp

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to f1f981e in 2 minutes and 28 seconds. Click for details.
  • Reviewed 670 lines of code in 6 files
  • Skipped 10 files when reviewing.
  • Skipped posting 9 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. datasets/quick-start.mdx:1
  • Draft comment:
    Clear and concise Quick Start docs for datasets.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
2. datasets/sdk-usage.mdx:39
  • Draft comment:
    Ensure the getting started guide link (/openllmetry/...) reflects the correct branding if intended.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None
3. experiments/introduction.mdx:1
  • Draft comment:
    Introduction page is well-structured and clearly outlines experiment capabilities.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
4. experiments/result-overview.mdx:1
  • Draft comment:
    Result Overview details are clear and include sufficient visual aids.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
5. experiments/running-from-code.mdx:57
  • Draft comment:
    Typo: 'define' should be 'defines' in the task function description.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None
6. mint.json:155
  • Draft comment:
    Navigation updates for Datasets and Experiments groups look good; verify ordering and branding consistency.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None
7. datasets/sdk-usage.mdx:41
  • Draft comment:
    Typo alert: The URL path '/openllmetry/getting-started-python' appears to have a misspelling. It might be intended as '/opentelemetry/getting-started-python'.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% Looking at the context, this is a Traceloop SDK documentation, not OpenTelemetry. The URL path '/openllmetry' could be intentional - it might be a play on words combining 'LLM' (Large Language Models) with 'telemetry'. Without access to the actual website structure or knowing if this URL actually works, I can't be certain this is actually a typo. I might be overthinking this - if it's clearly a typo, it should be fixed. Also, the unusual spelling could confuse users. However, without being able to verify the correct URL structure or knowing the intentional branding decisions, making assumptions about the correct spelling could introduce errors. Since we can't verify the intended URL structure and this could be intentional branding, we should err on the side of caution and not keep this comment.
8. experiments/running-from-code.mdx:57
  • Draft comment:
    Typo: It should be "defines" instead of "define" in the sentence "Create a task function that define how your AI system processes each dataset row."
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% While this is technically correct grammar, fixing minor grammatical issues in documentation is not a critical code change. The meaning is perfectly clear either way. The rules state we should not make purely informative comments or comments about obvious/unimportant issues. The grammar error could make the documentation look less professional. Poor grammar in documentation could reflect badly on the project's quality. While professional documentation is important, this is such a minor grammatical issue that it doesn't significantly impact readability or understanding. The rules explicitly state not to make unimportant comments. Delete this comment as it points out a trivial grammatical issue that doesn't meaningfully impact the documentation's effectiveness.
9. experiments/running-from-code.mdx:43
  • Draft comment:
    Typographical error in the URL: "openllmetry" seems to be a typo; perhaps it should be "opentelemetry".
  • Reason this comment was not posted:
    Based on historical feedback, this comment is too similar to comments previously marked by users as bad.

Workflow ID: wflow_NFxINPhicVyyo7oj

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
experiments/introduction.mdx (1)

32-32: Remove stray “32” character.

It will render as visible text at the end of the page.

-32
+
datasets/sdk-usage.mdx (1)

227-227: Remove stray “227” character at EOF.

Likely an artifact of code generation that will render.

-227
+
experiments/result-overview.mdx (1)

45-45: Remove stray trailing number.

-45
+
experiments/running-from-code.mdx (1)

259-267: Double initialization and API mismatch.

  • You call both Traceloop.init() and Traceloop.client(). Earlier, init() returned the client; keep one pattern.
-# Initialize Traceloop
-Traceloop.init()
-client = Traceloop.client()
+# Initialize Traceloop
+client = Traceloop.init()
🧹 Nitpick comments (30)
experiments/introduction.mdx (3)

5-5: Tighten phrasing: “change of flow” → “flow change”.

Reads more naturally and avoids awkward construction.

-Building reliable LLM applications means knowing whether a new prompt, model, or change of flow actually makes things better.
+Building reliable LLM applications means knowing whether a new prompt, model, or flow change actually makes things better.

7-13: Add alt text and lazy-loading to images for accessibility and performance.

Current tags have no alt; consider descriptive alts and lazy loading.

-<Frame>
-  <img
-    className="block dark:hidden"
-    src="/img/experiment/exp-list-light.png"
-  />
-  <img className="hidden dark:block" src="/img/experiment/exp-list-dark.png" />
-</Frame>
+<Frame>
+  <img
+    className="block dark:hidden"
+    src="/img/experiment/exp-list-light.png"
+    alt="Experiments list view in light theme"
+    loading="lazy"
+  />
+  <img
+    className="hidden dark:block"
+    src="/img/experiment/exp-list-dark.png"
+    alt="Experiments list view in dark theme"
+    loading="lazy"
+  />
+</Frame>

18-31: Polish card titles and copy; fix pluralization.

  • “Compare Experiment Runs Results” is awkward. Prefer “Compare Experiment Runs.”
  • “evaluator input” → “evaluator inputs.” Minor copy tightening elsewhere.
- <Card title="Run Multiple Evaluators" icon="list-check">
-    Execute multiple evaluation checks against your dataset
+ <Card title="Run Multiple Evaluators" icon="list-check">
+    Execute multiple evaluation checks against your dataset.
   </Card>
-   <Card title="View Complete Results" icon="table">
-    See all experiment run outputs in a comprehensive table view with relevant indicators and detailed reasoning
+   <Card title="View Complete Results" icon="table">
+    See all experiment run outputs in a comprehensive table with indicators and reasoning.
   </Card>
-  <Card title="Compare Experiment Runs Results" icon="code-compare">
-    Run the same experiment across different dataset versions to see how it affects your workflow
+  <Card title="Compare Experiment Runs" icon="code-compare">
+    Run the same experiment across dataset versions to see how it affects your workflow.
   </Card>
   <Card title="Custom Task Pipelines" icon="code">
-    Add a tailored task to the experiment to create evaluator input. For example: LLM calls, semantic search, etc. 
+    Create a custom task pipeline that generates evaluator inputs (e.g., LLM calls, semantic search).
   </Card>
datasets/quick-start.mdx (6)

5-11: Add alt text and lazy-loading to dataset images.

Improves accessibility and page performance.

 <Frame>
   <img
     className="block dark:hidden"
     src="/img/dataset/dataset-list-light.png"
+    alt="Dataset list view in light theme"
+    loading="lazy"
   />
-  <img className="hidden dark:block" src="/img/dataset/dataset-list-dark.png" />
+  <img
+    className="hidden dark:block"
+    src="/img/dataset/dataset-list-dark.png"
+    alt="Dataset list view in dark theme"
+    loading="lazy"
+  />
 </Frame>

13-15: Tighten phrasing.

“available in the SDK” → “available via the SDK.” Also remove extra space.

-Datasets are simple data tables that you can use to manage your data for experiments and evaluation of your AI applications.
-Datasets are available in the SDK, and they enable you to create versioned snapshots for reproducible testing. 
+Datasets are simple data tables to manage data for experiments and evaluation of your AI applications.
+Datasets are available via the SDK and enable versioned snapshots for reproducible testing.

26-30: Make list items parallel and add terminal punctuation.

Minor style improvement flagged by the linter.

-- **Text**: For prompts, model responses, or any textual data
-- **Number**: For numerical values, scores, or metrics
-- **Boolean**: For true/false flags or binary classifications
+- **Text**: Prompts, model responses, or any textual data.
+- **Number**: Numerical values, scores, or metrics.
+- **Boolean**: True/false flags or binary classifications.

40-46: Add alt text to the second image set.

 <Frame>
   <img
     className="block dark:hidden"
     src="/img/dataset/dataset-view-light.png"
+    alt="Dataset edit view in light theme"
+    loading="lazy"
   />
-  <img className="hidden dark:block" src="/img/dataset/dataset-view-dark.png" />
+  <img
+    className="hidden dark:block"
+    src="/img/dataset/dataset-view-dark.png"
+    alt="Dataset edit view in dark theme"
+    loading="lazy"
+  />
 </Frame>

49-53: Grammar: “Published versions”.

Fixes adjective form and article.

-1. Click **Publish Version** to create a stable snapshot
-2. Published versions are immutable
-3. Publish versions are accessible in the SDK
+1. Click **Publish Version** to create a stable snapshot.
+2. Published versions are immutable.
+3. Published versions are accessible via the SDK.

57-61: Optional: Link to SDK usage page from “version history” step.

Helps users discover how to fetch versions programmatically.

-You can access all published versions of your dataset by opening the version history modal. This allows you to:
+You can access all published versions of your dataset by opening the version history modal. You can also fetch specific versions via the SDK (see Datasets → SDK Usage). This allows you to:
datasets/sdk-usage.mdx (4)

2-2: Capitalize title consistently: “SDK Usage”.

-title: "SDK usage"
+title: "SDK Usage"

61-70: Grammar: possessive “patients’ medical questions”.

-    description="Dataset with patients medical questions"
+    description="Dataset with patients' medical questions"

175-205: Align row schema with earlier defined columns.

The row example includes fields (prompt, response, model) that weren’t added in the “Adding a Column” step for the manual dataset path. Consider either:

  • defining those columns beforehand, or
  • using a row that matches the defined schema.
-const rowData = {
-  user_id: userId,
-  prompt: prompt,
-  response: `This is the model response`,
-  model: "gpt-3.5-turbo",
-  satisfaction_score: 1,
-};
+const rowData = {
+  "user-id": userId,
+  "satisfaction-score": 1,
+};

20-35: Use a Minimal Initialization Example with Advanced Options Commented

The SDK’s initialize function supports both disableBatch and traceloopSyncEnabled as valid options. For most users, you can rely on your TRACELOOP_API_KEY environment variable and call traceloop.initialize() with no arguments; advanced flags can then be added as needed for development or prompt‐registry workflows (traceloop.com, deepwiki.com).

File: datasets/sdk-usage.mdx
Lines: 20–35

Suggested diff:

 import * as traceloop from "@traceloop/node-server-sdk";
 
-// Initialize with comprehensive configuration
-traceloop.initialize({
-  appName: "your-app-name",
-  apiKey: process.env.TRACELOOP_API_KEY,
-  disableBatch: true,
-  traceloopSyncEnabled: true,
-});
+// Minimal initialization (uses TRACELOOP_API_KEY env var)
+traceloop.initialize();
+
+// Advanced options (uncomment if needed):
+// traceloop.initialize({ disableBatch: true });
+// traceloop.initialize({ traceloopSyncEnabled: true });
 
 // Wait for initialization to complete
 await traceloop.waitForInitialization();
 
 // Get the client instance for dataset operations
 const client = traceloop.getClient();
experiments/result-overview.mdx (4)

6-12: Add alt text and lazy-loading to images.

 <Frame>
   <img
     className="block dark:hidden"
     src="/img/experiment/exp-list-light.png"
+    alt="Experiments overview list in light theme"
+    loading="lazy"
   />
-  <img className="hidden dark:block" src="/img/experiment/exp-list-dark.png" />
+  <img
+    className="hidden dark:block"
+    src="/img/experiment/exp-list-dark.png"
+    alt="Experiments overview list in dark theme"
+    loading="lazy"
+  />
 </Frame>

17-23: Add alt text and lazy-loading to run list images.

 <Frame>
   <img
     className="block dark:hidden"
     src="/img/experiment/exp-run-list-light.png"
+    alt="Experiment run list view in light theme"
+    loading="lazy"
   />
-  <img className="hidden dark:block" src="/img/experiment/exp-run-list-dark.png" />
+  <img
+    className="hidden dark:block"
+    src="/img/experiment/exp-run-list-dark.png"
+    alt="Experiment run list view in dark theme"
+    loading="lazy"
+  />
 </Frame>

36-42: Add alt text and lazy-loading to single run images.

 <Frame>
   <img
     className="block dark:hidden"
     src="/img/experiment/exp-run-light.png"
+    alt="Experiment run details view in light theme"
+    loading="lazy"
   />
-  <img className="hidden dark:block" src="/img/experiment/exp-run-dark.png" />
+  <img
+    className="hidden dark:block"
+    src="/img/experiment/exp-run-dark.png"
+    alt="Experiment run details view in dark theme"
+    loading="lazy"
+  />
 </Frame>

5-15: Optional cross-links to adjacent docs.

Add “See also” links to Introduction and Running via SDK to streamline navigation.

-All experiments are logged in the Traceloop platform. Each experiment is executed through the SDK.
+All experiments are logged in the Traceloop platform. Each experiment is executed through the SDK.
+
+See also: [Introduction](/experiments/introduction) · [Run via SDK](/experiments/running-from-code)
experiments/running-from-code.mdx (9)

21-37: TypeScript initialization looks good; ensure minimal-first guidance.

The advanced options are helpful, but starting with a minimal example improves DX. Consider mirroring the change made in SDK Usage to show minimal init first, then advanced options.


57-62: Grammar fixes in task-function description.

-Create a task function that define how your AI system processes each dataset row. The task is one of the experiments parameters, it will run it on each dataset row. 
+Create a task function that defines how your AI system processes each dataset row. The task is one of the experiment’s parameters and will run on each dataset row.

65-67: Include missing typing imports for the Python signature snippet.

Without imports, the snippet may confuse readers.

 ```python Python
-from typing import Callable, Optional, Dict, Any
-task: Callable[[Optional[Dict[str, Any]]], Dict[str, Any]]
+from typing import Callable, Optional, Dict, Any
+
+task: Callable[[Optional[Dict[str, Any]]], Dict[str, Any]]

---

`183-191`: **Align dataset slug/version with earlier examples for consistency.**

Earlier sections use “medical-questions” and publish to v1; keep consistent to reduce confusion.

```diff
-    dataset_slug="medical-q",
-    dataset_version="v1",
+    dataset_slug="medical-questions",
+    dataset_version="v1",

209-231: Unify Python OpenAI usage style.

This section uses openai.ChatCompletion.acreate whereas earlier you used AsyncOpenAI. Pick one style for consistency.

-    response = await openai.ChatCompletion.acreate(
-        model="gpt-4",
-        messages=[
+    response = await openai.chat.completions.create({
+        model: "gpt-4",
+        messages: [
             {"role": "system", "content": "Be very careful and conservative in your response."},
             {"role": "user", "content": input_data["question"]}
-        ]
-    )
-    return {"response": response.choices[0].message.content}
+        ],
+    })
+    return {"response": (response.choices or [None])[0].message.content}

268-283: Add OpenAI key handling in the complete example.

The example uses OpenAI but doesn’t show API key setup; add a quick note or environment read for completeness.

-import openai
+import openai, os
@@
-    response = await openai.ChatCompletion.acreate(
+    openai.api_key = os.getenv("OPENAI_API_KEY")
+    response = await openai.ChatCompletion.acreate(

349-356: Parameters list: call out error-handling flags.

Since you show stop_on_error/stopOnError earlier, include it here for completeness.

 - `experiment_slug` (str): Unique identifier for this experiment
+- `stop_on_error` (bool): Whether to stop on the first task error (Python). In TypeScript, use `stopOnError`.

365-371: Add best practice about evaluator–task contract.

Highlight that task outputs must include evaluator-required fields (already noted above), repeated here for reinforcement.

 4. **Use appropriate evaluators** that match your use case
 5. **Compare multiple approaches** systematically to find the best solution
+6. **Ensure task outputs include evaluator input fields** so evaluators can run without errors.

194-201: Highlight parameter-order differences between Python and TypeScript

The Python and TypeScript SDKs use different call signatures for experiment.run(). In Python you supply only named parameters (e.g. dataset_slug="…", evaluators=[…]), whereas in TypeScript you pass the experiment task as the first positional argument and an options object second. Adding an explicit callout will help readers avoid confusion.

Please update experiments/running-from-code.mdx around the TypeScript snippet (lines 194–201):

• Above the TS example, insert a note along the lines of:

+ <!-- Note: Unlike the Python SDK, which uses only keyword arguments (e.g. `dataset_slug="…"`) for all settings, the TypeScript SDK takes the experiment task as its first positional argument followed by an options object. -->
mint.json (4)

143-145: Two groups named “Quick Start” may confuse users; consider disambiguating.

We have two separate groups with the same label (“Quick Start”) for different products/areas. Suggest renaming the Hub one to “Hub Quick Start” (or similar) to clarify the sidebar.

   {
-      "group": "Quick Start",
+      "group": "Hub Quick Start",
       "pages": ["hub/getting-started", "hub/configuration"]
   },

Also applies to: 80-90


34-60: Optional: Add anchors for Datasets and Experiments for quicker access.

If these are now first-class concepts, consider adding them to the top-level anchors.

   "anchors": [
     {
       "name": "OpenLLMetry",
       "icon": "telescope",
       "url": "openllmetry"
     },
+    {
+      "name": "Datasets",
+      "icon": "database",
+      "url": "datasets/quick-start"
+    },
+    {
+      "name": "Experiments",
+      "icon": "beaker",
+      "url": "experiments/introduction"
+    },
     {
       "name": "Hub",
       "icon": "arrows-to-dot",
       "url": "hub"
     },

198-211: Optional: Add shortlink redirects for new sections.

Handy shortlinks can improve shareability. Redirect “/datasets” and “/experiments” to their landing pages.

   "redirects": [
     {
       "source": "/openllmetry/integrations/exporting",
       "destination": "/openllmetry/integrations/introduction"
     },
     {
       "source": "/openllmetry/tracing/decorators",
       "destination": "/openllmetry/tracing/annotations"
     },
     {
       "source": "/openllmetry/tracing/privacy",
       "destination": "/openllmetry/privacy/traces"
+    },
+    {
+      "source": "/datasets",
+      "destination": "/datasets/quick-start"
+    },
+    {
+      "source": "/experiments",
+      "destination": "/experiments/introduction"
     }
   ],

106-137: Two “Integrations” groups; consider clarifying scope labels.

There are two groups named “Integrations” (OpenLLMetry vendor integrations vs. a product-specific PostHog page). Consider renaming the latter to “Product Integrations” (or “Hub Integrations”) to reduce sidebar ambiguity.

   {
-    "group": "Integrations",
+    "group": "Product Integrations",
     "pages": ["integrations/posthog"]
   },

Also applies to: 163-165

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2a0c952 and f1f981e.

⛔ Files ignored due to path filters (10)
  • img/dataset/dataset-list-dark.png is excluded by !**/*.png
  • img/dataset/dataset-list-light.png is excluded by !**/*.png
  • img/dataset/dataset-view-dark.png is excluded by !**/*.png
  • img/dataset/dataset-view-light.png is excluded by !**/*.png
  • img/experiment/exp-list-dark.png is excluded by !**/*.png
  • img/experiment/exp-list-light.png is excluded by !**/*.png
  • img/experiment/exp-run-dark.png is excluded by !**/*.png
  • img/experiment/exp-run-light.png is excluded by !**/*.png
  • img/experiment/exp-run-list-dark.png is excluded by !**/*.png
  • img/experiment/exp-run-list-light.png is excluded by !**/*.png
📒 Files selected for processing (6)
  • datasets/quick-start.mdx (1 hunks)
  • datasets/sdk-usage.mdx (1 hunks)
  • experiments/introduction.mdx (1 hunks)
  • experiments/result-overview.mdx (1 hunks)
  • experiments/running-from-code.mdx (1 hunks)
  • mint.json (1 hunks)
🧰 Additional context used
🪛 LanguageTool
experiments/introduction.mdx

[style] ~5-~5: The wording of this phrase can be improved.
Context: ...ompt, model, or change of flow actually makes things better. <img className="block d...

(MAKE_STYLE_BETTER)

datasets/sdk-usage.mdx

[grammar] ~51-~51: There might be a mistake here.
Context: ...rent ways depending on your data source: - Python: Import from CSV file or pandas...

(QB_NEW_EN)


[grammar] ~52-~52: There might be a mistake here.
Context: ...Import from CSV file or pandas DataFrame - TypeScript: Import from CSV data or cr...

(QB_NEW_EN)

datasets/quick-start.mdx

[grammar] ~26-~26: There might be a mistake here.
Context: ...set. You can add different column types: - Text: For prompts, model responses, or...

(QB_NEW_EN)


[grammar] ~27-~27: There might be a mistake here.
Context: ...ts, model responses, or any textual data - Number: For numerical values, scores, ...

(QB_NEW_EN)


[grammar] ~28-~28: There might be a mistake here.
Context: ...For numerical values, scores, or metrics - Boolean: For true/false flags or binar...

(QB_NEW_EN)


[grammar] ~57-~57: There might be a mistake here.
Context: ...rsion history modal. This allows you to: - Compare different versions of your datas...

(QB_NEW_EN)

experiments/running-from-code.mdx

[grammar] ~50-~50: There might be a mistake here.
Context: ...nt Structure An experiment consists of: - A dataset to test against - A **task...

(QB_NEW_EN)


[grammar] ~51-~51: There might be a mistake here.
Context: ...ists of: - A dataset to test against - A task function that defines what yo...

(QB_NEW_EN)


[grammar] ~52-~52: There might be a mistake here.
Context: ...at defines what your AI system should do - Evaluators to measure performance ## ...

(QB_NEW_EN)


[grammar] ~59-~59: There might be a mistake here.
Context: .... The task function signature expects: - Input: An optional dictionary containi...

(QB_NEW_EN)


[grammar] ~60-~60: There might be a mistake here.
Context: ...ctionary containing the dataset row data - Output: A dictionary with your task re...

(QB_NEW_EN)

🔇 Additional comments (4)
datasets/sdk-usage.mdx (2)

39-43: Confirm Getting Started Guide Link and Product Capitalization

The link in your <Note> block points to /openllmetry/getting-started-python, which may not match our published docs structure or slug. Please verify the exact path and adjust accordingly. For example, the canonical Python guide is typically one of:

  • GitHub source: https://github.com/traceloop/OpenLLMetry/blob/main/docs/getting-started/python.md
  • GitHub Pages: https://traceloop.github.io/OpenLLMetry/getting-started/python

After confirming the correct slug:

  • Update the MDX link to use the verified path (e.g. /docs/openllmetry/getting-started/python or the appropriate root-relative URL).
  • Ensure the product name “OpenLLMetry” is spelled with two capital “L”s everywhere (in slugs, titles, and text).

151-169: Please verify the correct method name and parameter shape for adding multiple columns in the Dataset SDK.

  • In datasets/sdk-usage.mdx (lines 165–169), the example calls:
    await myDataset.addColumn(columnsToAdd);
    but no addColumns usage exists in this repo.
  • Confirm whether the SDK’s addColumn method is overloaded to accept an array of column definitions, or whether a separate addColumns method should be used for bulk addition.
  • If the SDK requires addColumns for multiple items, update the example accordingly; otherwise, leave as-is.
experiments/running-from-code.mdx (1)

42-45: Confirm Python guide link and product name capitalization

The slug /openllmetry/getting-started-python matches the openllmetry/getting-started-python.mdx file in the docs—no change to the link path is needed.
Please double-check that the product name “OpenLLMetry” is capitalized correctly in the guide’s title and headings.

mint.json (1)

154-161: All referenced pages verified; ready to merge.

I ran the verification script against the repository and confirmed that all of the newly added MDX files for the Datasets and Experiments groups are present. No missing pages detected.

Comment on lines 87 to 121
from openai import AsyncOpenAI

def provide_medical_info_prompt(question: str) -> str:
"""
Provides comprehensive medical information without restrictions
"""
return f"""You are a knowledgeable health educator.
Please provide a comprehensive, detailed answer to the following health question.
Question: {question}
Please provide:
1. A clear, factual explanation using accessible language
2. Key benefits and important considerations
3. Specific recommendations and actionable guidance
4. Relevant details about treatments, symptoms, or health practices
5. Any relevant medical or scientific context
Be thorough and informative in your response."""

async def medical_task(row):
openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))

prompt_text = provide_medical_info_prompt(row["question"])
response = await openai_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": prompt_text}],
temperature=0.7,
max_tokens=500,
)

return {
"response": response.choices[0].message.content,
"model": "gpt-4"
}

ai_response = response.choices[0].message.content

return {"completion": ai_response, "text": ai_response}
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Python example: import missing and minor hardening.

  • Missing import os.
  • Consider showing model as a variable and handling potential empty responses.
 ```python Python
-from openai import AsyncOpenAI
+import os
+from openai import AsyncOpenAI
@@
 async def medical_task(row):
     openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
@@
-    ai_response = response.choices[0].message.content
+    choice = (response.choices or [None])[0]
+    ai_response = choice.message.content if choice and choice.message else ""
 
     return {"completion": ai_response, "text": ai_response}

<details>
<summary>🤖 Prompt for AI Agents</summary>

In experiments/running-from-code.mdx around lines 87 to 121, the snippet is
missing an import for os and doesn't defensively handle empty or missing choices
from the API; add "import os" at top, expose the model name as a variable (e.g.,
model = "gpt-3.5-turbo") and use it in the request, and replace direct indexing
of response.choices with a safe access pattern that checks response.choices and
response.choices[0].message before reading content, defaulting ai_response to an
empty string or a helpful fallback when absent; also consider reading the API
key via os.getenv and failing early with a clear error if it's not set.


</details>

<!-- fingerprinting:phantom:triton:chinchilla -->

<!-- This is an auto-generated comment by CodeRabbit -->

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 0ae79f9 in 42 seconds. Click for details.
  • Reviewed 19 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. experiments/running-from-code.mdx:329
  • Draft comment:
    Ensure the provided GitHub links are correct and consistent; 'openllmetry' and 'openllmetry-js' should be verified.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_IgzRgLBssGnDggkK

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
experiments/running-from-code.mdx (3)

82-82: Put and on separate lines.

Some MDX renderers require block elements to start on their own lines. This avoids layout glitches.

-<Note> Ensure that the evaluator input schema variables are included in the task output dictionary. </Note> <Tip> You can add extra attributes to the task output even if they are not evaluator input parameters—these will also be logged to the platform. </Tip>
+<Note>
+  Ensure that the evaluator input schema variables are included in the task output dictionary.
+</Note>
+<Tip>
+  You can add extra attributes to the task output even if they are not evaluator input parameters—these will also be logged to the platform.
+</Tip>

86-121: Python snippet: add missing import and harden response extraction.

  • Missing import os for os.getenv.
  • Safely read choices[0] to avoid IndexError/None access.
 ```python Python
-from openai import AsyncOpenAI
+import os
+from openai import AsyncOpenAI
@@
 async def medical_task(row):
-    openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
+    openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
@@
-    ai_response = response.choices[0].message.content
+    choice = (response.choices or [None])[0]
+    ai_response = choice.message.content if choice and choice.message else ""
 
     return {"completion": ai_response, "text": ai_response}

---

`123-172`: **TypeScript snippet: instantiate OpenAI client and ensure defined string output.**

`openai` is referenced but never created; also coalesce `content` to a string.



```diff
 ```typescript TypeScript
 import { OpenAI } from "openai";
 import type {
   ExperimentTaskFunction,
   TaskInput,
   TaskOutput,
 } from "@traceloop/node-server-sdk";
 
+const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
+
 function provideMedicalInfoPrompt(question: string): string {
@@
 const medicalTask: ExperimentTaskFunction = async (
   row: TaskInput,
 ): Promise<TaskOutput> => {
   const promptText = provideMedicalInfoPrompt(row.question as string);
   const answer = await openai.chat.completions.create({
     model: "gpt-3.5-turbo",
     messages: [{ role: "user", content: promptText }],
     temperature: 0.7,
     max_tokens: 500,
   });
 
-  const aiResponse = answer.choices?.[0]?.message?.content
+  const aiResponse = answer.choices?.[0]?.message?.content ?? ""
   return { completion: aiResponse, text: aiResponse };
 };

</blockquote></details>

</blockquote></details>

<details>
<summary>🧹 Nitpick comments (12)</summary><blockquote>

<details>
<summary>experiments/running-from-code.mdx (12)</summary><blockquote>

`21-37`: **Fix code fence language label for accurate syntax highlighting.**

Use a TypeScript fence instead of JS; also standardize the human-readable label casing.



```diff
-```js Typescript
+```typescript TypeScript
 import * as traceloop from "@traceloop/node-server-sdk";
@@
 const client = traceloop.getClient();

---

`41-45`: **Verify docs link and wording in the Note block.**

- Confirm the route “/openllmetry/getting-started-python” exists on the site.
- Consider “SDK Getting Started guide” capitalization and “UI” capitalization in nearby text for consistency across the docs.



Would you like me to scan the repo navigation (mint.json) to confirm the path and update link text for consistency?

---

`57-62`: **Tighten grammar and clarify frequency of task execution.**

Two small language nits and one clarity tweak.



```diff
-Create a task function that define how your AI system processes each dataset row. The task is one of the experiments parameters, it will run it on each dataset row. 
+Create a task function that defines how your AI system processes each dataset row. The task is one of the experiment’s parameters and runs once per dataset row.
 
 The task function signature expects:
 - **Input**: An optional dictionary containing the dataset row data
 - **Output**: A dictionary with your task results

107-109: Avoid per-row client construction; reuse AsyncOpenAI across invocations.

Instantiate the OpenAI client once at module scope to reduce connection overhead and improve throughput.

-from openai import AsyncOpenAI
+from openai import AsyncOpenAI
+openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
@@
-async def medical_task(row):
-    openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
+async def medical_task(row):

If you prefer lazy init, cache it the first time the function runs.


176-189: Polish parameter descriptions and casing.

Minor grammar and capitalization, and clarify the published-version constraint.

 - `dataset_version` (str): Version of the dataset to use, experiment can only run on a published version
+ `dataset_version` (str): Version of the dataset to use. Experiments can only run on a published version.
@@
- `stop_on_error` (boolean):  Whether to stop on first error (default: False)
- `wait_for_results` (boolean): Whether to wait for async tasks to complete, when not waiting the results will be found in the ui (default: True)
+ `stop_on_error` (boolean): Whether to stop on the first error (default: False)
+ `wait_for_results` (boolean): Whether to wait for async tasks to complete. When not waiting, results will be available in the UI (default: True)

221-236: Python “compare approaches”: import and defensive access for copy-pasteability.

These are standalone snippets; include required imports and safe response extraction as above.

 ```python Python
-# Task function that provides comprehensive medical information
+from openai import AsyncOpenAI
+import os
+
+# Task function that provides comprehensive medical information
 async def medical_task_provide_info(row):
-    openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
+    openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
@@
-    ai_response = response.choices[0].message.content
+    choice = (response.choices or [None])[0]
+    ai_response = choice.message.content if choice and choice.message else ""
     return {"completion": ai_response, "text": ai_response}

---

`238-251`: **Python “refuse advice”: mirror the same import and safety improvements.**

Keep the two examples consistent and robust.



```diff
 ```python Python
-# Task function that refuses to provide medical advice
+# Task function that refuses to provide medical advice
 async def medical_task_refuse_advice(row):
-    openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
+    openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
@@
-    ai_response = response.choices[0].message.content
+    choice = (response.choices or [None])[0]
+    ai_response = choice.message.content if choice and choice.message else ""
     return {"completion": ai_response, "text": ai_response}

---

`252-254`: **Typo: approaches.**

Fix spelling in the heading/comment.



```diff
-# Run both approches in the same experiment
+# Run both approaches in the same experiment
 async def compare_medical_approaches():

263-270: Use distinct slugs for different approaches to avoid collisions.

Reusing the same experiment_slug can overwrite or conflate runs, depending on backend behavior. Recommend unique slugs per approach.

-        experiment_slug="medical-info",
+        experiment_slug="medical-info-provide",
@@
-        experiment_slug="medical-info",
+        experiment_slug="medical-info-refuse",

If the platform intentionally deduplicates by slug, document that behavior explicitly.


275-328: TypeScript compare examples: ensure OpenAI client is in scope.

This block references openai but does not declare it locally. Either show the instantiation here, or add a note that it’s defined above.

 ```typescript TypeScript
+import { OpenAI } from "openai";
+const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
 // Task function that provides comprehensive medical information
 const medicalTaskProvideInfo: ExperimentTaskFunction = async (

Also consider ?? "" instead of || "" to avoid accidental fallback on falsy non-undefined values.


17-19: Align Python comment with actual initialization.

The comment says “dataset sync enabled” but the code calls Traceloop.init() with no options. Either show the flag if one exists in Python, or adjust the comment.

-# Initialize with dataset sync enabled
-client = Traceloop.init()
+# Initialize the SDK
+client = Traceloop.init()

If Python supports a traceloop_sync_enabled=True option, consider demonstrating it explicitly.


162-167: Model name currency check; consider exposing model via config.

"gpt-3.5-turbo" may be deprecated depending on the SDKs’ current compatibility. Suggest reading the model name from an env var and/or updating to a currently supported model.

-    model: "gpt-3.5-turbo",
+    model: process.env.OPENAI_MODEL ?? "gpt-3.5-turbo",

Happy to update all snippets once we confirm the recommended default model for August 2025.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f1f981e and 0ae79f9.

📒 Files selected for processing (1)
  • experiments/running-from-code.mdx (1 hunks)
🧰 Additional context used
🪛 LanguageTool
experiments/running-from-code.mdx

[grammar] ~50-~50: There might be a mistake here.
Context: ...nt Structure An experiment consists of: - A dataset to test against - A **task...

(QB_NEW_EN)


[grammar] ~51-~51: There might be a mistake here.
Context: ...ists of: - A dataset to test against - A task function that defines what yo...

(QB_NEW_EN)


[grammar] ~52-~52: There might be a mistake here.
Context: ...at defines what your AI system should do - Evaluators to measure performance ## ...

(QB_NEW_EN)


[grammar] ~59-~59: There might be a mistake here.
Context: .... The task function signature expects: - Input: An optional dictionary containi...

(QB_NEW_EN)


[grammar] ~60-~60: There might be a mistake here.
Context: ...ctionary containing the dataset row data - Output: A dictionary with your task re...

(QB_NEW_EN)


[grammar] ~187-~187: There might be a mistake here.
Context: ...p_on_error(boolean): Whether to stop on first error (default: False) -wait_fo...

(QB_NEW_EN)

🔇 Additional comments (4)
experiments/running-from-code.mdx (4)

192-201: LGTM on Python experiment.run usage.

Clear example with evaluators and explicit slug; parameters align with the list above.


203-211: LGTM on TypeScript experiment.run usage.

Signature matches the SDK pattern shown earlier; options object is concise and readable.


24-30: Confirm option names for the Node SDK initialization.

Double-check traceloopSyncEnabled and disableBatch exact casing/names against the public API to avoid copy-paste errors.

Would you like me to probe the SDK docs and update these keys if needed?


336-341: Verify example links remain valid.

Ensure these GitHub paths exist and point to live examples; otherwise, consider pinning to a tag to reduce link rot.

Do you want me to check these URLs and propose pinned commit links?

@nina-kollman nina-kollman merged commit 5339e8c into main Aug 25, 2025
1 of 2 checks passed
Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed c3638ff in 1 minute and 45 seconds. Click for details.
  • Reviewed 23 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. experiments/running-from-code.mdx:88
  • Draft comment:
    Good fix: Added the missing 'import os' necessary for using os.getenv in the Python example.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
2. experiments/running-from-code.mdx:159
  • Draft comment:
    In the TypeScript example, the new OpenAI client initialization lacks proper indentation. Also consider instantiating the client outside the task function to avoid repeated creation in performance-critical scenarios.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% The indentation issue is real but very minor - it would likely be caught by any linter or formatter. The performance suggestion about client initialization is a good catch that could meaningfully impact performance if this function is called repeatedly. However, this is more of an optional optimization than a clear bug. The indentation issue is too minor to warrant a comment. For the performance suggestion, we don't have enough context about how this code is used to know if repeated client creation is actually problematic. While the context is limited, initializing API clients inside functions is generally considered a performance anti-pattern worth fixing, regardless of the specific usage context. The indentation issue is too minor, but the performance suggestion about client initialization is a worthwhile code quality improvement to point out.

Workflow ID: wflow_UlCd5iLcM0ai47aF

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants