Skip to content
Merged
63 changes: 63 additions & 0 deletions datasets/quick-start.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
title: "Quick Start"
---

<Frame>
<img
className="block dark:hidden"
src="/img/dataset/dataset-list-light.png"
/>
<img className="hidden dark:block" src="/img/dataset/dataset-list-dark.png" />
</Frame>

Datasets are simple data tables that you can use to manage your data for experiments and evaluation of your AI applications.
Datasets are available in the SDK, and they enable you to create versioned snapshots for reproducible testing.

<Steps>
<Step title="Create a new dataset">

Click **New Dataset** to create a dataset, give it a descriptive name that reflects its purpose or use case, add a description to help your team understand its context, and provide a slug that allows you to use the dataset in the SDK.

</Step>

<Step title="Add your data">

Add rows and columns to structure your dataset.
You can add different column types:
- **Text**: For prompts, model responses, or any textual data
- **Number**: For numerical values, scores, or metrics
- **Boolean**: For true/false flags or binary classifications

<Tip>
Use meaningful column names that clearly describe what each field contains,
making it easier to work with your dataset in code, ensure clarity when using evaluators, and collaborate with team members.
</Tip>

</Step>

<Step title="Publish your dataset version">

<Frame>
<img
className="block dark:hidden"
src="/img/dataset/dataset-view-light.png"
/>
<img className="hidden dark:block" src="/img/dataset/dataset-view-dark.png" />
</Frame>

Once you're satisfied with your dataset structure and data:
1. Click **Publish Version** to create a stable snapshot
2. Published versions are immutable
3. Publish versions are accessible in the SDK

</Step>

<Step title="View your version history">

You can access all published versions of your dataset by opening the version history modal. This allows you to:
- Compare different versions of your dataset
- Track changes over time
- Switch between versions

</Step>
</Steps>
226 changes: 226 additions & 0 deletions datasets/sdk-usage.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,226 @@
---
title: "SDK usage"
description: "Access your managed datasets with the Traceloop SDK"
---

## SDK Initialization

First, initialize the Traceloop SDK.

<CodeGroup>

```python Python
from traceloop.sdk import Traceloop

# Initialize with dataset sync enabled
client = Traceloop.init()
```

```js Typescript
import * as traceloop from "@traceloop/node-server-sdk";

// Initialize with comprehensive configuration
traceloop.initialize({
appName: "your-app-name",
apiKey: process.env.TRACELOOP_API_KEY,
disableBatch: true,
traceloopSyncEnabled: true,
});

// Wait for initialization to complete
await traceloop.waitForInitialization();

// Get the client instance for dataset operations
const client = traceloop.getClient();
```

</CodeGroup>

<Note>
Make sure you've created an API key and set it as an environment variable
`TRACELOOP_API_KEY` before you start. Check out the SDK's [getting started
guide](/openllmetry/getting-started-python) for more information.
</Note>

The SDK fetches your datasets from Traceloop servers. Changes made to a draft dataset version are immediately available in the UI.

## Dataset Operations

### Create a dataset

You can create datasets in different ways depending on your data source:
- **Python**: Import from CSV file or pandas DataFrame
- **TypeScript**: Import from CSV data or create manually

<CodeGroup>

```python Python
import pandas as pd
from traceloop.sdk import Traceloop

client = Traceloop.init()

# Create dataset from CSV file
dataset_csv = client.datasets.from_csv(
file_path="path/to/your/data.csv",
slug="medical-questions",
name="Medical Questions",
description="Dataset with patients medical questions"
)

# Create dataset from pandas DataFrame
data = {
"product": ["Laptop", "Mouse", "Keyboard", "Monitor"],
"price": [999.99, 29.99, 79.99, 299.99],
"in_stock": [True, True, False, True],
"category": ["Electronics", "Accessories", "Accessories", "Electronics"],
}
df = pd.DataFrame(data)

# Create dataset from DataFrame
dataset_df = client.datasets.from_dataframe(
df=df,
slug="product-inventory",
name="Product Inventory",
description="Sample product inventory data",
)
```

```js Typescript
const client = traceloop.getClient();

// Option 1: Create dataset manually
const myDataset = await client.datasets.create({
name: "Medical Questions",
slug: "medical-questions",
description: "Dataset with patients medical questions"
});

// Option 2: Create and import from CSV data
const csvData = `user_id,prompt,response,model,satisfaction_score
user_001,"What is React?","React is a JavaScript library...","gpt-3.5-turbo",4
user_002,"Explain Docker","Docker is a containerization platform...","gpt-3.5-turbo",5`;

await myDataset.fromCSV(csvData, { hasHeader: true });
```

</CodeGroup>

### Get a dataset
The dataset can be retrieved using its slug, which is available on the dataset page in the UI
<CodeGroup>

```python Python
# Get dataset by slug - current draft version
my_dataset = client.datasets.get_by_slug("medical-questions")

# Get specific version as CSV
dataset_csv = client.datasets.get_version_csv(
slug="medical-questions",
version="v2"
)
```

```js Typescript
// Get dataset by slug - current draft version
const myDataset = await client.datasets.get("medical-questions");

// Get specific version as CSV
const datasetCsv = await client.datasets.getVersionCSV("medical-questions", "v1");

```

</CodeGroup>

### Adding a Column

<CodeGroup>

```python Python
from traceloop.sdk.dataset import ColumnType

# Add a new column to your dataset
new_column = my_dataset.add_column(
slug="confidence_score",
name="Confidence Score",
col_type=ColumnType.NUMBER
)
```

```js Typescript
// Define schema by adding multiple columns
const columnsToAdd = [
{
name: "User ID",
slug: "user-id",
type: "string" as const,
description: "Unique identifier for the user"
},
{
name: "Satisfaction score",
slug: "satisfaction-score",
type: "number" as const,
description: "User satisfaction rating (1-5)"
}
];

await myDataset.addColumn(columnsToAdd);
console.log("Schema defined with multiple columns");
```

</CodeGroup>

### Adding Rows

Map the column slug to its relevant value
<CodeGroup>

```python Python
# Add new rows to your dataset
row_data = {
"product": "TV Screen",
"price": 1500.0,
"in_stock": True,
"category": "Electronics"
}

my_dataset.add_rows([row_data])
```

```js Typescript
// Add individual rows to dataset
const userId = "user_001";
const prompt = "Explain machine learning in simple terms";
const startTime = Date.now();

const rowData = {
user_id: userId,
prompt: prompt,
response: `This is the model response`,
model: "gpt-3.5-turbo",
satisfaction_score: 1,
};

await myDataset.addRow(rowData);
```

</CodeGroup>

## Dataset Versions

### Publish a dataset
Dataset versions and history can be viewed in the UI. Versioning allows you to run the same evaluations and experiments across different datasets, making valuable comparisons possible.
<CodeGroup>

```python Python
# Publish the current dataset state as a new version
published_version = my_dataset.publish()
```

```js Typescript
// Publish dataset with version and description
const publishedVersion = await myDataset.publish();
```

</CodeGroup>

31 changes: 31 additions & 0 deletions experiments/introduction.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
title: "Introduction"
---

Building reliable LLM applications means knowing whether a new prompt, model, or change of flow actually makes things better.

<Frame>
<img
className="block dark:hidden"
src="/img/experiment/exp-list-light.png"
/>
<img className="hidden dark:block" src="/img/experiment/exp-list-dark.png" />
</Frame>

Experiments in Traceloop provide teams with a structured workflow for testing and comparing results across different prompt, model, and evaluator checks, all against real datasets.
## What You Can Do with Experiments

<CardGroup cols={2}>
<Card title="Run Multiple Evaluators" icon="list-check">
Execute multiple evaluation checks against your dataset
</Card>
<Card title="View Complete Results" icon="table">
See all experiment run outputs in a comprehensive table view with relevant indicators and detailed reasoning
</Card>
<Card title="Compare Experiment Runs Results" icon="code-compare">
Run the same experiment across different dataset versions to see how it affects your workflow
</Card>
<Card title="Custom Task Pipelines" icon="code">
Add a tailored task to the experiment to create evaluator input. For example: LLM calls, semantic search, etc.
</Card>
</CardGroup>
44 changes: 44 additions & 0 deletions experiments/result-overview.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
---
title: "Result Overview"
---

All experiments are logged in the Traceloop platform. Each experiment is executed through the SDK.
<Frame>
<img
className="block dark:hidden"
src="/img/experiment/exp-list-light.png"
/>
<img className="hidden dark:block" src="/img/experiment/exp-list-dark.png" />
</Frame>

## Experiment Runs
An experiment can be run multiple times against different datasets and tasks. All runs are logged in the Traceloop platform to enable easy comparison.

<Frame>
<img
className="block dark:hidden"
src="/img/experiment/exp-run-list-light.png"
/>
<img className="hidden dark:block" src="/img/experiment/exp-run-list-dark.png" />
</Frame>

## Experiment Tasks

An experiment run is made up of multiple tasks, where each task represents the experiment flow applied to a single dataset row.

The task logging captures:

- Task input – the data taken from the dataset row.

- Task outputs – the results produced by running the task, which are then passed as input to the evaluator.

- Evaluator results – the evaluator’s assessment based on the task outputs.
<Frame>
<img
className="block dark:hidden"
src="/img/experiment/exp-run-light.png"
/>
<img className="hidden dark:block" src="/img/experiment/exp-run-dark.png" />
</Frame>


Loading