diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..86ce0f3 --- /dev/null +++ b/.gitignore @@ -0,0 +1,2 @@ +# Claude Code settings +.claude/ diff --git a/datasets/quick-start.mdx b/datasets/quick-start.mdx index 8f7ebf5..f8420db 100644 --- a/datasets/quick-start.mdx +++ b/datasets/quick-start.mdx @@ -2,6 +2,9 @@ title: "Quick Start" --- +Datasets are simple data tables that you can use to manage your data for experiments and evaluation of your AI applications. +Datasets are available in the SDK, and they enable you to create versioned snapshots for reproducible testing. + -Datasets are simple data tables that you can use to manage your data for experiments and evaluation of your AI applications. -Datasets are available in the SDK, and they enable you to create versioned snapshots for reproducible testing. - diff --git a/evaluators/custom-evaluator.mdx b/evaluators/custom-evaluator.mdx new file mode 100644 index 0000000..97a82fa --- /dev/null +++ b/evaluators/custom-evaluator.mdx @@ -0,0 +1,47 @@ +--- +title: "Custom Evaluators" +description: "Define an evaluator for your specific needs " +--- + +Create your own evaluator to match your specific needs. You can start right away with custom criteria for full flexibility, or use one of our recommended formats as a starting point. + + + + + + + +## Do It Yourself + +This option lets you write the evaluator prompt from scratch by adding the desired messages (System, Assistant, User, or Developer) and configuring the model along with its settings. + + + + + + +## Generate Evaluator + +The evaluator prompt can be automatically configured by Traceloop by clicking on the **Generate Evaluator** button. +To enable the button, map the column you want to evaluate (such as an LLM response) and add any additional data columns required for prompt creation. +Describe the evaluator’s purpose and reference the relevant data columns in the description. + +The system generates a prompt template that you can edit and customize as needed. + + +## Test Evaluator + +Before creating an evaluator, you can test it on existing Playground data. +This allows you to refine and correct the evaluator prompt before saving the final version. + +## Execute Evaluator + +Evaluators can be executed in [playground columns](../playgrounds/columns/column-management) and in [experiments through the SDK](../experiments/running-from-code). + + diff --git a/evaluators/evaluator-library.mdx b/evaluators/evaluator-library.mdx new file mode 100644 index 0000000..cb6b287 --- /dev/null +++ b/evaluators/evaluator-library.mdx @@ -0,0 +1,82 @@ +--- +title: "Evaluator Library" +description: "Select from pre-built quality checks or create custom evaluators to systematically assess AI outputs" +--- + +The Evaluator Library provides a comprehensive collection of pre-built quality checks designed to systematically assess AI outputs. You can choose from existing evaluators or create custom ones tailored to your specific needs. + + + + + + +## Made by Traceloop + +Traceloop provides several pre-configured evaluators for common assessment tasks: + +### Content Analysis Evaluators + +**Character Count** +- Analyze response length and verbosity +- Helps ensure responses meet length requirements + +**Character Count Ratio** +- Measure the ratio of characters to the input +- Useful for assessing response proportionality + +**Word Count** +- Ensure appropriate response detail level +- Track output length consistency + +**Word Count Ratio** +- Measure the ratio of words to the input +- Compare input/output verbosity + +### Quality Assessment Evaluators + +**Answer Relevancy** +- Verify responses address the query +- Ensure AI outputs stay on topic + +**Faithfulness** +- Detect hallucinations and verify facts +- Maintain accuracy and truthfulness + +### Safety & Security Evaluators + +**PII Detection** +- Identify personal information in responses +- Protect user privacy and data security + +**Profanity Detection** +- Monitor for inappropriate language +- Maintain content quality standards + +**Secrets Detection** +- Monitor for sensitive information leakage +- Prevent accidental exposure of credentials + +## Custom Evaluators + +In addition to the pre-built evaluators, you can create custom evaluators with: + +### Inputs +- **string**: Text-based input parameters +- Support for multiple input types + +### Outputs +- **results**: String-based evaluation results +- **pass**: Boolean indicator for pass/fail status + +## Usage + +1. Browse the available evaluators in the library +2. Select evaluators that match your assessment needs +3. Configure input parameters as required +4. Use the "Use evaluator" button to integrate into your workflow +5. Monitor outputs and pass/fail status for systematic quality assessment + +The Evaluator Library streamlines the process of implementing comprehensive AI output assessment, ensuring consistent quality and safety standards across your applications. \ No newline at end of file diff --git a/evaluators/intro.mdx b/evaluators/intro.mdx new file mode 100644 index 0000000..d970426 --- /dev/null +++ b/evaluators/intro.mdx @@ -0,0 +1,41 @@ +--- +title: "Introduction" +description: "Evaluating workflows and LLM outputs" +--- + +The evaluation library is a core feature of Traceloop, providing comprehensive tools to assess LLM outputs, data quality, and performance across various dimensions. Whether you need automated scoring or human judgment, the evaluation system has you covered. + +## Why Do We Need Evaluators? + +LLM agents are more complex than single-turn completions. +They operate across multiple steps, use tools, and depend on context and external systems like memory or APIs. This complexity introduces new failure modes: agents may hallucinate tools, get stuck in loops, or produce final answers that hide earlier mistakes. + +Evaluators make these issues visible by checking correctness, relevance, task completion, tool usage, memory retention, safety, and style. They ensure outputs remain consistent even when dependencies shift and provide a structured way to measure reliability. Evaluation is continuous, extending into production through automated tests, drift detection, quality gates, and online monitoring. +In short, evaluators turn outputs into trustworthy systems by providing measurable and repeatable checks that give teams confidence to deploy at scale. + +## Evaluators types + +The system supports: +- **Custom evaluators** - Create your own evaluation logic tailored to specific needs +- **Built-in evaluators** - pre-configured evaluators by Traceloop for common assessment tasks + +In the Evaluator Library, select the evaluator you want to define. +You can either create a custom evaluator by clicking **New Evaluator** or choose one of the prebuilt **Made by Traceloop** evaluators. + + + + + + +Clicking on existing evaluators will present their input and output schema. This is valuable information in order to execute the evaluator [through the SDK](../experiments/running-from-code). + +## Where to Use Evaluators + +Evaluators can be used in two main contexts within Traceloop: + +- **[Playgrounds](../playgrounds/quick-start)** - Test and iterate on your evaluators interactively, compare different configurations, and validate evaluation logic before deployment +- **[Experiments](../experiments/introduction)** - Run systematic evaluations across datasets programmatically using the SDK, track performance metrics over time, and easily compare experiment results +- **[Monitors](../monitoring/introduction)** - Continuously evaluate your LLM applications in production with real-time monitoring and alerting on quality degradation diff --git a/evaluators/made-by-traceloop.mdx b/evaluators/made-by-traceloop.mdx new file mode 100644 index 0000000..a481e3b --- /dev/null +++ b/evaluators/made-by-traceloop.mdx @@ -0,0 +1,87 @@ +--- +title: "Made by Traceloop" +description: "Pre-configured evaluators by Traceloop for common assessment tasks" +--- + +The Evaluator Library provides a comprehensive collection of pre-built quality checks designed to systematically assess AI outputs. + +Each evaluator comes with a predefined input and output schema. When using an evaluator, you’ll need to map your data to its input schema. + + + + + +## Evaluator Types + + + + Analyze response length and verbosity to ensure outputs meet specific length requirements. + + + + Measure the ratio of characters to the input to assess response proportionality and expansion. + + + + Ensure appropriate response detail level by tracking the total number of words in outputs. + + + + Measure the ratio of words to the input to compare input/output verbosity and expansion patterns. + + + + Verify responses address the query to ensure AI outputs stay on topic and remain relevant. + + + + Detect hallucinations and verify facts to maintain accuracy and truthfulness in AI responses. + + + + Identify personal information exposure to protect user privacy and ensure data security compliance. + + + + Flag inappropriate language use to maintain content quality standards and professional communication. + + + + Monitor for credential and key leaks to prevent accidental exposure of sensitive information. + + + + Validate SQL queries to ensure proper syntax and structure in database-related AI outputs. + + + + Validate JSON responses to ensure proper formatting and structure in API-related outputs. + + + + Validate regex patterns to ensure correct regular expression syntax and functionality. + + + + Validate placeholder regex patterns to ensure proper template and variable replacement structures. + + + + Validate semantic similarity between expected and actual responses to measure content alignment. + + + + Validate agent goal accuracy to ensure AI systems achieve their intended objectives effectively. + + + + Validate topic adherence to ensure responses stay focused on the specified subject matter. + + + + Measure text perplexity from logprobs to assess the predictability and coherence of generated text. + + diff --git a/img/evaluator/eval-custom-dark.png b/img/evaluator/eval-custom-dark.png new file mode 100644 index 0000000..865a42f Binary files /dev/null and b/img/evaluator/eval-custom-dark.png differ diff --git a/img/evaluator/eval-custom-light.png b/img/evaluator/eval-custom-light.png new file mode 100644 index 0000000..b46988c Binary files /dev/null and b/img/evaluator/eval-custom-light.png differ diff --git a/img/evaluator/eval-do-it-yourself-dark.png b/img/evaluator/eval-do-it-yourself-dark.png new file mode 100644 index 0000000..dd38cc7 Binary files /dev/null and b/img/evaluator/eval-do-it-yourself-dark.png differ diff --git a/img/evaluator/eval-do-it-yourself-light.png b/img/evaluator/eval-do-it-yourself-light.png new file mode 100644 index 0000000..3a54245 Binary files /dev/null and b/img/evaluator/eval-do-it-yourself-light.png differ diff --git a/img/evaluator/eval-library-dark.png b/img/evaluator/eval-library-dark.png new file mode 100644 index 0000000..ddef98e Binary files /dev/null and b/img/evaluator/eval-library-dark.png differ diff --git a/img/evaluator/eval-library-light.png b/img/evaluator/eval-library-light.png new file mode 100644 index 0000000..c3689a9 Binary files /dev/null and b/img/evaluator/eval-library-light.png differ diff --git a/img/evaluator/eval-made-by-traceloop-dark.png b/img/evaluator/eval-made-by-traceloop-dark.png new file mode 100644 index 0000000..3b07fb8 Binary files /dev/null and b/img/evaluator/eval-made-by-traceloop-dark.png differ diff --git a/img/evaluator/eval-made-by-traceloop-light.png b/img/evaluator/eval-made-by-traceloop-light.png new file mode 100644 index 0000000..92ea738 Binary files /dev/null and b/img/evaluator/eval-made-by-traceloop-light.png differ diff --git a/img/playground/play-action-dark.png b/img/playground/play-action-dark.png new file mode 100644 index 0000000..4fc39b6 Binary files /dev/null and b/img/playground/play-action-dark.png differ diff --git a/img/playground/play-action-light.png b/img/playground/play-action-light.png new file mode 100644 index 0000000..77a7f93 Binary files /dev/null and b/img/playground/play-action-light.png differ diff --git a/img/playground/play-column-list-dark.png b/img/playground/play-column-list-dark.png new file mode 100644 index 0000000..e486f4d Binary files /dev/null and b/img/playground/play-column-list-dark.png differ diff --git a/img/playground/play-column-list-light.png b/img/playground/play-column-list-light.png new file mode 100644 index 0000000..09b512a Binary files /dev/null and b/img/playground/play-column-list-light.png differ diff --git a/img/playground/play-column-options-dark.png b/img/playground/play-column-options-dark.png new file mode 100644 index 0000000..4ad127e Binary files /dev/null and b/img/playground/play-column-options-dark.png differ diff --git a/img/playground/play-column-options-light.png b/img/playground/play-column-options-light.png new file mode 100644 index 0000000..447f7dd Binary files /dev/null and b/img/playground/play-column-options-light.png differ diff --git a/img/playground/play-column-settings-dark.png b/img/playground/play-column-settings-dark.png new file mode 100644 index 0000000..fa869d1 Binary files /dev/null and b/img/playground/play-column-settings-dark.png differ diff --git a/img/playground/play-column-settings-light.png b/img/playground/play-column-settings-light.png new file mode 100644 index 0000000..5f1bea8 Binary files /dev/null and b/img/playground/play-column-settings-light.png differ diff --git a/img/playground/play-full-table-dark.png b/img/playground/play-full-table-dark.png new file mode 100644 index 0000000..75ae548 Binary files /dev/null and b/img/playground/play-full-table-dark.png differ diff --git a/img/playground/play-full-table-light.png b/img/playground/play-full-table-light.png new file mode 100644 index 0000000..94b61d0 Binary files /dev/null and b/img/playground/play-full-table-light.png differ diff --git a/img/playground/play-json-dark.png b/img/playground/play-json-dark.png new file mode 100644 index 0000000..3db6462 Binary files /dev/null and b/img/playground/play-json-dark.png differ diff --git a/img/playground/play-json-light.png b/img/playground/play-json-light.png new file mode 100644 index 0000000..aea5d72 Binary files /dev/null and b/img/playground/play-json-light.png differ diff --git a/img/playground/play-list-dark.png b/img/playground/play-list-dark.png new file mode 100644 index 0000000..a6a26cb Binary files /dev/null and b/img/playground/play-list-dark.png differ diff --git a/img/playground/play-list-light.png b/img/playground/play-list-light.png new file mode 100644 index 0000000..1235165 Binary files /dev/null and b/img/playground/play-list-light.png differ diff --git a/img/playground/play-multi-select-dark.png b/img/playground/play-multi-select-dark.png new file mode 100644 index 0000000..2a2dfd1 Binary files /dev/null and b/img/playground/play-multi-select-dark.png differ diff --git a/img/playground/play-multi-select-light.png b/img/playground/play-multi-select-light.png new file mode 100644 index 0000000..9012e0e Binary files /dev/null and b/img/playground/play-multi-select-light.png differ diff --git a/img/playground/play-number-col-dark.png b/img/playground/play-number-col-dark.png new file mode 100644 index 0000000..8c97e7b Binary files /dev/null and b/img/playground/play-number-col-dark.png differ diff --git a/img/playground/play-number-col-light.png b/img/playground/play-number-col-light.png new file mode 100644 index 0000000..17a503f Binary files /dev/null and b/img/playground/play-number-col-light.png differ diff --git a/img/playground/play-number-col-summary-dark.png b/img/playground/play-number-col-summary-dark.png new file mode 100644 index 0000000..cd6f58d Binary files /dev/null and b/img/playground/play-number-col-summary-dark.png differ diff --git a/img/playground/play-number-col-summary-light.png b/img/playground/play-number-col-summary-light.png new file mode 100644 index 0000000..9c27128 Binary files /dev/null and b/img/playground/play-number-col-summary-light.png differ diff --git a/img/playground/play-prompt-column-dark.png b/img/playground/play-prompt-column-dark.png new file mode 100644 index 0000000..5c9bd75 Binary files /dev/null and b/img/playground/play-prompt-column-dark.png differ diff --git a/img/playground/play-prompt-column-light.png b/img/playground/play-prompt-column-light.png new file mode 100644 index 0000000..991152f Binary files /dev/null and b/img/playground/play-prompt-column-light.png differ diff --git a/img/playground/play-prompt-structure-output-dark.png b/img/playground/play-prompt-structure-output-dark.png new file mode 100644 index 0000000..4f3b432 Binary files /dev/null and b/img/playground/play-prompt-structure-output-dark.png differ diff --git a/img/playground/play-prompt-structure-output-light.png b/img/playground/play-prompt-structure-output-light.png new file mode 100644 index 0000000..11d60d5 Binary files /dev/null and b/img/playground/play-prompt-structure-output-light.png differ diff --git a/img/playground/play-prompt-write-dark.png b/img/playground/play-prompt-write-dark.png new file mode 100644 index 0000000..2f7ab7a Binary files /dev/null and b/img/playground/play-prompt-write-dark.png differ diff --git a/img/playground/play-prompt-write-light.png b/img/playground/play-prompt-write-light.png new file mode 100644 index 0000000..c06280f Binary files /dev/null and b/img/playground/play-prompt-write-light.png differ diff --git a/img/playground/play-single-select-creation-dark.png b/img/playground/play-single-select-creation-dark.png new file mode 100644 index 0000000..01e09e0 Binary files /dev/null and b/img/playground/play-single-select-creation-dark.png differ diff --git a/img/playground/play-single-select-creation-light.png b/img/playground/play-single-select-creation-light.png new file mode 100644 index 0000000..c140486 Binary files /dev/null and b/img/playground/play-single-select-creation-light.png differ diff --git a/img/playground/play-single-select-dark.png b/img/playground/play-single-select-dark.png new file mode 100644 index 0000000..8a4b4a3 Binary files /dev/null and b/img/playground/play-single-select-dark.png differ diff --git a/img/playground/play-single-select-light.png b/img/playground/play-single-select-light.png new file mode 100644 index 0000000..3630985 Binary files /dev/null and b/img/playground/play-single-select-light.png differ diff --git a/mint.json b/mint.json index 7992629..8c8e528 100644 --- a/mint.json +++ b/mint.json @@ -151,10 +151,28 @@ "group": "Prompt Management", "pages": ["prompts/quick-start", "prompts/registry", "prompts/sdk-usage"] }, + { + "group": "Playgrounds", + "pages": [ + "playgrounds/quick-start", + { + "group": "Columns", + "pages": [ + "playgrounds/columns/data-columns", + "playgrounds/columns/prompt", + "playgrounds/columns/column-management" + ] + } + ] + }, { "group": "Datasets", "pages": ["datasets/quick-start", "datasets/sdk-usage"] }, + { + "group": "Evaluators", + "pages": ["evaluators/intro", "evaluators/custom-evaluator", "evaluators/made-by-traceloop"] + }, { "group": "Experiments", "pages": ["experiments/introduction", "experiments/result-overview", "experiments/running-from-code"] diff --git a/playgrounds/columns/column-management.mdx b/playgrounds/columns/column-management.mdx new file mode 100644 index 0000000..b8320d2 --- /dev/null +++ b/playgrounds/columns/column-management.mdx @@ -0,0 +1,54 @@ +--- +title: "Column Management" +description: "Learn all columns general functionalities" +--- + +Columns in the Playground can be reordered, edited, or deleted at any time to adapt your workspace as your analysis evolves. Understanding how to manage columns effectively helps you maintain organized and efficient playgrounds. + +## Columns Settings +Column Settings lets you hide specific columns from the Playground and reorder them as needed. To open the settings, click the Playground Action button and select Column Settings + + + +To change the column order, use the six-dot handle on the right side of each column to simply drag the column into the desired position. + +To hide a column, toggle its switch in the menu. + + +Columns can also be reordered by dragging them to your desired position in the playground + + + + + + +## Columns Actions + +Each column has a menu that lets you manage and customize it. From this menu, you can: +- Rename the column directly by editing its title +- Edit the column configuration +- Duplicate the column to create a copy with the same settings +- Delete the column if it’s no longer needed + + + + diff --git a/playgrounds/columns/data-columns.mdx b/playgrounds/columns/data-columns.mdx new file mode 100644 index 0000000..2f84887 --- /dev/null +++ b/playgrounds/columns/data-columns.mdx @@ -0,0 +1,116 @@ +--- +title: "Data Columns" +--- + +Columns are the building blocks of playgrounds, defining what kind of data you can store, process, and analyze. + + + + + + + +**Need to reorder, edit, or delete columns?** + + Learn how to effectively manage your columns in the [Column Management](./column-management) guide. + + +## 📝 Data Input Columns +Store and manage static data entered manually or imported from external sources. + +### Text field +Free-form text input with multiline support + +### Numeric +Numbers, integers, and floating-point values + + + + +The last row allows you to choose a calculation method for the column, such as average, median, minimum, maximum, or sum. + + + + + + + + +### Single select +Single-choice columns let you define a set of predefined options and restrict each cell to one selection. +To create one, set the column name and add options in the Create Column drawer. +In the values box, type an option and press Enter to save it—once added, it will appear as a colored label. + +In the table, each cell will then allow you to select only one of the defined options. +This column type is especially useful for manual tagging with a single tag. + + + + + + + + + + + +### Multi select +Multi-select columns let you define a set of predefined options and allow each cell to contain multiple selections. The setup process is the same as for single-select columns: define the column name, add options in the Create Column drawer, and save them as labels. + +In the table, each cell can then include several of the defined options. This column type is especially useful for manual tagging with multiple tags. + + + + + +### JSON +A JSON column allows you to store and edit structured JSON objects directly in the Playground. Each cell can contain a JSON value, making it easy to work with complex data structures. + +When editing a cell, an Edit JSON panel opens with syntax highlighting and formatting support, so you can quickly add or update fields. + + + + diff --git a/playgrounds/columns/prompt.mdx b/playgrounds/columns/prompt.mdx new file mode 100644 index 0000000..70fe014 --- /dev/null +++ b/playgrounds/columns/prompt.mdx @@ -0,0 +1,89 @@ +--- +title: "Prompt Column" +description: "Execute LLM prompts with full model configuration" +--- + +### Prompt +A Prompt column allows you to define a custom prompt and run it directly on your Playground data. +You can compose prompts with messages (system, user, assistant or developer), insert playground variables, and configure which model to use. +Each row in your playground will be passed through the prompt, and the model’s response will be stored in the column. + + +Prompt columns make it easy to test different prompts against real data, compare model outputs side by side. + + + + + +## Prompt Writing + +Write your prompt messages by selecting a specific role—System, User, Assistant, or Developer. + +You can insert variables into the prompt using curly brackets (e.g., `{{variable_name}}`) or by adding column valuable with the top right `+` button in the message box. These variables can then be mapped to existing column data, allowing your prompt to dynamically adapt to the playground + + + + + +## Configuration Options + +### Model Selection +You can connect to a wide range of LLM providers and models. Common choices include OpenAI (GPT-4o, GPT-4o-mini), Anthropic (Claude-3.5-Sonnet, Claude-3-Opus), and Google (Gemini-2.5 family). +Other providers such as Groq and DeepSeek may also be supported, and additional integrations will continue to be added over time. + +### Structured Output +Structured output can be enabled for models that support it. You can define a schema in several ways: + +- **JSON Editor** - Write a JSON structure directly in the editor +- **Visual Editor** - Add parameters interactively, specifying their names and types +- **Generate Schema** - Use the "Generate schema" button on the top right to automatically create a schema based on your written prompt + + + + + +## Tools +Tools let you extend prompts by allowing the model to call custom functions with structured arguments. Instead of plain text, the model can return a validated tool-call object that follows your schema. + +To create a tool, give it a name and description so the model knows when to use it. Then define its parameters with a name, description, type (string, number, boolean, etc.), and whether they are required. +### Advanced Settings +Fine-tune model behavior options: +- **Temperature** (0.0-1.0): Control randomness and creativity +- **Max Tokens**: Limit model output length (1-8000+ depending on model) +- **Top P**: Nucleus sampling parameter (0.0-1.0) +- **Frequency Penalty**: Reduce repetition (0.0 to 1.0) +- **Presence Penalty**: Encourage topic diversity (0.0 to 1.0) +- **Logprobs**: When enabled, returns the probability scores for generated tokens +- **Thinking Budget** (512-24576): Sets the number of tokens the model can use for internal reasoning before producing the final output +A higher budget allows more complex reasoning but increases cost and runtime +- **Exclude Reasoning from Response**: If enabled, the model hides its internal reasoning steps and only outputs the final response + +## Prompt Execution + +A prompt can be executed across all cells in a column or on a specific cell. + +Prompt outputs can be mapped to different columns by clicking a cell and selecting the mapping icon, or by double-clicking the cell \ No newline at end of file diff --git a/playgrounds/quick-start.mdx b/playgrounds/quick-start.mdx new file mode 100644 index 0000000..f985dd1 --- /dev/null +++ b/playgrounds/quick-start.mdx @@ -0,0 +1,75 @@ +--- +title: "Quick Start" +--- + +Playgrounds are interactive spreadsheets where you can organize your data and experiment with LLMs, evaluate outputs, and analyze data. +Think of them as powerful workbenches for AI development that combine the flexibility of a spreadsheet with the power of LLM evaluation and execution. +It’s designed for everyone, from product managers and analysts to QA, data engineers, and software developers. + + + + + + + +Playgrounds can be used to build datasets for experiments and evaluation. Once you've structured your data in a playground, you can export it to a dataset and publish a version for reproducible testing. Learn more about [datasets](../datasets/quick-start). + + +## Playground Structure + +A playground is organized as a table-like structure with three fundamental components: **rows**, **columns**, and **cells**. Understanding how these work together is essential for effective playground usage. + +### Rows + +Rows represent individual **data points** or **test cases** in your playground. Each row is a complete record that spans across all columns. +Each row in the Playground is independent and can be executed on its own, maintains an order that can be rearranged as needed. + +### Row Operations +- **Add Row**: Create new rows manually or through bulk operations +- **Generate Rows**: Use the AI row generator to create new rows based on the existing data in your Playground. +- **Delete Row**: Remove unwanted rows individually or in bulk +- **Execute Row**: Execute all cells in a specific row + +## Columns + +Columns are the building blocks of playgrounds, defining what kind of data you can store, process, and analyze. They come in different types to handle various data formats and use cases: + +**Data Input Columns** store static data such as text, json, numbers and tags + +**Prompt Columns** execute LLM prompts directly on your data with full model configuration, allowing you to test different prompts and compare outputs side by side. + +**Evaluation Columns** assess AI outputs and data quality using pre-built evaluators or custom evaluators tailored to your specific needs. Learn more about [evaluators](../evaluators/intro). + +You can manage columns by reordering, hiding, editing, duplicating, or deleting them as your analysis evolves. Learn more about [column types](./columns/data-columns) and [column management](./columns/column-management). + +## Create a Playground + +Data can be imported from different sources: + +1. CSV files +2. JSON file +3. From A Dataset +4. From production spans + +You can create a Playground from scratch and import data later. Simply set a name for the Playground and start adding columns, rows, and data. + + + + + + +## Running a Playground + +Execute all cells in your playground by clicking the play button in the top right corner. This runs all prompt columns and evaluation columns across every row, allowing you to process your entire dataset at once. + +You can also run individual cells, rows, or columns by clicking on their respective play buttons to test specific configurations. For example, you might run a single agent execution, test one user input, or evaluate a specific chat conversation. + + +Ready to build more sophisticated playgrounds? Dive into the [complete documentation](./index) or explore specific [column types](./columns/data-columns) to unlock the full power of Traceloop Playgrounds! +