-
Notifications
You must be signed in to change notification settings - Fork 48
docs(monitor): update the monitor doc #106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,104 @@ | ||
| --- | ||
| title: "Defining Monitors" | ||
| description: "Learn how to create and configure monitors to evaluate your LLM outputs" | ||
| --- | ||
|
|
||
| Monitors in Traceloop allow you to continuously evaluate your LLM outputs in real time. This guide walks you through the process of creating and configuring monitors for your specific use cases. | ||
|
|
||
| ## Creating a Monitor | ||
|
|
||
| To create a monitor, you need to complete these steps: | ||
|
|
||
| <Steps> | ||
| <Step title="Send Traces"> | ||
| Connect the SDK to your system and add decorators to your flow. See [OpenLLMetry](/openllmetry/introduction) for setup instructions. | ||
| </Step> | ||
| <Step title="Choose an Evaluator"> | ||
| Select the evaluation logic that will run on matching spans. You can define your own custom evaluators or use the pre-built ones by Traceloop. See [Evaluators](/evaluators/intro) for more details. | ||
| </Step> | ||
| <Step title="Define Span Filter"> | ||
| Set criteria that determine which spans the monitor will evaluate. | ||
| </Step> | ||
| <Step title="Configure Settings"> | ||
| Set up how the monitor operates, including sampling rates and other advanced options. | ||
| </Step> | ||
| </Steps> | ||
|
|
||
| ### Basic Monitor Setup | ||
|
|
||
| Navigate to the Monitors page and click the **New** button to open the Evaluator Library. Choose the evaluator you want to run in your monitor. | ||
| Next, you will be able to configure which spans will be monitored. | ||
|
|
||
| ## Span Filtering | ||
|
|
||
| The span filtering modal shows the actual spans from your system, letting you see how your chosen filters apply to real data. | ||
| Add filters by clicking on the <kbd>+</kbd> button. | ||
| <Frame> | ||
| <img className="block dark:hidden" | ||
| src="/img/monitor/monitor-filter-light.png" /> | ||
| <img className="hidden dark:block" | ||
| src="/img/monitor/monitor-filter-dark.png" /> | ||
nina-kollman marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| </Frame> | ||
|
|
||
|
|
||
| ### Filter Options | ||
|
|
||
| - **Environment**: Filter by a specific environment | ||
| - **Workflow Name**: Filter by the workflow name defined in your system | ||
| - **Service Name**: Target spans from specific services or applications | ||
| - **AI Data**: Filter based on LLM-specific attributes like model name, token usage, streaming status, and other AI-related metadata | ||
| - **Attributes**: Filter based on span attributes | ||
|
|
||
|
|
||
| <img className="block dark:hidden" | ||
| src="/img/monitor/monitor-filter-options-light.png" | ||
| style={{maxWidth: '500px'}} | ||
| /> | ||
| <img className="hidden dark:block" | ||
| src="/img/monitor/monitor-filter-options-dark.png" | ||
| style={{maxWidth: '500px'}} | ||
| /> | ||
|
|
||
|
|
||
| ## Monitor Settings | ||
|
|
||
| ### Map Input | ||
|
|
||
| You need to map the appropriate span fields to the evaluator’s input schema. | ||
| This can be done easily by browsing through the available span field options—once you select a field, the real data is immediately displayed so you can see how it maps to the input. | ||
|
|
||
| <Frame> | ||
| <img className="block dark:hidden" | ||
| src="/img/monitor/monitor-settings-light.png" /> | ||
| <img className="hidden dark:block" | ||
| src="/img/monitor/monitor-settings-dark.png" /> | ||
| </Frame> | ||
|
|
||
| When the field data is not plain text, you can use JSON key mapping or Regex to extract the specific content you need. | ||
|
|
||
| For example, if your content is an array and you want to extract the "text" field from the object: | ||
|
|
||
| ```json | ||
| [{"type":"text","text":"explain who are you and what can you do in one sentence"}] | ||
| ``` | ||
|
|
||
| You can use JSON key mapping like `0.text` to extract just the text content. The JSON key mapping will be applied to the Preview table, allowing you to see the extracted result in real-time. | ||
|
|
||
| <Frame> | ||
| <img className="block dark:hidden" | ||
| src="/img/monitor/monitor-json-light.png" /> | ||
| <img className="hidden dark:block" | ||
| src="/img/monitor/monitor-json-dark.png" /> | ||
| </Frame> | ||
|
|
||
| You can use Regex like `text":"(.+?)"` to extract just the text content. The regex will be applied to the Preview table, allowing you to see the extracted result in real-time. | ||
nina-kollman marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| <Frame> | ||
| <img className="block dark:hidden" | ||
| src="/img/monitor/monitor-regex-light.png" /> | ||
| <img className="hidden dark:block" | ||
| src="/img/monitor/monitor-regex-dark.png" /> | ||
| </Frame> | ||
|
|
||
| ### Advanced | ||
| You can set a **Rate sample** to control the percentage of spans within the selected filter group that the monitor will run on. | ||
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -3,39 +3,23 @@ title: "Introduction" | |||||||||
| description: "Detect hallucinations and regressions in the quality of your LLMs" | ||||||||||
| --- | ||||||||||
|
|
||||||||||
| One of the key features of Traceloop is the ability to monitor the quality of your LLM outputs. It helps you to detect hallucinations and regressions in the quality of your models and prompts. | ||||||||||
| One of the key features of Traceloop is the ability to monitor the quality of your LLM outputs in **real time**. It helps you to detect hallucinations and regressions in the quality of your models and prompts. | ||||||||||
|
|
||||||||||
| To start monitoring your LLM outputs, make sure you installed OpenLLMetry and configured it to send data to Traceloop. If you haven't done that yet, you can follow the instructions in the [Getting Started](/openllmetry/getting-started) guide. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Verification agent 🧩 Analysis chainBroken link risk: “Getting Started” path may not exist. Link points to /openllmetry/getting-started, but navigation lists per-language pages (…-python, …-ts, …-nextjs, …-go, …-ruby). Verify target exists or update to a valid page. If missing, consider linking to the language selector page or to /openllmetry/introduction. 🏁 Script executed: #!/bin/bash
# Verify whether a generic getting-started page exists.
fd -a --strip-cwd-prefix 'getting-started.mdx' | sed 's/^/FILE: /'
fd -a --strip-cwd-prefix 'openllmetry' | rg -nI -e 'getting-started' -e 'Getting Started' -n -H -SLength of output: 212 🏁 Script executed: #!/bin/bash
# List top-level directories to inspect documentation structure
ls -1
# Check for a docs folder and list its subdirectories if present
if [ -d docs ]; then
echo "DIR: docs"
ls -1 docs
fi
# Find any directory named 'openllmetry'
fd -t d --strip-cwd-prefix openllmetry
# Find all .mdx files and filter for 'getting-started' (case-insensitive)
fd -t f -e mdx --strip-cwd-prefix . | rg -i 'getting-started'
# Preview the generic getting-started.mdx if it exists at repository root
if [ -f getting-started.mdx ]; then
echo "---- beginning of getting-started.mdx ----"
head -n20 getting-started.mdx
echo "---- end of snippet ----"
fiLength of output: 596 Broken “Getting Started” link—update to a valid per-language path 🤖 Prompt for AI Agents |
||||||||||
| Next, if you're not using a [supported LLM framework](/openllmetry/tracing/supported#frameworks), [make sure to annotate workflows and tasks](/openllmetry/tracing/annotations). | ||||||||||
| You can then define any of the following [monitors](https://app.traceloop.com/monitors/prd) to track the quality of your LLM outputs. | ||||||||||
|
|
||||||||||
| <Frame> | ||||||||||
| <img className="block dark:hidden" src="/img/monitor-light.png" /> | ||||||||||
| <img className="hidden dark:block" src="/img/monitor-dark.png" /> | ||||||||||
| <img className="block dark:hidden" src="/img/monitor/monitor-list-light.png" /> | ||||||||||
| <img className="hidden dark:block" src="/img/monitor/monitor-list-dark.png" /> | ||||||||||
|
Comment on lines
+12
to
+13
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Add alt text to images (accessibility + SEO). - <img className="block dark:hidden" src="/img/monitor/monitor-list-light.png" />
- <img className="hidden dark:block" src="/img/monitor/monitor-list-dark.png" />
+ <img className="block dark:hidden" src="/img/monitor/monitor-list-light.png" alt="Monitors list in the dashboard (light)" />
+ <img className="hidden dark:block" src="/img/monitor/monitor-list-dark.png" alt="Monitors list in the dashboard (dark)" />📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||
| </Frame> | ||||||||||
| ## Semantic Metrics | ||||||||||
|
|
||||||||||
| - **QA Relevancy:** Asses the relevant of an answer generated by a model with respect to a question. This is especially useful when running RAG pipelines. | ||||||||||
| - **Faithfulness:** Checks whether some generated content was inferred or deducted from a given context. Relevant for RAG pipelines, entity extraction, summarization, and many other text-related tasks. | ||||||||||
| - **Text Quality:** Evaluates the overall readability and coherence of text. | ||||||||||
| - **Grammar Correctness:** Checks for grammatical errors in generated texts. | ||||||||||
| - **Redundancy Detection:** Identifies repetitive content. | ||||||||||
| - **Focus Assessment:** Measures whether a given paragraph focuses on a single subject or "jumps" between multiple ones. | ||||||||||
| ## What is a Monitor? | ||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should talk about LLM-as-a-judge vs "made by traceloop" and connect it to our evals. I wouldn't put the list of monitors here |
||||||||||
|
|
||||||||||
| ## Syntactic TextMetrics | ||||||||||
| A monitor is an evaluator that runs on a group of defined spans with specific characteristics in real time. For every span that matches the group filter, it will run the evaluator and log the monitor result. This allows you to continuously assess the quality and performance of your LLM outputs as they are generated in production. | ||||||||||
|
|
||||||||||
| - **Text Length:** Checks if the length of the generated text is within a given range (constant or with respect to an input). | ||||||||||
| - **Word Count:** Checks if the number of words in the generated text is within a given range. | ||||||||||
| Monitors can use two types of evaluators: | ||||||||||
|
|
||||||||||
| ## Safety Metrics | ||||||||||
| - **LLM-as-a-Judge**: uses a large language model to evaluate outputs based on semantic qualities. You can create custom evaluators with this method by writing prompts that capture your own criteria. | ||||||||||
| - **Traceloop built in evaluators**: deterministic evaluations for structural validation, safety checks, and syntactic analysis. | ||||||||||
|
|
||||||||||
| - **PII Detection:** Identifies personally identifiable information in generated texts or input prompts. | ||||||||||
| - **Secret Detection:** Identifies secrets and API keys in generated texts or input prompts. | ||||||||||
| - **Toxicity Detection:** Identifies toxic content in generated texts or input prompts. | ||||||||||
|
|
||||||||||
| ## Structural Metrics | ||||||||||
|
|
||||||||||
| - **Regex Validation**: Ensures that the output of a model matches a given regular expression. | ||||||||||
| - **SQL Validation**: Ensures SQL queries are syntactically correct. | ||||||||||
| - **JSON Schema Validation**: Ensures that the output of a model matches a given JSON schema. | ||||||||||
| - **Code Validation**: Ensures that the output of a model is valid code in a given language. | ||||||||||
| All monitors connect to our comprehensive [Evaluators](/evaluators/intro) library, allowing you to choose the right evaluation approach for your specific use case. | ||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,79 @@ | ||
| --- | ||
| title: "Using Monitors" | ||
| description: "Learn how to view, analyze, and act on monitor results in your LLM applications" | ||
| --- | ||
|
|
||
| Once you've created monitors, Traceloop continuously evaluates your LLM outputs and provides insights into their performance. This guide explains how to interpret and act on monitor results. | ||
|
|
||
| ## Monitor Dashboard | ||
|
|
||
| The Monitor Dashboard provides an overview of all active monitors and their current status. | ||
| It shows each monitor’s health, the number of times it has run, and the most recent execution time. | ||
|
|
||
| <Frame> | ||
| <img className="block dark:hidden" src="/img/monitor/monitor-list-light.png" /> | ||
| <img className="hidden dark:block" src="/img/monitor/monitor-list-dark.png" /> | ||
| </Frame> | ||
nina-kollman marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| ## Viewing Monitor Results | ||
|
|
||
| ### Real-time Monitoring | ||
|
|
||
| Monitor results are displayed in real-time as your LLM applications generate new spans. You can view: | ||
|
|
||
| - **Run Details**: The span value that was evaluated and its result | ||
| - **Trend Analysis**: Performance over time | ||
| - **Volume Metrics**: Number of evaluations performed | ||
| - **Evaluator Output Rates**: Such as success rates for threshold-based evaluators | ||
|
|
||
| ### Monitor Results Page | ||
|
|
||
| Click on any monitor to access its detailed results page. The monitor page provides comprehensive analytics and span-level details. | ||
|
|
||
| #### Chart Visualizations | ||
|
|
||
| The Monitor page includes multiple chart views to help you analyze your data, and you can switch between chart types using the selector in the top-right corner. | ||
|
|
||
| **Line Chart View** - Shows evaluation trends over time: | ||
| <Frame> | ||
| <img className="block dark:hidden" src="/img/monitor/monitor-page-line-light.png" /> | ||
| <img className="hidden dark:block" src="/img/monitor/monitor-page-line-dark.png" /> | ||
| </Frame> | ||
|
|
||
| **Bar Chart View** - Displays evaluation results in time buckets: | ||
| <Frame> | ||
| <img className="block dark:hidden" src="/img/monitor/monitor-page-buckets-light.png" /> | ||
| <img className="hidden dark:block" src="/img/monitor/monitor-page-buckets-dark.png" /> | ||
| </Frame> | ||
nina-kollman marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| #### Filtering and Time Controls | ||
|
|
||
| The top toolbar provides filtering options: | ||
| - **Environment**: Filter by production, staging, etc. | ||
| - **Time Range**: 24h, 7d, 14d, or custom ranges | ||
| - **Metric**: Select which evaluator output property to measure | ||
| - **Bucket Size**: 6h, Hourly, Daily, etc. | ||
| - **Aggregation**: Choose average, median, sum, min, max, or count | ||
|
|
||
| #### Matching Spans Table | ||
|
|
||
| The bottom section shows all spans that matched your monitor's filter criteria: | ||
| - **Timestamp**: When the evaluation occurred | ||
| - **Input**: The actual content that was mapped to be evaluated | ||
| - **Output**: The evaluation result/score | ||
| - **Completed Runs**: Total successful/error evaluations | ||
| - **Error Runs**: Failed evaluation attempts | ||
|
|
||
| Each row includes a link icon to view the full span details in the trace explorer: | ||
|
|
||
| <Frame> | ||
| <img className="block dark:hidden" src="/img/trace/trace-light.png" /> | ||
| <img className="hidden dark:block" src="/img/trace/trace-dark.png" /> | ||
| </Frame> | ||
nina-kollman marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| For further information on tracing refer to [OpenLLMetry](/openllmetry/introduction). | ||
|
|
||
| <Tip> | ||
| Ready to set up an evaluator for your monitor? Learn more about creating and configuring evaluators in the [Evaluators](/evaluators/intro) section. | ||
| </Tip> | ||
Uh oh!
There was an error while loading. Please reload this page.